# LLM Post-Filter Experiment: Data Extraction

This notebook extracts GLITCH detections with context for LLM evaluation, leveraging our baseline experiment results.

## Goals:
1. Extract all True Positive and False Positive GLITCH detections
2. Prepare data structure for LLM post-filtering  
3. Create summary statistics for the experiment

## Data Sources:
- **Baseline**: Chef and Puppet experiment results
- **Target Smells**: Hard-coded secret, Suspicious comment, Use of weak cryptography algorithms


## Setup and Imports


In [4]:
import sys
import pandas as pd
import numpy as np
from pathlib import Path

# Add the src directory to path to import our hybrid modules
project_root = Path.cwd().parent.parent.parent
sys.path.append(str(project_root / "src"))

from hybrid.data_extractor import GLITCHDetectionExtractor

print(f"Project root: {project_root}")
print(f"Working directory: {Path.cwd()}")

# Initialize the extractor
extractor = GLITCHDetectionExtractor(project_root)
print("✅ Data extractor initialized successfully!")


Project root: /Users/colemei/Library/Mobile Documents/com~apple~CloudDocs/01.Work/04.Master/Course/Research Program/Project/LLM-IaC-SecEval
Working directory: /Users/colemei/Library/Mobile Documents/com~apple~CloudDocs/01.Work/04.Master/Course/Research Program/Project/LLM-IaC-SecEval/experiments/llm-postfilter/notebooks
✅ Data extractor initialized successfully!


## Extract Chef Detections


In [5]:
print("🔍 Extracting Chef detections...")

# Extract Chef detections
chef_detections = extractor.extract_detections_for_llm('chef')

print(f"\n📊 Chef Detection Summary:")
for smell, detections in chef_detections.items():
    tp_count = sum(1 for d in detections if d['is_true_positive'])
    fp_count = sum(1 for d in detections if not d['is_true_positive'])
    print(f"  {smell}:")
    print(f"    Total: {len(detections)} | TP: {tp_count} | FP: {fp_count}")

# Show example detection structure
if chef_detections:
    first_smell = next(iter(chef_detections.keys()))
    if chef_detections[first_smell]:
        print(f"\n📝 Example detection structure ({first_smell}):")
        example = chef_detections[first_smell][0]
        for key, value in example.items():
            print(f"  {key}: {value}")
    else:
        print(f"\n⚠️ No detections found for {first_smell}")
else:
    print("\n❌ No detections extracted!")


INFO:hybrid.data_extractor:Loaded chef data: Oracle=148, GLITCH=166
INFO:hybrid.data_extractor:Hard-coded secret: 9 TP, 37 FP
INFO:hybrid.data_extractor:Suspicious comment: 4 TP, 6 FP
INFO:hybrid.data_extractor:Use of weak cryptography algorithms: 1 TP, 1 FP
INFO:hybrid.data_extractor:Extracted 46 detections for Hard-coded secret
INFO:hybrid.data_extractor:Extracted 10 detections for Suspicious comment
INFO:hybrid.data_extractor:Extracted 2 detections for Use of weak cryptography algorithms


🔍 Extracting Chef detections...

📊 Chef Detection Summary:
  Hard-coded secret:
    Total: 46 | TP: 9 | FP: 37
  Suspicious comment:
    Total: 10 | TP: 4 | FP: 6
  Use of weak cryptography algorithms:
    Total: 2 | TP: 1 | FP: 1

📝 Example detection structure (Hard-coded secret):
  detection_id: chef_Hard-coded secret_chef-boneyard_qa-chef-server-cluster-attributes-default.rb_3
  iac_tool: chef
  smell_category: Hard-coded secret
  glitch_smell: hardcoded-secret
  file_path: chef-boneyard_qa-chef-server-cluster-attributes-default.rb
  line_number: 3
  detection_id_raw: chef-boneyard_qa-chef-server-cluster-attributes-default.rb_3
  is_true_positive: True
  glitch_detection: True


## Extract Puppet Detections


In [6]:
print("🔍 Extracting Puppet detections...")

# Extract Puppet detections
puppet_detections = extractor.extract_detections_for_llm('puppet')

print(f"\n📊 Puppet Detection Summary:")
for smell, detections in puppet_detections.items():
    tp_count = sum(1 for d in detections if d['is_true_positive'])
    fp_count = sum(1 for d in detections if not d['is_true_positive'])
    print(f"  {smell}:")
    print(f"    Total: {len(detections)} | TP: {tp_count} | FP: {fp_count}")

# Compare with Chef
print(f"\n🔄 Chef vs Puppet Comparison:")
all_smells = set(chef_detections.keys()) | set(puppet_detections.keys())
for smell in sorted(all_smells):
    chef_count = len(chef_detections.get(smell, []))
    puppet_count = len(puppet_detections.get(smell, []))
    print(f"  {smell}: Chef={chef_count}, Puppet={puppet_count}")

print(f"\n✅ Total detections extracted: {sum(len(detections) for detections in chef_detections.values()) + sum(len(detections) for detections in puppet_detections.values())}")


INFO:hybrid.data_extractor:Loaded puppet data: Oracle=117, GLITCH=197
INFO:hybrid.data_extractor:Hard-coded secret: 9 TP, 57 FP


INFO:hybrid.data_extractor:Suspicious comment: 9 TP, 14 FP
INFO:hybrid.data_extractor:Use of weak cryptography algorithms: 4 TP, 3 FP
INFO:hybrid.data_extractor:Extracted 66 detections for Hard-coded secret
INFO:hybrid.data_extractor:Extracted 23 detections for Suspicious comment
INFO:hybrid.data_extractor:Extracted 7 detections for Use of weak cryptography algorithms


🔍 Extracting Puppet detections...

📊 Puppet Detection Summary:
  Hard-coded secret:
    Total: 66 | TP: 9 | FP: 57
  Suspicious comment:
    Total: 23 | TP: 9 | FP: 14
  Use of weak cryptography algorithms:
    Total: 7 | TP: 4 | FP: 3

🔄 Chef vs Puppet Comparison:
  Hard-coded secret: Chef=46, Puppet=66
  Suspicious comment: Chef=10, Puppet=23
  Use of weak cryptography algorithms: Chef=2, Puppet=7

✅ Total detections extracted: 154


## Save Extracted Data


In [7]:
print("💾 Saving extracted data...")

# Save both Chef and Puppet detections
output_dir = project_root / "experiments/llm-postfilter/data"
output_dir.mkdir(parents=True, exist_ok=True)

# Save Chef data
chef_saved = extractor.save_detections('chef', output_dir)
print(f"✅ Chef data saved to {output_dir}")

# Save Puppet data  
puppet_saved = extractor.save_detections('puppet', output_dir)
print(f"✅ Puppet data saved to {output_dir}")

# List saved files
print(f"\n📁 Saved files:")
for file_path in sorted(output_dir.glob("*.csv")):
    file_size = file_path.stat().st_size
    print(f"  {file_path.name} ({file_size:,} bytes)")

print(f"\n🎯 Next steps:")
print("1. Design LLM prompts for each security smell category")
print("2. Implement LLM post-filtering pipeline") 
print("3. Evaluate GLITCH+LLM vs GLITCH-only performance")
print("\n🚀 Data extraction phase completed successfully!")


INFO:hybrid.data_extractor:Loaded chef data: Oracle=148, GLITCH=166
INFO:hybrid.data_extractor:Hard-coded secret: 9 TP, 37 FP
INFO:hybrid.data_extractor:Suspicious comment: 4 TP, 6 FP
INFO:hybrid.data_extractor:Use of weak cryptography algorithms: 1 TP, 1 FP
INFO:hybrid.data_extractor:Extracted 46 detections for Hard-coded secret
INFO:hybrid.data_extractor:Extracted 10 detections for Suspicious comment
INFO:hybrid.data_extractor:Extracted 2 detections for Use of weak cryptography algorithms
INFO:hybrid.data_extractor:Saved 46 detections to /Users/colemei/Library/Mobile Documents/com~apple~CloudDocs/01.Work/04.Master/Course/Research Program/Project/LLM-IaC-SecEval/experiments/llm-postfilter/data/chef_hard_coded_secret_detections.csv
INFO:hybrid.data_extractor:Saved 10 detections to /Users/colemei/Library/Mobile Documents/com~apple~CloudDocs/01.Work/04.Master/Course/Research Program/Project/LLM-IaC-SecEval/experiments/llm-postfilter/data/chef_suspicious_comment_detections.csv
INFO:hybrid

💾 Saving extracted data...
✅ Chef data saved to /Users/colemei/Library/Mobile Documents/com~apple~CloudDocs/01.Work/04.Master/Course/Research Program/Project/LLM-IaC-SecEval/experiments/llm-postfilter/data
✅ Puppet data saved to /Users/colemei/Library/Mobile Documents/com~apple~CloudDocs/01.Work/04.Master/Course/Research Program/Project/LLM-IaC-SecEval/experiments/llm-postfilter/data

📁 Saved files:
  chef_detection_summary.csv (181 bytes)
  chef_hard_coded_secret_detections.csv (12,068 bytes)
  chef_suspicious_comment_detections.csv (3,176 bytes)
  chef_use_of_weak_cryptography_algorithms_detections.csv (795 bytes)
  puppet_detection_summary.csv (188 bytes)
  puppet_hard_coded_secret_detections.csv (18,591 bytes)
  puppet_suspicious_comment_detections.csv (6,281 bytes)
  puppet_use_of_weak_cryptography_algorithms_detections.csv (2,504 bytes)

🎯 Next steps:
1. Design LLM prompts for each security smell category
2. Implement LLM post-filtering pipeline
3. Evaluate GLITCH+LLM vs GLITCH-o