# Security Smells Detection Analysis: GLITCH vs Ground Truth

This notebook analyzes the performance of GLITCH static analysis tool in detecting three critical security smells in Chef cookbooks:

1. **Hard-coded secret**
2. **Suspicious comment** 
3. **Use of weak cryptography algorithms**

We compare GLITCH detection results against manually annotated ground truth data to calculate precision, recall, and F1-scores.


## Data Loading and Setup


In [1]:
import pandas as pd
import numpy as np

# Load datasets
oracle_df = pd.read_csv('../../data/oracle-dataset-chef.csv')
glitch_df = pd.read_csv('../../data/GLITCH-chef-oracle.csv')

print(f"Oracle dataset: {oracle_df.shape[0]} records")
print(f"GLITCH dataset: {glitch_df.shape[0]} records")

# Define target security smell categories
target_categories = [
    'Hard-coded secret',
    'Suspicious comment', 
    'Use of weak cryptography algorithms'
]

glitch_target_categories = [
    'hardcoded-secret',
    'suspicious comment', 
    'weak cryptography algorithms'
]

# Category mapping
category_mapping = {
    'Hard-coded secret': 'hardcoded-secret',
    'Suspicious comment': 'suspicious comment',
    'Use of weak cryptography algorithms': 'weak cryptography algorithms'
}


Oracle dataset: 148 records
GLITCH dataset: 166 records


## Ground Truth Analysis


In [2]:
# Filter ground truth for target security smells
oracle_security_smells = oracle_df[oracle_df['CATEGORY'].isin(target_categories)].copy()
oracle_security_smells['ID'] = oracle_security_smells['PATH'] + '_' + oracle_security_smells['LINE'].astype(str)

print("Ground Truth Security Smells Distribution:")
print(oracle_security_smells['CATEGORY'].value_counts())
print(f"\nTotal: {len(oracle_security_smells)} instances")

oracle_security_smells.head()


Ground Truth Security Smells Distribution:
Hard-coded secret                      13
Suspicious comment                      4
Use of weak cryptography algorithms     1
Name: CATEGORY, dtype: int64

Total: 18 instances


Unnamed: 0,PATH,LINE,CATEGORY,AGREEMENT,ID
10,DrRayWang_Chef-Base-cookbooks-bcpc-recipes-mys...,22,Hard-coded secret,2,DrRayWang_Chef-Base-cookbooks-bcpc-recipes-mys...
11,DrRayWang_Chef-Base-cookbooks-bcpc-recipes-mys...,24,Hard-coded secret,2,DrRayWang_Chef-Base-cookbooks-bcpc-recipes-mys...
12,DrRayWang_Chef-Base-cookbooks-bcpc-recipes-mys...,26,Hard-coded secret,2,DrRayWang_Chef-Base-cookbooks-bcpc-recipes-mys...
41,aws_aws-parallelcluster-cookbook-cookbooks-aws...,93,Suspicious comment,3,aws_aws-parallelcluster-cookbook-cookbooks-aws...
43,aws_aws-parallelcluster-cookbook-cookbooks-aws...,112,Suspicious comment,3,aws_aws-parallelcluster-cookbook-cookbooks-aws...


## GLITCH Detection Results


In [3]:
# Filter GLITCH results for target security smells
glitch_security_smells = glitch_df[glitch_df['ERROR'].isin(glitch_target_categories)].copy()
glitch_security_smells['ID'] = glitch_security_smells['PATH'] + '_' + glitch_security_smells['LINE'].astype(str)

print("GLITCH Security Smells Detection Results:")
print(glitch_security_smells['ERROR'].value_counts())
print(f"\nTotal: {len(glitch_security_smells)} detections")

glitch_security_smells.head()


GLITCH Security Smells Detection Results:
hardcoded-secret                46
suspicious comment              10
weak cryptography algorithms     2
Name: ERROR, dtype: int64

Total: 58 detections


Unnamed: 0,PATH,LINE,ERROR,ID
1,DrRayWang_Chef-Base-cookbooks-bcpc-recipes-gra...,130,hardcoded-secret,DrRayWang_Chef-Base-cookbooks-bcpc-recipes-gra...
4,DrRayWang_Chef-Base-cookbooks-bcpc-recipes-gra...,153,hardcoded-secret,DrRayWang_Chef-Base-cookbooks-bcpc-recipes-gra...
7,DrRayWang_Chef-Base-cookbooks-bcpc-recipes-gra...,174,hardcoded-secret,DrRayWang_Chef-Base-cookbooks-bcpc-recipes-gra...
10,DrRayWang_Chef-Base-cookbooks-bcpc-recipes-gra...,184,hardcoded-secret,DrRayWang_Chef-Base-cookbooks-bcpc-recipes-gra...
12,DrRayWang_Chef-Base-cookbooks-bcpc-recipes-mys...,33,hardcoded-secret,DrRayWang_Chef-Base-cookbooks-bcpc-recipes-mys...


## Performance Analysis: Precision and Recall

Comparing GLITCH detections against ground truth to evaluate tool performance.


In [4]:
# Calculate performance metrics by category
results = {}

for oracle_cat, glitch_cat in category_mapping.items():
    # Get ground truth and GLITCH instances for this category
    oracle_instances = set(oracle_security_smells[oracle_security_smells['CATEGORY'] == oracle_cat]['ID'])
    glitch_instances = set(glitch_security_smells[glitch_security_smells['ERROR'] == glitch_cat]['ID'])
    
    # Calculate metrics
    tp = len(oracle_instances.intersection(glitch_instances))
    fp = len(glitch_instances - oracle_instances)
    fn = len(oracle_instances - glitch_instances)
    
    precision = tp / (tp + fp) if (tp + fp) > 0 else 0
    recall = tp / (tp + fn) if (tp + fn) > 0 else 0
    f1_score = 2 * (precision * recall) / (precision + recall) if (precision + recall) > 0 else 0
    
    results[oracle_cat] = {
        'Ground_Truth': len(oracle_instances),
        'GLITCH_Detections': len(glitch_instances),
        'True_Positives': tp,
        'False_Positives': fp,
        'False_Negatives': fn,
        'Precision': precision,
        'Recall': recall,
        'F1_Score': f1_score
    }

# Create results dataframe
results_df = pd.DataFrame(results).T.round(3)
print("Performance Results by Category:")
results_df


Performance Results by Category:


Unnamed: 0,Ground_Truth,GLITCH_Detections,True_Positives,False_Positives,False_Negatives,Precision,Recall,F1_Score
Hard-coded secret,13.0,46.0,9.0,37.0,4.0,0.196,0.692,0.305
Suspicious comment,4.0,10.0,4.0,6.0,0.0,0.4,1.0,0.571
Use of weak cryptography algorithms,1.0,2.0,1.0,1.0,0.0,0.5,1.0,0.667


## Overall Performance Summary


In [5]:
# Calculate overall metrics across all categories
all_oracle_ids = set(oracle_security_smells['ID'])
all_glitch_ids = set(glitch_security_smells['ID'])

overall_tp = len(all_oracle_ids.intersection(all_glitch_ids))
overall_fp = len(all_glitch_ids - all_oracle_ids)
overall_fn = len(all_oracle_ids - all_glitch_ids)

overall_precision = overall_tp / (overall_tp + overall_fp) if (overall_tp + overall_fp) > 0 else 0
overall_recall = overall_tp / (overall_tp + overall_fn) if (overall_tp + overall_fn) > 0 else 0
overall_f1 = 2 * (overall_precision * overall_recall) / (overall_precision + overall_recall) if (overall_precision + overall_recall) > 0 else 0

print(f"Overall Performance Metrics:")
print(f"Ground Truth Instances: {len(all_oracle_ids)}")
print(f"GLITCH Detections: {len(all_glitch_ids)}")
print(f"True Positives: {overall_tp}")
print(f"False Positives: {overall_fp}")
print(f"False Negatives: {overall_fn}")
print(f"Precision: {overall_precision:.3f}")
print(f"Recall: {overall_recall:.3f}")
print(f"F1-Score: {overall_f1:.3f}")

# Show examples
print(f"\nExamples of True Positives (first 3):")
if overall_tp > 0:
    correct_detections = list(all_oracle_ids.intersection(all_glitch_ids))[:3]
    for i, detection_id in enumerate(correct_detections):
        oracle_info = oracle_security_smells[oracle_security_smells['ID'] == detection_id].iloc[0]
        print(f"{i+1}. {oracle_info['PATH']}:{oracle_info['LINE']} - {oracle_info['CATEGORY']}")

print(f"\nExamples of False Negatives (first 3):")
if overall_fn > 0:
    missed_detections = list(all_oracle_ids - all_glitch_ids)[:3]
    for i, detection_id in enumerate(missed_detections):
        oracle_info = oracle_security_smells[oracle_security_smells['ID'] == detection_id].iloc[0]
        print(f"{i+1}. {oracle_info['PATH']}:{oracle_info['LINE']} - {oracle_info['CATEGORY']}")


Overall Performance Metrics:
Ground Truth Instances: 18
GLITCH Detections: 58
True Positives: 14
False Positives: 44
False Negatives: 4
Precision: 0.241
Recall: 0.778
F1-Score: 0.368

Examples of True Positives (first 3):
1. cookbooks_ic-pig-cookbooks-zabbix-attributes-default.rb:52 - Hard-coded secret
2. aws_aws-parallelcluster-cookbook-cookbooks-aws-parallelcluster-slurm-recipes-install_slurm.rb:93 - Suspicious comment
3. cookbooks_ic-pig-cookbooks-zabbix-attributes-default.rb:34 - Hard-coded secret

Examples of False Negatives (first 3):
1. DrRayWang_Chef-Base-cookbooks-bcpc-recipes-mysql.rb:24 - Hard-coded secret
2. chef_mixlib-install-acceptance-windows-server-2012r2-fips-.acceptance-acceptance-cookbook-recipes-verify.rb:12 - Hard-coded secret
3. DrRayWang_Chef-Base-cookbooks-bcpc-recipes-mysql.rb:22 - Hard-coded secret
