# Complete Fuzzing Pipeline Analysis

Comprehensive analysis from patch arrival to test execution.

## Pipeline Overview:

### STATIC ANALYSIS (Host)
1. Patch Loading
2. Repository Setup  
3. Static Analysis (Pylint, Flake8, Radon, Mypy, Bandit)

### DYNAMIC ANALYSIS (Container)
4. Build Singularity Container
5. Install Dependencies
6. Run Existing Tests
7. Patch Analysis
8. Generate Hypothesis Tests
9. Execute Tests
10. Coverage Analysis
11. Final Verdict

## Imports

In [4]:
import sys, subprocess
from pathlib import Path
import json, time, ast
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

PROJECT_ROOT = Path.cwd()
sys.path.append(str(PROJECT_ROOT))

from swebench_integration import DatasetLoader, PatchLoader
from verifier.dynamic_analyzers.patch_analyzer import PatchAnalyzer
from verifier.dynamic_analyzers.test_generator import HypothesisTestGenerator
from verifier.dynamic_analyzers.singularity_executor import SingularityTestExecutor
from verifier.dynamic_analyzers.coverage_analyzer import CoverageAnalyzer
from verifier.dynamic_analyzers.test_patch_singularity import build_singularity_image, install_package_in_singularity, run_tests_in_singularity
import streamlit.modules.static_eval.static_modules.code_quality as code_quality
import streamlit.modules.static_eval.static_modules.syntax_structure as syntax_structure
from verifier.utils.diff_utils import parse_unified_diff, filter_paths_to_py

sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)
print("‚úì Imports OK")

‚úì Imports OK


---
# STATIC ANALYSIS
---

## Stage 1: Load Patch

In [5]:
REPO_FILTER = "scikit-learn/scikit-learn" # Example: pytest-dev/pytest, pylint-dev/pylint

loader = DatasetLoader("princeton-nlp/SWE-bench_Verified", hf_mode=True, split="test")
sample = next(loader.iter_samples(limit=1, filter_repo=REPO_FILTER), None)

if sample:
    print(f"‚úì {sample.get('metadata', {}).get('instance_id', 'unknown')}")
    print(f"  Repo: {sample['repo']}")
    print(f"\nPatch preview:\n{sample['patch'][:400]}...")
else:
    raise Exception(f"No sample found for {REPO_FILTER}")

‚úì scikit-learn__scikit-learn-10297
  Repo: scikit-learn/scikit-learn

Patch preview:
diff --git a/sklearn/linear_model/ridge.py b/sklearn/linear_model/ridge.py
--- a/sklearn/linear_model/ridge.py
+++ b/sklearn/linear_model/ridge.py
@@ -1212,18 +1212,18 @@ class RidgeCV(_BaseRidgeCV, RegressorMixin):
 
     store_cv_values : boolean, default=False
         Flag indicating if the cross-validation values corresponding to
-        each alpha should be stored in the `cv_values_` attrib...


## Stage 2: Setup Repository


In [6]:
patcher = PatchLoader(sample=sample, repos_root="./repos_temp")
repo_path = patcher.clone_repository()
patch_result = patcher.apply_patch()

print(f"‚úì Repo: {repo_path}")
print(f"‚úì Patch: {'Applied' if patch_result['applied'] else 'FAILED'}")

[+] Cloning scikit-learn/scikit-learn into /fs/nexus-scratch/ihbas/verifier_harness/repos_temp/scikit-learn__scikit-learn ...
‚úì Repo: /fs/nexus-scratch/ihbas/verifier_harness/repos_temp/scikit-learn__scikit-learn
‚úì Patch: Applied


## Stage 2b: Apply Test Patch (if exists)

In [7]:
# In SWE-bench, test_patch contains additional tests needed to validate the fix
# These tests (like test_clear_for_call_stage) don't exist until we apply the test_patch
test_patch = sample.get('metadata', {}).get('test_patch', '')

if test_patch and test_patch.strip():
    print("üìù Applying test_patch...")
    try:
        test_patch_result = patcher.apply_additional_patch(test_patch)
        print(f"‚úì Test patch applied: {test_patch_result.get('log', 'success')}")
    except Exception as e:
        print(f"‚ö†Ô∏è Test patch application failed: {e}")
else:
    print("‚ÑπÔ∏è  No test_patch in metadata (tests already exist in repo)")

üìù Applying test_patch...
‚úì Test patch applied: Additional patch applied successfully.


## Stage 3: Static Analysis

In [8]:
config = {
    'checks': {'pylint': True, 'flake8': True, 'radon': True, 'mypy': True, 'bandit': True},
    'weights': {'pylint': 0.5, 'flake8': 0.15, 'radon': 0.25, 'mypy': 0.05, 'bandit': 0.05}
}

print("üîç Static analysis...")
cq_results = code_quality.analyze(str(repo_path), sample['patch'], config)
ss_results = syntax_structure.run_syntax_structure_analysis(str(repo_path), sample['patch'])

sqi_data = cq_results.get('sqi', {})
print(f"‚úì SQI: {sqi_data.get('SQI', 0)}/100 ({sqi_data.get('classification', 'Unknown')})")

üîç Static analysis...
‚úì SQI: 61.54/100 (Fair)


---
# DYNAMIC ANALYSIS (Container)
---

## Stage 4: Build Container

In [9]:
CONTAINER_IMAGE_PATH = "/fs/nexus-scratch/ihbas/.containers/singularity/verifier-swebench.sif"
PYTHON_VERSION = "3.11"

print("üê≥ Building container...")
image_path = build_singularity_image(CONTAINER_IMAGE_PATH, PYTHON_VERSION, force_rebuild=False)

result = subprocess.run(
    ["singularity", "exec", str(image_path), "python", "--version"],
    capture_output=True, text=True
)
print(f"‚úì Container: {image_path}")
print(f"  {result.stdout.strip()}")

üê≥ Building container...
‚úÖ Singularity image already exists: /fs/nexus-scratch/ihbas/.containers/singularity/verifier-swebench.sif
‚úì Container: /fs/nexus-scratch/ihbas/.containers/singularity/verifier-swebench.sif
  Python 3.11.14


## Stage 5: Install Dependencies

In [10]:
print("üì¶ Installing dependencies...")

install_result = install_package_in_singularity(
    repo_path=Path(repo_path),
    image_path=CONTAINER_IMAGE_PATH
)

if install_result.get("installed"):
    print("‚úì Dependencies installed")
elif install_result.get("returncode") != 0:
    print(f"‚ö†Ô∏è Install issues (code {install_result.get('returncode')})")
    print(install_result.get('stderr', '')[-500:])
else:
    print("‚ö†Ô∏è No setup.py/pyproject.toml")

üì¶ Installing dependencies...
üì¶ Installing package and dependencies in: /fs/nexus-scratch/ihbas/verifier_harness/repos_temp/scikit-learn__scikit-learn
   Setup files found: setup.py=True, pyproject.toml=False, setup.cfg=True
   Attempting editable install...
   Installing build dependencies...
   Configuring git for build...
   Fetching git tags for version detection...
‚ÑπÔ∏è  Editable install not possible (will use PYTHONPATH mode)
   This is normal for packages with C extensions or complex build requirements
‚ö†Ô∏è No setup.py/pyproject.toml


## Stage 6: Run Existing Tests

In [11]:
print("üß™ Running existing tests...\n")

# Get tests from metadata
fail_to_pass = sample.get('metadata', {}).get('FAIL_TO_PASS', '[]')
pass_to_pass = sample.get('metadata', {}).get('PASS_TO_PASS', '[]')

print(f"  FAIL_TO_PASS: {fail_to_pass}")
print(f"  PASS_TO_PASS: {pass_to_pass}\n")

# Parse test lists
try:
    f2p = ast.literal_eval(fail_to_pass) if isinstance(fail_to_pass, str) else fail_to_pass
    p2p = ast.literal_eval(pass_to_pass) if isinstance(pass_to_pass, str) else pass_to_pass
except:
    f2p, p2p = [], []

all_tests = f2p + p2p

# Use the proper function from test_patch_singularity
test_result = run_tests_in_singularity(
    repo_path=Path(repo_path),
    tests=all_tests,
    image_path=CONTAINER_IMAGE_PATH
)

print(f"Exit: {test_result['returncode']}")
print((test_result['stdout'] + test_result['stderr'])[-1500:])
print(f"\n{'‚úì' if test_result['returncode'] == 0 else '‚ö†Ô∏è'} Tests {'passed' if test_result['returncode'] == 0 else 'had issues'}")

üß™ Running existing tests...

  FAIL_TO_PASS: ["sklearn/linear_model/tests/test_ridge.py::test_ridge_classifier_cv_store_cv_values"]
  PASS_TO_PASS: ["sklearn/linear_model/tests/test_ridge.py::test_ridge", "sklearn/linear_model/tests/test_ridge.py::test_primal_dual_relationship", "sklearn/linear_model/tests/test_ridge.py::test_ridge_singular", "sklearn/linear_model/tests/test_ridge.py::test_ridge_regression_sample_weights", "sklearn/linear_model/tests/test_ridge.py::test_ridge_sample_weights", "sklearn/linear_model/tests/test_ridge.py::test_ridge_shapes", "sklearn/linear_model/tests/test_ridge.py::test_ridge_intercept", "sklearn/linear_model/tests/test_ridge.py::test_toy_ridge_object", "sklearn/linear_model/tests/test_ridge.py::test_ridge_vs_lstsq", "sklearn/linear_model/tests/test_ridge.py::test_ridge_individual_penalties", "sklearn/linear_model/tests/test_ridge.py::test_ridge_cv_sparse_svd", "sklearn/linear_model/tests/test_ridge.py::test_ridge_sparse_svd", "sklearn/linear_model/te

## Stage 7: Analyze Patch

In [12]:
print("üîç Analyzing patch...")

patch_analyzer = PatchAnalyzer()
modified_files = filter_paths_to_py(list(parse_unified_diff(sample['patch']).keys()))

if modified_files:
    first_file_path = modified_files[0]  # e.g., "src/_pytest/logging.py"
    first_file = Path(repo_path) / first_file_path
    patched_code = first_file.read_text(encoding='utf-8')
    
    # Pass file_path to parse_patch for proper module detection
    patch_analysis = patch_analyzer.parse_patch(sample['patch'], patched_code, file_path=first_file_path)
    
    print(f"‚úì Files: {len(modified_files)}")
    print(f"  Module: {patch_analysis.module_path}")
    print(f"  Functions: {patch_analysis.changed_functions}")
    if patch_analysis.class_context:
        print(f"  Classes: {list(patch_analysis.class_context.values())}")
    print(f"  Lines: {len(patch_analysis.all_changed_lines)}")
else:
    patch_analysis = None
    patched_code = None

üîç Analyzing patch...
‚úì Files: 1
  Module: sklearn.linear_model.ridge
  Functions: ['__init__']
  Classes: ['RidgeClassifierCV']
  Lines: 20


## Stage 8: Generate Tests

In [13]:
if patch_analysis and patch_analysis.changed_functions:
    print("üß¨ Generating tests...")
    
    test_generator = HypothesisTestGenerator()
    test_code = test_generator.generate_tests(patch_analysis, patched_code)
    test_count = test_code.count('def test_')
    
    print(f"‚úì Generated {test_count} tests")
    print(f"\n{test_code[:600]}...")
else:
    test_code = None
    test_count = 0

üß¨ Generating tests...
‚úì Generated 1 tests

# Auto-generated change-aware fuzzing tests for patch validation
import pytest
from hypothesis import given, strategies as st, settings
from hypothesis import assume
import sys
from pathlib import Path

# Import from patched module: sklearn.linear_model.ridge
from sklearn.linear_model.ridge import RidgeClassifierCV

def test___init___exists():
    """Verify RidgeClassifierCV.__init__ exists and is callable"""
    assert hasattr(RidgeClassifierCV, '__init__'), 'RidgeClassifierCV should have __init__ method'
    # Note: Full property-based testing of methods requires instance creation
    # whic...


## Stage 9: Execute Tests

In [14]:
if test_code:
    print("üê≥ Executing change-aware fuzzing tests...\n")
    
    executor = SingularityTestExecutor(CONTAINER_IMAGE_PATH, timeout=120)
    start = time.time()
    
    try:
        # Pass module_name from patch_analysis for proper coverage tracking
        module_name = patch_analysis.module_path if patch_analysis else None
        
        success, output, coverage_data = executor.run_tests_with_existing_infrastructure(
            Path(repo_path), 
            test_code,
            module_name=module_name
        )
        
        elapsed = time.time() - start
        print(f"{'‚úì PASSED' if success else '‚ùå FAILED'} ({elapsed:.1f}s)\n")
        
        # Show full output to see actual test results
        print("=== FULL OUTPUT ===")
        print(output)
        print("=== END OUTPUT ===\n")
        
        # Also show just the test summary
        if "passed" in output or "PASSED" in output:
            lines = output.split('\n')
            for i, line in enumerate(lines):
                if 'test_' in line or 'passed' in line.lower() or 'failed' in line.lower():
                    print(line)
        
    except Exception as e:
        print(f"‚ùå Error: {e}")
        import traceback
        traceback.print_exc()
        success = False
        coverage_data = {}
else:
    success = True
    coverage_data = {}

üê≥ Executing change-aware fuzzing tests...



‚ùå FAILED (6.8s)

=== FULL OUTPUT ===
platform linux -- Python 3.11.14, pytest-9.0.1, pluggy-1.6.0 -- /usr/local/bin/python3.11
cachedir: .pytest_cache
hypothesis profile 'default'
rootdir: /workspace
configfile: setup.cfg
plugins: hypothesis-6.148.1, cov-7.0.0, timeout-2.4.0, xdist-3.8.0
timeout: 120.0s
timeout method: signal
timeout func_only: False
[1mcollecting ... [0mcollected 0 items / 2 errors

[31m[1m__________________ ERROR collecting test_fuzzing_generated.py __________________[0m
[31mImportError while importing test module '/workspace/test_fuzzing_generated.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
[1m[31msklearn/__check_build/__init__.py[0m:44: in <module>
    [0m[94mfrom[39;49;00m[90m [39;49;00m[04m[96m.[39;49;00m[04m[96m_check_build[39;49;00m[90m [39;49;00m[94mimport[39;49;00m check_build  [90m# noqa[39;49;00m[90m[39;49;00m
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^[90m[39;49;00m
[1m[31mE   ModuleNo

## Stage 10: Coverage

In [15]:
if patch_analysis and coverage_data:
    # Check if coverage was intentionally skipped
    if coverage_data.get('_coverage_skipped'):
        print(f"‚ÑπÔ∏è  Coverage skipped: {coverage_data.get('_skip_reason', 'N/A')}")
        coverage_result = {'overall_coverage': None, 'total_changed_lines': 0, 'total_covered_lines': 0, 'skipped': True}
    else:
        coverage_analyzer = CoverageAnalyzer()
        coverage_result = coverage_analyzer.calculate_changed_line_coverage(
            coverage_data, patch_analysis.changed_lines, patch_analysis.all_changed_lines
        )
        print(f"üìä Coverage: {coverage_result['overall_coverage']:.1%}")
        print(f"   {coverage_result['total_covered_lines']}/{coverage_result['total_changed_lines']} lines")
else:
    coverage_result = {'overall_coverage': 0.0, 'total_changed_lines': 0, 'total_covered_lines': 0}

## Stage 11: Verdict

In [16]:
sqi_score = sqi_data.get('SQI', 0) / 100.0
coverage_score = coverage_result.get('overall_coverage', 0.0)
coverage_skipped = coverage_result.get('skipped', False)

if sqi_score < 0.5:
    verdict = 'REJECT'
    reason = f'Poor SQI ({sqi_score:.2f})'
elif not success:
    verdict = 'REJECT'
    reason = 'Tests failed'
elif coverage_skipped:
    verdict = 'ACCEPT' if test_count > 0 else 'WARNING'
    reason = 'Coverage N/A (pytest internal module)' if test_count > 0 else 'No tests generated'
elif coverage_score is not None and coverage_score < 0.5:
    verdict = 'WARNING'
    reason = f'Low coverage ({coverage_score:.1%})'
else:
    verdict = 'ACCEPT'
    reason = 'All checks passed'

print("\n" + "="*80)
print("VERDICT")
print("="*80)
print(f"{verdict}: {reason}")
if coverage_score is not None:
    print(f"\nSQI: {sqi_score:.2%} | Tests: {test_count} | Coverage: {coverage_score:.1%}")
else:
    print(f"\nSQI: {sqi_score:.2%} | Tests: {test_count} | Coverage: N/A (pytest internal)")
print("="*80)


VERDICT
REJECT: Tests failed

SQI: 61.54% | Tests: 1 | Coverage: 0.0%
