# 🔒 Security Scanning Demo for Cataract-LMM Project

This notebook demonstrates security scanning techniques for Python projects, specifically focusing on the Cataract-LMM codebase. We'll use industry-standard tools like Bandit for static analysis and pip-audit for dependency vulnerability scanning.

## Overview
- **Bandit**: Static security analysis tool for Python code
- **pip-audit**: Tool for scanning Python packages for known vulnerabilities
- **SARIF**: Standard format for static analysis results interchange

Let's set up and run comprehensive security scans to identify potential vulnerabilities.

## 1. Install Security Scanning Tools

First, we need to install the necessary security scanning tools. These tools will help us identify security vulnerabilities in our code and dependencies.

In [None]:
# Install security scanning tools
import subprocess
import sys
import os
import json
import pandas as pd
from pathlib import Path


def run_command(cmd, description=""):
    """Execute a shell command and return the result"""
    print(f"🔧 {description}")
    print(f"Running: {cmd}")
    try:
        result = subprocess.run(cmd, shell=True, capture_output=True, text=True)
        if result.returncode == 0:
            print("✅ Success!")
            return result.stdout
        else:
            print(f"❌ Error: {result.stderr}")
            return None
    except Exception as e:
        print(f"❌ Exception: {e}")
        return None


# Install Bandit for static security analysis
print("📦 Installing Security Tools...")
run_command("pip install bandit[sarif] pip-audit", "Installing Bandit and pip-audit")

# Verify installations
print("\n🔍 Verifying installations...")
run_command("bandit --version", "Checking Bandit version")
run_command("pip-audit --version", "Checking pip-audit version")

## 2. Run Bandit Static Analysis

Now we'll run Bandit to perform static analysis on our Python codebase. Bandit scans Python code for common security issues and vulnerabilities.

In [None]:
# Set up paths for the Cataract-LMM project
project_root = Path("/workspaces/Cataract_LMM")
codes_dir = project_root / "codes"
reports_dir = codes_dir / "security_reports"

# Create reports directory if it doesn't exist
reports_dir.mkdir(exist_ok=True)

print(f"📁 Project root: {project_root}")
print(f"📁 Codes directory: {codes_dir}")
print(f"📁 Reports directory: {reports_dir}")

# Run Bandit static security analysis
print("\n🛡️ Running Bandit static security analysis...")

# Bandit JSON output
bandit_json = reports_dir / "bandit_results.json"
bandit_cmd = f"bandit -r {codes_dir} -f json -o {bandit_json} --severity-level medium"
run_command(bandit_cmd, "Running Bandit security scan (JSON output)")

# Bandit SARIF output for GitHub integration
bandit_sarif = reports_dir / "bandit_results.sarif"
sarif_cmd = f"bandit -r {codes_dir} -f sarif -o {bandit_sarif} --severity-level medium"
run_command(sarif_cmd, "Running Bandit security scan (SARIF output)")

# Check if files were created
if bandit_json.exists():
    print(f"✅ Bandit JSON report created: {bandit_json}")
    print(f"📊 File size: {bandit_json.stat().st_size} bytes")
else:
    print("❌ Bandit JSON report not created")

if bandit_sarif.exists():
    print(f"✅ Bandit SARIF report created: {bandit_sarif}")
    print(f"📊 File size: {bandit_sarif.stat().st_size} bytes")
else:
    print("❌ Bandit SARIF report not created")

## 3. Execute pip-audit for Dependency Vulnerabilities

Next, we'll use pip-audit to scan our Python dependencies for known security vulnerabilities.

In [None]:
# Run pip-audit for dependency vulnerability scanning
print("🔍 Running pip-audit for dependency vulnerabilities...")

# Change to codes directory to use the virtual environment
os.chdir(codes_dir)

# Export dependencies from poetry
requirements_file = reports_dir / "requirements_export.txt"
export_cmd = f"poetry export -f requirements.txt -o {requirements_file}"
run_command(export_cmd, "Exporting Poetry dependencies")

# Run pip-audit on exported requirements
pip_audit_json = reports_dir / "pip_audit_results.json"
if requirements_file.exists():
    audit_cmd = (
        f"pip-audit -r {requirements_file} --format=json --output={pip_audit_json}"
    )
    run_command(audit_cmd, "Running pip-audit dependency scan")

    if pip_audit_json.exists():
        print(f"✅ pip-audit report created: {pip_audit_json}")
        print(f"📊 File size: {pip_audit_json.stat().st_size} bytes")
    else:
        print("❌ pip-audit report not created")
else:
    print("❌ Requirements file not found, skipping pip-audit")

# Also run pip-audit on currently installed packages
pip_audit_installed = reports_dir / "pip_audit_installed.json"
installed_cmd = f"pip-audit --format=json --output={pip_audit_installed}"
run_command(installed_cmd, "Running pip-audit on installed packages")

## 4. Parse and Display Security Results

Let's load and analyze the security scan results, displaying them in a readable format.

In [None]:
# Parse Bandit results
def parse_bandit_results(json_file):
    """Parse Bandit JSON results into a DataFrame"""
    if not json_file.exists():
        print(f"❌ File not found: {json_file}")
        return pd.DataFrame()

    try:
        with open(json_file, "r") as f:
            data = json.load(f)

        if "results" not in data:
            print("❌ No results found in Bandit JSON")
            return pd.DataFrame()

        results = []
        for issue in data["results"]:
            results.append(
                {
                    "filename": issue.get("filename", ""),
                    "line_number": issue.get("line_number", 0),
                    "test_id": issue.get("test_id", ""),
                    "test_name": issue.get("test_name", ""),
                    "issue_severity": issue.get("issue_severity", ""),
                    "issue_confidence": issue.get("issue_confidence", ""),
                    "issue_text": issue.get("issue_text", ""),
                    "more_info": issue.get("more_info", ""),
                }
            )

        return pd.DataFrame(results)

    except Exception as e:
        print(f"❌ Error parsing Bandit results: {e}")
        return pd.DataFrame()


# Parse pip-audit results
def parse_pip_audit_results(json_file):
    """Parse pip-audit JSON results into a DataFrame"""
    if not json_file.exists():
        print(f"❌ File not found: {json_file}")
        return pd.DataFrame()

    try:
        with open(json_file, "r") as f:
            data = json.load(f)

        results = []
        for vuln in data:
            results.append(
                {
                    "package": vuln.get("package", ""),
                    "installed_version": vuln.get("installed_version", ""),
                    "vulnerability_id": vuln.get("vulnerability_id", ""),
                    "vulnerability_description": vuln.get(
                        "vulnerability_description", ""
                    ),
                    "fix_versions": ", ".join(vuln.get("fix_versions", [])),
                }
            )

        return pd.DataFrame(results)

    except Exception as e:
        print(f"❌ Error parsing pip-audit results: {e}")
        return pd.DataFrame()


# Load and display Bandit results
print("📊 Parsing Bandit Security Results...")
bandit_df = parse_bandit_results(bandit_json)

if not bandit_df.empty:
    print(f"🔍 Found {len(bandit_df)} security issues")
    print("\n📈 Security Issues by Severity:")
    severity_counts = bandit_df["issue_severity"].value_counts()
    print(severity_counts)

    print("\n📋 Top 10 Security Issues:")
    display_cols = [
        "filename",
        "line_number",
        "test_id",
        "issue_severity",
        "issue_text",
    ]
    print(bandit_df[display_cols].head(10).to_string(index=False, max_colwidth=50))
else:
    print("✅ No security issues found by Bandit")

# Load and display pip-audit results
print("\n📊 Parsing pip-audit Dependency Results...")
pip_audit_df = parse_pip_audit_results(pip_audit_json)

if not pip_audit_df.empty:
    print(f"🔍 Found {len(pip_audit_df)} vulnerable dependencies")
    print("\n📋 Vulnerable Dependencies:")
    display_cols = [
        "package",
        "installed_version",
        "vulnerability_id",
        "vulnerability_description",
    ]
    print(pip_audit_df[display_cols].head(10).to_string(index=False, max_colwidth=60))
else:
    print("✅ No vulnerable dependencies found")

## 5. Generate Security Report Summary

Finally, let's create a comprehensive security report with vulnerability counts, severity breakdowns, and recommendations.

In [None]:
# Generate comprehensive security report
def generate_security_report(bandit_df, pip_audit_df):
    """Generate a comprehensive security report"""
    report = []
    report.append("🔒 CATARACT-LMM SECURITY SCAN REPORT")
    report.append("=" * 50)

    # Bandit Analysis Summary
    report.append("\n🛡️ STATIC CODE ANALYSIS (Bandit)")
    report.append("-" * 30)

    if not bandit_df.empty:
        total_issues = len(bandit_df)
        report.append(f"Total Issues Found: {total_issues}")

        # Severity breakdown
        severity_counts = bandit_df["issue_severity"].value_counts()
        for severity, count in severity_counts.items():
            report.append(f"  {severity}: {count}")

        # Top issue types
        report.append("\nMost Common Issue Types:")
        test_id_counts = bandit_df["test_id"].value_counts().head(5)
        for test_id, count in test_id_counts.items():
            report.append(f"  {test_id}: {count}")

        # Critical files
        report.append("\nFiles with Most Issues:")
        file_counts = bandit_df["filename"].value_counts().head(5)
        for filename, count in file_counts.items():
            short_name = filename.split("/")[-1] if "/" in filename else filename
            report.append(f"  {short_name}: {count}")
    else:
        report.append("✅ No static security issues found")

    # Dependency Analysis Summary
    report.append("\n🔍 DEPENDENCY VULNERABILITY ANALYSIS (pip-audit)")
    report.append("-" * 45)

    if not pip_audit_df.empty:
        total_vulns = len(pip_audit_df)
        report.append(f"Total Vulnerable Dependencies: {total_vulns}")

        # Vulnerable packages
        report.append("\nVulnerable Packages:")
        for _, row in pip_audit_df.head(10).iterrows():
            report.append(
                f"  {row['package']} ({row['installed_version']}): {row['vulnerability_id']}"
            )
    else:
        report.append("✅ No vulnerable dependencies found")

    # Recommendations
    report.append("\n💡 RECOMMENDATIONS")
    report.append("-" * 20)

    if not bandit_df.empty:
        high_severity = bandit_df[bandit_df["issue_severity"] == "HIGH"]
        if not high_severity.empty:
            report.append(
                "🚨 URGENT: Address HIGH severity security issues immediately"
            )

        medium_severity = bandit_df[bandit_df["issue_severity"] == "MEDIUM"]
        if not medium_severity.empty:
            report.append("⚠️  Review MEDIUM severity issues and implement fixes")

    if not pip_audit_df.empty:
        report.append("📦 Update vulnerable dependencies to secure versions")
        report.append("🔄 Run regular dependency scans in CI/CD pipeline")

    if bandit_df.empty and pip_audit_df.empty:
        report.append("🎉 Excellent! No security issues detected")
        report.append(
            "🔄 Continue regular security scanning as part of development process"
        )

    # CI/CD Integration
    report.append("\n🚀 CI/CD INTEGRATION STATUS")
    report.append("-" * 25)
    report.append("✅ Bandit configured in GitHub Actions")
    report.append("✅ pip-audit configured for dependency scanning")
    report.append("✅ SARIF output enabled for GitHub Security tab")
    report.append("✅ Security reports stored as CI/CD artifacts")

    return "\n".join(report)


# Generate and display the report
security_report = generate_security_report(bandit_df, pip_audit_df)
print(security_report)

# Save the report to file
report_file = reports_dir / "security_scan_summary.txt"
with open(report_file, "w") as f:
    f.write(security_report)

print(f"\n📄 Security report saved to: {report_file}")

# Summary statistics
print(f"\n📊 SCAN SUMMARY:")
print(f"Static Analysis Issues: {len(bandit_df)}")
print(f"Vulnerable Dependencies: {len(pip_audit_df)}")
print(f"Reports Directory: {reports_dir}")
print(f"Total Report Files: {len(list(reports_dir.glob('*')))}")

## 🎯 Next Steps

Based on the security scan results:

1. **High Priority Issues**: Address any HIGH severity security issues immediately
2. **Dependency Updates**: Update vulnerable packages to their secure versions
3. **Code Review**: Review MEDIUM severity issues and implement appropriate fixes
4. **CI/CD Integration**: Ensure security scans are running successfully in the pipeline
5. **Regular Monitoring**: Set up automated security scanning on every commit

## 📚 Resources

- [Bandit Documentation](https://bandit.readthedocs.io/)
- [pip-audit Documentation](https://pypi.org/project/pip-audit/)
- [GitHub Security Advisories](https://docs.github.com/en/code-security)
- [OWASP Python Security](https://owasp.org/www-project-python-security/)