# L4-06: Supply Chain Security Evaluation

This notebook analyzes Software Bill of Materials (SBOM) files for our ML models to evaluate supply chain security posture. We're using the CycloneDX format to assess dependency coverage, version fidelity, integrity, and traceability.

**What we're evaluating:**
- Software dependencies and model artifacts
- Version pinning and reproducibility
- Component provenance and integrity
- Dependency graph completeness

In [None]:
# Import all the dependencies we need
from __future__ import annotations
from dataclasses import dataclass, asdict
from pathlib import Path
from typing import Dict, Any, List, Optional, Tuple
import json
import statistics
from datetime import datetime

## Data Models

First, let's define the data structures to hold our metrics. Each category has specific measurements we track.

In [None]:
# Dependency Coverage - tracks what dependencies are documented
@dataclass
class DependencyCoverageMetrics:
    software_dependency_coverage: float   # 0–1, how many software deps are listed
    model_artifact_coverage: float        # 0–1, how many ML models are tracked
    dataset_coverage: float               # 0–1, datasets used
    plugin_tool_coverage: float           # 0–1, plugins/tools coverage

# Version Fidelity - are versions pinned properly?
@dataclass
class VersionFidelityMetrics:
    version_specificity: float            # % of components with pinned versions
    hash_completeness: float              # % of components with integrity hashes
    env_reproducibility: float            # composite score for environment reproducibility

# Integrity & Provenance - can we trust the components?
@dataclass
class IntegrityProvenanceMetrics:
    provenance_completeness: float        # do we know where components came from?
    signature_validation_rate: float      # how many are cryptographically verified?
    unauthorized_component_count: int     # red flag: unknown suppliers

# Impact Traceability - understanding dependency relationships
@dataclass
class ImpactTraceabilityMetrics:
    dep_graph_completeness: float         # is the dependency graph complete?
    impact_analysis_time_minutes: Optional[float]  # how long to analyze impact
    avg_blast_radius: float               # average downstream dependents per component

# Top-level metrics container
@dataclass
class SupplyChainL4Metrics:
    dependency_coverage: DependencyCoverageMetrics
    version_fidelity: VersionFidelityMetrics
    integrity_provenance: IntegrityProvenanceMetrics
    impact_traceability: ImpactTraceabilityMetrics

## SBOM Loading Utilities

Helper functions to load and parse CycloneDX SBOM files

In [None]:
def load_cyclonedx_sbom(path: Path) -> Dict[str, Any]:
    """Load a CycloneDX SBOM JSON file"""
    with path.open("r", encoding="utf-8") as f:
        return json.load(f)

def get_components(sbom: Dict[str, Any]) -> List[Dict[str, Any]]:
    """Extract components list from SBOM"""
    return sbom.get("components", [])

def get_dependencies_map(sbom: Dict[str, Any]) -> Dict[str, List[str]]:
    """
    Build a dependency map from the SBOM.
    Returns mapping of bom-ref -> [list of dependent bom-refs]
    """
    dep_map: Dict[str, List[str]] = {}
    for dep in sbom.get("dependencies", []):
        ref = dep.get("ref")
        depends_on = dep.get("dependsOn", []) or []
        dep_map[ref] = depends_on
    return dep_map

## 1. Dependency Coverage Analysis

This checks how comprehensively we've documented our dependencies. We look for:
- Software libraries and frameworks
- ML model artifacts
- Datasets
- Plugins and tools

In [None]:
def compute_dependency_coverage(
    sbom: Dict[str, Any],
    expected_models: Optional[List[str]] = None,
    expected_datasets: Optional[List[str]] = None,
    expected_plugins: Optional[List[str]] = None,
    expected_software_components: Optional[List[str]] = None,
) -> DependencyCoverageMetrics:
    """
    Calculate how well our SBOM covers different dependency types.
    If you don't provide expected lists, we assume coverage is good if we find at least one.
    """
    components = get_components(sbom)

    # Categorize components by type
    software = [c for c in components if c.get("type") in ("library", "framework", "application")]
    
    # Models might be tagged differently, so we check both type and properties
    models = [c for c in components if "model" in (c.get("type") or "").lower()
              or "ml_model" in "".join(p.get("value","") for p in c.get("properties", []))]
    
    datasets = [c for c in components if "dataset" in (c.get("type") or "").lower()
                or "dataset" in (c.get("name") or "").lower()]
    
    plugins = [c for c in components if "plugin" in (c.get("type") or "").lower()
               or "connector" in (c.get("name") or "").lower()]

    def coverage(actual_ids: List[str], expected_ids: Optional[List[str]]) -> float:
        """Calculate coverage percentage"""
        if expected_ids is None:
            return 1.0 if actual_ids else 0.0  # Default to full coverage if anything found
        if not expected_ids:
            return 1.0

        covered = 0
        for exp in expected_ids:
            # Fuzzy match - case insensitive substring
            if any(exp.lower() in (a.lower()) for a in actual_ids):
                covered += 1
        return covered / len(expected_ids)

    # Calculate coverage for each category
    software_cov = coverage([c.get("name", "") for c in software],
                            expected_software_components)
    model_cov = coverage([c.get("name", "") for c in models],
                         expected_models)
    dataset_cov = coverage([c.get("name", "") for c in datasets],
                           expected_datasets)
    plugin_cov = coverage([c.get("name", "") for c in plugins],
                          expected_plugins)

    return DependencyCoverageMetrics(
        software_dependency_coverage=software_cov,
        model_artifact_coverage=model_cov,
        dataset_coverage=dataset_cov,
        plugin_tool_coverage=plugin_cov,
    )

## 2. Version Fidelity Analysis

Checks if versions are properly pinned (not using ranges like ~1.0 or ^2.0) and if integrity hashes are present. This is crucial for reproducible builds.

In [None]:
def is_pinned_version(version: str) -> bool:
    """
    Check if version is properly pinned or uses a range.
    Returns False if it looks like semver range (^, ~, >, <, *, x)
    """
    if not version:
        return False
    # These chars indicate version ranges
    bad_chars = ["^", "~", ">", "<", "*", "x", "X"]
    return not any(ch in version for ch in bad_chars)

def compute_version_fidelity(sbom: Dict[str, Any]) -> VersionFidelityMetrics:
    """Analyze version pinning and hash completeness"""
    components = get_components(sbom)
    if not components:
        return VersionFidelityMetrics(0.0, 0.0, 0.0)

    # Version specificity - are versions pinned?
    with_version = [c for c in components if c.get("version")]
    pinned = [c for c in with_version if is_pinned_version(c.get("version", ""))]
    version_specificity = len(pinned) / len(with_version) if with_version else 0.0

    # Hash completeness - do we have integrity hashes?
    with_hash = [c for c in components if c.get("hashes")]
    hash_completeness = len(with_hash) / len(components)

    # Environment reproducibility = weighted combo of above
    env_repro = 0.5 * version_specificity + 0.5 * hash_completeness

    return VersionFidelityMetrics(
        version_specificity=version_specificity,
        hash_completeness=hash_completeness,
        env_reproducibility=env_repro,
    )

## 3. Integrity & Provenance Analysis

Evaluates whether we know where our components come from and if they can be verified. This includes:
- Supplier information
- Cryptographic signatures (if available)
- Checking for unauthorized components

In [None]:
def compute_integrity_provenance(
    sbom: Dict[str, Any],
    signature_validation: Optional[Dict[str, bool]] = None,
    trusted_suppliers: Optional[List[str]] = None,
) -> IntegrityProvenanceMetrics:
    """
    Analyze component integrity and provenance.
    
    Args:
        signature_validation: mapping of bom-ref -> bool (was signature valid?)
        trusted_suppliers: list of approved supplier names
    """
    components = get_components(sbom)
    if not components:
        return IntegrityProvenanceMetrics(0.0, 0.0, 0)

    # Provenance completeness: do we have supplier info or external references?
    prov_ok = 0
    for c in components:
        supplier = c.get("supplier")
        ext_refs = c.get("externalReferences") or []
        if supplier or ext_refs:
            prov_ok += 1
    provenance_completeness = prov_ok / len(components)

    # Signature validation - if we have external validation data
    if signature_validation:
        vals = list(signature_validation.values())
        sig_rate = sum(1 for v in vals if v) / len(vals) if vals else 0.0
    else:
        sig_rate = 0.0  # No signing = 0% validation

    # Count unauthorized/untrusted components
    unauthorized = 0
    if trusted_suppliers:
        tset = {t.lower() for t in trusted_suppliers}
        for c in components:
            supplier = c.get("supplier") or {}
            # Supplier might be a dict in CycloneDX
            sname = supplier.get("name") if isinstance(supplier, dict) else str(supplier)
            if sname and sname.lower() not in tset:
                unauthorized += 1

    return IntegrityProvenanceMetrics(
        provenance_completeness=provenance_completeness,
        signature_validation_rate=sig_rate,
        unauthorized_component_count=unauthorized,
    )

## 4. Impact Traceability Analysis

Analyzes the dependency graph to understand:
- How complete is our dependency mapping?
- What's the blast radius if a component gets compromised?
- How traceable are dependencies?

In [None]:
def compute_impact_traceability(
    sbom: Dict[str, Any],
    impact_analysis_time_minutes: Optional[float] = None,
) -> ImpactTraceabilityMetrics:
    """
    Calculate dependency graph completeness and blast radius.
    
    - dep_graph_completeness: % of components in the dependency graph
    - avg_blast_radius: average number of dependents (downstream) per component
    """
    components = get_components(sbom)
    bom_refs = {c.get("bom-ref", c.get("name")) for c in components}
    dep_map = get_dependencies_map(sbom)

    if not components:
        return ImpactTraceabilityMetrics(0.0, impact_analysis_time_minutes, 0.0)

    # How many components are represented in the dependency graph?
    represented = sum(1 for ref in bom_refs if ref in dep_map)
    dep_graph_completeness = represented / len(bom_refs)

    # Calculate blast radius - how many things depend on each component?
    reverse_dep_counts = {ref: 0 for ref in bom_refs}
    for ref, deps in dep_map.items():
        for d in deps:
            if d in reverse_dep_counts:
                reverse_dep_counts[d] += 1

    avg_blast_radius = statistics.mean(reverse_dep_counts.values()) if reverse_dep_counts else 0.0

    return ImpactTraceabilityMetrics(
        dep_graph_completeness=dep_graph_completeness,
        impact_analysis_time_minutes=impact_analysis_time_minutes,
        avg_blast_radius=avg_blast_radius,
    )

## Main Computation Function

Ties everything together to compute all L4 supply chain metrics from an SBOM file.

In [None]:
def compute_supply_chain_l4_metrics(
    sbom_path: Path,
    expected_models: Optional[List[str]] = None,
    expected_datasets: Optional[List[str]] = None,
    expected_plugins: Optional[List[str]] = None,
    expected_software: Optional[List[str]] = None,
    signature_validation: Optional[Dict[str, bool]] = None,
    trusted_suppliers: Optional[List[str]] = None,
    vulnerability_records: Optional[List[Dict[str, Any]]] = None,
    latest_version_info: Optional[Dict[str, Dict[str, Any]]] = None,
    impact_analysis_time_minutes: Optional[float] = None,
) -> SupplyChainL4Metrics:
    """Run all the supply chain analyses and return comprehensive metrics"""
    sbom = load_cyclonedx_sbom(sbom_path)

    # Run each analysis
    dep_cov = compute_dependency_coverage(
        sbom,
        expected_models=expected_models,
        expected_datasets=expected_datasets,
        expected_plugins=expected_plugins,
        expected_software_components=expected_software,
    )
    
    ver_fid = compute_version_fidelity(sbom)
    
    integ_prov = compute_integrity_provenance(
        sbom,
        signature_validation=signature_validation,
        trusted_suppliers=trusted_suppliers,
    )
    
    imp_trace = compute_impact_traceability(
        sbom,
        impact_analysis_time_minutes=impact_analysis_time_minutes,
    )

    return SupplyChainL4Metrics(
        dependency_coverage=dep_cov,
        version_fidelity=ver_fid,
        integrity_provenance=integ_prov,
        impact_traceability=imp_trace,
    )

## Scoring System

Now we need to convert our metrics into a 0-100 score. We use thresholds to map percentages to ordinal scores, then combine category scores with weights.

In [None]:
def score_from_percentage(x: float, thresholds: Tuple[float, float, float]) -> int:
    """
    Map a 0-1 value to ordinal score 0-4 based on thresholds.
    Example thresholds: (0.5, 0.75, 0.9) = low, mid, high
    """
    if x is None:
        return 0
    if x < thresholds[0]:
        return 1  # Poor
    if x < thresholds[1]:
        return 2  # Fair
    if x < thresholds[2]:
        return 3  # Good
    return 4  # Excellent

def score_supply_chain_l4(metrics: SupplyChainL4Metrics) -> Dict[str, Any]:
    """
    Calculate final scores from raw metrics.
    
    Returns:
        - overall_score: 0-100
        - category_scores: dict of 0-100 scores per category
        - raw_metrics: the underlying measurements
    """
    # Thresholds for scoring - adjust based on your org's standards
    pct_thresh = (0.5, 0.75, 0.9)  # low, ok, strong
    
    # Dependency Coverage score
    dep_vals = [
        metrics.dependency_coverage.software_dependency_coverage,
        metrics.dependency_coverage.model_artifact_coverage,
        metrics.dependency_coverage.dataset_coverage,
        metrics.dependency_coverage.plugin_tool_coverage,
    ]
    dep_scores = [score_from_percentage(v, pct_thresh) for v in dep_vals]
    dep_score = sum(dep_scores) / (4 * 4) * 100  # normalize to 0-100

    # Version Fidelity score
    vf_vals = [
        metrics.version_fidelity.version_specificity,
        metrics.version_fidelity.hash_completeness,
        metrics.version_fidelity.env_reproducibility,
    ]
    vf_scores = [score_from_percentage(v, pct_thresh) for v in vf_vals]
    vf_score = sum(vf_scores) / (3 * 4) * 100

    # Integrity & Provenance score
    ip_vals = [
        metrics.integrity_provenance.provenance_completeness,
        metrics.integrity_provenance.signature_validation_rate,
    ]
    ip_scores = [score_from_percentage(v, pct_thresh) for v in ip_vals]
    ip_base = sum(ip_scores) / (2 * 4) * 100
    # Penalty: subtract 5 points for each unauthorized component
    ip_score = max(0.0, ip_base - 5.0 * metrics.integrity_provenance.unauthorized_component_count)

    # Impact Traceability score
    it_vals = [
        metrics.impact_traceability.dep_graph_completeness,
    ]
    it_scores = [score_from_percentage(v, pct_thresh) for v in it_vals]
    it_score = sum(it_scores) / (1 * 4) * 100

    category_scores = {
        "dependency_coverage": dep_score,
        "version_fidelity": vf_score,
        "integrity_provenance": ip_score,
        "impact_traceability": it_score,
    }
    
    # Weighted overall score
    # These weights represent relative importance - adjust as needed
    overall = (category_scores['dependency_coverage'] * 0.35) + \
              (category_scores['version_fidelity'] * 0.30) + \
              (category_scores['integrity_provenance'] * 0.25) + \
              (category_scores['impact_traceability'] * 0.10)

    return {
        "overall_score": overall,
        "category_scores": category_scores,
        "raw_metrics": {
            "dependency_coverage": asdict(metrics.dependency_coverage),
            "version_fidelity": asdict(metrics.version_fidelity),
            "integrity_provenance": asdict(metrics.integrity_provenance),
            "impact_traceability": asdict(metrics.impact_traceability),
        },
    }

## Evaluation: DeepSeek Model Environment

Let's analyze the DeepSeek model's supply chain first.

In [21]:
# DeepSeek SBOM evaluation
if __name__ == "__main__":
    sbom_path = Path("../docs/sbom_deepseek_llm_env.json")  
    metrics = compute_supply_chain_l4_metrics(sbom_path)
    scored = score_supply_chain_l4(metrics)

    print("Overall L4 Supply-chain score:", scored["overall_score"])
    print("Category scores:", json.dumps(scored["category_scores"], indent=2))

Overall L4 Supply-chain score: 58.4375
Category scores: {
  "dependency_coverage": 43.75,
  "version_fidelity": 58.333333333333336,
  "integrity_provenance": 62.5,
  "impact_traceability": 100.0
}


**DeepSeek Results Analysis:**
- Overall score of 58.4% is moderate - room for improvement
- Dependency coverage at 43.75% is concerning - we're missing a lot of documented dependencies
- Perfect impact traceability (100%) is great - we have full dependency graph
- Integrity/provenance at 62.5% suggests we have decent supplier info but could be better

## Evaluation: Llama Model Environment

Now let's check the Llama model for comparison.

In [20]:
# Llama SBOM evaluation
if __name__ == "__main__":
    sbom_path = Path("../docs/sbom_llama_env.json")  
    
    metrics = compute_supply_chain_l4_metrics(sbom_path)
    scored = score_supply_chain_l4(metrics)

    print("Overall L4 Supply-chain score:", scored["overall_score"])
    print("Category scores:", json.dumps(scored["category_scores"], indent=2))

Overall L4 Supply-chain score: 46.5625
Category scores: {
  "dependency_coverage": 43.75,
  "version_fidelity": 75.0,
  "integrity_provenance": 25.0,
  "impact_traceability": 25.0
}
