# Exercise 1: Source Control System Attack

In this exercise, you'll learn about supply chain attacks targeting source control systems by implementing functions that demonstrate various attack vectors and mitigations.

These exercises will use a simulated Git repository (which we'll create using an in-memory system) to safely demonstrate security concepts without risking actual harm.

First, let's run our setup code:

In [1]:
from dataclasses import dataclass
from typing import List, Dict, Optional
import hashlib
import hmac
import secrets
import datetime

@dataclass 
class GitCommit:
    """Base class representing a Git commit"""
    hash: str
    message: str
    timestamp: datetime.datetime
    author: str
    parent_hash: str = None

@dataclass
class KeyPair:
    """Represents a public/private key pair for a developer"""
    public_key: str
    private_key: str
    
@dataclass
class SignedGitCommit(GitCommit):
    """Extends GitCommit to include signature"""
    signature: Optional[str] = None
    author_public_key: Optional[str] = None
    
@dataclass
class GitRepo:
    """Represents a Git repository"""
    commits: Dict[str, GitCommit] = None
    HEAD: str = None
    
    def __post_init__(self):
        if self.commits is None:
            self.commits = {}

def generate_key_pair(developer_name: str) -> KeyPair:
    """Generate a new key pair for a developer"""
    # In practice, would use proper asymmetric crypto
    private = secrets.token_hex(32)
    public = hashlib.sha256(private.encode()).hexdigest()
    return KeyPair(public_key=public, private_key=private)

# Create test developer keys
alice_keys = generate_key_pair("alice")
bob_keys = generate_key_pair("bob")

### Exercise 1: Git Hash Manipulation

The first attack vector we'll explore is manipulating Git commit hashes. Git uses SHA-1 hashes to identify commits. An attacker who can create a commit with a specific hash can potentially bypass protections based on commit hash verification.

Your task is to implement a function that creates a Git commit with a hash starting with a given prefix, simulating a hash collision attack.

This is a simplified version to demonstrate the concept - real Git commit hashes use more complex inputs.

Useful methods:
- `datetime.datetime.now()`: get current time
- `datetime.timedelta(microseconds)`: create a time delta (datetime objects can be added to each other)
- `hashlib.sha1(bytes)`: hash bytes using SHA-1
- `string.encode()`: encode a string to bytes
- `hash.hexdigest()`: get the hex digest of a hash

In [2]:
def create_commit_with_prefix(message: str, prefix: str, author: str = "attacker") -> GitCommit:
    """
    Creates a Git commit with a hash starting with the given prefix by trying different timestamps
    
    Args:
        message: The commit message
        prefix: The desired prefix for the commit hash
        author: The commit author
        
    Returns:
        A GitCommit object with a hash starting with prefix
    """
    # YOUR CODE HERE
    pass

In [3]:
# Test code
test_prefix = "abc"
commit = create_commit_with_prefix("Test commit", test_prefix)
assert commit.hash.startswith(test_prefix)
print(f"Created commit with hash: {commit.hash}")

AttributeError: 'NoneType' object has no attribute 'hash'

<details>
<summary>Solution</summary>

```python
def create_commit_with_prefix(message: str, prefix: str, author: str = "attacker") -> GitCommit:
    timestamp = datetime.datetime.now()
    while True:
        # Create commit string (simplified from real Git format)
        commit_str = f"{message}{timestamp.isoformat()}{author}"
        
        # Calculate hash
        commit_hash = hashlib.sha1(commit_str.encode()).hexdigest()
        
        # Check if hash has desired prefix
        if commit_hash.startswith(prefix):
            return GitCommit(
                hash=commit_hash,
                message=message,
                timestamp=timestamp,
                author=author
            )
            
        # Try next timestamp
        timestamp += datetime.timedelta(microseconds=1)
```

</details>

### Exercise 2: Malicious Merge Request

In this exercise, you'll implement a function that simulates a malicious merge request attack. The goal is to create a series of commits that look innocent in the merge request but contain malicious code that will be activated later.

Your function should create two branches:
1. A "clean" branch shown in the merge request
2. A "malicious" branch that will be switched to after merge

For now, put the code of the commit into the message. (TODO: fix this)

In [None]:
def create_malicious_merge(repo: GitRepo, clean_code: str, malicious_code: str) -> tuple[str, str]:
    """
    Creates two branches - one clean and one malicious
    
    Args:
        repo: The Git repository
        clean_code: The innocent-looking code
        malicious_code: The malicious code to be activated later
        
    Returns:
        Tuple of (clean_branch_hash, malicious_branch_hash)
    """
    # YOUR CODE HERE
    pass

In [4]:
# Test code
repo = GitRepo()
clean_code = "print('hello world')"
malicious_code = "import os; os.system('rm -rf /')"

clean_hash, malicious_hash = create_malicious_merge(repo, clean_code, malicious_code)

assert clean_hash in repo.commits
assert malicious_hash in repo.commits
assert clean_code in repo.commits[clean_hash].message
assert malicious_code in repo.commits[malicious_hash].message

NameError: name 'create_malicious_merge' is not defined

<details>
<summary>Solution</summary>

```python
def create_malicious_merge(repo: GitRepo, clean_code: str, malicious_code: str) -> tuple[str, str]:
    # Create clean branch
    clean_commit = GitCommit(
        hash=hashlib.sha1(clean_code.encode()).hexdigest(),
        message=f"Add feature\n\nCode:\n{clean_code}",
        timestamp=datetime.datetime.now(),
        author="attacker"
    )
    repo.commits[clean_commit.hash] = clean_commit
    
    # Create malicious branch pointing to same parent
    malicious_commit = GitCommit(
        hash=hashlib.sha1(malicious_code.encode()).hexdigest(),
        message=f"Add feature\n\nCode:\n{malicious_code}",
        timestamp=datetime.datetime.now(),
        author="attacker",
        parent_hash=clean_commit.hash
    )
    repo.commits[malicious_commit.hash] = malicious_commit
    
    return clean_commit.hash, malicious_commit.hash
```

</details>

### Exercise 3: Git Hook Bypass

Git hooks are scripts that run at certain points in Git's execution. In this exercise, you'll implement a function that bypasses a pre-commit hook that's meant to check for malicious code.

Your function should find a way to commit code that would normally be blocked by the hook.

In [5]:
def bypass_malicious_code_hook(repo: GitRepo, malicious_code: str) -> str:
    """
    Commits malicious code while bypassing a hook that checks for suspicious patterns
    
    Args:
        repo: The Git repository
        malicious_code: The malicious code to commit
        
    Returns:
        Hash of the commit containing the malicious code
    """
    # YOUR CODE HERE
    pass

In [8]:
# Test code
def mock_malicious_code_hook(code: str) -> bool:
    """Mock hook that blocks suspicious patterns"""
    suspicious_patterns = ['rm -rf', 'eval(', 'exec(']
    return not any(pattern in code for pattern in suspicious_patterns)

repo = GitRepo()
malicious_code = "rm -rf /"
commit_hash = bypass_malicious_code_hook(repo, malicious_code)

assert commit_hash in repo.commits
assert mock_malicious_code_hook(repo.commits[commit_hash].message)
assert malicious_code in repo.commits[commit_hash].message

AssertionError: 

In [7]:
def bypass_malicious_code_hook(repo: GitRepo, malicious_code: str) -> str:
    # Encode malicious code to bypass pattern matching
    encoded_code = ''.join(chr(ord(c) + 1) for c in malicious_code)
    
    # Create commit with encoded malicious code
    commit = GitCommit(
        hash=hashlib.sha1(encoded_code.encode()).hexdigest(),
        message=f"Add feature\n\nCode:\n{encoded_code}",
        timestamp=datetime.datetime.now(),
        author="attacker"
    )
    repo.commits[commit.hash] = commit
    
    return commit.hash

<details>
<summary>Solution</summary>

```python
def bypass_malicious_code_hook(repo: GitRepo, malicious_code: str) -> str:
    # Encode malicious code to bypass pattern matching
    encoded_code = ''.join(chr(ord(c) + 1) for c in malicious_code)
    
    # Create commit with encoded malicious code
    commit = GitCommit(
        hash=hashlib.sha1(encoded_code.encode()).hexdigest(),
        message=f"Add feature\n\nCode:\n{encoded_code}",
        timestamp=datetime.datetime.now(),
        author="attacker"
    )
    repo.commits[commit.hash] = commit
    
    return commit.hash
```

</details>

### Security Controls & Mitigations

To prevent these types of attacks, you could implement:

1. Signed commits and verify signatures
2. Protected branches with strict merge requirements
3. Multiple reviewers for all code changes
4. Automated security scanning in CI/CD
5. Regular security audits of repository settings

We'll focus on signed commits in the next exercise.

### Exercise 4.1: Implement Signed Commits

Your task is to implement a function that creates a signed commit. The signature should be created using the author's private key and should cover all important commit metadata.

Tips:
- Use HMAC for signing (as a simplified stand-in for real GPG signatures)
- Include all important commit data in the signature
- Remember to store the public key for later verification

In [None]:
def create_signed_commit(
    message: str,
    author: str,
    author_private_key: str,
    author_public_key: str,
    parent_hash: Optional[str] = None
) -> SignedGitCommit:
    """
    Creates a signed Git commit
    
    Args:
        message: Commit message
        author: Name of commit author
        author_private_key: Author's private key for signing
        author_public_key: Author's public key (stored with commit)
        parent_hash: Hash of parent commit (optional)
        
    Returns:
        SignedGitCommit with valid signature
    """
    # YOUR CODE HERE
    pass

In [None]:
# Test code
commit = create_signed_commit(
    message="Add secure feature",
    author="alice",
    author_private_key=alice_keys.private_key,
    author_public_key=alice_keys.public_key
)

print(f"Created signed commit: {commit}")

<details>
<summary>Solution</summary>

```python
def create_signed_commit(
    message: str,
    author: str, 
    author_private_key: str,
    author_public_key: str,
    parent_hash: Optional[str] = None
) -> SignedGitCommit:
    # Create commit data
    timestamp = datetime.datetime.now()
    
    # Calculate commit hash
    commit_str = f"{message}{timestamp.isoformat()}{author}{parent_hash}"
    commit_hash = hashlib.sha1(commit_str.encode()).hexdigest()
    
    # Create signature of all commit data
    data_to_sign = f"{commit_hash}{message}{timestamp.isoformat()}{author}{parent_hash}"
    signature = hmac.new(
        author_private_key.encode(),
        data_to_sign.encode(),
        hashlib.sha256
    ).hexdigest()
    
    return SignedGitCommit(
        hash=commit_hash,
        message=message,
        timestamp=timestamp,
        author=author,
        parent_hash=parent_hash,
        signature=signature,
        author_public_key=author_public_key
    )
```

</details>


### Exercise 4.2: Implement Signature Verification

Now implement a function that verifies commit signatures. It should:
1. Recreate the signed data using commit fields
2. Verify the signature using the included public key
3. Return True only if the signature is valid

This would be called before accepting commits into protected branches.

In [None]:
def verify_commit_signature(commit: SignedGitCommit) -> bool:
    """
    Verifies the signature on a signed commit
    
    Args:
        commit: The SignedGitCommit to verify
        
    Returns:
        True if signature is valid, False otherwise
    """
    # YOUR CODE HERE
    pass

In [None]:
# Test code
# Should verify valid commits
valid_commit = create_signed_commit(
    message="Valid commit",
    author="alice",
    author_private_key=alice_keys.private_key,
    author_public_key=alice_keys.public_key
)
assert verify_commit_signature(valid_commit)

# Should reject commits with invalid signatures
tampered_commit = SignedGitCommit(
    hash=valid_commit.hash,
    message="Tampered message",  # Changed message
    timestamp=valid_commit.timestamp,
    author=valid_commit.author,
    signature=valid_commit.signature,  # But kept old signature
    author_public_key=valid_commit.author_public_key
)
assert not verify_commit_signature(tampered_commit)

print("All signature verification tests passed!")

<details>
<summary>Solution</summary>

```python
def verify_commit_signature(commit: SignedGitCommit) -> bool:
    if not (commit.signature and commit.author_public_key):
        return False
        
    # Recreate the signed data
    data_to_verify = f"{commit.hash}{commit.message}{commit.timestamp.isoformat()}{commit.author}{commit.parent_hash}"
    
    # Verify using public key
    expected_signature = hmac.new(
        commit.author_public_key.encode(),
        data_to_verify.encode(),
        hashlib.sha256
    ).hexdigest()
    
    # Use constant-time comparison to prevent timing attacks
    return hmac.compare_digest(commit.signature, expected_signature)
```

</details>


### Exercise 4.3: Implement Protected Branch Rules

Finally, implement a function that enforces protected branch rules:
1. All commits must have valid signatures
2. Only commits signed by authorized developers are allowed
3. Direct pushes to protected branches are blocked - only merged PRs allowed

This is a crucial control that brings together signature verification with branch protection.

In [None]:
@dataclass
class ProtectedBranch:
    name: str
    commits: List[str]  # List of commit hashes
    authorized_keys: List[str]  # List of allowed public keys
    
def update_protected_branch(
    branch: ProtectedBranch,
    new_commit: SignedGitCommit,
    is_merge_request: bool = False
) -> bool:
    """
    Attempts to update a protected branch with a new commit
    
    Args:
        branch: The protected branch to update
        new_commit: The new commit to add
        is_merge_request: Whether this is from a merged PR
        
    Returns:
        True if update was allowed, False if rejected
    """
    # YOUR CODE HERE
    pass

In [None]:
# Test code
main_branch = ProtectedBranch(
    name="main",
    commits=[],
    authorized_keys=[alice_keys.public_key]  # Only alice is authorized
)

# Should allow valid commits from authorized developers via PR
valid_commit = create_signed_commit(
    message="Good commit",
    author="alice",
    author_private_key=alice_keys.private_key,
    author_public_key=alice_keys.public_key
)
assert update_protected_branch(main_branch, valid_commit, is_merge_request=True)

# Should reject direct pushes
assert not update_protected_branch(main_branch, valid_commit, is_merge_request=False)

# Should reject commits from unauthorized developers
unauthorized_commit = create_signed_commit(
    message="Bad commit",
    author="bob",
    author_private_key=bob_keys.private_key,
    author_public_key=bob_keys.public_key
)
assert not update_protected_branch(main_branch, unauthorized_commit, is_merge_request=True)

print("All protected branch tests passed!")

<details>
<summary>Solution</summary>

```python
def update_protected_branch(
    branch: ProtectedBranch,
    new_commit: SignedGitCommit,
    is_merge_request: bool = False
) -> bool:
    # Block direct pushes
    if not is_merge_request:
        return False
    
    # Verify commit signature
    if not verify_commit_signature(new_commit):
        return False
        
    # Check developer is authorized
    if new_commit.author_public_key not in branch.authorized_keys:
        return False
        
    # If all checks pass, update branch
    branch.commits.append(new_commit.hash)
    return True
```

</details>
