# Hashing with Salt and Pepper

This notebook explores:
1. **Hash Collisions** - What they are and how they can be exploited
2. **Rainbow Tables** - Pre-computed hash tables for password cracking
3. **Salting & Peppering** - Techniques to protect against attacks
4. **Implementation** - Practical example of secure password hashing

## 1. Hash Collisions

### What is a Hash Collision?

A **hash collision** occurs when two different inputs produce the same hash output. This happens because:
- Hash functions map an infinite input space to a finite output space
- By the **pigeonhole principle**, collisions are inevitable

### Example of Collision
```
MD5("hello") = 5d41402abc4b2a76b9719d911017c592
MD5("world") = 7d793037a0760186574b0282f2f435e7
```

If we found two different passwords that produce the same hash, an attacker could use either password to authenticate.

### Types of Collisions

1. **Natural Collisions**: Occur by chance due to hash function limitations
2. **Cryptographic Attacks**: Deliberately crafted inputs designed to produce collisions
   - Example: MD5 and SHA-1 are considered broken due to practical collision attacks

### Attack Scenarios Using Collisions

1. **Authentication Bypass**: If an attacker finds a collision for a password hash, they can authenticate without knowing the original password
2. **Digital Signature Forgery**: Collisions can be used to create fraudulent documents with the same signature
3. **Integrity Attacks**: Malicious files can be crafted to have the same hash as legitimate files

## 2. Rainbow Tables

### What are Rainbow Tables?

**Rainbow tables** are pre-computed tables of hash values for common passwords. They trade computation time for storage space, allowing rapid password cracking.

### How Rainbow Tables Work

1. **Pre-computation Phase**:
   - Generate millions/billions of common passwords
   - Hash each password using the target hash function
   - Store the mapping: `hash → password`
   - Optimize storage using reduction functions and chains

2. **Attack Phase**:
   - Obtain the target hash from a compromised database
   - Look up the hash in the rainbow table
   - If found, retrieve the original password instantly

### Example Attack Scenario

```
Database breach reveals:
User: alice@example.com
Hash: 5f4dcc3b5aa765d61d8327deb882cf99

Attacker looks up in rainbow table:
5f4dcc3b5aa765d61d8327deb882cf99 → "password"

Result: Instant password recovery!
```

### Why Rainbow Tables are Dangerous

- **Speed**: Millions of hashes can be checked in seconds
- **Scalability**: One rainbow table can crack passwords for millions of users
- **Common passwords**: "password123", "qwerty", "123456" are immediately vulnerable
- **Dictionary attacks**: All common words and variations are pre-computed

In [41]:
import hashlib
import os
import secrets

## 3. Protection Mechanisms: Salting and Peppering

### How Salting Protects Against Rainbow Tables

**Salt** is a random value unique to each password, stored alongside the hash.

**Benefits:**
1. **Defeats Rainbow Tables**: Each password has a unique salt, requiring separate rainbow tables for each user
2. **Prevents Duplicate Detection**: Same passwords produce different hashes
3. **Increases Computation Cost**: Attacker must compute hashes for each salt individually

**Formula**: `hash = H(password + salt)`

**Example:**
```
User 1: salt=abc123, password="password" → hash1
User 2: salt=xyz789, password="password" → hash2
Even with same password, hash1 ≠ hash2!
```

### How Peppering Adds Extra Protection

**Pepper** is a secret value shared across all passwords, stored separately from the database (e.g., in environment variables or secure key storage).

**Benefits:**
1. **Database Breach Protection**: Even if the database is compromised, attacker doesn't have the pepper
2. **Additional Computation Barrier**: Without the pepper, hashes cannot be verified
3. **Key Rotation**: Pepper can be rotated to force re-hashing

**Formula**: `hash = H(password + salt + pepper)`

**Key Differences:**

| Aspect | Salt | Pepper |
|--------|------|--------|
| **Uniqueness** | Unique per password | Shared across all passwords |
| **Storage** | In database (with hash) | Separate secure storage |
| **Visibility** | Can be public | Must remain secret |
| **Purpose** | Prevent rainbow tables | Protect against database breach |

### Combined Protection

Using both salt and pepper provides defense in depth:
- **Salt** makes rainbow tables impractical
- **Pepper** protects even if database is compromised
- **Strong hash function** (e.g., SHA-256, bcrypt, Argon2) resists brute force

In [42]:
# Demonstration: Why unsalted hashes are vulnerable

# Simulating a simple rainbow table (very small example)
rainbow_table = {
    hashlib.md5("password".encode()).hexdigest(): "password",
    hashlib.md5("123456".encode()).hexdigest(): "123456",
    hashlib.md5("qwerty".encode()).hexdigest(): "qwerty",
    hashlib.md5("admin".encode()).hexdigest(): "admin",
    hashlib.md5("letmein".encode()).hexdigest(): "letmein",
}

print("Simple Rainbow Table:")
print("=" * 70)
for hash_val, password in rainbow_table.items():
    print(f"{password:15} → {hash_val}")

# Simulate a database breach
print("\n" + "=" * 70)
print("SIMULATED DATABASE BREACH")
print("=" * 70)
leaked_hash = hashlib.md5("password".encode()).hexdigest()
print(f"Leaked hash: {leaked_hash}")

# Attack using rainbow table
if leaked_hash in rainbow_table:
    cracked = rainbow_table[leaked_hash]
    print(f"✓ Password cracked instantly: '{cracked}'")
else:
    print("✗ Hash not found in rainbow table")

Simple Rainbow Table:
password        → 5f4dcc3b5aa765d61d8327deb882cf99
123456          → e10adc3949ba59abbe56e057f20f883e
qwerty          → d8578edf8458ce06fbc5bb76a58c5ca4
admin           → 21232f297a57a5a743894a0e4a801fc3
letmein         → 0d107d09f5bbe40cade3de5c71e9e9b7

SIMULATED DATABASE BREACH
Leaked hash: 5f4dcc3b5aa765d61d8327deb882cf99
✓ Password cracked instantly: 'password'


## 4. Implementation: Secure Password Hashing

### Setup and Configuration

In [43]:
# Pepper is a secret stored separately (e.g., environment variable or config)
# In production: Load from secure key management system, never hardcode!
PEPPER = os.getenv("PASSWORD_PEPPER", "my-secret-pepper-key-12345")

print(f"Pepper loaded: {'*' * len(PEPPER)} (hidden for security)")
print("Pepper would be stored in: Environment variable or secrets manager")

Pepper loaded: ************************** (hidden for security)
Pepper would be stored in: Environment variable or secrets manager


### Password Hashing Functions with Salt and Pepper

In [44]:
def hash_password(password: str, pepper: str = PEPPER) -> tuple[str, str]:
    """
    Hash a password with salt and pepper.

    :param str password:
        The plain text password to hash
    :param str pepper:
        The pepper value (secret, shared across all passwords)

    :returns tuple[str, str]:
        A tuple containing (salt_hex, hash_hex)
    """
    # Generate a random salt (16 bytes = 128 bits)
    salt = secrets.token_bytes(16)

    # Combine salt + password + pepper
    combined = salt + password.encode("utf-8") + pepper.encode("utf-8")

    # Hash using SHA-256
    password_hash = hashlib.sha256(combined).digest()

    # Return salt and hash as hex strings for storage
    return salt.hex(), password_hash.hex()


def verify_password(password: str, salt_hex: str, hash_hex: str, pepper: str = PEPPER) -> bool:
    """
    Verify a password against a stored salt and hash.

    :param str password:
        The plain text password to verify
    :param str salt_hex:
        The salt value (hex string)
    :param str hash_hex:
        The stored hash value (hex string)
    :param str pepper:
        The pepper value (secret, shared across all passwords)

    :returns bool:
        True if password matches, False otherwise
    """
    # Convert hex strings back to bytes
    salt = bytes.fromhex(salt_hex)
    stored_hash = bytes.fromhex(hash_hex)

    # Recreate the hash with provided password
    combined = salt + password.encode("utf-8") + pepper.encode("utf-8")
    password_hash = hashlib.sha256(combined).digest()

    # Compare hashes (use secrets.compare_digest to prevent timing attacks)
    return secrets.compare_digest(password_hash, stored_hash)

### Example 1: User Registration (Creating Hash with Salt & Pepper)

In [45]:
# User registration - hashing a new password
print("=" * 70)
print("USER REGISTRATION - Creating Secure Password Hash")
print("=" * 70)

password = "MySecurePassword123!"
salt, password_hash = hash_password(password)

print("\nInput:")
print(f"  Password: {password}")
print(f"  Pepper: {'*' * 20} (secret, not stored with hash)")

print("\nGenerated:")
print(f"  Salt (random, 128-bit): {salt}")
print(f"  Hash (SHA-256): {password_hash}")

print("\nStored in database:")
print("  ┌─────────────────────────────────────────────┐")
print("  │ username: alice@example.com                 │")
print(f"  │ salt:     {salt[:40]}... │")
print(f"  │ hash:     {password_hash[:40]}... │")
print("  └─────────────────────────────────────────────┘")

print("\nPassword hashed securely with salt and pepper!")

USER REGISTRATION - Creating Secure Password Hash

Input:
  Password: MySecurePassword123!
  Pepper: ******************** (secret, not stored with hash)

Generated:
  Salt (random, 128-bit): aea67563c1013f77b798a9030f8b6645
  Hash (SHA-256): 11dd99c6e4e236f7afe7be97f9931de32692830985113dec4db69c0ff52d5fd8

Stored in database:
  ┌─────────────────────────────────────────────┐
  │ username: alice@example.com                 │
  │ salt:     aea67563c1013f77b798a9030f8b6645... │
  │ hash:     11dd99c6e4e236f7afe7be97f9931de326928309... │
  └─────────────────────────────────────────────┘

Password hashed securely with salt and pepper!


### Example 2: User Login (Verifying Password)

In [46]:
# Correct password verification
print("\n" + "=" * 70)
print("USER LOGIN - Correct Password Attempt")
print("=" * 70)

correct_password = "MySecurePassword123!"
print(f"\nAttempted password: {correct_password}")
print(f"Retrieved from DB: salt={salt[:20]}..., hash={password_hash[:20]}...")

is_valid = verify_password(correct_password, salt, password_hash)
print("\nVerification process:")
print("  1. Retrieve salt from database")
print("  2. Combine: password + salt + pepper")
print("  3. Hash using SHA-256")
print("  4. Compare with stored hash")

print(f"\n{'[SUCCESS] - Authentication granted!' if is_valid else '[FAILED] - Access denied!'}")


USER LOGIN - Correct Password Attempt

Attempted password: MySecurePassword123!
Retrieved from DB: salt=aea67563c1013f77b798..., hash=11dd99c6e4e236f7afe7...

Verification process:
  1. Retrieve salt from database
  2. Combine: password + salt + pepper
  3. Hash using SHA-256
  4. Compare with stored hash

[SUCCESS] - Authentication granted!


In [47]:
# Wrong password verification
print("\n" + "=" * 70)
print("USER LOGIN - Wrong Password Attempt")
print("=" * 70)

wrong_password = "WrongPassword123"
print(f"\nAttempted password: {wrong_password}")

is_valid = verify_password(wrong_password, salt, password_hash)
print("\nVerification process:")
print("  1. Retrieve salt from database")
print("  2. Combine: password + salt + pepper")
print("  3. Hash using SHA-256")
print("  4. Compare with stored hash → Mismatch!")

print(f"\n{'[SUCCESS] - Authentication granted!' if is_valid else '[FAILED] - Access denied!'}")


USER LOGIN - Wrong Password Attempt

Attempted password: WrongPassword123

Verification process:
  1. Retrieve salt from database
  2. Combine: password + salt + pepper
  3. Hash using SHA-256
  4. Compare with stored hash → Mismatch!

[FAILED] - Access denied!


### Example 3: Demonstration - Why Salt Prevents Rainbow Table Attacks

In [48]:
# Demonstration: Same password produces different hashes with different salts
print("\n" + "=" * 70)
print("SALT PROTECTION: Same Password → Different Hashes")
print("=" * 70)

same_password = "CommonPassword123"
print(f"\nScenario: Two users choose the same password: '{same_password}'")

# User 1
salt1, hash1 = hash_password(same_password)
print(f"\n{'─' * 70}")
print("User 1 (alice@example.com):")
print(f"  Password:  {same_password}")
print(f"  Salt:      {salt1}")
print(f"  Hash:      {hash1}")

# User 2 (same password, different salt)
salt2, hash2 = hash_password(same_password)
print(f"\n{'─' * 70}")
print("User 2 (bob@example.com):")
print(f"  Password:  {same_password}")
print(f"  Salt:      {salt2}")
print(f"  Hash:      {hash2}")

print(f"\n{'─' * 70}")
print("Analysis:")
print(f"    Different salts: {salt1 != salt2}")
print(f"    Different hashes: {hash1 != hash2}")

print(f"\n{'─' * 70}")
print("Security Implications:")
print("  1. Rainbow table for 'CommonPassword123' is useless")
print("  2. Each user requires individual brute-force attack")
print("  3. Attacker cannot detect duplicate passwords")
print("  4. Computation cost increased by factor of # of users")


SALT PROTECTION: Same Password → Different Hashes

Scenario: Two users choose the same password: 'CommonPassword123'

──────────────────────────────────────────────────────────────────────
User 1 (alice@example.com):
  Password:  CommonPassword123
  Salt:      ff939d3b1aeabb67b0798cd2ec22b912
  Hash:      31c0a52fcc801ecc5c0520feabdfd20e2154998d527c0b2580222a0d9ef85413

──────────────────────────────────────────────────────────────────────
User 2 (bob@example.com):
  Password:  CommonPassword123
  Salt:      69aa59520611e3eea16f06a397c5a528
  Hash:      b95f4202eb64ecfaf1f39edc233401176d188f3d5280c670ed1c178b63923eca

──────────────────────────────────────────────────────────────────────
Analysis:
    Different salts: True
    Different hashes: True

──────────────────────────────────────────────────────────────────────
Security Implications:
  1. Rainbow table for 'CommonPassword123' is useless
  2. Each user requires individual brute-force attack
  3. Attacker cannot detect duplicat