# Password Hashing Fundamentals

**Learning Objectives:**
- Understand why passwords must be hashed
- Learn the difference between hashing and encryption
- Implement Argon2 password hashing in Python
- Test hashing and verification

## Part 1: Why Hash Passwords?

**Plain text passwords = Security disaster**

When a database is breached, plain text passwords are immediately compromised. Attackers can:
- Log into user accounts
- Try the same password on other sites (credential stuffing)
- Impersonate users

**Hashing makes passwords one-way**

Even with the hash, attackers cannot:
- Reverse engineer the original password
- Use pre-computed tables (with salt)
- Brute force efficiently (with Argon2)

In [None]:
# Let's see why plain text is dangerous

# Imagine a database breach:
users_plaintext = [
    {"email": "alice@example.com", "password": "alice123"},
    {"email": "bob@example.com", "password": "bob456"},
]

print("⚠️  PLAIN TEXT DATABASE BREACH")
print("All passwords immediately exposed!")
for user in users_plaintext:
    print(f"  {user['email']}: {user['password']}")

# Now with hashing:
users_hashed = [
    {"email": "alice@example.com", "hash": "$argon2id$v=19$..."},
    {"email": "bob@example.com", "hash": "$argon2id$v=19$..."},
]

print("\n✅ HASHED DATABASE BREACH")
print("Attackers have hashes, but cannot recover passwords!")
for user in users_hashed:
    print(f"  {user['email']}: {user['hash'][:50]}...")

## Part 2: Hashing vs Encryption

| Aspect | Hashing | Encryption |
|--------|----------|-----------|
| Reversible | ❌ No | ✅ Yes (with key) |
| Use Case | Passwords | Data at rest |
| Speed | Slow (intentional) | Fast |
| Examples | Argon2, bcrypt | AES, ChaCha20 |

In [None]:
# Demonstrate one-way hashing
import hashlib

def simple_hash(text: str) -> str:
    """Simple hash for demonstration (NOT FOR PRODUCTION!)"""
    return hashlib.sha256(text.encode()).hexdigest()

password = "MyPassword123!"
hashed = simple_hash(password)

print(f"Password: {password}")
print(f"Hash: {hashed}")
print(f"\nCan we reverse? NO! One-way function.")

## Part 3: Salt - The Rainbow Table Killer

**What is Salt?**
- Random data added to password before hashing
- Stored WITH the hash
- Never reused

**Why it matters:**
- Defeats pre-computed rainbow tables
- Same password = different hash (with different salts)
- Forces attacker to compute hash for EACH salt

In [None]:
# Demonstrate salt effect
import hashlib
import os

def hash_with_salt(password: str, salt: bytes) -> str:
    return hashlib.sha256(password.encode() + salt).hexdigest()

password = "samepassword"

# Two different salts for same password
salt1 = os.urandom(16)
salt2 = os.urandom(16)

hash1 = hash_with_salt(password, salt1)
hash2 = hash_with_salt(password, salt2)

print("Same password, different salts:")
print(f"Password: {password}")
print(f"Salt 1: {salt1.hex()}")
print(f"Hash 1: {hash1}")
print(f"\nSalt 2: {salt2.hex()}")
print(f"Hash 2: {hash2}")
print(f"\nHashes are different! (salt prevents rainbow tables)")

## Part 4: Argon2 - Memory-Hard Hashing

**Why Argon2?**
- Winner of Password Hashing Competition 2015
- Memory-hard: Requires lots of RAM (expensive for GPUs)
- Tunable: Adjust memory and time cost

**Attack resistance:**
- GPU attacks: Each hash needs 64MB VRAM
- ASIC attacks: Not feasible due to memory requirement
- Rainbow tables: Defeated by salt
- Brute force: Too slow (1000ms per hash)

In [None]:
# Install argon2-cffi
!pip install argon2-cffi -q

In [None]:
# Use our Argon2PasswordHasher
import sys
sys.path.append('..')

from src.adapters.security.password_hasher import hash_password, verify_password

# Hash a password
password = "MySecurePassword123!"
hashed = hash_password(password)

print("=== Argon2 Hashing Demo ===")
print(f"\nPassword: {password}")
print(f"Hash: {hashed}")
print(f"\nHash format: $argon2id$v=19$m=65536,t=3,p=4$salt$hash")
print(f"  v=19: Algorithm version")
print(f"  m=65536: Memory cost (64MB)")
print(f"  t=3: Time cost (iterations)")
print(f"  p=4: Parallelism (threads)")

In [None]:
# Verify passwords
print("\n=== Password Verification ===")

# Correct password
correct_result = verify_password("MySecurePassword123!", hashed)
print(f"Correct password: {correct_result}")

# Incorrect password
wrong_result = verify_password("WrongPassword123!", hashed)
print(f"Wrong password: {wrong_result}")

In [None]:
# Benchmark different memory costs
import time
from argon2 import PasswordHasher

passwords = ["password123!", "AnotherPass456!", "ThirdPass789!"]

for mem_cost in [16384, 65536, 262144]:  # 16MB, 64MB, 256MB
    hasher = PasswordHasher(
        memory_cost=mem_cost,
        time_cost=3,
        parallelism=4,
        hash_len=32,
        salt_len=16,
        type="ID"
    )
    
    start = time.time()
    for pwd in passwords:
        hasher.hash(pwd)
    total_time = (time.time() - start) * 1000  # ms
    
    print(f"\nMemory: {mem_cost//1024}MB")
    print(f"Total time for {len(passwords)} hashes: {total_time:.1f}ms")
    print(f"Per hash: {total_time/len(passwords):.1f}ms")
    
    if total_time/len(passwords) < 100:
        print("⚠️  Too fast! Increase memory_cost or time_cost")
    elif total_time/len(passwords) > 1000:
        print("⚠️  Too slow! Decrease memory_cost or time_cost")
    else:
        print("✅ Good balance!")

## Part 5: Security Best Practices

**DO:**
- ✅ Hash passwords before storing
- ✅ Use Argon2id with proper parameters
- ✅ Generate unique salt for each password
- ✅ Use constant-time comparison
- ✅ Rate limit login attempts

**DON'T:**
- ❌ Store passwords in plain text
- ❌ Use fast hashes (MD5, SHA1, SHA256)
- ❌ Log passwords (hashed or plain)
- ❌ Use custom hash implementations
- ❌ Reuse salts

**Common Vulnerabilities:**
1. Weak password requirements
2. Insufficient hashing parameters
3. Timing attacks (non-constant comparison)
4. Passwords in error messages
5. Missing rate limiting

In [None]:
# Exercise: Implement password strength checker
import re

def check_password_strength(password: str) -> dict:
    """Check password strength and return recommendations."""
    issues = []
    
    # Length check
    if len(password) < 12:
        issues.append("At least 12 characters")
    
    # Uppercase
    if not re.search(r'[A-Z]', password):
        issues.append("At least one uppercase letter")
    
    # Lowercase
    if not re.search(r'[a-z]', password):
        issues.append("At least one lowercase letter")
    
    # Number
    if not re.search(r'[0-9]', password):
        issues.append("At least one number")
    
    # Special character
    if not re.search(r'[^A-Za-z0-9]', password):
        issues.append("At least one special character")
    
    # Common passwords
    common = ["password", "123456", "qwerty", "admin"]
    if password.lower() in common:
        issues.append("Not a common password")
    
    if issues:
        return {"valid": False, "issues": issues}
    else:
        return {"valid": True, "issues": []}

# Test
test_passwords = [
    "weakpass",
    "StrongPass123!",
    "P@ssw0rd2024"
]

for pwd in test_passwords:
    result = check_password_strength(pwd)
    print(f"\nPassword: {pwd}")
    print(f"Valid: {result['valid']}")
    if result['issues']:
        print(f"Issues: {', '.join(result['issues'])}")

## Part 6: Production Checklist

**Before deploying password hashing:**

- [ ] Use Argon2id (not Argon2d) for passwords
- [ ] Set memory_cost >= 64MB
- [ ] Set time_cost >= 3
- [ ] Benchmark on production hardware
- [ ] Aim for 200-500ms per hash
- [ ] Implement rate limiting on login
- [ ] Log verification failures (not passwords)
- [ ] Plan for parameter updates (rehashing)
- [ ] Store pepper separately (optional but recommended)

**Monitoring metrics:**
- Hash computation time (detect slow/fast anomalies)
- Failed verification attempts (detect attacks)
- needs_rehash calls (detect outdated parameters)
- Password strength distribution (analyze user behavior)

## Summary

**Key Takeaways:**
1. Passwords must ALWAYS be hashed (never plain text)
2. Argon2 is the current industry standard (memory-hard, tunable)
3. Salt prevents rainbow table attacks
4. Constant-time comparison prevents timing attacks
5. Benchmark your parameters (aim for 200-500ms per hash)

**Next Steps:**
- JWT Authentication (Lesson 2)
- User Registration Flow (Lesson 3)
- Rate Limiting (Lesson 6)

**Further Reading:**
- OWASP Password Storage Cheat Sheet
- Password Hashing Competition (PHC)
- Argon2 RFC Documentation