# Break: ECB Mode Pattern Leakage

**Module 03** | Breaking Weak Parameters

*Identical plaintext blocks produce identical ciphertext blocks. This is catastrophic.*

## Why This Matters

A block cipher like AES encrypts **fixed-size blocks** (16 bytes each). But real messages
are longer than one block. A **mode of operation** defines how to apply the block cipher
to multi-block messages.

The simplest mode is **ECB (Electronic Codebook)**: encrypt each block independently
with the same key. This sounds reasonable, but it has a fatal flaw:

$$\text{If } P_i = P_j \text{, then } C_i = C_j$$

Identical plaintext blocks produce identical ciphertext blocks. Any **pattern** in the
plaintext is preserved in the ciphertext, even though the individual block values are
scrambled. An attacker learns the structure of your message without decrypting a single byte.

## The Scenario

We'll use a toy block cipher on **8-bit blocks** (1-byte blocks) to make the patterns
visible. The cipher is the AES S-box itself --- a bijection on single bytes that
provides good confusion but (in ECB mode) zero diffusion across blocks.

We'll encrypt structured messages and see how the structure leaks through.

In [None]:
# === Setup: Build the AES S-box as our toy block cipher ===
R.<x> = GF(2)[]
F.<a> = GF(2^8, modulus=x^8 + x^4 + x^3 + x + 1)

def byte_to_gf(b):
    return sum(GF(2)((b >> i) & 1) * a^i for i in range(8))

def gf_to_byte(elem):
    p = elem.polynomial()
    return sum(int(p[i]) << i for i in range(8))

# Build S-box
A_mat = matrix(GF(2), [
    [1,0,0,0,1,1,1,1],[1,1,0,0,0,1,1,1],[1,1,1,0,0,0,1,1],[1,1,1,1,0,0,0,1],
    [1,1,1,1,1,0,0,0],[0,1,1,1,1,1,0,0],[0,0,1,1,1,1,1,0],[0,0,0,1,1,1,1,1]
])
c_vec = vector(GF(2), [(0x63 >> i) & 1 for i in range(8)])

SBOX = [0] * 256
INV_SBOX = [0] * 256
for b in range(256):
    if b == 0:
        inv_bits = vector(GF(2), [0]*8)
    else:
        inv_byte = gf_to_byte(byte_to_gf(b)^(-1))
        inv_bits = vector(GF(2), [(inv_byte >> i) & 1 for i in range(8)])
    result_bits = A_mat * inv_bits + c_vec
    SBOX[b] = sum(int(result_bits[i]) << i for i in range(8))
    INV_SBOX[SBOX[b]] = b

def encrypt_block(b):
    """Toy block cipher: encrypt one byte using the AES S-box."""
    return SBOX[b]

def decrypt_block(b):
    """Toy block cipher: decrypt one byte."""
    return INV_SBOX[b]

print('Toy block cipher ready (AES S-box on 8-bit blocks).')
print(f'Example: encrypt(0x41) = 0x{encrypt_block(0x41):02X}')
print(f'         decrypt(0x{encrypt_block(0x41):02X}) = 0x{decrypt_block(encrypt_block(0x41)):02X}')

## Step 1: ECB Mode Encryption

In ECB mode, we encrypt each block independently:

$$C_i = E_K(P_i)$$

No chaining, no IV, no interaction between blocks. Each block is a standalone encryption.

In [None]:
def ecb_encrypt(plaintext_bytes):
    """Encrypt a list of bytes in ECB mode."""
    return [encrypt_block(b) for b in plaintext_bytes]

def ecb_decrypt(ciphertext_bytes):
    """Decrypt a list of bytes in ECB mode."""
    return [decrypt_block(b) for b in ciphertext_bytes]

# Encrypt a message with repeating structure
message = 'AAAA BBBB AAAA CCCC AAAA BBBB'
plaintext = [ord(c) for c in message]
ciphertext = ecb_encrypt(plaintext)

print(f'Plaintext:  {message}')
print(f'PT bytes:   {" ".join(f"{b:02X}" for b in plaintext)}')
print(f'CT bytes:   {" ".join(f"{b:02X}" for b in ciphertext)}')
print()

# Highlight the pattern preservation
print('Pattern analysis:')
print(f'  PT "A" (0x41) always encrypts to 0x{encrypt_block(0x41):02X}')
print(f'  PT "B" (0x42) always encrypts to 0x{encrypt_block(0x42):02X}')
print(f'  PT " " (0x20) always encrypts to 0x{encrypt_block(0x20):02X}')
print()
print('The ciphertext has the SAME repetition structure as the plaintext!')

In [None]:
# Visualize with a larger structured message: a simple 16x16 "image"
# Create a toy grayscale image with clear structure

# Build a 16x16 image with vertical stripes and a block pattern
width, height = 32, 32
image = []
for row in range(height):
    for col in range(width):
        if row < 8:
            # Top band: alternating light/dark columns
            image.append(0x20 if col % 4 < 2 else 0xE0)
        elif row < 16:
            # Middle-upper band: solid medium gray
            image.append(0x80)
        elif row < 24:
            # Middle-lower band: checkerboard
            image.append(0x40 if (row + col) % 2 == 0 else 0xC0)
        else:
            # Bottom band: gradient
            image.append((col * 8) % 256)

# Encrypt in ECB mode
ecb_image = ecb_encrypt(image)

print(f'Image size: {width}x{height} = {len(image)} bytes')
print(f'Unique plaintext values:  {len(set(image))}')
print(f'Unique ciphertext values: {len(set(ecb_image))}')
print()
print('Plaintext image (hex, first 8 rows):')
for row in range(8):
    print(' '.join(f'{image[row*width+col]:02X}' for col in range(min(16, width))))
print()
print('ECB-encrypted image (hex, first 8 rows):')
for row in range(8):
    print(' '.join(f'{ecb_image[row*width+col]:02X}' for col in range(min(16, width))))

## Step 2: Visualize the Pattern Leakage

Even though individual byte values are different (the S-box scrambled them), the
**pattern structure** is perfectly preserved. Let's visualize this with a histogram
and a structural comparison.

In [None]:
# Block frequency analysis: does the ciphertext reveal structure?
from collections import Counter

pt_counts = Counter(image)
ct_counts = Counter(ecb_image)

print('=== Block Frequency Analysis ===')
print()
print('Plaintext byte frequencies (top 10):')
for val, count in pt_counts.most_common(10):
    bar = '#' * (count // 4)
    print(f'  0x{val:02X}: {count:3d}  {bar}')
print()

print('ECB ciphertext byte frequencies (top 10):')
for val, count in ct_counts.most_common(10):
    bar = '#' * (count // 4)
    print(f'  0x{val:02X}: {count:3d}  {bar}')
print()

print('Observation: the FREQUENCY DISTRIBUTION is identical!')
print('The S-box just relabels the bars --- their heights don\'t change.')
print()

# Verify: sorted frequency lists should match
pt_freqs = sorted(pt_counts.values(), reverse=True)
ct_freqs = sorted(ct_counts.values(), reverse=True)
print(f'Sorted frequency lists match: {pt_freqs == ct_freqs}')

In [None]:
# Structural leakage: detect repeating blocks
print('=== Detecting Repeating Blocks ===')
print()

# An attacker doesn't know what the bytes mean, but can detect repetitions
def detect_patterns(data, block_size=1):
    """Detect positions of repeated blocks."""
    seen = {}
    repeats = 0
    for i in range(0, len(data), block_size):
        block = tuple(data[i:i+block_size])
        if block in seen:
            repeats += 1
        else:
            seen[block] = i
    return repeats, len(seen)

pt_reps, pt_unique = detect_patterns(image)
ct_reps, ct_unique = detect_patterns(ecb_image)

print(f'Plaintext:  {pt_unique} unique blocks, {pt_reps} repeated positions')
print(f'Ciphertext: {ct_unique} unique blocks, {ct_reps} repeated positions')
print()
print(f'The ciphertext has exactly the same repetition count as the plaintext.')
print(f'An attacker can recover the STRUCTURE of the plaintext from ECB ciphertext.')
print()

# Demonstrate: attacker can tell which blocks are equal
print('Attacker\'s view (block equality map):')
print('Encoding each unique ciphertext block as a letter...')
block_to_label = {}
label_idx = 0
labels = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnop'
for b in ecb_image:
    if b not in block_to_label:
        block_to_label[b] = labels[label_idx % len(labels)]
        label_idx += 1

print('First 4 rows of 32-byte image, labeled by block identity:')
for row in range(4):
    row_data = ecb_image[row*width:(row+1)*width]
    print('  ' + ''.join(block_to_label[b] for b in row_data))
print()
print('Clear repeating patterns visible, even without knowing the key!')

## Step 3: Compare with CBC Mode

In **CBC (Cipher Block Chaining)** mode, each block is XORed with the previous
ciphertext block before encryption:

$$C_i = E_K(P_i \oplus C_{i-1}), \quad C_0 = E_K(P_0 \oplus \text{IV})$$

The chaining means identical plaintext blocks produce **different** ciphertext blocks
(unless $C_{i-1}$ also happens to be the same, which is astronomically unlikely).

In [None]:
random.seed(42)  # reproducible

def cbc_encrypt(plaintext_bytes, iv):
    """Encrypt a list of bytes in CBC mode."""
    ciphertext = []
    prev = iv
    for b in plaintext_bytes:
        # XOR with previous ciphertext block, then encrypt
        encrypted = encrypt_block(b ^^ prev)
        ciphertext.append(encrypted)
        prev = encrypted
    return ciphertext

# Encrypt the same structured image with CBC
iv = randint(0, 255)
cbc_image = cbc_encrypt(image, iv)

print('=== ECB vs CBC Comparison ===')
print()
print(f'Same plaintext image ({width}x{height}), two modes:')
print()

# Frequency analysis
cbc_counts = Counter(cbc_image)

print(f'ECB unique ciphertext values: {len(ct_counts)}')
print(f'CBC unique ciphertext values: {len(cbc_counts)}')
print()

print('ECB ciphertext (first 4 rows):')
for row in range(4):
    print(' '.join(f'{ecb_image[row*width+col]:02X}' for col in range(min(16, width))))
print()

print('CBC ciphertext (first 4 rows):')
for row in range(4):
    print(' '.join(f'{cbc_image[row*width+col]:02X}' for col in range(min(16, width))))
print()

# Repetition comparison
cbc_reps, cbc_unique = detect_patterns(cbc_image)
print(f'Repeated block positions: ECB = {ct_reps}, CBC = {cbc_reps}')
print(f'CBC breaks the pattern: chaining makes identical plaintexts produce different ciphertexts.')

In [None]:
# Quantify the information leak: mutual information between
# plaintext block identity and ciphertext block identity

def block_equality_vector(data):
    """For each pair (i,j), record whether data[i] == data[j]."""
    n = len(data)
    equalities = []
    for i in range(min(n, 200)):  # sample to keep tractable
        for j in range(i+1, min(n, 200)):
            equalities.append(1 if data[i] == data[j] else 0)
    return equalities

pt_eq = block_equality_vector(image)
ecb_eq = block_equality_vector(ecb_image)
cbc_eq = block_equality_vector(cbc_image)

# Correlation: do equal-plaintext pairs correspond to equal-ciphertext pairs?
ecb_match = sum(1 for a, b in zip(pt_eq, ecb_eq) if a == b)
cbc_match = sum(1 for a, b in zip(pt_eq, cbc_eq) if a == b)
total = len(pt_eq)

print('=== Pattern Correlation ===')
print()
print(f'Do equal plaintext blocks produce equal ciphertext blocks?')
print(f'  ECB: {ecb_match}/{total} pairs match ({100*ecb_match/total:.1f}%)')
print(f'  CBC: {cbc_match}/{total} pairs match ({100*cbc_match/total:.1f}%)')
print()
print(f'ECB: 100% correlation = complete pattern leakage.')
print(f'CBC: ~50% correlation = no meaningful leakage (random chance).')

## The Fix: Chained Modes of Operation

Never use ECB for multi-block messages. Use a mode that chains blocks together:

| Mode | How it works | Advantage |
|------|-------------|----------|
| **CBC** | $C_i = E_K(P_i \oplus C_{i-1})$ | Hides patterns, widely deployed |
| **CTR** | $C_i = P_i \oplus E_K(\text{nonce} \| i)$ | Parallelizable, random access |
| **GCM** | CTR + GHASH authentication | Encryption + integrity (gold standard) |

All of these ensure that identical plaintext blocks produce different ciphertext blocks.

**AES-GCM** is the standard choice in TLS 1.3, and we'll explore it in the Connect
notebook on AES-GCM authenticated encryption.

## Exercises

### Exercise 1

Encrypt the string `'HELLO HELLO HELLO HELLO HELLO'` in both ECB and CBC mode.
How many repeated ciphertext blocks does each mode produce?

### Exercise 2

Implement **CTR (Counter) mode**: $C_i = P_i \oplus E_K(\text{nonce} + i)$.
Encrypt the same structured image. Does it hide patterns like CBC?

### Exercise 3

In CBC mode, what happens if you reuse the same IV for two different messages
that share the same first block? What does the attacker learn from
$C_1 \oplus C_1'$ where $C_1 = E_K(P_1 \oplus \text{IV})$ and
$C_1' = E_K(P_1' \oplus \text{IV})$?

In [None]:
# Exercise space

# Exercise 1: Encrypt and compare
msg = [ord(c) for c in 'HELLO HELLO HELLO HELLO HELLO']
ecb_msg = ecb_encrypt(msg)
cbc_msg = cbc_encrypt(msg, randint(0, 255))

ecb_reps_msg, _ = detect_patterns(ecb_msg)
cbc_reps_msg, _ = detect_patterns(cbc_msg)
print(f'ECB repeated blocks: {ecb_reps_msg}')
print(f'CBC repeated blocks: {cbc_reps_msg}')

# Exercise 2: Implement CTR mode
# TODO: def ctr_encrypt(plaintext_bytes, nonce): ...

# Exercise 3: CBC IV reuse analysis
# TODO

## Summary

| Property | ECB | CBC / CTR / GCM |
|----------|-----|------------------|
| Equal plaintext blocks â†’ equal ciphertext? | **Yes** (fatal) | No |
| Pattern leakage | Complete | None |
| Block independence | Each block isolated | Blocks chained together |
| Safe for multi-block messages? | **No** | Yes |

**Key takeaways:**
- ECB mode encrypts blocks independently, so **patterns in the plaintext are perfectly
  preserved** in the ciphertext.
- An attacker can detect which plaintext blocks are equal without knowing the key.
- Chained modes (CBC, CTR, GCM) break this by making each ciphertext block depend on
  more than just its plaintext block.
- This is why **ECB should never be used** for messages longer than one block.
- The underlying block cipher (AES) is perfectly fine --- the weakness is entirely in the mode.

---

*Back to [Module 03: Galois Fields and AES](../README.md)*