# Practical Activity 1: Classical Cryptography and Perfect Encryption

## Objectives
- Implement several **classical ciphers** using Python.  
- Analyse their **security properties**.  
- Understand the concept of **perfect secrecy**.  
- Solve related **problems and questions**.  

## 1. Introduction

Classical cryptography refers to the historical methods of encrypting messages before the rise of modern algorithms.  
Examples include the **Caesar cipher**, the **Affine cipher**, and the **Vigenère cipher**.  

While these systems were groundbreaking for their time, they are **cryptographically weak** by modern standards.  

The **One-Time Pad (OTP)**, however, is a special cipher that provides **perfect secrecy** (in Shannon’s sense) but has practical limitations due to key distribution.  


## 2. Classical Ciphers Implementation

For our code, we will need to use a little bit of modular arithmetic. Concretely, we will need to compute the inverse of a number in $\mathbb{Z}_n$, which is done as follows:

In [1]:
def modinv(a, m):
    """
    Compute the modular inverse of a modulo m.
    Returns x such that (a * x) % m == 1, or None if no inverse exists.
    """
    # Extended Euclidean Algorithm
    def egcd(x, y):
        if y == 0:
            return x, 1, 0
        g, u, v = egcd(y, x % y)
        return g, v, u - (x // y) * v

    g, x, _ = egcd(a, m)
    if g != 1:
        return None  # inverse does not exist
    else:
        return x % m


<strong> Explanation: </strong> 
* Given $a,b$ integers, the <strong>Extended Euclidean Algorithm (EEA)</strong> computes n,m and g such that $an+mb=g$ where $g$ is the greatest common divisor of $a,b$.   
* Given $a,m$ integers, $a=1\mod m$ means that $a=mx+1$. Hence $ab=1\mod m$ iff $ab=my+1$ iff the greatest common divisor of $a,m$ is $1$, and then the EEA computes $b$ (the inverse of $a$ modulo $m$). 
* We need to compute modular inverses for decryption if the affine cypher.

In [2]:
# 2.1 Caesar Cipher
def caesar_encrypt(text, key):
    result = ""
    for char in text.upper():
        if char.isalpha():
            result += chr(((ord(char) - 65 + key) % 26) + 65)
        else:
            result += char
    return result

def caesar_decrypt(cipher, key):
    return caesar_encrypt(cipher, -key)

# Example
plaintext = "Cryptography IS FUN"
cipher = caesar_encrypt(plaintext, 3)
decrypted = caesar_decrypt(cipher, 3)
plaintext, cipher, decrypted

('Cryptography IS FUN', 'FUBSWRJUDSKB LV IXQ', 'CRYPTOGRAPHY IS FUN')

In [3]:
# 2.2 Affine Cipher
def affine_encrypt(text, a, b):
    result = ""
    for char in text.upper():
        if char.isalpha():
            result += chr(((a * (ord(char) - 65) + b) % 26) + 65)
        else:
            result += char
    return result

def affine_decrypt(cipher, a, b):
    result = ""
    inv_a = modinv(a, 26)
    for char in cipher:
        if char.isalpha():
            result += chr(((inv_a * ((ord(char) - 65) - b)) % 26) + 65)
        else:
            result += char
    return result

# Example
cipher = affine_encrypt("HELLO WORLD", 5, 8)
decrypted = affine_decrypt(cipher, 5, 8)
cipher, decrypted

('RCLLA OAPLX', 'HELLO WORLD')

In [4]:
# 2.3 Vigenère Cipher
def vigenere_encrypt(text, key):
    result = ""
    key = key.upper()
    key_index = 0
    for char in text.upper():
        if char.isalpha():
            shift = ord(key[key_index % len(key)]) - 65
            result += chr(((ord(char) - 65 + shift) % 26) + 65)
            key_index += 1
        else:
            result += char
    return result

def vigenere_decrypt(cipher, key):
    result = ""
    key = key.upper()
    key_index = 0
    for char in cipher.upper():
        if char.isalpha():
            shift = ord(key[key_index % len(key)]) - 65
            result += chr(((ord(char) - 65 - shift) % 26) + 65)
            key_index += 1
        else:
            result += char
    return result

# Example
cipher = vigenere_encrypt("ATTACK AT DAWN", "LEMON")
decrypted = vigenere_decrypt(cipher, "LEMON")
cipher, decrypted

('LXFOPV EF RNHR', 'ATTACK AT DAWN')

In [5]:
# 2.4 One-Time Pad (OTP) with XOR
import os

# Imports Python’s os module, which includes os.urandom() to generate cryptographically secure random bytes.
# The os module in Python is part of the standard library. 
# It provides functions for interacting with the operating system (OS).
# Think of it as a bridge between Python code and the system resources 
# (like files, directories, environment variables, random number generation, etc.)


def otp_encrypt(text, key):
    return bytes([ord(c) ^ k for c, k in zip(text, key)])


# zip(text, key) pairs each character of the plaintext with one byte of the key.

# ord(c) converts each character c into its ASCII code.

# ord(c) ^ k applies XOR between the plaintext byte and key byte.

# The result is collected into a list and wrapped in bytes([...]) so the ciphertext is stored as raw bytes.



def otp_decrypt(cipher, key):
    return ''.join([chr(c ^ k) for c, k in zip(cipher, key)])

# Note that encryption and decryption are the same in this case. 

# Example
plaintext = "HELLO"
key = os.urandom(len(plaintext))  # truly random key
# Indeed, os.urandom(n) generates n cryptographically secure random bytes.
cipher = otp_encrypt(plaintext, key)
decrypted = otp_decrypt(cipher, key)
plaintext, cipher, decrypted

('HELLO', b'\x85\x0b\xbd\x90_', 'HELLO')

Let us now put everything in bits, since OTP and XOR run with bits...:

In [6]:
def to_bits(data):
    """Convert a bytes or string object into a string of bits."""
    if isinstance(data, str):
        data = data.encode("utf-8")  # convert string to bytes
    return ''.join(f"{byte:08b}" for byte in data)

def otp_encrypt(text, key):
    """Encrypt plaintext (string) with key (bytes)."""
    return bytes([ord(c) ^ k for c, k in zip(text, key)])

def otp_decrypt(cipher, key):
    """Decrypt ciphertext (bytes) with key (bytes)."""
    return ''.join([chr(c ^ k) for c, k in zip(cipher, key)])


# Example
plaintext = "HELLO"
key = os.urandom(len(plaintext))  # random key (bytes)
cipher = otp_encrypt(plaintext, key)
decrypted = otp_decrypt(cipher, key)

print("Plaintext (ASCII):", plaintext)
print("Plaintext (bits): ", ' '.join(f"{ord(c):08b}" for c in plaintext))

print("\nKey (bits):       ", ' '.join(f"{k:08b}" for k in key))

print("\nCipher (bits):    ", ' '.join(f"{c:08b}" for c in cipher))
print("Cipher (raw):     ", cipher)

print("\nDecrypted (bits): ", ' '.join(f"{ord(c):08b}" for c in decrypted))
print("Decrypted (text):", decrypted)


Plaintext (ASCII): HELLO
Plaintext (bits):  01001000 01000101 01001100 01001100 01001111

Key (bits):        01101101 00110011 11111111 01100101 00100011

Cipher (bits):     00100101 01110110 10110011 00101001 01101100
Cipher (raw):      b'%v\xb3)l'

Decrypted (bits):  01001000 01000101 01001100 01001100 01001111
Decrypted (text): HELLO


## 3. Cryptanalysis Exercises

In [7]:
# 3.1 Brute force attack on Caesar cipher
cipher = caesar_encrypt("MEET ME AT MIDNIGHT", 7)
print("Cipher:", cipher)

print("\nBrute force results:")
for k in range(26):
    print(f"Key {k}: {caesar_decrypt(cipher, k)}")

Cipher: TLLA TL HA TPKUPNOA

Brute force results:
Key 0: TLLA TL HA TPKUPNOA
Key 1: SKKZ SK GZ SOJTOMNZ
Key 2: RJJY RJ FY RNISNLMY
Key 3: QIIX QI EX QMHRMKLX
Key 4: PHHW PH DW PLGQLJKW
Key 5: OGGV OG CV OKFPKIJV
Key 6: NFFU NF BU NJEOJHIU
Key 7: MEET ME AT MIDNIGHT
Key 8: LDDS LD ZS LHCMHFGS
Key 9: KCCR KC YR KGBLGEFR
Key 10: JBBQ JB XQ JFAKFDEQ
Key 11: IAAP IA WP IEZJECDP
Key 12: HZZO HZ VO HDYIDBCO
Key 13: GYYN GY UN GCXHCABN
Key 14: FXXM FX TM FBWGBZAM
Key 15: EWWL EW SL EAVFAYZL
Key 16: DVVK DV RK DZUEZXYK
Key 17: CUUJ CU QJ CYTDYWXJ
Key 18: BTTI BT PI BXSCXVWI
Key 19: ASSH AS OH AWRBWUVH
Key 20: ZRRG ZR NG ZVQAVTUG
Key 21: YQQF YQ MF YUPZUSTF
Key 22: XPPE XP LE XTOYTRSE
Key 23: WOOD WO KD WSNXSQRD
Key 24: VNNC VN JC VRMWRPQC
Key 25: UMMB UM IB UQLVQOPB


In [8]:
# 3.2 Frequency analysis helper
from collections import Counter

def frequency_analysis(text):
    text = ''.join([c for c in text.upper() if c.isalpha()])
    counts = Counter(text)
    total = sum(counts.values())
    return {c: round(counts[c] / total, 3) for c in counts}

cipher = caesar_encrypt("THIS IS A SECRET MESSAGE THAT WE WILL TRY TO BREAK", 5)
frequency_analysis(cipher)

{'Y': 0.15,
 'M': 0.05,
 'N': 0.075,
 'X': 0.125,
 'F': 0.1,
 'J': 0.15,
 'H': 0.025,
 'W': 0.075,
 'R': 0.025,
 'L': 0.025,
 'B': 0.05,
 'Q': 0.05,
 'D': 0.025,
 'T': 0.025,
 'G': 0.025,
 'P': 0.025}

<strong> Index of Coincidence (IC) in Vigenére Cryptanalysis</strong>

The Index of Coincidence (IC) is a statistical measure that estimates how likely it is that two randomly chosen letters from a text are the same.

Formally, for a text of length $N$, with letter counts 
$f_A, f_B, \dots, f_Z$, the IC is defined as:
$$ IC = \frac{\sum_{i=A}^{Z} f_i (f_i - 1)}{N(N-1)}.$$

Here:
* $f_i$ = frequency of letter $i$,
* $N$ = total number of letters in the text.


Intuition: 
* If the text is completely random, all letters appear with probability $\tfrac{1}{26}$.
    $$\text{Expected } IC \approx \frac{1}{26} \approx 0.038.$$
* If the text is English plaintext, letters follow natural frequencies (E, T, A are more common).
   $$\text{Expected } IC \approx 0.066.$$


So:
$$\text{Random text} \;\;\Rightarrow\;\; IC \approx 0.038$$

$$\text{English text} \;\;\Rightarrow\;\; IC \approx 0.066$$

* Application to the Vigenére Cipher: 

The Vigenére cipher shifts the plaintext with a periodic key of length $k$.

* If we guess a key length $k$ and split the ciphertext into $k$ subsequences (taking every $k$-th letter), then:
    
*  If $k$ is correct, each subsequence is essentially a Caesar cipher of English text, so its IC is closer to $0.066$.
*  If $k$ is wrong, subsequences mix multiple Caesar shifts, so they look more random and the IC is closer to $0.038$.

Workflow:
*  For each candidate key length $k$:

        * Split ciphertext into subsequences.
        * Compute IC for each subsequence.
        * Take the average IC.
Compare results:
* Peaks closer to $0.066$ suggest the likely key length.
* Flat values near $0.038$ suggest incorrect lengths.
   
   
   Key Insight: 
The Index of Coincidence works because it distinguishes between:
* Natural language distribution: biased toward some letters, higher IC.
* Uniform random distribution: all letters equal, lower IC.


Thus, IC is a statistical tool to guess the Vigenére key length.


In [9]:
# 3.3 Index of Coincidence (IC) for Vigenère analysis
def index_of_coincidence(text):
    text = ''.join([c for c in text.upper() if c.isalpha()])
    N = len(text)
    freqs = Counter(text)
    ic = sum([f*(f-1) for f in freqs.values()]) / (N*(N-1))
    return ic

sample = vigenere_encrypt("THIS IS A LONGER TEXT TO COMPUTE INDEX OF COINCIDENCE", "KEY")
index_of_coincidence(sample)

0.05813953488372093

Note that spaces do not affect to the computation above!:

In [15]:
sample = vigenere_encrypt("THISISALONGERTEXTTOCOMPUTEINDEXOFCOINCIDENCE", "KEY")
index_of_coincidence(sample)

0.05813953488372093

## 4. Questions & Problems (Solved)

1. **Show that Caesar cipher can be brute-forced easily**  
   → Done in section 3.1, only 26 possibilities.  

2. **Why is the Affine cipher vulnerable to frequency analysis?**  
   → Because it’s a substitution cipher with a fixed mapping, so letter frequencies remain.  

3. **Implement a function that guesses Vigenère key length using IC.**  

4. **Challenge: Encrypt with Vigenère and break it using frequency analysis.**  
   → Implemented with IC-based key length guess, then frequency analysis

In [10]:
def guess_vigenere_key_length(cipher, max_len):
    scores = {}
    for key_len in range(1, max_len+1):
        ic_values = []
        for i in range(key_len):
            subseq = cipher[i::key_len]
            ic_values.append(index_of_coincidence(subseq))
        scores[key_len] = sum(ic_values)/len(ic_values)
    return scores

#### cipher[i::key_len] is Python slicing with a stride.

#### For i = 0, it takes positions 0, key_len, 2*key_len, …

#### For i = 1, positions 1, key_len+1, … and so on.

#### Each subseq corresponds to all letters encrypted by the same key character (same Caesar shift).

#### Compute the IC of each subseq and store it.

### Finally, the order:
### scores[key_len] = sum(ic_values)/len(ic_values)
### Combines all the subsequences' ICs into one average number for that key length.
### The guessed key length is the one that attains a maximum. 

cipher = vigenere_encrypt("It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, it was the epoch of belief, it was the epoch of incredulity, it was the season of Light, it was the season of Darkness, it was the spring of hope, it was the winter of despair.", "LEMON")
guess_vigenere_key_length(cipher, 8)


{1: 0.049325823519371904,
 2: 0.046896484927008805,
 3: 0.0492217777199196,
 4: 0.04599463741939607,
 5: 0.043794430816552196,
 6: 0.04345366421652006,
 7: 0.048437675763556065,
 8: 0.042696407304728144}

Note that we used the key LEMON so that k=5, but the guessed length was 1! This is because of the spaces.
Let us try to do the computation deleting the spaces:

In [11]:
cipher = vigenere_encrypt("ITWASTHEBESTOFTIMESITWASTHEWORSTOFTIMESITWASTHEAGEOFWISDOMITWASTHEAGEOFFOOLISHNESSITWASTHEEPOCHOFBELIEFITWASTHEEPOCHOFINCREDULITYITWASTHESEASONOFLIGHTITWASTHESEASONOFDARKNESSITWASTHESPRINGOFHOPEITWASTHEWINTEROFDESPAIR", "LEMON")
cipher

'TXIOFELQPRDXATGTQQGVEAMGGSIICEDXATGTQQGVEAMGGSIMURZJIWFOSYWGHEEHUPESSBQJACYTWTBRDWUHJLWFVRPTAQUZJNSYTIRWGHEEHUPIBCPSSRWANVQRHWMFMVEAMGGSIESNDSZCSWMSVGTXIOFELQGRLWABBQHMFXYIEGVEAMGGSIEDETRSCSSSBSVEAMGGSIIWAEIDCSOIEDNTV'

In [12]:
guess_vigenere_key_length(cipher, 8)

{1: 0.049325823519371904,
 2: 0.04870895932372396,
 3: 0.05305807447424272,
 4: 0.04723651610444063,
 5: 0.08705325682069867,
 6: 0.05256685256685256,
 7: 0.04761904761904762,
 8: 0.040649165649165646}

Now the lenght was guessed : k=5. 

Once the key length $k$ is known, the standard approach to crack a Vigenère cipher is:

* Split the ciphertext into $k$ groups, where each group corresponds to letters encrypted with the same Caesar shift.

* For each group, guess the Caesar shift by comparing letter frequencies with English letter frequencies.

* Collect all shifts → reconstruct the key.

* Decrypt the ciphertext.

In [14]:
import string
from collections import Counter

# English letter frequencies (approximate, normalized)
ENGLISH_FREQ = {
    'A': 0.082, 'B': 0.015, 'C': 0.028, 'D': 0.043, 'E': 0.13,
    'F': 0.022, 'G': 0.02, 'H': 0.061, 'I': 0.07, 'J': 0.0015,
    'K': 0.0077, 'L': 0.04, 'M': 0.024, 'N': 0.067, 'O': 0.075,
    'P': 0.019, 'Q': 0.00095, 'R': 0.06, 'S': 0.063, 'T': 0.091,
    'U': 0.028, 'V': 0.0098, 'W': 0.024, 'X': 0.0015, 'Y': 0.02, 'Z': 0.00074
}

LETTERS = string.ascii_uppercase

def vigenere_decrypt(ciphertext, key):
    """Decrypt Vigenère cipher given ciphertext and key."""
    plaintext = []
    key = key.upper()
    for i, c in enumerate(ciphertext):
        if c in LETTERS:
            shift = LETTERS.index(key[i % len(key)])
            p = (LETTERS.index(c) - shift) % 26
            plaintext.append(LETTERS[p])
        else:
            plaintext.append(c)
    return ''.join(plaintext)

def caesar_score(text):
    """Score a text segment by comparing letter frequencies with English frequencies."""
    N = len(text)
    if N == 0:
        return float('inf')
    counts = Counter(text)
    chi2 = 0
    for letter in LETTERS:
        observed = counts.get(letter, 0)
        expected = ENGLISH_FREQ[letter] * N
        chi2 += (observed - expected) ** 2 / (expected + 1e-9)
    return chi2

def guess_caesar_shift(segment):
    """Guess the Caesar shift for a segment using chi-squared statistic."""
    best_shift, best_score = None, float('inf')
    for shift in range(26):
        decrypted = ''.join(LETTERS[(LETTERS.index(c) - shift) % 26] for c in segment)
        score = caesar_score(decrypted)
        if score < best_score:
            best_score = score
            best_shift = shift
    return best_shift

def guess_vigenere_key(ciphertext, key_len):
    """Guess the Vigenère key of given length."""
    key = ""
    for i in range(key_len):
        segment = ciphertext[i::key_len]  # every i-th letter
        shift = guess_caesar_shift(segment)
        key += LETTERS[shift]
    return key

# =====================
# 🔍 Example
# =====================
#ciphertext = "LXFOPVEFRNHR"  # "ATTACKATDAWN" encrypted with key "LEMON"

ciphertext = vigenere_encrypt("ITWASTHEBESTOFTIMESITWASTHEWORSTOFTIMESITWASTHEAGEOFWISDOMITWASTHEAGEOFFOOLISHNESSITWASTHEEPOCHOFBELIEFITWASTHEEPOCHOFINCREDULITYITWASTHESEASONOFLIGHTITWASTHESEASONOFDARKNESSITWASTHESPRINGOFHOPEITWASTHEWINTEROFDESPAIR", "LEMON")


key_len = 5  # assume we already guessed correctly

guessed_key = guess_vigenere_key(ciphertext, key_len)
decrypted = vigenere_decrypt(ciphertext, guessed_key)

print("Ciphertext: ", ciphertext)
print("Guessed Key:", guessed_key)
print("Decrypted:  ", decrypted)


Ciphertext:  TXIOFELQPRDXATGTQQGVEAMGGSIICEDXATGTQQGVEAMGGSIMURZJIWFOSYWGHEEHUPESSBQJACYTWTBRDWUHJLWFVRPTAQUZJNSYTIRWGHEEHUPIBCPSSRWANVQRHWMFMVEAMGGSIESNDSZCSWMSVGTXIOFELQGRLWABBQHMFXYIEGVEAMGGSIEDETRSCSSSBSVEAMGGSIIWAEIDCSOIEDNTV
Guessed Key: LEMON
Decrypted:   ITWASTHEBESTOFTIMESITWASTHEWORSTOFTIMESITWASTHEAGEOFWISDOMITWASTHEAGEOFFOOLISHNESSITWASTHEEPOCHOFBELIEFITWASTHEEPOCHOFINCREDULITYITWASTHESEASONOFLIGHTITWASTHESEASONOFDARKNESSITWASTHESPRINGOFHOPEITWASTHEWINTEROFDESPAIR


## 5. Conclusions

- **Classical ciphers** are useful for learning but offer **no real security** today.  
- **Brute force** and **frequency analysis** break them quickly.  
- **Perfect secrecy** exists only with OTP, but key distribution makes it impractical.  
- This motivates **modern cryptography**, which balances strong security with efficiency.  
