# Frequency Analysis on Caesar Cipher

## Classical Ciphers

https://en.wikipedia.org/wiki/Classical_cipher

## Encryption and Decryption

In the Caesar Cipher, the encryption method is substituting each letter of the plain text with another letter using a numeric integer key. I will use the letter `k` to denote the key and the letter `p` to denote the plain text. The substitution is performed in the following manner: each letter in `p` is replaced with a letter that is `k` letters after it in the alphabet, for example with `k=1` the letter `A` would be substituted with the letter `B`, the letter `E` would be substituted with the letter `F` and the letter `Z` would be substituted with the letter `A`. From this explanation we can derive a mathematical function for **decryption** and **encryption** of the Caesar Cipher.

For `x` is the letter to be encrypted, we can use the formula `E(x) = (x + k) mod 26` to encrypt and `E(x) = (x - k) mod 26` to decrypt.

Note: `k≥26` is equivalent to `k = k mod 26` because the size of the English alphabet is 26.

Ref: https://medium.com/@Nougat-Waffle/caesar-cipher-and-frequency-analysis-with-python-635b04e0186f


In [None]:
from string import ascii_lowercase

ALPHABET = ascii_lowercase  # abcdefghijklmnopqrstuvwxyz
ALPHABET_SIZE = len(ALPHABET)
# print(f"Alphabet: {ALPHABET}, Size: {ALPHABET_SIZE}")


def _crypt(text: str, key: int) -> str:
    """
    Encrypt or decrypt the text. Pass a negative key to decrypt.
    """
    output = ""

    for char in text:
        # If the character is not in the english alphabet don't change it.
        if not char.isalpha():
            output += char
            continue

        index = ALPHABET.index(char.lower())
        # E(x) = (x +/- k) mod 26
        new_char = ALPHABET[(index + key) % ALPHABET_SIZE]

        # Setting the right case for the letter and adding it to the output
        output += new_char.upper() if char.isupper() else new_char

    return output


def encrypt(plain_text: str, key: int) -> str:
    return _crypt(plain_text, key)


def decrypt(cipher_text: str, key: int) -> str:
    # Ensure that the key is negative.
    key = -abs(key)
    return _crypt(cipher_text, key)

## Frequency Analysis and Breaking The Caesar Cipher

> In cryptanalysis, frequency analysis is the study of the frequency of letters or groups of letters in a ciphertext. The method is used as an aid to breaking classical ciphers.

In [None]:
import matplotlib.pyplot as plt


def calculate_letter_frequency(text: str) -> dict:
    letter_freq: dict = {}
    text = text.lower()
    total_count = 0
    for char in text:
        if char in ALPHABET:
            total_count += 1
            if char not in letter_freq:
                letter_freq[char] = 1
            else:
                letter_freq[char] += 1
    for l, f in letter_freq.items():
        letter_freq[l] = f * 100 / total_count
    return letter_freq


def plot_histogram(dictionary, title="Frequency analysis"):
    sorted_items = sorted(dictionary.items(), key=lambda item: item[1], reverse=True)
    keys = [item[0] for item in sorted_items]
    values = [item[1] for item in sorted_items]

    plt.bar(keys, values)
    plt.xlabel("Characters")
    plt.ylabel("Frequency %")
    plt.title(title)
    plt.show()

In [None]:
# Taken from https://en.wikipedia.org/wiki/Letter_frequency.
# Values are percentage.
LETTER_FREQUENCY = {
    "e": 12.7,
    "t": 9.1,
    "a": 8.2,
    "o": 7.5,
    "i": 7.0,
    "n": 6.7,
    "s": 6.3,
    "h": 6.1,
    "r": 6.0,
    "d": 4.25,
    "l": 4.0,
    "c": 2.8,
    "u": 2.8,
    "m": 2.4,
    "w": 2.4,
    "f": 2.2,
    "g": 2.0,
    "y": 2.0,
    "p": 1.9,
    "b": 1.5,
    "v": 0.98,
    "k": 0.77,
    "j": 0.15,
    "x": 0.15,
    "q": 0.095,
    "z": 0.074,
}
# print(sum(LETTER_FREQUENCY.values()))
plot_histogram(LETTER_FREQUENCY, title="Frequency analysis of English alphabet")

TODO: add more documentation about breaking a cipher


In [None]:
from math import inf


def calculate_difference(text: str) -> float:
    letter_freq = calculate_letter_frequency(text)
    difference = (
        sum([abs(letter_freq.get(letter, 0) - LETTER_FREQUENCY[letter]) for letter in ALPHABET]) / ALPHABET_SIZE
    )
    return difference


def break_cipher(cipher_text: str) -> int:
    lowest_difference = inf
    encryption_key = 0

    for key in range(1, ALPHABET_SIZE):
        current_plain_text = decrypt(cipher_text, key)
        current_difference = calculate_difference(current_plain_text)
        if current_difference < lowest_difference:
            lowest_difference = current_difference
            encryption_key = key
            print(f"break_cipher -> encryption_key: {encryption_key}, lowest_difference: {lowest_difference}")
    return encryption_key

## Demo: Encrypt, break and decrypt

In [None]:
%pip install -q ipywidgets
from IPython.display import HTML, clear_output, display
from ipywidgets import IntText, Layout, Text, interact

clear_output()

In [None]:
def demo(plain_text, encryption_key):
    letter_freq = calculate_letter_frequency(plain_text)
    plot_histogram(letter_freq, title="Frequency analysis of given plain text")

    encrypted_text = encrypt(plain_text, encryption_key)
    print(f"Encrypted text: {encrypted_text}")
    letter_freq = calculate_letter_frequency(encrypted_text)
    plot_histogram(letter_freq, title="Frequency analysis of the encrypted text")
    plot_histogram(LETTER_FREQUENCY, title="Frequency analysis of English alphabet")

    encryption_key = break_cipher(encrypted_text)
    print(f"Encryption key: {encryption_key}")

    decrypted_text = decrypt(encrypted_text, encryption_key)
    display(HTML(f"<br><strong>Decrypted text</strong>: {decrypted_text}"))


# First display an explanation.
display(
    HTML(f"""
        <br>
        <p>
          Here we will first encrypt a plain text, then break it (obtain the encryption key),
          and then finally decrypt it (using the encryption key) to check if the output is same as input.
        </p>
        <p>
          Please insert below a plain below and an encryption key. 
          We provide example values to start with.
        </p>
        """)
)
# Then display the interactive widget.
text_widget = Text(
    value="My name is John.",
    placeholder="Insert a plain text.",
    description="Plain Text: ",
    disabled=False,
    continuous_update=False,
    layout=Layout(width="auto", height="auto"),
)
key_widget = IntText(
    value=0,
    description="Key: ",
    disabled=False,
    continuous_update=False,
)
_ = interact(demo, plain_text=text_widget, encryption_key=key_widget)