# What is Cipher

Cipher methods are techniques for scrambling messages to keep their contents confidential. They work by transforming readable text (called plaintext) into an unreadable format (ciphertext) using a secret key. This key acts like a decoder ring, allowing the intended recipient to decrypt the message back to plaintext.

There are many cipher methods, but here's a quick look at a classic one:

Caesar Cipher: This is a simple substitution cipher, where each letter in the message is shifted a certain number of positions down the alphabet. 

For example, shifting every letter by 3 positions would turn "hello" into "khoor." The key in this case is the number of positions to shift (3). Caesar ciphers are easy to crack, but they were a good starting point for understanding encryption.

Modern ciphers are far more complex and use sophisticated algorithms to scramble data. These algorithms are often categorized based on the type of key used (symmetric or asymmetric) and the way data is processed (block ciphers or stream ciphers).


Table of Content:

1. [Basic Letter Encryption](#1)
   1. [Caesar Cipher](#1.1)
   2. [Random Mapping](#1.2)
   3. [Multiply Random Mappings](#1.3)
   4. [Advanced Mapping by Looping Different Ciphers](#1.4)
   5. [Growing Cipher Issue](#1.5)
   6. [Hacker Game](#1.6)

In [None]:
%load_ext autoreload
%autoreload 2

<a id="1"></a> 
## 1. Basic Letter Encryption

In [None]:
%%writefile basic_cipher.py

import re
import sys
import random
import string
from datetime import datetime

# basic class of cipher
class BasicCipher:
    def __init__(self, name="Basic"):
        self.ALPHABET = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
        self.name = name + "_Cipher"
        self.cipher = {}
        self.reverse_cipher = {} 
        self.ciphers = []
        self.reverse_ciphers = []  

    def __str__(self) -> str:
        cipher_size = sum([len(cipher) for cipher in self.ciphers])
        description =  f"""
        {"*"*37}
        Name: {self.name}
        Ciphers: {len(self.ciphers)}
        Total Size: {cipher_size}, KB: {cipher_size/1024}
        {"*"*37}
        """
        return description

    def __repr__(self):
        return self.__str__()   

    def selfcheck(self):
        plaintext = self.format_string("Hello World")
        ciphertext = self.encrypt(plaintext)
        message = self.decrypt(ciphertext)
        
        print("_"*79)
        print(self)
        run_time = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
        print(f"Init Run at {run_time}")
        print(f"input/{plaintext      = }")
        print(f"encoding/{ciphertext  = }")
        print(f"decoding/{message     = }")        
        print("_"*79)


    def format_string(self, input_string): 
        # Only keep the letters and space from input string
        # ! normally we don't care about the space, but for some cipher,
        # ! space is also important, so we keep it for demonstration purpose.
        return re.sub(r'[^A-Za-z ]', '', input_string).upper()
         
    # function need to be implemented in the subclass
    def generate_ciphers(self):
        pass

    def generate_ciphers(self):
        pass     

    # encode and decode are the same as encrypt and decrypt    
    def encode(self, plaintext)-> str:
        return self.encrypt(plaintext)
    
    def decode(self, ciphertext):
        return self.decrypt(ciphertext)

    def encrypt(self, plaintext)-> str:
        letters = list(self.format_string(plaintext))
        if not self.ciphers:
            return plaintext
        else:                
            for self.cipher in self.ciphers:
                cipher_letters = []
                for letter in letters:
                    if letter in self.cipher:
                        cipher_letters.append(self.cipher[letter])
                    else:
                        cipher_letters.append(letter)   
                letters = "".join(cipher_letters)         
            return "".join(cipher_letters)  

    def decrypt(self, ciphertext) -> str:
        letters = list(self.format_string(ciphertext))
        if not self.reverse_ciphers:
            return ciphertext
        else:
            for self.reverse_cipher in self.reverse_ciphers:
                plain_letters = []
                for letter in letters:
                    if letter in self.reverse_cipher:
                        plain_letters.append(self.reverse_cipher[letter])
                    else:
                        plain_letters.append(letter) 
                letters = plain_letters
            return "".join(plain_letters)
 

    def run_cipher_demo(self) -> tuple:
        plaintext="The quick brown fox jumps over the lazy dog".upper()
        ciphertext = self.encrypt(plaintext)
        message = self.decrypt(ciphertext)
        run_time = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
        print(f"Demo Run at {run_time}")
        print(f"{plaintext   = }")
        print(f"{ciphertext  = }")
        print(f"{message     = }")
        print("_"*79)
        return plaintext, ciphertext, message 
 
    def run_cipher_demo_repeat(self) -> tuple:
        plaintext="AAABBB BBBCCC XXXYYY YYYZZZ ABCXYZ ABCXYZ".upper()
        ciphertext = self.encrypt(plaintext)
        message = self.decrypt(ciphertext)
        run_time = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
        print(f"Demo Run at {run_time}")
        print(f"{plaintext   = }")
        print(f"{ciphertext  = }")
        print(f"{message     = }")
        print("_"*79)
        return plaintext, ciphertext, message 


# test the class
if __name__ == "__main__":
    cipher = BasicCipher()
    cipher.selfcheck()
    cipher.run_cipher_demo() 

In [None]:
from basic_cipher import BasicCipher
cipher_basic = BasicCipher()
cipher_basic.selfcheck() 
# results = cipher_basic.run_cipher_demo()
results = cipher_basic.run_cipher_demo_repeat()

<a id="1.1"></a> 
### 1.1 Caesar Cipher

Caesar Cipher: Letter Mapping by Shifting the Alphabet order.

To decrypt the Caesar Cipher, we only need to know the shift number.
For example, A -> F means right-shift 5, thus key is 5.




In [None]:
%%writefile caesar_cipher.py

from basic_cipher import BasicCipher

class CaesarCipher(BasicCipher):
    def __init__(self, name="Caesar", right_shift=1):
        super().__init__(name)
        self.key = right_shift % 26 # only 25 possible keys
        self.generate_ciphers() # generate the cipher dictionary
        self.generate_reverse_ciphers()

    def generate_ciphers(self):
        alphabet = list(self.ALPHABET)
        alphabet_shift = alphabet[self.key:] + alphabet[:self.key]
        self.cipher = dict(zip(alphabet, alphabet_shift))
        self.ciphers.append(self.cipher)
        return self.cipher

    def generate_reverse_ciphers(self):
        reverse_cipher = dict(zip(self.cipher.values(), self.cipher.keys()))
        self.reverse_cipher = reverse_cipher
        self.reverse_ciphers.append(reverse_cipher)
        return reverse_cipher

In [None]:
from caesar_cipher import CaesarCipher

# use the default shift key
# caesar_cipher = CaesarCipher()
# use the shift key of 10
cipher_caesar = CaesarCipher(right_shift=10)
cipher_caesar.selfcheck()
# results = cipher_caesar.run_cipher_demo()
results = cipher_caesar.run_cipher_demo_repeat()

Caesar Cipher is considered simple because it only contains 25 different mappings which can be easily solved by brutal force methods, search all the keys from 1 to 25.

However, we can using random mapping to make it more difficult.

<a id="1.2"></a> 
### 1.2 Random Mapping

Random Mapping can help but we have to share the cipher code to decrypt the message. 

In [None]:
%%writefile random_cipher.py

from basic_cipher import BasicCipher
import random

class RandomCipher(BasicCipher):
    def __init__(self, name="Random", cipher_number=1):
        super().__init__(name)
        self.cipher_number = cipher_number
        self.generate_ciphers() # generate the cipher dictionary
        self.generate_reverse_ciphers()

    def generate_ciphers(self):
        alphabet = list(self.ALPHABET)
        alphabet_shuffled = alphabet.copy()
        for i in range(self.cipher_number):
            random.shuffle(alphabet_shuffled) 
            self.cipher = dict(zip(alphabet, alphabet_shuffled))
            self.ciphers.append(self.cipher)
        return True

    def generate_reverse_ciphers(self):
        for i in range(self.cipher_number):
            self.reverse_cipher = dict(zip(self.ciphers[i].values(), self.ciphers[i].keys()))
            self.reverse_ciphers.append(self.reverse_cipher )
        self.reverse_ciphers.reverse() # reverse the list
        return True

In [None]:
from random_cipher import RandomCipher

cipher_random = RandomCipher()
cipher_random.selfcheck()
# results = cipher_random.run_cipher_demo()
results = cipher_random.run_cipher_demo_repeat() 

We can make it more difficult my using multiply mappings.

<a id="1.3"></a> 
### 1.3 Multiply Random Mappings

In [None]:
from random_cipher import RandomCipher

cipher_random_many = RandomCipher(cipher_number=3)
cipher_random_many.selfcheck()
# results = cipher_random_many.run_cipher_demo()
results = cipher_random_many.run_cipher_demo_repeat()

But we see there is still a problem, because we are using letter mapping, the same letter is mapped to the same cipher code. 


Based on the statistic analysis, we can use letter frequency to find the mappings.


For example, the "LL" from "HELLO WORLD" is mapped to "LL" then "PP", then "MM", etc.

As you can see, just mapping the letter by comparing the letter frequency, the encryption is decoded / hacked.

<a id="1.4"></a> 
### 1.4 Advanced Mapping by Looping Different Ciphers

To avoid the same letter being mapped to the same cipher, we need to roll the mapping based the letter's order.
This means if we have N ciphers, for the 
- 1st letter using the 1st cipher mapping. 
- 2nd letter using the 2nd cipher mapping.
- ...
- N letter using the Nth cipher mapping.
- N+1 letter, go back, using the 1st again.
- ... 

In [None]:
%%writefile rolling_cipher.py

from random_cipher import RandomCipher

class RollingCipher(RandomCipher):
    def __init__(self, name="Rolling", cipher_number=1):
        super().__init__(name, cipher_number)
        self.cipher_number = cipher_number

    # overwrite the encrypt and decrypt function
    def encrypt(self, plaintext)-> str:
        letters = list(self.format_string(plaintext))
        if not self.ciphers:
            return plaintext
        else:                
            cipher_letters = []
            for i, letter in enumerate(letters):
                if letter in self.ciphers[i % self.cipher_number]:
                    cipher_letters.append(self.ciphers[i % self.cipher_number][letter])
                else:
                    cipher_letters.append(letter)   
            return "".join(cipher_letters) 
    
    def decrypt(self, ciphertext) -> str:
        letters = list(self.format_string(ciphertext))
        if not self.reverse_ciphers:
            return ciphertext
        else:
            plain_letters = []
            for i, letter in enumerate(letters):
                cipher = self.ciphers[i % self.cipher_number]
                reverse_cipher = dict(zip(cipher.values(), cipher.keys()))
                if letter in cipher:
                    plain_letters.append(reverse_cipher[letter])
                else:
                    plain_letters.append(letter) 
            return "".join(plain_letters)

In [None]:
from rolling_cipher import RollingCipher

cipher_rolling = RollingCipher(cipher_number = 5)
cipher_rolling.selfcheck()
# results = cipher_rolling.run_cipher_demo() 
results = cipher_rolling.run_cipher_demo_repeat()
# now we see the repeated letter are encrypted differently

<a id="1.5"></a>
### 1.5 Issue the Growing Cipher Size

From 1.1 to 1.4, we see the evolution of the encrypt and decrypt functions.

And the size of my key (ciphers) keep growing, it will be really difficult to keep a large ciphers in real world. 

This is where we get to know the Enigma Machine.



In [None]:
Previous_Cipher_list = [cipher_caesar, cipher_random, cipher_random_many, cipher_rolling]
for one_cipher in Previous_Cipher_list:
    print(one_cipher)

<a id="1.6"></a>
### 1.6 Hacker Game: Simple Mapping Cipher vs Rolling Mapping Cipher

To demonstrate the issue, I used the Book Mobi Dict as the plaintext, the apply the random cipher with `RandomCipher` with `3` Ciphers.

Next we redo the case study by replace the my cipher to `RollingCipher`.

We will see the mapping can not be easily used to decrypt the cipher.
And with a larger rolling cipher, the distribution of cipher test is more evenly to human language. 

In [None]:
%%writefile hack_the_code_demo.py

import re
import requests
import pandas as pd
import matplotlib.pyplot as plt
from random_cipher import RandomCipher
from rolling_cipher import RollingCipher 
from pathlib import Path
import argparse

def format_clean(text):
    text = str(text)
    # Remove all non-alphabetic characters
    text = re.sub('[^a-zA-Z]', ' ', text)
    # Replace multiple spaces with a single space
    # text = ' '.join(text.split())
    text = ''.join(text.split()) # <- remove all spaces
    return text

def letter_frequency(text):
    text = text.upper()
    letter_count = {}
    for letter in text:
        if letter in letter_count:
            letter_count[letter] += 1
        else:
            letter_count[letter] = 1
    letter_count = dict(sorted(letter_count.items()))
    return letter_count

def hack_decrypt(cipher_text, hack_cipher):
    cipher_text = cipher_text.upper() 
    reverse_cipher = {v: k for k, v in hack_cipher.items()}
    plaintext = []
    for letter in cipher_text:
        if letter in reverse_cipher:
            plaintext.append(reverse_cipher[letter])
        else:
            plaintext.append(letter)
    return "".join(plaintext)    
# Download the text

def download_text(textbook_url):
    response = requests.get(textbook_url)
    text = response.text
    plaintext = format_clean(text)
    print(f"The Book has {len(plaintext)} letters") 
    return plaintext

def main(cipher_type :str, cipher_number: int) -> dict:
    # Step 1: Download the text
    textbook_url = "https://www.gutenberg.org/cache/epub/2701/pg2701.txt"
    plaintext = download_text(textbook_url)  

    # Step 2: init cipher and get the cipher text
    match cipher_type:
        case "random_cipher":
            my_cipher = RandomCipher(cipher_number = cipher_number)
        case "rolling_cipher":
            my_cipher = RollingCipher(cipher_number = cipher_number)
    my_cipher.selfcheck()
    cipher_text = my_cipher.encrypt(plaintext)
    
    # Step 3: Get the plain text letter count and the cipher text letter count 
    letter_count = letter_frequency(plaintext)
    cipher_count = letter_frequency(cipher_text)

    # step 4: Plot the letter count of the plain text and the cipher text
    # Plot the letter count of the plain text and the cipher text
    fig, axs = plt.subplots(2, 2, figsize=(10, 7))
    df_letter_count = pd.DataFrame(letter_count.items(), columns=['letter', 'count'])
    df_cipher_count = pd.DataFrame(cipher_count.items(), columns=['letter', 'count'])
    df_letter_count.plot(kind='bar', x='letter', y='count', title='Plaintext A-Z',ax=axs[0][0], rot=0) 
    df_cipher_count.plot(kind='bar', x='letter', y='count', title=f'{cipher_type.upper()} Text A-Z', ax=axs[0][1], rot=0, color='red')
    df_letter_count = df_letter_count.sort_values(by='count', ascending=False)
    df_cipher_count = df_cipher_count.sort_values(by='count', ascending=False)
    df_letter_count = df_letter_count.reset_index(drop=True)
    df_cipher_count = df_cipher_count.reset_index(drop=True) 
    df_letter_count = df_letter_count.add_prefix('plain_')
    df_cipher_count = df_cipher_count.add_prefix('cipher_')
    df_letter_count.plot(kind='bar', x='plain_letter', y='plain_count', title='Plaintext High-Low', ax=axs[1][0], rot=0)
    df_cipher_count.plot(kind='bar', x='cipher_letter', y='cipher_count', title=f'{cipher_type.upper()}  High-Low', ax=axs[1][1], rot=0, color='red')
    plt.tight_layout() 

    Path("img").mkdir(parents=True, exist_ok=True)
    fig.savefig(f"img/{cipher_type}_{str(cipher_number)}.png", dpi=300)

    # Step 5: Hack the cipher text buy mapping the letter frequency of the cipher text to the letter frequency of the plain text
    df_letter_cipher_join = pd.concat([df_letter_count, df_cipher_count], axis=1)
    # use the plain_letter and cipher_letter to create a cipher and reverse_cipher
    hack_cipher = dict(zip(df_letter_cipher_join['plain_letter'], df_letter_cipher_join['cipher_letter']))
    
    # Step 6: Hack the cipher text and get the message
    plaintext_sample = plaintext[:200]
    cipher_text_sample = cipher_text[:200]
    hack_message = hack_decrypt(cipher_text_sample, hack_cipher)
    message = my_cipher.decrypt(cipher_text_sample)

    results = { "plaintext_sample": plaintext_sample,
                "cipher_text_sample": cipher_text_sample,
                "hack_message": hack_message,
                "message": message}
    return results

def text_segment(results):
    from wordsegment import load, segment
    load()
    hack_message = results['hack_message']
    hack_message_segments = segment(hack_message)
    fmt_hack_message = " ".join(hack_message_segments)
    print("-" * 80)
    print(f"{hack_message       = }")
    print(f"{fmt_hack_message   = }")

if __name__ == "__main__":
    # add the command line argument for cipher_type and cipher_number
    # python hack_the_code_demo.py random_cipher 1
    # python hack_the_code_demo.py rolling_cipher 3

    # Create the parser
    parser = argparse.ArgumentParser(description="Process some integers.")

    # Add the arguments
    parser.add_argument("cipher_type", type=str, help="The type of cipher to use.", default="random_cipher")
    parser.add_argument("cipher_number", type=int, help="The number of ciphers to use.", default=1)

    # Parse the arguments
    args = parser.parse_args() 
    results = main(args.cipher_type, args.cipher_number)

    print(f"{results['plaintext_sample'] = }")
    print(f"{results['cipher_text_sample'] = }")
    print(f"{results['hack_message'] = }")
    print(f"{results['message'] = }")
    text_segment(results)


In [None]:
%run hack_the_code_demo.py random_cipher 3

In [None]:
%run hack_the_code_demo.py rolling_cipher 5000

The `Enigma Machine` is similar to the rolling cipher above, but with a huge cipher mapping and extra letter switching to make the decryption extremely difficult.

**END**