## Mono-alphabetic sustitution cipher

We've seen that the shift ciphertext is easy to break as the number of possible keys was 26. A better approach would be to simply substitute each letter of the alphabet by another random letter. This way we would have 26! different permutations (that is a number of the order of $10^{26}$), here we cannot do exhaustive search, it would take so long for our computer...

Let's see an example of this encryption scheme

In [None]:
import string
from copy import deepcopy
from random import randint, seed

seed(1) #fix seed so that we can reproduce the results
characters = string.ascii_lowercase

def random_permutation(characters):
    old_chars = list(deepcopy(characters))
    permut_ = []
    
    while len(old_chars)>0:
        elem = old_chars.pop(randint(0,len(old_chars)-1))
        permut_.append(elem)
    return ''.join(permut_)
    
print("Plaintext characters are: \n\t{}".format(characters))
print("Equivalent in one random mono-alphabetic order: \n\t{}".format(random_permutation(characters)))

Alice and Bob just have to meet once and exchange the key which is now lenght 26 (the random substitution). In this particular case $a$ in the plaintext will be substituted by $e$ in the ciphertext, $b$ by $t$... according to the results above. Let's write the functions to encrypt and decrypt

In [None]:
# The random key generator is simply the permutation of the original characters
key_generator = lambda c: random_permutation(c)

def mono_encrypt(plaintext, characters, key):
    convert_dict = {}
    for p, c in zip(characters, key):
        convert_dict[p] = c    
    convert_dict[' '] = ' '
    
    c = ''
    for p in plaintext:
        c += convert_dict[p]
        
    return c


def mono_decrypt(ciphertext, characters, key):
    convert_dict = {}
    for p, c in zip(characters, key):
        convert_dict[c] = p    
    convert_dict[' '] = ' '
    
    c = ''
    for p in  ciphertext:
        c += convert_dict[p]
        
    return c

Now lets encrypt and decrypt a message using a new fresh generated key

In [3]:
key = key_generator(characters)
sentence = 'it was a bright cold day in april and the clocks were striking thirteen winston smith his chin nuzzled into his breast in an effort to escape the vile wind slipped quickly through the glass doors of victory mansions though not quickly enough to prevent a swirl of gritty dust from entering along with him'

ciphertext = mono_encrypt(sentence, characters, key)
plaintext = mono_decrypt(ciphertext, characters, key)
print("The key is: {}\n\n".format(key))
print("THE SENTENCE:\n\n{}\n\nCIPHERTEXT:\n\n{}\n\nPLAINTEXT:\n\n{}".format(sentence, ciphertext, plaintext))

The key is: avsboircylxmpgkhjwqdtzefun


THE SENTENCE:

it was a bright cold day in april and the clocks were striking thirteen winston smith his chin nuzzled into his breast in an effort to escape the vile wind slipped quickly through the glass doors of victory mansions though not quickly enough to prevent a swirl of gritty dust from entering along with him

CIPHERTEXT:

yd eaq a vwyrcd skmb bau yg ahwym agb dco smksxq eowo qdwyxygr dcywdoog eygqdkg qpydc cyq scyg gtnnmob ygdk cyq vwoaqd yg ag oiikwd dk oqsaho dco zymo eygb qmyhhob jtysxmu dcwktrc dco rmaqq bkkwq ki zysdkwu pagqykgq dcktrc gkd jtysxmu ogktrc dk hwozogd a qeywm ki rwyddu btqd iwkp ogdowygr amkgr eydc cyp

PLAINTEXT:

it was a bright cold day in april and the clocks were striking thirteen winston smith his chin nuzzled into his breast in an effort to escape the vile wind slipped quickly through the glass doors of victory mansions though not quickly enough to prevent a swirl of gritty dust from entering along with him


Seems to work well, but hold your horses... There is a plausible attack we can carry. Imagine the attacker knows the language in which Alice and Bob are comunciating, then he gained a lot of information with that!. He knows the distribution/frequency of all letters. Let me load George Orwell's book to estimate the probabilities of letters in English language

In [4]:
from utils import download_data, process_load_textfile
import string
import os

url = 'http://gutenberg.net.au/ebooks01/0100021.txt'
filename = 'Nineteen-eighty-four_Orwell.txt'
download_path = '/'.join(os.getcwd().split('/')[:-1]) + '/data/'

#download data to specified path
download_data(url, filename, download_path)
#load data and process
data = process_load_textfile(filename, download_path)#.replace(" ","")

In [5]:
print("The lenght of the book is {} characters".format(len(list(data))))

The lenght of the book is 569015 characters


In [6]:
#just a sample the first 1000 characters to see how it looks like
data[:1000]

'  project gutenberg australia    title nineteen eightyfour author george orwell pseudonym of eric blair   a project gutenberg of australia ebook  ebook no  txt language   english date first posted august  date most recently updated november   project gutenberg of australia ebooks are created from printed editions which are in the public domain in australia unless a copyright notice is included we do not keep any ebooks in compliance with a particular paper edition  copyright laws are changing all over the world be sure to check the copyright laws for your country before downloading or redistributing this file  this ebook is made available at no cost and with almost no restrictions whatsoever you may copy it give it away or reuse it under the terms of the project gutenberg of australia license which may be viewed online at httpgutenbergnetaulicencehtml  to contact project gutenberg of australia go to httpgutenbergnetau   title      nineteen eightyfour author     george orwell pseudonym

We assume that english letters occur with the distribution of Orwell's book so we count the letters as follows:

In [7]:
def count_char_freqs(text, characters = string.ascii_lowercase):
    freqs = {}
    for letter in characters:
        f = text.count(letter)
        freqs[letter] = f
    return freqs

english_frequencies = count_char_freqs(data)
print(english_frequencies)

{'a': 36523, 'b': 7653, 'c': 11636, 'd': 19022, 'e': 59619, 'f': 10188, 'g': 9283, 'h': 29164, 'i': 31950, 'j': 463, 'k': 3609, 'l': 18657, 'm': 10828, 'n': 31986, 'o': 35051, 'p': 8614, 'q': 409, 'r': 26126, 's': 28972, 't': 43877, 'u': 13037, 'v': 4313, 'w': 12243, 'x': 792, 'y': 9423, 'z': 306}


Let's take a random sample of length 0.01 the size of the original text and encrypt it. Then we will try to infer some information just looking at the ciphertext letter frequencies and knowing the english letter distrbution.

In [8]:
n = round(len(data)*0.05)
i = randint(0, len(data)-1)
sampled_data = data[i:i+n]
encrypted_sampled_data = mono_encrypt(sampled_data, characters, key)

print("We sample a chunk of {} characters from the book starting at position {}".format(n, i))
print("Using the private key k = {}\n\n".format(key))
print("Sampled Plaintext:")
print(sampled_data)

We sample a chunk of 28451 characters from the book starting at position 348856
Using the private key k = avsboircylxmpgkhjwqdtzefun


Sampled Plaintext:


In [9]:
ciphertext_frequencies = count_char_freqs(encrypted_sampled_data)

In [10]:
def find_key_attack(ciphertext_freqencies, english_frequencies):
    """Takes two frequency dictionaries on letters and outputs a plausible
    key
    inputs like: {'a': 36548, 'b': 7668, 'c': 11642 ...
    outputs a key
    """
    cf = sorted(ciphertext_frequencies.items(), key=lambda item: item[1])
    ef = sorted(english_frequencies.items(), key=lambda item: item[1])
    
    #map english to 
    mapping = {}
    for e, c in zip(ef, cf):
        mapping[e[0]] = c[0]
    
    m = ''
    for letter in string.ascii_lowercase:
        m += mapping[letter]
        
    return m


inferred_key = find_key_attack(ciphertext_frequencies, english_frequencies)

print("The orinal key is: \n\t{}".format(key))
print("The inferred key: \n\t{}".format(inferred_key))

The orinal key is: 
	avsboircylxmpgkhjwqdtzefun
The inferred key: 
	arimohuqgjxbpykvncwdsztfel


In [11]:
count = 0
for a, b in zip(key, inferred_key):
    if a == b:
        count += 1
        
print("We have correctly guessed {} out of {} digits of the key".format(count, len(characters)))

We have correctly guessed 8 out of 26 digits of the key


We see that there are some coincidences, let's try to decrypt the message using the inferred key

In [12]:
mono_decrypt(encrypted_sampled_data, characters, inferred_key)

' trnh yowdl fsovnle oidg tre euoiomnu ail iot tre emotnoiad pahnh cos a rnesasurnuad hounetg yrat nh uoiuesiel rese nh iot tre mosade oc mahheh yrohe attntwle nh winmfostait ho doib ah treg ase keft htealndg at yosk pwt tre mosade oc tre fastg nthedc evei tre rwmpdeht fastg mempes nh exfeutel to pe uomfeteit nilwhtsnowh ail evei niteddnbeit yntrni iassoy dnmnth pwt nt nh adho ieuehhasg trat re hrowdl pe a uselwdowh ail nbiosait caiatnu yrohe fsevandnib moolh ase ceas ratsel alwdatnoi ail osbnahtnu tsnwmfr ni otres yoslh nt nh ieuehhasg trat re hrowdl rave tre meitadntg affsofsnate to a htate oc yas nt loeh iot mattes yretres tre yas nh autwaddg raffeinib ail hniue io leunhnve vnutosg nh fohhnpde nt loeh iot mattes yretres tre yas nh bonib yedd os paldg add trat nh ieelel nh trat a htate oc yas hrowdl exnht tre hfdnttnib oc tre niteddnbeiue yrnur tre fastg sejwnseh oc nth mempesh ail yrnur nh mose eahndg aurnevel ni ai atmohfrese oc yas nh ioy admoht winveshad pwt tre rnbres wf tre sai

We can see that some words are easily readable to the human eye. In fact, we have guessed quite a lot of information just by looking at the ciphertext and this is dangerous!. If the attacker keeps on gathering encrypted messages in between Alice and Bob, he'll gather a lot of information and eventually will be able to find the key.