# Shift Cipher

Shift cipher is an encryption technique, where all characters of given text are shifted by a certain amount determined by a key, which is an integer value.

Assume a given text P needs to be encrypted using a key k, the cipher text C can be calculated as -
C = P + k

## Implementation

Firstly we define an encoding function that will be used to take plaintext and convert it to a 26 character encoding. (By converting all letters to upper case and discarding all remaining characters).

In [1]:
def encode(string):
    result = ''
    for letter in string:
        if letter.isalpha():
            result += letter.upper()
    return result

Lets declare a plain text that we would need to encrypt.

In [2]:
P = 'Enemy can attack tonight. Stay alert!'

Encoding this string, we get -

In [3]:
T = encode(P)
print(T)

ENEMYCANATTACKTONIGHTSTAYALERT


Since we will be only having 26 characters, we declare Zp as closed ring of 26 integers.

In [4]:
Zp = Integers(26)
print(Zp)

Ring of integers modulo 26


We now declare a key which is just a random number from out key domain (which is numbers from 0 to 25).

In [5]:
from random import choice
k_domain = [i for i in range(26)] # integers from 0 to 25 (included)
key = choice(k_domain)

In [6]:
print('Key - ', key)

Key -  24


In [7]:
def shiftcipher(text, cipher_key):
    cipher = ''
    
    for e in text:
        text_value = ord(e) - ord('A')
        cipher_value = (text_value + cipher_key) % 26
        cipher += chr(int(cipher_value + ord('A')))
    return cipher
    

def shiftdecipher(cipher_text, cipher_key):
    text = ''
    for e in cipher_text:
        cipher_value = ord(e) - ord('A')
        text_value = (cipher_value - cipher_key) % 26
        text += chr(int(text_value + ord('A')))
    return text

Now we test the cipher and decipher algorithms by encrypting and decrypting the text P

In [8]:
T = encode(P)
C = shiftcipher(T, key)
D = shiftdecipher(C, key)
print(f'Given text - "{P}"')
print(f'Encoded - {T}')
print(f'Key - {key}')
print(f'Cipher text - {C}')
print(f'Decipher text - {D}')

Given text - "Enemy can attack tonight. Stay alert!"
Encoded - ENEMYCANATTACKTONIGHTSTAYALERT
Key - 24
Cipher text - CLCKWAYLYRRYAIRMLGEFRQRYWYJCPR
Decipher text - ENEMYCANATTACKTONIGHTSTAYALERT


## Test Against Builtin Cipher

Now, we can test the result against the built-in Shift Cipher in sagemath.

In [9]:
A = ShiftCryptosystem(AlphabeticStrings())
E = A.encoding(encode(P))
print(f'Text - {P}')
print(f'Encoded - {E}')
print(f'Key -\b{key}')
C_test = A.enciphering(key, E)
D_test = A.deciphering(key, C_test)

# convert to python string
C_test = str(C_test)
D_test = str(D_test)

print(f'Cipher text - {C_test}')
print(f'Decipher text - {D_test}')

Text - Enemy can attack tonight. Stay alert!
Encoded - ENEMYCANATTACKTONIGHTSTAYALERT
Key -24
Cipher text - CLCKWAYLYRRYAIRMLGEFRQRYWYJCPR
Decipher text - ENEMYCANATTACKTONIGHTSTAYALERT


Comparing the built in cipher result with our implementation -

In [10]:
print('Results \t Implementation \t Built-in\n')
print(f'Cipher Text \t {C} \t {C_test}')
print(f'Decipher Text \t {D} \t {D_test}\n')
if C_test == C and D_test == D:
    print('Implementation is CORRECT')
else:
    print('Implementatiokn is INCORRECT')

Results 	 Implementation 	 Built-in

Cipher Text 	 CLCKWAYLYRRYAIRMLGEFRQRYWYJCPR 	 CLCKWAYLYRRYAIRMLGEFRQRYWYJCPR
Decipher Text 	 ENEMYCANATTACKTONIGHTSTAYALERT 	 ENEMYCANATTACKTONIGHTSTAYALERT

Implementation is CORRECT


## Cryptoanalysis

### Brute Force Attack

Since the key domain for shift cipher is very small, a brute force attack is very easy to carry out, making this cipher very weak.

In [11]:
print('Total keys -', len(k_domain))

Total keys - 26


We must now get a list of english words that can be used to detect existence of english words in our bruteforced decipher text.

A good list of 3000 most used english words is here -
https://github.com/aneeshsharma/EnglishWords/raw/main/common3000.txt

We download the list of words and convert it to a list and then convert all words into upper case.

In [12]:
import requests
url = 'https://github.com/aneeshsharma/EnglishWords/raw/main/common3000.txt'

words_file = requests.get(url, allow_redirects=True)
words_file_obj = open('words.txt', 'wb')
words_file_obj.write(words_file.content)
words_file_obj.close()

In [13]:
words = open('words.txt').read().split()
words = [word.upper() for word in words]

In [14]:
print(f'Number of words in dictionary - {len(words)}')

Number of words in dictionary - 3000


A function can be defined to find all substrings in a string that are among the 3000 most common english words. This can give us a measure of the likelihood of the string being an english sentance.

In [15]:
# function to find english words in a string according to word list
def find_words(string):
    l = len(string)
    found = []
    for i in range(l):
        for j in range(i, l):
            word = string[i:j+1]
            if len(word) <= 1:
                continue
            if word in words:
                found.append(string[i:j+1])
    return found

Now, we must try to decipher the encrypted text using the list of keys we have and try to compare and count any english words found in the text. More the words detected, more likely is it that the key is correct.

In [16]:
keys = {}
max_words = 0
for candidate in k_domain:
    candidate_text = shiftdecipher(C, candidate)
    found = find_words(candidate_text)
    if len(found) > 3:
        if len(found) > max_words:
            max_words = len(found)
        keys[candidate] = len(found)

print('Key \t\t Likelihood')
for likely_key in keys:
    print(f'{likely_key} \t\t {keys[likely_key]}')

Key 		 Likelihood
9 		 4
17 		 4
24 		 10


Now that we have a list of keys and their likelihood of being correct, we can display the keys and the possible plain text that are the most likely to be correct.

In [17]:
text_list = [[] for _ in range(max_words + 1)]
for likely_key in keys:
    count = keys[likely_key]
    text_list[count].append(shiftdecipher(C, likely_key))

print('Most likely strings -')
for text in text_list[max_words]:
    print(f'{text}')

Most likely strings -
ENEMYCANATTACKTONIGHTSTAYALERT


### Statistical Mono

We get the frequency of all alphabets in English language. The file at - https://github.com/aneeshsharma/EnglishWords/blob/main/character_frequency.json is a json file with frequency of all letters in English language.

In [18]:
import requests
url = 'https://github.com/aneeshsharma/EnglishWords/raw/main/character_frequency.json'

characters_list = requests.get(url, allow_redirects=True)
characters_file_obj = open('characters.json', 'wb')
characters_file_obj.write(characters_list.content)
characters_file_obj.close()

In [19]:
def dict_to_sorted_tuple(d):
    d = [(key, d[key]) for key in d]
    return sorted(d, key=lambda item: -item[1])

In [20]:
import json
frequencies_file = json.loads(open('characters.json', 'r').read())
frequencies = dict_to_sorted_tuple(frequencies_file)

In [21]:
frequencies

[('E', 12.6),
 ('T', 9.37),
 ('A', 8.34),
 ('O', 7.7),
 ('N', 6.8),
 ('I', 6.71),
 ('H', 6.11),
 ('S', 6.11),
 ('R', 5.68),
 ('L', 4.24),
 ('D', 4.14),
 ('U', 2.85),
 ('C', 2.73),
 ('M', 2.53),
 ('W', 2.34),
 ('Y', 2.04),
 ('F', 2.03),
 ('G', 1.92),
 ('P', 1.66),
 ('B', 1.54),
 ('V', 1.06),
 ('K', 0.87),
 ('J', 0.23),
 ('X', 0.2),
 ('Q', 0.09),
 ('Z', 0.06)]

Now, we try to map the most frequent character in the cipher text to most frequent english letter and check, if not we try with the next most frequent enlglish letter and so on.

In [22]:
def frequency(string):
    f = {}
    for x in string:
        if x in f:
            f[x] += 1
        else:
            f[x] = 1
    return f

In [23]:
C_freq = dict_to_sorted_tuple(frequency(C))

In [24]:
statistic_keys = {}
max_words = 0
for freq in frequencies:
    letter = freq[0]
    candidate = (ord(C_freq[0][0]) - ord(letter)) % 26
    candidate_text = shiftdecipher(C, candidate)
    found = find_words(candidate_text)
    if len(found) > 3:
        if len(found) > max_words:
            max_words = len(found)
        statistic_keys[candidate] = len(found)

print('Key \t\t Likelihood')
for likely_key in statistic_keys:
    print(f'{likely_key} \t\t {keys[likely_key]}')

Key 		 Likelihood
24 		 10
17 		 4
9 		 4


In [25]:
text_list = [[] for _ in range(max_words + 1)]
for likely_key in statistic_keys:
    count = keys[likely_key]
    text_list[count].append(shiftdecipher(C, likely_key))

print('Most likely strings -')
for text in text_list[max_words]:
    print(f'{text}')

Most likely strings -
ENEMYCANATTACKTONIGHTSTAYALERT
