# **Cryptography - Edgar Allan Poe's The Gold Bug**

###The following script has the aim to reproduce in a computational way the cryptogram around which Edgar Allan poe's tale The Gold Bug resolves. It belongs to the third chapter of the Rebecca Guolo's Bachelor thesis "Debugging Edgar Allan Poe's cryptography".

####*Frequency analysis of Poe's cryptogram.*  
Firstly, it is shown the cryptogram found by Legrand, rendered as a list in order to carry on further investigations. Moreover, it is printed the number of the symbols that make it up. 

In [76]:
PoeCiphertext = "53‡‡†305))6*;4826)4‡.)4‡);806*;48†8¶60))85;1‡(;:‡*8†83(88)5*†;46(;88*96*?;8)*‡(;485);5*†2:*‡(;4956*2(5*—4)8¶8*;4069285);)6†8)4‡‡;1(‡9;48081;8:8‡1;48†85;4)485†528806*81(‡9;48;(88;4(‡?34;48)4‡;161;:188;‡?;"
print(PoeCiphertext)

PoeCryptogram = list(PoeCiphertext)
print(PoeCryptogram)

print(len(PoeCryptogram))

53‡‡†305))6*;4826)4‡.)4‡);806*;48†8¶60))85;1‡(;:‡*8†83(88)5*†;46(;88*96*?;8)*‡(;485);5*†2:*‡(;4956*2(5*—4)8¶8*;4069285);)6†8)4‡‡;1(‡9;48081;8:8‡1;48†85;4)485†528806*81(‡9;48;(88;4(‡?34;48)4‡;161;:188;‡?;
['5', '3', '‡', '‡', '†', '3', '0', '5', ')', ')', '6', '*', ';', '4', '8', '2', '6', ')', '4', '‡', '.', ')', '4', '‡', ')', ';', '8', '0', '6', '*', ';', '4', '8', '†', '8', '¶', '6', '0', ')', ')', '8', '5', ';', '1', '‡', '(', ';', ':', '‡', '*', '8', '†', '8', '3', '(', '8', '8', ')', '5', '*', '†', ';', '4', '6', '(', ';', '8', '8', '*', '9', '6', '*', '?', ';', '8', ')', '*', '‡', '(', ';', '4', '8', '5', ')', ';', '5', '*', '†', '2', ':', '*', '‡', '(', ';', '4', '9', '5', '6', '*', '2', '(', '5', '*', '—', '4', ')', '8', '¶', '8', '*', ';', '4', '0', '6', '9', '2', '8', '5', ')', ';', ')', '6', '†', '8', ')', '4', '‡', '‡', ';', '1', '(', '‡', '9', ';', '4', '8', '0', '8', '1', ';', '8', ':', '8', '‡', '1', ';', '4', '8', '†', '8', '5', ';', '4', ')', '4', '8', '5', '†', '5', 

The initial purpose is to define each symbol that composes the cryptogram, together with its frequency. 
The following function prints a dictionary where the keys are the symbols and their recurrence are the values. 

In [77]:
def Recurrence(ciphertext):
  DictFreq = {}
  for symbol in ciphertext:
    if symbol in DictFreq:
        DictFreq[symbol] += 1
    else:
        DictFreq[symbol] = 1
  return DictFreq

DictCiphertext = Recurrence(PoeCryptogram)
print(DictCiphertext)

{'5': 12, '3': 4, '‡': 16, '†': 8, '0': 6, ')': 16, '6': 11, '*': 13, ';': 26, '4': 19, '8': 33, '2': 5, '.': 1, '¶': 2, '1': 8, '(': 10, ':': 4, '9': 5, '?': 3, '—': 1}


Starting with the Recurrence dictionary, it is possible to calculate the frequency. 

In [78]:
def Frequency(dictionary):
  totFreq = sum(dictionary.values())
  for symbol in dictionary:
    dictionary[symbol] = float(dictionary[symbol]/totFreq)
  return dictionary
print(Frequency(DictCiphertext))

freqPoe = Frequency(DictCiphertext)

{'5': 0.059113300492610835, '3': 0.019704433497536946, '‡': 0.07881773399014778, '†': 0.03940886699507389, '0': 0.029556650246305417, ')': 0.07881773399014778, '6': 0.054187192118226604, '*': 0.06403940886699508, ';': 0.12807881773399016, '4': 0.09359605911330049, '8': 0.1625615763546798, '2': 0.024630541871921183, '.': 0.0049261083743842365, '¶': 0.009852216748768473, '1': 0.03940886699507389, '(': 0.04926108374384237, ':': 0.019704433497536946, '9': 0.024630541871921183, '?': 0.014778325123152709, '—': 0.0049261083743842365}


To complete, the symbols are organized visualizing them from the most frequent to the less recurrent ones. 

In [79]:
def orderFreq(dictionary):
  return dict(sorted(dictionary.items(), key=lambda item: item[1], reverse=True))
freqCryptogram = orderFreq(freqPoe)

for (symbol,freq) in freqCryptogram.items():
  print("{:<2}: {:1.4f}".format(symbol,freq))

8 : 0.1626
; : 0.1281
4 : 0.0936
‡ : 0.0788
) : 0.0788
* : 0.0640
5 : 0.0591
6 : 0.0542
( : 0.0493
† : 0.0394
1 : 0.0394
0 : 0.0296
2 : 0.0246
9 : 0.0246
3 : 0.0197
: : 0.0197
? : 0.0148
¶ : 0.0099
. : 0.0049
— : 0.0049


####*Corpora frequency analysis*  
The usage of a corpus is essential to decrypt following the same path of Legrand.  
Two corpora have been created to compare if letters frequencies changed during the centuries: the first one reflects the most frequent letters of the 19th century - Poe's time, while the second one depicts the contemporary frequencies of English letters.  
The decryption will be made using both frequency analyses, one at the time.  


 1. The first corpus has been created taking in consideration all the tales (retrieved from https://www.gutenberg.org/) written by the author Edgar Allan Poe, obviously with the exception of The Golg Bug. The texts have been transformed in lowercase and punctuation has been removed; then words have been split in letters in order to calculate their frequency. 

In [80]:
def noPunctuation(text):
  noPunct = []
  for word in text:
    if word.isalnum():
      noPunct.append(word)
  return noPunct

def noCAP(text):
  min = text.lower()
  return min

In [81]:
with open("EAP_tales.txt", mode="r", encoding="UTF-8") as EAP_tales: 
  tales = EAP_tales.read()
  ncTales = noCAP(tales)
  npTales = noPunctuation(ncTales)
  rTales = Recurrence(npTales)
  freqTales = Frequency(rTales)
  freqPoesTime = orderFreq(freqTales)
  for (symbol,freq) in freqPoesTime.items():
    print("{:<2}: {:1.4f}".format(symbol,freq)) #first 26 results are alphabet's letters

e : 0.1275
t : 0.0942
a : 0.0782
o : 0.0752
i : 0.0729
n : 0.0701
s : 0.0604
h : 0.0590
r : 0.0574
d : 0.0416
l : 0.0399
u : 0.0294
c : 0.0272
m : 0.0265
f : 0.0253
w : 0.0206
p : 0.0198
y : 0.0192
g : 0.0184
b : 0.0156
v : 0.0108
k : 0.0050
x : 0.0021
q : 0.0012
j : 0.0009
z : 0.0007
1 : 0.0001
é : 0.0001
0 : 0.0001
2 : 0.0001
8 : 0.0000
3 : 0.0000
ê : 0.0000
5 : 0.0000
è : 0.0000
4 : 0.0000
7 : 0.0000
æ : 0.0000
6 : 0.0000
ö : 0.0000
à : 0.0000
9 : 0.0000
â : 0.0000
α : 0.0000
ô : 0.0000
œ : 0.0000
υ : 0.0000
ο : 0.0000
ε : 0.0000
ι : 0.0000
λ : 0.0000
σ : 0.0000
τ : 0.0000
ν : 0.0000
ς : 0.0000
π : 0.0000
μ : 0.0000
ρ : 0.0000
ë : 0.0000
ï : 0.0000
ä : 0.0000
ξ : 0.0000
δ : 0.0000
ῦ : 0.0000
γ : 0.0000
ω : 0.0000
κ : 0.0000
χ : 0.0000
½ : 0.0000
η : 0.0000
ú : 0.0000
φ : 0.0000
î : 0.0000
û : 0.0000
ü : 0.0000
ῆ : 0.0000
ç : 0.0000


2. Switching to nowadays letters frequencies, the following corpus has been created using COCA Corpus as a base, since it offers the list of the most frequent English words. Once that the Corpus of Contemporary American English has been downloaded (retrieved from www.wordfrequency.info), a .csv file has been created concatenating the column H "Words" of the third sheet "WordForms", together with column I "WordFreq". The .csv file is the starting point for the creation of letters' frequencies corpus.

In [82]:
from csv import reader
frequentEnglishWords = []
with open("WF.csv") as WF:
  next(WF)
  for row in WF:
    frequentEnglishWords.append(row.split()[0])

nop = noPunctuation(str(frequentEnglishWords))
nocap = (noCAP(''.join(nop)))
Rnop = Recurrence(nocap)
FRnop = Frequency(Rnop)
freqContemporary = orderFreq(FRnop)

for (symbol,freq) in freqContemporary.items():
  print("{:<2}: {:1.4f}".format(symbol,freq))

e : 0.1257
s : 0.0993
i : 0.0844
r : 0.0734
t : 0.0729
n : 0.0727
a : 0.0697
o : 0.0591
l : 0.0440
c : 0.0433
d : 0.0406
g : 0.0317
p : 0.0304
u : 0.0284
m : 0.0246
h : 0.0217
f : 0.0147
b : 0.0141
v : 0.0122
y : 0.0111
w : 0.0096
k : 0.0088
x : 0.0030
j : 0.0016
q : 0.0015
z : 0.0014


####*Cracking the cryptogram in a computationl way*  
Once obtained these three different frequency analyses, it is time to decrypt Legrand's cryptogram in a computational way, substituing the most frequent symbol with the corresponding most frequent letter of Poe's time (repeating the procedure for every symbol/English letter) and then doing it again, but with contemporary frequencies.

In [83]:
LfreqCryptogram = list(freqCryptogram) #lists have an index, differently from dictionaries
LfreqPoesTime = list(freqPoesTime)
LfreqContemporary = list(freqContemporary)

print(len(LfreqCryptogram)) #since the cryptogram has 20 unique symbols, only the first 20 letters of both corpora are needed for the substitution, i.e. the decryption

dictPoesTime = dict(zip(LfreqCryptogram, LfreqPoesTime[0:20])) 
print(dictPoesTime)

dictContemporary = dict(zip(LfreqCryptogram, LfreqContemporary[0:20])) 
print(dictContemporary)

decryptionPoesTime = []
for i in PoeCryptogram:
  for key, value in dictPoesTime.items():
    if key == i:
      decryptionPoesTime.append(value)
print(decryptionPoesTime)

decryptionContemporary = []
for i in PoeCryptogram:
  for key, value in dictContemporary.items():
    if key == i:
      decryptionContemporary.append(value)
print(decryptionContemporary)

20
{'8': 'e', ';': 't', '4': 'a', '‡': 'o', ')': 'i', '*': 'n', '5': 's', '6': 'h', '(': 'r', '†': 'd', '1': 'l', '0': 'u', '2': 'c', '9': 'm', '3': 'f', ':': 'w', '?': 'p', '¶': 'y', '.': 'g', '—': 'b'}
{'8': 'e', ';': 's', '4': 'i', '‡': 'r', ')': 't', '*': 'n', '5': 'a', '6': 'o', '(': 'l', '†': 'c', '1': 'd', '0': 'g', '2': 'p', '9': 'u', '3': 'm', ':': 'h', '?': 'f', '¶': 'b', '.': 'v', '—': 'y'}
['s', 'f', 'o', 'o', 'd', 'f', 'u', 's', 'i', 'i', 'h', 'n', 't', 'a', 'e', 'c', 'h', 'i', 'a', 'o', 'g', 'i', 'a', 'o', 'i', 't', 'e', 'u', 'h', 'n', 't', 'a', 'e', 'd', 'e', 'y', 'h', 'u', 'i', 'i', 'e', 's', 't', 'l', 'o', 'r', 't', 'w', 'o', 'n', 'e', 'd', 'e', 'f', 'r', 'e', 'e', 'i', 's', 'n', 'd', 't', 'a', 'h', 'r', 't', 'e', 'e', 'n', 'm', 'h', 'n', 'p', 't', 'e', 'i', 'n', 'o', 'r', 't', 'a', 'e', 's', 'i', 't', 's', 'n', 'd', 'c', 'w', 'n', 'o', 'r', 't', 'a', 'm', 's', 'h', 'n', 'c', 'r', 's', 'n', 'b', 'a', 'i', 'e', 'y', 'e', 'n', 't', 'a', 'u', 'h', 'm', 'c', 'e', 's', 'i',

The decryption based on the letters' frequency at Poe's time is the following: (the original cryptogram is printed before to compare it with the solution; afterwards the real solution of the cryptogram is proposed)

In [84]:
decryption = "A good glass in the bishop’s hostel in the devil’s seat forty-one degrees and thirteen minutes north-east and by north main branch seventh limb east side shoot from the left eye of the death’s-head a bee-line from the tree trough the shot fifty feet out."
print(PoeCiphertext)
print(''.join(decryptionPoesTime))
print("The decryption proposed by Legrand is the following: {}".format(decryption))

53‡‡†305))6*;4826)4‡.)4‡);806*;48†8¶60))85;1‡(;:‡*8†83(88)5*†;46(;88*96*?;8)*‡(;485);5*†2:*‡(;4956*2(5*—4)8¶8*;4069285);)6†8)4‡‡;1(‡9;48081;8:8‡1;48†85;4)485†528806*81(‡9;48;(88;4(‡?34;48)4‡;161;:188;‡?;
sfoodfusiihntaechiaogiaoiteuhntaedeyhuiiestlortwonedefreeisndtahrteenmhnpteinortaesitsndcwnortamshncrsnbaieyentauhmcesitihdeiaootlromtaeuelteweoltaedestaiaesdsceeuhnelromtaetreetaropfataeiaotlhltwleetopt
The decryption proposed by Legrand is the following: A good glass in the bishop’s hostel in the devil’s seat forty-one degrees and thirteen minutes north-east and by north main branch seventh limb east side shoot from the left eye of the death’s-head a bee-line from the tree trough the shot fifty feet out.


The decryption based on the contemporary letters' frequency is the following: (the original cryptogram is printed before to compare it with the solution; afterwards the real solution of the cryptogram is proposed)

In [85]:
print(PoeCiphertext)
print(''.join(decryptionContemporary))
print("The decryption proposed by Legrand is the following: {}".format(decryption))

53‡‡†305))6*;4826)4‡.)4‡);806*;48†8¶60))85;1‡(;:‡*8†83(88)5*†;46(;88*96*?;8)*‡(;485);5*†2:*‡(;4956*2(5*—4)8¶8*;4069285);)6†8)4‡‡;1(‡9;48081;8:8‡1;48†85;4)485†528806*81(‡9;48;(88;4(‡?34;48)4‡;161;:188;‡?;
amrrcmgattonsiepotirvtirtsegonsiecebogtteasdrlshrnecemleetancsiolseenuonfsetnrlsieatsancphnrlsiuaonplanyitebensigoupeatstocetirrsdlrusiegedseherdsieceasitieacapeegonedlrusiesleesilrfmisietirsdodshdeesrfs
The decryption proposed by Legrand is the following: A good glass in the bishop’s hostel in the devil’s seat forty-one degrees and thirteen minutes north-east and by north main branch seventh limb east side shoot from the left eye of the death’s-head a bee-line from the tree trough the shot fifty feet out.


An analysis of the results and considerations about them will be presented in the third chapter's second paragraph of the above-mentioned Bachelor thesis.

####*Encryption*    
Even the opposite process can be accomplished in a computational way, i.e. the encryption.  

Following modern developments, the kind of encryption proposed by Edgar Allan Poe appears undoubtedly obsolete. Consequently, libraries like *cryptography* allow the creation of ciphertexts based on the AES Algorithm, which is not unbreakable, but it still has an higher level of security than Poe's.  
The code that follows has been retrieved from https://www.thepythoncode.com/article/encrypt-decrypt-files-symmetric-python and has been adapted to Legrand's cryptogram.

In [86]:
!pip install cryptography
from cryptography.fernet import Fernet

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [87]:
key = Fernet.generate_key()
with open("AES_key.txt", "wb") as key_file:
  key_file.write(key)

message = decryption.encode()
f = Fernet(key)
encrypted = f.encrypt(message)
print(encrypted)

b'gAAAAABjHzJXl_pTeI91Xpf7kENpgeSR_Jfb0plpTabGilxfqL2i8COr42e4RElmvho160F6CTPS49VbCACPjSBnKeuGaHmbeREv8T1QpA0K-vb_aZl9NRwcub-uOVJLlh8qThE7Foq7Mk-juxXW27pAsnf9Z8Bl_0eUmzapcykVRwM6WtjBN55PTjk6WPJcBYt7z9EgaZG6BsjIu0piUvQX0e9Va5r7Xv4WZNyv4w6H68gg4U3_lw1ieYjwn73nua0mhIvAWU7wycBzBrS1TT25k0xe58TeSweHn36rVJfB8a41lZbM0vG0v9IPPOB-8wiy4WTE-KZgHOOgrsu5Mqo3evHKihkYUYFImoTM-guWIMXz9jQjTh0p-wzZrAxhycXZxU3XVQ6hJoja62TWp6yEfAvht1gFni7upSf5iK7fHRHanbrL76g='


Another way to encrypt the message using more modern technologies is generating both a public and a private key through the library *rsa*. In this case only the former is necessary since the latter allows the decryption.  
The code has been retrieved and adapted from https://stuvel.eu/python-rsa-doc/usage.html#generating-keys

In [88]:
import rsa
(pubkey, privkey) = rsa.newkeys(2162) #"RSA can only encrypt messages that are smaller than the key. A couple of bytes are lost on random padding, and the rest is available for the message itself."
keys = (pubkey, privkey)

with open("rsa.text", 'w+') as fp:
  fp.write(pubkey.save_pkcs1().decode())
  fp.write(privkey.save_pkcs1().decode())

message = decryption.encode('utf8')
crypto = rsa.encrypt(message, pubkey)
print(crypto)

b"\x02\x13\xc7\xc2\xcf\x81{\xd8u\xff\x08\xac(\xe0Y\xea\x19\xde>e\xbd\xb3\x9c\xbe\x96X\x1c\x18p\r!\x9a\xe1\x84{Z\xb1z)\xb9E\x0e&\xd6Yi>k-&\xd7iQ\x8f\tW\x8fV$x\xb1\xab+\x88\xcc\xd0\xdeH\x8b\xfeO^\x18\x9b\xeeu{\xfd\xbb\xca\x15\x9f\xa7O\xd9\x8dF\xcb\xa4%\x03\xe3m\xe3\xfb\xabz\xa8\x92\xee\xf2\x7f\xa0\xc8?\xc2\x03\xe9\xb4K\x1e1dg\x07*\xfb\xb5VqM\xcc\xa9\xf2\xa8\xb4+AN\x99\xac\x01aD\x19\xd1X\xc9\xc4\xb5m\xa4\xfc(%\x07\x1f\xaf\x12;\xa1\x8fA\xf9\xc7\xaf\xc5\xfc?6\xcb\xcf<\xa6\xabU,J\x15\xbf\xbe\xfc\xd1M'\xa9\xda\xb5\x1a\x96\xbc\xa8\xe5\x05O\x8c_\x16\xc9\xee\x9b\x01i\xbd1z\xb97.m\xbcH\xa1Dx\xb4\xbe\xd0\xcd\xad\xbf.~X\xe3\xda\x08\xa7/\x8b\x89\xed!\xe8A\x17w\x99\x07\x82\xdbb\x13a\x06\xee3b\xf3\x95\x86|\x1b\x82\x98\xd4\xa2\x7fED\xe4C\xc1\x0e\x9e\xcb>\xd3T\x15W\xb6\xab\x95\x97\xb3`\x8f\x98\x9b\x8d"
