# Assignment

The file substitution.bin is a substitution cipher of an English language document, using ASCII
character encoding. Your task is to figure out the substitution pattern used to encrypt the document.
Then, using the substitution, provide the encryption of your kerberos email in lowercase (including
the @mit.edu) as this answer. Write each byte in hexadecimal notation, e.g 03, a2, 4e.

In [11]:
import numpy as np
import pandas as pd

import substitution as sub

In [45]:
cipher_bytes = sub.read_file("substitution.bin")
print('message length in bytes (characters):')
print(len(cipher))

message length in bytes (characters):
10865


In [28]:
print(cipher[:100])

b'%MDXTXZmff:*M*[AMCmfOZZHK:TUf:(f:*[f%MX*(fHCgX*If(9ZfTMCgmZqf[MCf:gZfxZDTMAZf(MfmZ*HfCmf:*M*[AMCmfOZ'


In [27]:
sub_test = sub.try_substitution({58: 'a'}, cipher_bytes)

In [29]:
print(sub_test[:100])

 37 77 68 88 84 88 90 109 102 102 a 42 77 42 91 65 77 67 109 102 79 90 90 72 75 a 84 85 102 a 40 102


Get the cipher as a sequence of ints representing the bytes.

In [46]:
# Get the cipher as a list of ASCII decimals

chars_list = [b for b in cipher]
chars_s = pd.Series(chars_list)
assert(len(chars_list) == len(cipher))
print(chars_list[:10])

[37, 77, 68, 88, 84, 88, 90, 109, 102, 102]


How many unique characters are in the text?

- There are 95 printable ASCII characters in total: https://en.wikipedia.org/wiki/ASCII#Printable_characters
- codes 32 - 126

In [35]:
printable_chars = list(range(32,127))

In [42]:
print('How many unique characters?')
print(chars_s.nunique())
print('Are the all in the list of printable characters?')
print(pd.Series(chars_s.unique()).isin(printable_chars).sum() == chars_s.nunique())
print('yes')

How many unique characters?
51
Are the all in the list of printable characters?
True
yes


## Basic frequency analysis

Get the distribution of characters, etc. Match it against some known distributions from English language text.
- char counts
- bigram counts
- trigram counts

In [94]:
char_counts_df = (
    chars_s.value_counts().rename('count').to_frame().reset_index()
    .rename(columns={'index':'c'})
    .assign(freq=lambda x: x['count']/len(chars_list)).round(3)
    .sort_values(by='count',ascending=False)
)
print('distribution of character counts:')
print(char_counts_df.describe())
char_counts_df.head(5)

distribution of character counts:
                c        count       freq
count   51.000000    51.000000  51.000000
mean    81.764706   213.039216   0.019608
std     24.694605   346.754897   0.031858
min     32.000000     1.000000   0.000000
25%     64.500000     6.500000   0.001000
50%     81.000000    38.000000   0.003000
75%    102.500000   263.500000   0.024000
max    124.000000  1863.000000   0.171000


Unnamed: 0,c,count,freq
0,102,1863,0.171
1,90,1038,0.096
2,40,786,0.072
3,77,763,0.07
4,109,650,0.06


It seems c 102 might be the space character. I expect the space character to be well distributed and not appear with multiple in a row.

Does 102 ever occur with 2 in a row? This will also be answered if we just collect the bigrams which we need to do anyhow.

In [83]:
# Get the bigrams
bigrams_list = []
for i in range(1, len(chars_list)):
    bigrams_list += [[chars_list[i-1], chars_list[i]]]
# Get the trigrams
trigrams_list = []
for i in range(2, len(chars_list)):
    trigrams_list += [[chars_list[i-2], chars_list[i-1], chars_list[i]]]

In [85]:
print('chars_list[:6]',chars_list[:6])
print('bigrams_list[:5]:', (bigrams_list[:5]))
print('trigrams_list[:5]:', (trigrams_list[:4]))

chars_list[:6] [37, 77, 68, 88, 84, 88]
bigrams_list[:5]: [[37, 77], [77, 68], [68, 88], [88, 84], [84, 88]]
trigrams_list[:5]: [[37, 77, 68], [77, 68, 88], [68, 88, 84], [88, 84, 88]]


In [96]:
bigram_counts_df = (
    pd.Series(bigrams_list).apply(lambda b: ','.join([str(c) for c in b]))
    .value_counts().rename('count').to_frame().reset_index()
    .rename(columns={'index':'c'})
    .assign(freq=lambda x: x['count']/len(bigrams_list)).round(3)
    .sort_values(by='count',ascending=False)
)
print('distribution of bigram counts:')
print(bigram_counts_df.describe())

distribution of bigram counts:
            count        freq
count  543.000000  543.000000
mean    20.007366    0.001766
std     35.392162    0.003304
min      1.000000    0.000000
25%      1.000000    0.000000
50%      5.000000    0.000000
75%     22.000000    0.002000
max    324.000000    0.030000


In [99]:
trigram_counts_df = (
    pd.Series(trigrams_list).apply(lambda b: ','.join([str(c) for c in b]))
    .value_counts().rename('count').to_frame().reset_index()
    .rename(columns={'index':'c'})
    .assign(freq=lambda x: x['count']/len(trigrams_list)).round(3)
    .sort_values(by='count',ascending=False)
)
print('distribution of trigram counts:')
print(trigram_counts_df.describe())

distribution of trigram counts:
             count         freq
count  2133.000000  2133.000000
mean      5.092827     0.000359
std       9.395461     0.000923
min       1.000000     0.000000
25%       1.000000     0.000000
50%       2.000000     0.000000
75%       5.000000     0.000000
max     158.000000     0.015000


In [95]:
print('char counts')
char_counts_df.head(10)

char counts


Unnamed: 0,c,count,freq
0,102,1863,0.171
1,90,1038,0.096
2,40,786,0.072
3,77,763,0.07
4,109,650,0.06
5,58,620,0.057
6,103,589,0.054
7,88,568,0.052
8,42,530,0.049
9,68,386,0.036


In [98]:
print('bigram counts')
bigram_counts_df.head(10)

bigram counts


Unnamed: 0,c,count,freq
0,90102,324,0.03
1,10240,235,0.022
2,4057,201,0.019
3,7767,195,0.018
4,10258,191,0.018
5,109102,179,0.016
6,40102,160,0.015
7,5790,148,0.014
8,102109,137,0.013
9,8842,130,0.012


In [100]:
print('trigram counts')
trigram_counts_df.head(10)

trigram counts


Unnamed: 0,c,count,freq
0,1024057,158,0.015
1,405790,120,0.011
2,917767,102,0.009
3,1029177,101,0.009
4,5790102,99,0.009
5,7767103,80,0.007
6,1028477,76,0.007
7,1025842,64,0.006
8,7767102,62,0.006
9,102102102,60,0.006


In [92]:
# Is 102 space? Is there a bigram '102,102': Yes
print(bigram_counts.loc['90,102'])
print(bigram_counts.loc['90,90'])
'90,102' in bigram_counts.index
bigram_counts.loc['102,102']

324
27


108

In [106]:
# Are there any characters that appear at least 3 times in a row?
# i.e. trigrams of all the same character?
s_trigrams = []
for (a,b,c) in trigrams_list:
    if (a==b) and (b==c):
        s_trigrams += [a]
print('%s times characters appear 3 times in a row. Unique characters:' % len(s_trigrams))
pd.Series(s_trigrams).value_counts()

60 times characters appear 3 times in a row. Unique characters:


102    60
dtype: int64

Something strange is going on with char 120; seems only to validate it as a space.

In [112]:
sub_test = sub.try_substitution({102: ' '}, cipher_bytes)
# sub_test

The most common trigram is 'THE'. Assuming '102' is ' ' and then excluding most common trigram '102,40,57', then consider our second most popular trigram: '40,57,90'

This makes sense as 'THE' because '40' is 2nd most common character which matches what is known about 'T'. And '57' as 'H' is not in 10 top freqent characters. And '90' a 'E' is 2nd top character after '102' as ' '.

So let's try assuming 120=' ' and 40,57,90='THE'

In [169]:
substitution = {
    102:' ',40: 't', 57:'h',90:'e'
}
sub_test = sub.try_substitution(substitution, cipher_bytes)
sub_test[:1000]

' 37 77 68 88 84 88 e 109     58 42 77 42 91 65 77 67 109   79 e e 72 75 58 84 85   58 t   58 42 91   37 77 88 42 t   72 67 103 88 42 73   t h e   84 77 67 103 109 e 113   91 77 67   58 103 e   120 e 68 84 77 65 e   t 77   109 e 42 72   67 109   58 42 77 42 91 65 77 67 109   79 e e 72 75 58 84 85   108 88 58   t h 88 109   79 77 103 65 92   91 77 67   84 58 42   67 109 e   t h e   79 77 103 65   58 t   58 42 91   t 88 65 e 113   79 77 103   58 42 91   103 e 58 109 77 42   81 81   t 77   103 e 37 77 103 t   37 58 103 t   77 79   t h e   84 77 67 103 109 e   91 77 67   68 88 85 e 72   77 103   72 88 72 42 107 t   68 88 85 e 113   77 103   t 77   109 67 73 73 e 109 t   120 58 91 109   t h 58 t   120 e   84 77 67 68 72   88 65 37 103 77 108 e   77 67 103   t e 58 84 h 88 42 73 92   63 67 88 98     120 e   120 88 68 68   h 58 108 e   77 42 e   88 42 81 84 68 58 109 109   63 67 88 98   77 42   t h e   72 58 t e   68 88 109 t e 72   77 42   t h e   84 77 67 103 109 e   84 58 68 e 42 72 58 103

Find 'a'. 'a' should have high frequency and 'aa' should not be a bigram.

In [149]:
# Test the most frequent chars 'c' to see if there are no 'cc'
for i, r in char_counts_df.sort_values(by='count', ascending=False)[['c','freq']].head(8).iterrows():
    c = int(r['c'])
    cc = '%s,%s'%(c,c)
    print('c=%s; freq=%s; cc=%s; freq[%s]=%s' % (
        c, r['freq'], cc, cc, 
        bigram_counts_df.set_index('c').loc[cc]['freq'] if cc in bigram_counts_df['c'].values else 0
    ))    

c=102; freq=0.171; cc=102,102; freq[102,102]=0.01
c=90; freq=0.096; cc=90,90; freq[90,90]=0.002
c=40; freq=0.072; cc=40,40; freq[40,40]=0.001
c=77; freq=0.07; cc=77,77; freq[77,77]=0.0
c=109; freq=0.06; cc=109,109; freq[109,109]=0.005
c=58; freq=0.057; cc=58,58; freq[58,58]=0
c=103; freq=0.054; cc=103,103; freq[103,103]=0.0
c=88; freq=0.052; cc=88,88; freq[88,88]=0


It seems 'a' is likely 77 or 58.

The only one-letter words are 'a' and 'I'. Check occurances of ' a ' by getting trigrams starting and ending in 102=' '.

In [166]:
print('all trigrams starting and beginning with 102:')
trigram_counts_df[trigram_counts_df['c'].apply(lambda c: (c[:3]=='102') and (c[-3:] == '102'))]

all trigrams starting and beginning with 102:


Unnamed: 0,c,count,freq
9,102102102,60,0.006
48,10258102,29,0.003
1909,102105102,1,0.0
1904,10260102,1,0.0


It seems 'a' is '58' and it seems this document does not contain 'i'.

Hehe they gave us a=58 in the example as an extra clue. Thanks.

Perhaps 105 or 60 are capital 'A'.

In [171]:
substitution = {
    102:' ',40: 't', 57:'h',90:'e', 58:'a'
}
sub.try_substitution(substitution, cipher_bytes)[:1000]

' 37 77 68 88 84 88 e 109     a 42 77 42 91 65 77 67 109   79 e e 72 75 a 84 85   a t   a 42 91   37 77 88 42 t   72 67 103 88 42 73   t h e   84 77 67 103 109 e 113   91 77 67   a 103 e   120 e 68 84 77 65 e   t 77   109 e 42 72   67 109   a 42 77 42 91 65 77 67 109   79 e e 72 75 a 84 85   108 88 a   t h 88 109   79 77 103 65 92   91 77 67   84 a 42   67 109 e   t h e   79 77 103 65   a t   a 42 91   t 88 65 e 113   79 77 103   a 42 91   103 e a 109 77 42   81 81   t 77   103 e 37 77 103 t   37 a 103 t   77 79   t h e   84 77 67 103 109 e   91 77 67   68 88 85 e 72   77 103   72 88 72 42 107 t   68 88 85 e 113   77 103   t 77   109 67 73 73 e 109 t   120 a 91 109   t h a t   120 e   84 77 67 68 72   88 65 37 103 77 108 e   77 67 103   t e a 84 h 88 42 73 92   63 67 88 98     120 e   120 88 68 68   h a 108 e   77 42 e   88 42 81 84 68 a 109 109   63 67 88 98   77 42   t h e   72 a t e   68 88 109 t e 72   77 42   t h e   84 77 67 103 109 e   84 a 68 e 42 72 a 103 92   t h e   63 67 88

I found some hints online: The order of most frequent letter combinations.
https://www3.nd.edu/~busiforc/handouts/cryptography/cryptography%20hints.html

- Order Of Frequency Of Most Common Doubles: ss ee tt ff ll mm oo
- Order Of Frequency Of Final Letters: E S T D N R Y F L O G H A K M P U W
- Most Frequent Two-Letter Words: of, to, in, it, is, be, as, at, so, we, he, by, or, on, do, if, me, my, up, an, go, no, us, am

Given this, find 's'.

In [190]:
# reminder of what we have so far:
substitution

{102: ' ', 40: 't', 57: 'h', 90: 'e', 58: 'a'}

In [214]:
# get the common doubles
print('most common doubles:')
def is_double(c):
    [a,b] = c.split(',')
    return (a==b)
bigram_counts_df[bigram_counts_df['c'].apply(is_double)].head(10)

most common doubles:


Unnamed: 0,c,count,freq
17,102102,108,0.01
66,109109,50,0.005
70,6868,48,0.004
122,9090,27,0.002
137,7979,22,0.002
160,3737,15,0.001
221,6565,8,0.001
216,4040,8,0.001
257,4242,6,0.001
281,8181,5,0.0


In [196]:
# get the most common final characters. i.e. characters before 120=' '
print('most common characters before 120=" ":')
bigram_counts_df[bigram_counts_df['c'].apply(lambda c: c[-3:]=='102')].head()

most common characters before 120=" ":


Unnamed: 0,c,count,freq
0,90102,324,0.03
5,109102,179,0.016
6,40102,160,0.015
12,103102,124,0.011
15,42102,113,0.01


's' clearly seems to be 109. 

In [198]:
substitution = {
    102:' ',40: 't', 57:'h',90:'e', 58:'a', 109:'s'
}
sub.try_substitution(substitution, cipher_bytes)[:500]

' 37 77 68 88 84 88 e s     a 42 77 42 91 65 77 67 s   79 e e 72 75 a 84 85   a t   a 42 91   37 77 88 42 t   72 67 103 88 42 73   t h e   84 77 67 103 s e 113   91 77 67   a 103 e   120 e 68 84 77 65 e   t 77   s e 42 72   67 s   a 42 77 42 91 65 77 67 s   79 e e 72 75 a 84 85   108 88 a   t h 88 s   79 77 103 65 92   91 77 67   84 a 42   67 s e   t h e   79 77 103 65   a t   a 42 91   t 88 65 e 113   79 77 103   a 42 91   103 e a s 77 42   81 81   t 77   103 e 37 77 103 t   37 a 103 t   77 79  '

Look at most common 2-letter words.

Use this in combination with character frequencies and known characters to find 'o'.

Do this by getting 4-grams.

In [206]:
# Get the 4grams
fourgrams_list = []
for i in range(3, len(chars_list)):
    fourgrams_list += [[chars_list[i-3], chars_list[i-2], chars_list[i-1], chars_list[i]]]

fourgrams_df = (
    pd.Series(fourgrams_list).apply(lambda b: ','.join([str(c) for c in b]))
    .value_counts().rename('count').to_frame().reset_index()
    .rename(columns={'index':'c'})
    .assign(freq=lambda x: x['count']/len(fourgrams_list)).round(3)
    .sort_values(by='count',ascending=False)
)
fourgrams_df.head()

Unnamed: 0,c,count,freq
0,102405790,110,0.01
1,102917767,101,0.009
2,405790102,99,0.009
3,917767102,62,0.006
4,884273102,49,0.005


In [216]:
# Get the 4grams that start and end with 102=' '
print('most common 2-letter words:')
fourgrams_df[fourgrams_df['c'].apply(lambda c: c[:3]=='102' and (c[-3:]=='102'))].head(10)

most common 2-letter words:


Unnamed: 0,c,count,freq
7,1024077102,44,0.004
10,102102102102,39,0.004
14,1027779102,32,0.003
25,1027742102,24,0.002
27,1028842102,23,0.002
26,10212090102,23,0.002
33,10277103102,21,0.002
38,10288109102,21,0.002
67,1028879102,16,0.001
119,10258109102,12,0.001


77 seems likely 'o' because
- 77 commonly comes after 't'=40 --> 'to'
- 77 commonly comes first in 2-letter words
- 77 among most frequent characters

Find 'n' as a letter commonly in 2-letter words following 'o' or 'a'

In [230]:
# I need a helper method to look at this with substitutions
def apply_subs(c):
    cs = c.split(',')
    ms = [substitution[int(ci)] if int(ci) in substitution else ci for ci in cs]
    return ','.join(ms)

df = fourgrams_df[fourgrams_df['c'].apply(lambda c: c[:3]=='102' and (c[-3:]=='102'))].head(15).copy()
df['c'] = df['c'].apply(apply_subs)
df

Unnamed: 0,c,count,freq
7,",t,o,",44,0.004
10,", , ,",39,0.004
14,",o,79,",32,0.003
25,",o,42,",24,0.002
27,",88,42,",23,0.002
26,",120,e,",23,0.002
33,",o,103,",21,0.002
38,",88,s,",21,0.002
67,",88,79,",16,0.001
119,",a,s,",12,0.001


'n' seems likely 42. 'i' seems likely '88'. In which case 'f' seems likely 79.

In [232]:
substitution = {
    102:' ',40: 't', 57:'h',90:'e', 58:'a', 109:'s', 77:'o', 42: 'n', 88: 'i', 79: 'f'
}
df = fourgrams_df[fourgrams_df['c'].apply(lambda c: c[:3]=='102' and (c[-3:]=='102'))].head(15).copy()
df['c'] = df['c'].apply(apply_subs)
df

Unnamed: 0,c,count,freq
7,",t,o,",44,0.004
10,", , ,",39,0.004
14,",o,f,",32,0.003
25,",o,n,",24,0.002
27,",i,n,",23,0.002
26,",120,e,",23,0.002
33,",o,103,",21,0.002
38,",i,s,",21,0.002
67,",i,f,",16,0.001
119,",a,s,",12,0.001


Find 'd' as the letter that completes common word 'and'

In [233]:
# Get the 5grams
fivegrams_list = []
for i in range(4, len(chars_list)):
    fivegrams_list += [[chars_list[i-4], chars_list[i-3], chars_list[i-2], chars_list[i-1], chars_list[i]]]

fivegrams_df = (
    pd.Series(fivegrams_list).apply(lambda b: ','.join([str(c) for c in b]))
    .value_counts().rename('count').to_frame().reset_index()
    .rename(columns={'index':'c'})
    .assign(freq=lambda x: x['count']/len(fourgrams_list)).round(3)
    .sort_values(by='count',ascending=False)
)
fivegrams_df.head()

Unnamed: 0,c,count,freq
0,102405790102,99,0.009
1,102917767102,62,0.006
2,102917767103,39,0.004
3,102584272102,39,0.004
4,917767103102,39,0.004


In [235]:
df = fivegrams_df[fivegrams_df['c'].apply(lambda c: c[:3]=='102' and (c[-3:]=='102'))].head(10).copy()
df['c'] = df['c'].apply(apply_subs)
df

Unnamed: 0,c,count,freq
0,",t,h,e,",99,0.009
1,",91,o,67,",62,0.006
3,",a,n,72,",39,0.004
5,",f,o,103,",30,0.003
14,", , , ,",21,0.002
25,",a,103,e,",18,0.002
79,",n,o,t,",12,0.001
95,",65,a,91,",11,0.001
88,",84,a,n,",11,0.001
118,", ,120,e,",9,0.001


d seems to be 72; r seems to be 103.

And seems 'y' likely 91 and 'u' likely 67.

In [238]:
substitution = {
    102:' ',40: 't', 57:'h',90:'e', 58:'a', 109:'s', 77:'o', 42: 'n', 88: 'i', 79: 'f',
    103: 'r', 72: 'd', 91:'y', 67:'u'
}
df = fivegrams_df[fivegrams_df['c'].apply(lambda c: c[:3]=='102' and (c[-3:]=='102'))].head(15).copy()
df['c'] = df['c'].apply(apply_subs)
df

Unnamed: 0,c,count,freq
0,",t,h,e,",99,0.009
1,",y,o,u,",62,0.006
3,",a,n,d,",39,0.004
5,",f,o,r,",30,0.003
14,", , , ,",21,0.002
25,",a,r,e,",18,0.002
79,",n,o,t,",12,0.001
95,",65,a,y,",11,0.001
88,",84,a,n,",11,0.001
118,", ,120,e,",9,0.001


Find 'l'. Revisit common characters and common doubles. 'l' should be in the top.

In [239]:
df = bigram_counts_df[bigram_counts_df['c'].apply(is_double)].head(10)
df['c'] = df['c'].apply(apply_subs)
df

Unnamed: 0,c,count,freq
17,",",108,0.01
66,"s,s",50,0.005
70,6868,48,0.004
122,"e,e",27,0.002
137,"f,f",22,0.002
160,3737,15,0.001
221,6565,8,0.001
216,"t,t",8,0.001
257,"n,n",6,0.001
281,8181,5,0.0


l seems likely 68 but not definitely.

In [245]:
sub.try_substitution(substitution, cipher_bytes)[:1000]

' 37 o 68 i 84 i e s     a n o n y 65 o u s   f e e d 75 a 84 85   a t   a n y   37 o i n t   d u r i n 73   t h e   84 o u r s e 113   y o u   a r e   120 e 68 84 o 65 e   t o   s e n d   u s   a n o n y 65 o u s   f e e d 75 a 84 85   108 i a   t h i s   f o r 65 92   y o u   84 a n   u s e   t h e   f o r 65   a t   a n y   t i 65 e 113   f o r   a n y   r e a s o n   81 81   t o   r e 37 o r t   37 a r t   o f   t h e   84 o u r s e   y o u   68 i 85 e d   o r   d i d n 107 t   68 i 85 e 113   o r   t o   s u 73 73 e s t   120 a y s   t h a t   120 e   84 o u 68 d   i 65 37 r o 108 e   o u r   t e a 84 h i n 73 92   63 u i 98     120 e   120 i 68 68   h a 108 e   o n e   i n 81 84 68 a s s   63 u i 98   o n   t h e   d a t e   68 i s t e d   o n   t h e   84 o u r s e   84 a 68 e n d a r 92   t h e   63 u i 98   120 i 68 68   t e s t   y o u r   85 n o 120 68 e d 73 e   o f   65 a t e r i a 68   f r o 65   68 e 84 t u r e s 113   37 r o 75 68 e 65   s e t s 113   a n d   r e a d i 

- p: 37
- m: 65
- b: 75
- c: 84
- k: 85
- v: 108
- w: 120
- l: 68 < confirmed

In [250]:
substitution = {
    102:' ',40: 't', 57:'h',90:'e', 58:'a', 109:'s', 77:'o', 42: 'n', 88: 'i', 79: 'f',
    103: 'r', 72: 'd', 91:'y', 67:'u', 68: 'l', 37: 'p', 65:'m',75:'b',84:'c',85:'k',108:'v',
    120: 'w', 
}
sub.try_substitution(substitution, cipher_bytes)[:1000]

' p o l i c i e s     a n o n y m o u s   f e e d b a c k   a t   a n y   p o i n t   d u r i n 73   t h e   c o u r s e 113   y o u   a r e   w e l c o m e   t o   s e n d   u s   a n o n y m o u s   f e e d b a c k   v i a   t h i s   f o r m 92   y o u   c a n   u s e   t h e   f o r m   a t   a n y   t i m e 113   f o r   a n y   r e a s o n   81 81   t o   r e p o r t   p a r t   o f   t h e   c o u r s e   y o u   l i k e d   o r   d i d n 107 t   l i k e 113   o r   t o   s u 73 73 e s t   w a y s   t h a t   w e   c o u l d   i m p r o v e   o u r   t e a c h i n 73 92   63 u i 98     w e   w i l l   h a v e   o n e   i n 81 c l a s s   63 u i 98   o n   t h e   d a t e   l i s t e d   o n   t h e   c o u r s e   c a l e n d a r 92   t h e   63 u i 98   w i l l   t e s t   y o u r   k n o w l e d 73 e   o f   m a t e r i a l   f r o m   l e c t u r e s 113   p r o b l e m   s e t s 113   a n d   r e a d i n 73 s 92   t h e r e   i s   n o   f i n a l   e 112 a m 92   n o   c o 

Can finish filling this in from here.

In [269]:
substitution[112]='x'
substitution[73]='g'
sub.try_substitution(substitution, cipher_bytes)

' p o l i c i e s     a n o n y m o u s   f e e d b a c k   a t   a n y   p o i n t   d u r i n g   t h e   c o u r s e ,   y o u   a r e   w e l c o m e   t o   s e n d   u s   a n o n y m o u s   f e e d b a c k   v i a   t h i s   f o r m .   y o u   c a n   u s e   t h e   f o r m   a t   a n y   t i m e ,   f o r   a n y   r e a s o n   - -   t o   r e p o r t   p a r t   o f   t h e   c o u r s e   y o u   l i k e d   o r   d i d n \' t   l i k e ,   o r   t o   s u g g e s t   w a y s   t h a t   w e   c o u l d   i m p r o v e   o u r   t e a c h i n g .   q u i z     w e   w i l l   h a v e   o n e   i n - c l a s s   q u i z   o n   t h e   d a t e   l i s t e d   o n   t h e   c o u r s e   c a l e n d a r .   t h e   q u i z   w i l l   t e s t   y o u r   k n o w l e d g e   o f   m a t e r i a l   f r o m   l e c t u r e s ,   p r o b l e m   s e t s ,   a n d   r e a d i n g s .   t h e r e   i s   n o   f i n a l   e x a m .   n o   c o l l a b o r a t i o n   i s   p e

In [270]:
substitution[107]="'"
substitution[63]='q'
substitution[98]='z'
substitution[92]='.'
substitution[113]=','
substitution[32]='j'
substitution[81]='-'
substitution[64]='"'
substitution[118]='/'
substitution[124]='@'
sub.try_substitution(substitution, cipher_bytes)

' p o l i c i e s     a n o n y m o u s   f e e d b a c k   a t   a n y   p o i n t   d u r i n g   t h e   c o u r s e ,   y o u   a r e   w e l c o m e   t o   s e n d   u s   a n o n y m o u s   f e e d b a c k   v i a   t h i s   f o r m .   y o u   c a n   u s e   t h e   f o r m   a t   a n y   t i m e ,   f o r   a n y   r e a s o n   - -   t o   r e p o r t   p a r t   o f   t h e   c o u r s e   y o u   l i k e d   o r   d i d n \' t   l i k e ,   o r   t o   s u g g e s t   w a y s   t h a t   w e   c o u l d   i m p r o v e   o u r   t e a c h i n g .   q u i z     w e   w i l l   h a v e   o n e   i n - c l a s s   q u i z   o n   t h e   d a t e   l i s t e d   o n   t h e   c o u r s e   c a l e n d a r .   t h e   q u i z   w i l l   t e s t   y o u r   k n o w l e d g e   o f   m a t e r i a l   f r o m   l e c t u r e s ,   p r o b l e m   s e t s ,   a n d   r e a d i n g s .   t h e r e   i s   n o   f i n a l   e x a m .   n o   c o l l a b o r a t i o n   i s   p e

At this point I just want to finish the assignment. I have enough characters to encode my email.

In [281]:
reverse_substitution = {a:b for (b,a) in substitution.items()}
def encrypt(m):
    return [reverse_substitution[char] if char in reverse_substitution else char for char in m]
my_c = encrypt('aberke@mit.edu')
print('ascii charcters encrypted')
my_c

ascii charcters encrypted


[58, 75, 90, 103, 85, 90, 124, 65, 88, 40, 92, 90, 72, 67]

In [280]:
# write in hexidecimal notation
print('in hex:')
''.join('{:02x}'.format(x) for x in my_c)

hex:


'3a4b5a67555a7c4158285c5a4843'