---
#  SUBSTITUTION CIPHERS
*Alan Mathison Turing  (23 June 1912 – 7 June 1954) was an English mathematician, computer scientist, logician, cryptanalyst, philosopher, and theoretical biologist. Turing was highly influential in the development of theoretical computer science, providing a formalisation of the concepts of algorithm and computation with the Turing machine, which can be considered a model of a general-purpose computer. Turing is widely considered to be the father of theoretical computer science and artificial intelligence*

![](https://s2.glbimg.com/3auOxS3cG2mc_H5jFXDpxC7ol-w=/e.glbimg.com/og/ed/f/original/2016/09/12/dr-alan-turing-2956483.jpg)


In cryptography, a classical cipher is a type of cipher that was used historically but for the most part, has fallen into disuse. In contrast to modern cryptographic algorithms, most classical ciphers can be practically computed and solved by hand. However, they are also usually very simple to break with modern technology. The term includes the simple systems used since Greek and Roman times, the elaborate Renaissance ciphers, World War II cryptography such as the Enigma machine and beyond

In a substitution cipher, letters (or groups of letters) are systematically replaced throughout the message for other letters (or groups of letters).
A well-known example of a substitution cipher is the Caesar cipher. To encrypt a message with the Caesar cipher, each letter of message is replaced by the letter three positions later in the alphabet. Hence, A is replaced by D, B by E, C by F, etc. Finally, X, Y and Z are replaced by A, B and C respectively. So, for example, "WIKIPEDIA" encrypts as "ZLNLSHGLD". Caesar rotated the alphabet by three letters, but any number works.

Another method of substitution cipher is based on a keyword. All spaces and repeated letters are removed from a word or phrase, which the encoder then uses as the start of the cipher alphabet. The end of the cipher alphabet is the rest of the alphabet in order without repeating the letters in the keyword.

---
# PART 2 - STATISTICAL NLP

## 2.1. Text Normalization 

In [1]:
# Import the packges
from collections import Counter 
import unidecode
import re 

#Read the file with the string >> Remove accents>>Remove special characters
rap_lord = re.sub('\W+',' ',unidecode.unidecode(open('rap_lord.txt','r',encoding="utf8").read().replace('\n', ' ').lower()))
print(rap_lord)

lutei pra entrar e nao vou sair os que nao pertencem eu devolvi acido no metal causa efeito letal teto baixo te espreme respira quem pira ta na mira da minha firma entao me espera recupera o folego se comigo nao morre nunca cai nao tento a sorte woodstock num flow metodico toma e pra quem quer dou e pra quem pode e nosso destino e uma caixa de surpresa leopardo ou zebra me diz ce quer ser predador ou presa e assim o percorri pela beira da terra ate a sorte me dizer menino voce tem uma aval no tempo essencia eu elevo no peito o excesso essencial e muito bom nao se acomodar satisfacao se o verso ecoar vendo em polpa nao vou me poupar entao demorou meu mano let s go quero que se foda o que disser to de pe vou mantendo a fe ate meu mano vou correndo igual rale adivinha o que tu quer vagabundo quer mas e quem nao quer ne quero ver dinheiro na responsa ser amigo da onca jacare que banca vira bolsa mano entao me mostre a cara convivencia com malandro que ja foi da costa fala pra carai diz que

In [8]:
# Define function which counts the letters
def freq_text(sentece,n_top):
  letters_count = Counter(sentece.lower()).most_common(n_top)
  letters_total = sum(Counter(sentece.lower()).values())

  letters_list=[]
  for letter in letters_count:
    if letter[0]!=" ":
        print (f"{letter[0]} => {round((letter[1]/letters_total)*100,2)}%")
        letters_list.append(letter[0])
  return letters_list

#Apply      
list_letters=freq_letters(rap_lord,30)

a => 12.49%
e => 10.74%
o => 8.77%
n => 4.7%
r => 4.62%
i => 4.36%
m => 4.23%
u => 4.08%
s => 3.98%
d => 3.81%
t => 2.97%
c => 2.93%
p => 2.29%
l => 2.12%
v => 1.92%
q => 1.56%
b => 1.09%
f => 1.01%
g => 0.83%
h => 0.58%
z => 0.47%
j => 0.34%
x => 0.17%
w => 0.06%
k => 0.06%


## 2.1. Caesar Cipher and Language statistics

---
## Test 04 

Using the information available in below and knowing that the text was written in Portuguese. Try decipher the cipher sentence

In [9]:
cipher=''' balp wyh luayhy l uhv cvb zhpy vz xbl uhv wlyalujlt lb klcvscp hjpkv uv tlahs jhbzh lmlpav slahs alav ihpev al lzwyltl ylzwpyh xblt wpyh ah uh tpyh kh tpuoh mpyth luahv tl lzwlyh yljbwlyh v mvslnv zl jvtpnv uhv tvyyl ubujh jhp uhv aluav h zvyal dvvkzavjr ubt msvd tlavkpjv avth l wyh xblt xbly kvb l wyh xblt wvkl l uvzzv klzapuv l bth jhpeh kl zbywylzh slvwhykv vb gliyh tl kpg jl xbly zly wylkhkvy vb wylzh l hzzpt v wlyjvyyp wlsh ilpyh kh alyyh hal h zvyal tl kpgly tlupuv cvjl alt bth hchs uv altwv lzzlujph lb lslcv uv wlpav v lejlzzv lzzlujphs l tbpav ivt uhv zl hjvtvkhy zhapzmhjhv zl v clyzv ljvhy clukv lt wvswh uhv cvb tl wvbwhy luahv kltvyvb tlb thuv sla z nv xblyv xbl zl mvkh v xbl kpzzly av kl wl cvb thualukv h ml hal tlb thuv cvb jvyylukv pnbhs yhsl hkpcpuoh v xbl ab xbly chnhibukv xbly thz l xblt uhv xbly ul xblyv cly kpuolpyv uh ylzwvuzh zly htpnv kh vujh qhjhyl xbl ihujh cpyh ivszh thuv luahv tl tvzayl h jhyh jvucpclujph jvt thshukyv xbl qh mvp kh jvzah mhsh wyh jhyhp kpg xbl nvzah zl nvzah ahtilt gl chnhibukv cl h ivah l uhv cl v wl thz uhv xbly tl cly lt wl qhv zlp hal xblt zhv av uh jvualujhv ihihihylihihylih whwv kl jbghv cvjl xbly wyvchy qh wyvclp xbl zlp ilt al ylwylzlualp slclp wyh jhtpuohkh xbhukv ult lyh upunblt uhv whshcyh kl jvumvyav yljlipkh tpuoh cpkh zl ylzbtl uv tlb kvt qhv chp chp lzwlyv xbl zlb lnv uhv hayhwhsol zbh jvukbah zl uhv chnhibukv jhp l jvtv jhp klwluklukv vukl lb zlp ilt klzzh mliyl l ahsclg uhv slchual thpz tltiyv kv ohprhpzz zvb jhiyh kh wlzal xbl hnypkl thpz zvb jhwhg kl mhgly lzzh tbsapkhv hsphkh uh tpzzhv jvujlkpkh uh cpkh kl bt altwv hayhz uhv zlp zl l jplujph vb wluzv lt jvtv v hjlzzv l lzzlujphs h avkvz xbl lualuklyht uhv hkphuah hjbzhy v kvt uhzjlb jvtpnv l chjpsv l uhv v bzhy kpglt wvy hp xbl l mhjps mhgly abkv xbl lb zlp l uhv mhglt l uhv zhilt uh cpkh cvjl wlykl altwv vb lualukl v jvujlpav kl zhilkvyph nhuoh tlsvkph jhuzhkv kl cl apv vbcpy tbsapkhv mhshy kl tpuvyph uhv chp zl jvchykph lewshuhy kpcpkpyht tlth ahn zl epunh lzzh mhsah kl lapjh wyhapjh lejluaypjh lslnlt l clqv v zvt uh lzxbpch thz xbl mpah uhv zlp xbl wvbjvz zhv ivuz wlsv ayvjv zlp kvt jhkh wshuv uhv l lt chv zlt xblyly zly gvphv thuv zv hbtluah uh pkhkl l thuapkh h jpkhkl ylkbg iwt luayh svunlcpkhkl jpluapzah kv nyhcl xbhukv xbly zhil xbhs xbl lsl l lb cpt kvtpulp vz whsjv yvkhwl vz tpj jvt mpv yvkvcphyph uvcv ypv zvb whbspzah tltv l johtv vz vbayvz kl apv lb uhv clqv bth hytpuoh wyhph zltwyl mvp kvsshy ipss ivyh mpsoh zhihkv kl hiyps ihshkh qh hiypb jhthyhkh tlb qh ah h tps yhw whyh tl klpehy mliyps lb alualp uhv zlycpb bupmvytl l whyh nhyjvt kl uhcpv bt zhscl hv ptvyahs zhivahnl xbl mhg kh ypth bt mbgps xbpugl huvz klwvpz jvuzaybpukv thpz wvualz xbl lunluolpyv jpcps pzxblpyv wyh hjlukly v whcpv yhjpvuhpz ygv lunhqhtluav uh sbah l chuahnlt tl kvb spilykhkl kl ylwylzluahy jpkhkl zlt kpclyzpkhkl gvuh uvyal wyv tbukv luahv whyal ho bt zhscl h xblt uhv mhsoh uh jvukbah mpsoh kl bth wbah clzal h jhyhwbjh clqh h jhyh kl xblt lzjbah tvkh clsoh xbl al tbkh xbl zl mvkh tbkv v ybtv thz uhv tbkv v xbl zl wshuah clt wyh nblyyh l kl xbliyh clukl v hstvjv wlnh qhuah thsvxblpyv jhuah qbuav jvt h cvuahkl klzzh wvyyh wyh lzzl tbukv zly tlsovy thz uh clykhkl v xbl zl wylnh kpmlylual kh uvclsh cpkh svrh cpkh jbyah chp uh mhsoh xbl al jvyah chsl thpz kv xbl zl wluzh abkv tbukv xbl klmlukl chsl thpz zlnbpy lt mylual jhtpuohukv kpmlylual jhtpuohukv jvt h tpuoh nlual jhyh h jhyh jvuxbpzahkv uv xbl wlnh uvzzh tlual uh clykhkl lb jhuav hxbpsv xbl kpmlyl v upcls jhyh jvtwhapcls thz uhv whzzh uv jhuhs kvtpunv hxbpsv xbl zl mhsh kl ptwvyahual wyh uhjhv thz xbl zl mvkh lb mhsv tltv yhwpkv jvtv xblt ihal v jvyhjhv lt jhkh whzzv lb vsov l clqv uh ihnhnlt jhslqhkh tlb jvtbupjhkv thuv l jvtwspjhkv jhkh shjv xbl thualuov chsl v vbyv thz uhv chsl v jvbyv hxbp zl cpzh v tbyv wvl uh jvuah kv thuv xbl jhsv v wvcv lb xblyv cl uh jhyh jhyh jvt v tluvy vo alt tbpav cluluv l wvbjh kv vo mhsht kh cpavyph thz uhv mhsht kh klyyvah thz uhv whyh whyh whyh whyh whyh yhw svyk uhv zlp zl l jplujph vb wluzv lt jvtv v hjlzzv l lzzlujphs h avkvz xbl lualuklyht uhv hkphuah hjbzhy v kvt uhzjlb jvtpnv l chjpsv l uhv v bzhy ho uhv wluzh xbl lb whylp uhv hjhiv uhv hjhiv uhv klpeh lb hwyvclpah xbl lzzl tvtluav l ivt qhv ah ahv ivt pythv xbl lb mhslp xbl lb kvtlzapjv thpz bt wvbjv kh ihapkh klzzl msvd chnhibukhv chnhibukv mpjh svbjv av svbjhv zlual h jvspzhv luahv cpukv kl bt tvslxbl jhapchukv wyv tlb yhw xbl al whzzh bth lulynph xbl cpyvb tlb nhuoh whv kpglt wvy hp xbl l mhjps mhgly abkv xbl lb zlp l uhv mhglt l uhv zhilt uhv zhilt uhv zhilt uhv zhilt mhsht kh cpavyph thz uhv mhsht kh klyyvah thz uhv whyh whyh whyh whyh whyh yhw svyk''' 
#
cipher_sentence= "sbalp wyh luayhy l uhv cvb zhpy vz xbl uhv wlyalujlt lb klcvscp hjpkv uv tlahs"
#
list_letters=freq_text(cipher,30)

h => 12.49%
l => 10.74%
v => 8.77%
u => 4.7%
y => 4.62%
p => 4.36%
t => 4.23%
b => 4.08%
z => 3.98%
k => 3.81%
a => 2.97%
j => 2.93%
w => 2.29%
s => 2.1%
c => 1.92%
x => 1.56%
i => 1.09%
m => 1.01%
n => 0.83%
o => 0.58%
g => 0.47%
q => 0.34%
e => 0.17%
d => 0.06%
r => 0.06%


In [15]:
#>>        
cipher_code=   ['h','l','v','u','y','p','t','b','z','k','a','j','w','s','c','x','i','m','n','o','g','q','e','d','r']
key_code=      ['a','e','o','n','r','i','m','u','s','d','t','c','p','l','v','q','b','f','g','h','z','j','x','w','k']
#>>
decipher="lutei pra entrar e nao vou sair os que nao pertencem eu devolvi acido no metal"

---
## 2.3.Pareto Chart and Stop Words