In [28]:
import numpy as np
import operator

Let's encrypt a message, we can do it simply by changing every letter into the next one : 

    Original message :  Secrecy 
    Encrypted message : Tfdsfdz 
   
We will take a text from Wikipedia and see why it is not a good idea to encrypt a message using this method.

In [81]:
# Helper functions
LETTERS = ["A","B","C","D","E","F","G","H","I","J","K","L","M","N","O","P","Q","R","S","T","U","V","W","X","Y","Z"]


# Convert a letter to an int (represented by its position in the alphabet)
def letter2num(letter) :
    return LETTERS.index(letter) + 1


# Convert back an int to the corresponding letter in the alphabet
def num2Letter(num) :
    return LETTERS[num-1]


# Switch a letter by adding n to its position in the alphabet
def LetterPlusNum(letter, num) : 
    
    letter_num = letter2num(letter)

    # np.mod is used to handle cases when the sum is more than 26
    addition = np.mod(letter_num+num, 26) 
    
    return num2Letter(addition)


# Calculate the difference of position between two letters
def diffIndices(letter1, letter2) : 

    letter1_num = letter2num(letter1)
    letter2_num = letter2num(letter2)

    diff = np.mod(letter1_num-letter2_num, 26)

    return diff


# Switch back a letter by substracting n to its position in the alphabet
def LetterMinusNum(letter, num) : 
    
    letter_num = letter2num(letter)

    # np.mod is used to handle cases when the sub is less than 26
    substraction = np.mod(letter_num-num, 26) 
    
    return num2Letter(substraction)


# Example
print("A + 1 = " + LetterPlusNum("A", 1))
print("B - A = " + str(diffIndices("B","A")))
print("A - 1 = " + LetterMinusNum("A",1))

A + 1 = B
B - A = 1
A - 1 = Z


In [54]:
TEXT2ENCODE = "Secrecy is the practice of hiding information from certain individuals " + \
              "or groups who do not have the need to know perhaps while sharing it with other individuals " + \
              "That which is kept hidden is known as the secret Secrecy is often controversial depending " + \
              "on the content or nature of the secret the group or people keeping the secret and the motivation " + \
              "for secrecy Secrecy by government entities is often decried as excessive or in promotion " + \
              "of poor operation excessive revelation of information on individuals can conflict with virtues " + \
              "of privacy and confidentiality It is often contrasted with social transparency Secrecy can exist " + \
              "in a number of different ways encoding or encryption where mathematical and technical strategies " + \
              "are used to hide messages true secrecy where restrictions are put upon those who take part of the " + \
              "message such as through government security classification and obfuscation where secrets are hidden " + \
              "in plain sight behind complex idiosyncratic language or steganography"

# Prepare the message to be encrypted (Upper case and white spaces removed)
text2Encode = TEXT2ENCODE.upper().replace(" ", "")

In [64]:
# Simple encryption : change each letter by the next one in the alphabet

textEncoded = ""
for i, curr_letter in  enumerate(text2Encode) : 
    textEncoded += LetterPlusNum(curr_letter, 1)


print(textEncoded)

TFDSFDZJTUIFQSBDUJDFPGIJEJOHJOGPSNBUJPOGSPNDFSUBJOJOEJWJEVBMTPSHSPVQTXIPEPOPUIBWFUIFOFFEUPLOPXQFSIBQTXIJMFTIBSJOHJUXJUIPUIFSJOEJWJEVBMTUIBUXIJDIJTLFQUIJEEFOJTLOPXOBTUIFTFDSFUTFDSFDZJTPGUFODPOUSPWFSTJBMEFQFOEJOHPOUIFDPOUFOUPSOBUVSFPGUIFTFDSFUUIFHSPVQPSQFPQMFLFFQJOHUIFTFDSFUBOEUIFNPUJWBUJPOGPSTFDSFDZTFDSFDZCZHPWFSONFOUFOUJUJFTJTPGUFOEFDSJFEBTFYDFTTJWFPSJOQSPNPUJPOPGQPPSPQFSBUJPOFYDFTTJWFSFWFMBUJPOPGJOGPSNBUJPOPOJOEJWJEVBMTDBODPOGMJDUXJUIWJSUVFTPGQSJWBDZBOEDPOGJEFOUJBMJUZJUJTPGUFODPOUSBTUFEXJUITPDJBMUSBOTQBSFODZTFDSFDZDBOFYJTUJOBOVNCFSPGEJGGFSFOUXBZTFODPEJOHPSFODSZQUJPOXIFSFNBUIFNBUJDBMBOEUFDIOJDBMTUSBUFHJFTBSFVTFEUPIJEFNFTTBHFTUSVFTFDSFDZXIFSFSFTUSJDUJPOTBSFQVUVQPOUIPTFXIPUBLFQBSUPGUIFNFTTBHFTVDIBTUISPVHIHPWFSONFOUTFDVSJUZDMBTTJGJDBUJPOBOEPCGVTDBUJPOXIFSFTFDSFUTBSFIJEEFOJOQMBJOTJHIUCFIJOEDPNQMFYJEJPTZODSBUJDMBOHVBHFPSTUFHBOPHSBQIZ


Now it is true that this is gibberish and it seems rather difficult to find patterns to quickly decode the message.
However using the letter frequency method we will see that this is easy.

In [84]:
# More helper functions

# This function create a dictionnary of occurency of each letters in the text
def countLetters(text) :

    # We are using the array defined above to initialize the dict
    letters_occurencies = {letter:0 for letter in LETTERS}   
    for i in text : 
        letters_occurencies[i]+=1 

    return letters_occurencies

def sortLettersByFrequency(text) : 
    
    dict_occurency = countLetters(text)
    nb_letters = float(len(text))
    dict_frequency = {k:float(v/len(text)*100) for k,v in dict_occurency.items()}
    sortedLetters = sorted(dict_frequency.items(), key=operator.itemgetter(1), reverse = True)

    return sortedLetters



# Example 
MSG = "HELLO WORLD".replace(" ","")
print("Occurencies of each letter in the text : ")
print(countLetters(MSG))
print("\nMost frequent letters :")
print(sortLettersByFrequency(MSG))

Occurencies of each letter in the text : 
{'A': 0, 'B': 0, 'C': 0, 'D': 1, 'E': 1, 'F': 0, 'G': 0, 'H': 1, 'I': 0, 'J': 0, 'K': 0, 'L': 3, 'M': 0, 'N': 0, 'O': 2, 'P': 0, 'Q': 0, 'R': 1, 'S': 0, 'T': 0, 'U': 0, 'V': 0, 'W': 1, 'X': 0, 'Y': 0, 'Z': 0}

Most frequent letters :
[('L', 30.0), ('O', 20.0), ('D', 10.0), ('E', 10.0), ('H', 10.0), ('R', 10.0), ('W', 10.0), ('A', 0.0), ('B', 0.0), ('C', 0.0), ('F', 0.0), ('G', 0.0), ('I', 0.0), ('J', 0.0), ('K', 0.0), ('M', 0.0), ('N', 0.0), ('P', 0.0), ('Q', 0.0), ('S', 0.0), ('T', 0.0), ('U', 0.0), ('V', 0.0), ('X', 0.0), ('Y', 0.0), ('Z', 0.0)]


In English the most frequent letter is "E" with a frequency of around 12%. So we can match the most frequent letter in the encoded message with E.

In [85]:
# We can see here that the letter F has a frequency of 12.74% 
print("Most frequent letters : \n")
print(sortLettersByFrequency(textEncoded))
print("\nMost frequent letter : \n")
print(sortLettersByFrequency(textEncoded)[0][0])

Most frequent letters : 

[('C', 12.738095238095237), ('G', 9.166666666666666), ('R', 8.80952380952381), ('L', 8.214285714285714), ('M', 7.976190476190475), ('P', 7.142857142857142), ('Q', 6.785714285714286), ('Y', 6.428571428571428), ('A', 5.714285714285714), ('F', 4.523809523809524), ('B', 3.4523809523809526), ('N', 2.619047619047619), ('D', 2.380952380952381), ('E', 2.142857142857143), ('S', 2.0238095238095237), ('J', 1.9047619047619049), ('W', 1.7857142857142856), ('K', 1.5476190476190477), ('T', 1.5476190476190477), ('U', 1.5476190476190477), ('I', 0.5952380952380952), ('V', 0.4761904761904762), ('Z', 0.4761904761904762), ('H', 0.0), ('O', 0.0), ('X', 0.0)]

Most frequent letter : 

C


In [86]:
# Another helper function

textDecoded = ""
diff = diffIndices(sortLettersByFrequency(textEncoded)[0][0], "E")
for i, curr_letter in  enumerate(textEncoded) : 
    textDecoded += LetterMinusNum(curr_letter, diff)


print("Is the message correctly decoded ?")
print(textDecoded==text2Encode)

Is the message correctly decoded ?
True


In [87]:
# Again we can do it but adding more letters for example

for n in range(2,25) : 
    textEncoded = ""
    for i, curr_letter in  enumerate(text2Encode) : 
        textEncoded += LetterPlusNum(curr_letter, n)

    textDecoded = ""
    diff = diffIndices(sortLettersByFrequency(textEncoded)[0][0], "E")
    for i, curr_letter in  enumerate(textEncoded) : 
        textDecoded += LetterMinusNum(curr_letter, diff)

    print("\nMessage encoded by switching letters by " + str(n))
    print("Message succesfully decoded : " + str(textDecoded==text2Encode))


Message encoded by switching letters by 2
Message succesfully decoded : True

Message encoded by switching letters by 3
Message succesfully decoded : True

Message encoded by switching letters by 4
Message succesfully decoded : True

Message encoded by switching letters by 5
Message succesfully decoded : True

Message encoded by switching letters by 6
Message succesfully decoded : True

Message encoded by switching letters by 7
Message succesfully decoded : True

Message encoded by switching letters by 8
Message succesfully decoded : True

Message encoded by switching letters by 9
Message succesfully decoded : True

Message encoded by switching letters by 10
Message succesfully decoded : True

Message encoded by switching letters by 11
Message succesfully decoded : True

Message encoded by switching letters by 12
Message succesfully decoded : True

Message encoded by switching letters by 13
Message succesfully decoded : True

Message encoded by switching letters by 14
Message succesfu

No matter by how much we switched the letters in the original message, it is always easy to decode the message
using frequencies. Of course this automated method wouldn't work well for smaller message but it wouldn't be
difficult to decode it by hand. To get a more robust encoded message we can use a key to encrypt it.