### Frequency Analysis for a Vigenere Cypher

Assumes that we know:
1. Cipher Text
2. Key Length (determined by Kasiski or Index of Conincidence methods)

The goal of the algorithm is the perform frequency analysis on the cipher text to determine the possible letter offsets used to encode each letter of the plain text.  The algorithm will return a list of possible letter values for each letter of the key.

This freqency list can be used to do a simplified brute force analysis on possible key combination values. 

In [1]:
pt_1 = """\
The end when it finally came was just as chaotic messy and jaw dropping \ 
as every other chapter \ 
of Boris Johnsons political career Holed up in Downing Street on \
Wednesday night the prime minister faced an open rebellion of his cabinet\ 
a catastrophic loss of support in his Conservative Party and a wholesale \
exodus of ministers which threatened to leave significant parts of \
the British government without functioning leadership \
Yet far from surrendering Mr Johnsons aides put out word that \ 
he would continue to fight It looked like a last roll of \
the dice by one of the great gamblers in British politics \ 
His brazen refusal to bow to reality invited comparisons to\ 
Donald J Trumps defiance in the chaotic days after he lost the twenty twenty presidential election \
\
By Thursday morning however political gravity had finally reasserted itself \
For one of the few times in his career Mr Johnson was unable to bend the narrative \
to his advantage through the sheer force of his personality \
At midday the prime minister went to a lectern in front of \
ten Downing Street to announce he was relinquishing the leadership \
of a party that no longer supported him and giving up a job he had \
pursued for much of his adult life I want to tell you how sorry I am\ 
to be giving up the best job in the world Mr Johnson said Then \
defusing the solemnity of the moment with a wry line from the pool \
halls of America he added Thems the breaks\
George Washington is often described as childless which is true 
but only in the strictly biological definition. 
When I started digging into his archives, I was surprised 
to see that in reality, he was raising children from his late 
20s until the day he died. When Washington met Martha Custis, 
she was a wealthy widow with a young daughter and son, and when they married, 
he became the legal guardian to Patsy and Jacky Custis. 
Washington’s letters and ledgers indicate that he spent 
significant time and money (though he often reimbursed 
himself from the Custis estate) making sure the children 
were happy, healthy and well educated. His youth had been 
defined by relative struggle and deprivation, and he wanted 
them to have the very best of everything.
\
In the end however Mr Johnsons risk taking bravado was \
not enough to compensate for his shortcomings He engaged in \
behavior that critics said revealed a sense of entitlement and a \
belief that the rules did not apply to him his staff or his \
loyalists Critics accused him of being disorganized \
ideologically and administratively\
\
After leading Britain out of the European Union in 2020 \
the prime minister did not have much of a plan for what \
to do next He quickly became hostage to events lurching \
from crisis to crisis as the coronavirus pandemic engulfed Britain \
A pattern of scandals which followed him throughout his career soon overtook Downing Street\
\
In the end however Mr Johnsons risk taking bravado was not enough \
to compensate for his shortcomings He engaged in behavior that critics \
said revealed a sense of entitlement and a belief that the rules did not \
apply to him his staff or his loyalists \
Critics accused him of being disorganized ideologically and administratively"""

ct_1 = "MEUHWODIIRTLOHKFV JZBR CSUX"
ALPHABET = "ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890 "

In [2]:
# helpher function to return a python dictionary of the cipher alphabet indexed by the position 
# of each letter in the alphabet
# currently not used
def get_alpha_dict(alphabet):
    alphaDict = {}
    for i, letter in enumerate(alphabet):
        alphaDict[letter]=i
    return alphaDict

# encryption function for vigenere cipher.
# encrypts each letter with a different cipher alphabet which has been shifted based on the letter in the key, 
# A -> shift by 0; z -> shift by 25.
# Key is repeated to match the size of the plain text.
def encrypt_vignere(message, key, alphabet):
    message = message.upper()
    kLen = len(key)
    ciphertext = ""
    kIndex = 0
    
    for letter in message:
        if letter not in alphabet:
            continue
            
        #encrypt each letter with corresponding letter from vigenere key
        kLetter = key[kIndex]
        shiftValue = (alphabet.find(letter)+alphabet.find(kLetter)) % len(alphabet)
        ciphertext += alphabet[shiftValue]
        kIndex = (kIndex + 1) % kLen

    print(f"Message : {message[:15]}...")
    print(f"Ciphertext : {ciphertext[:15]}...")
    return ciphertext

# decryption function for vigenere cipher.
# decrypts each litter with a different cipher alphabet which has been shifted based on the letter in the key, 
# A -> shift by 0; z -> shift by 25.
# Key is repeated to match the size of the ciphertext.
def decrypt_vignere(message, key, alphabet):
    kLen = len(key)
    plaintext = ""
    kIndex = 0
    
    for i, letter in enumerate(message):
        #decrypt each letter with corresponding letter from vigenere key
        kLetter = key[kIndex]
        shiftValue = (alphabet.find(letter)-alphabet.find(kLetter)) % len(alphabet)
        plaintext += alphabet[shiftValue]
        kIndex = (kIndex + 1) % kLen

    print(f"Plaintext : {plaintext[:15]}...")
    return plaintext


In [3]:
from operator import itemgetter

# Frequency analysis on a Vigenere cypher when you have a possible keylength is done by analyzing 
# the frequencies of each "column" of letters in the cipher text that are encypted by the same 
# letter of the repeating Vigenere key. i.e. if the length of the key is N, every Nth letter must 
# have been encrypted with the same letter from the key
# 
# We then return the list of possible letters for each letter of the key that can be used in a 
# brute force analysis

def generate_frequency_table(cipher_text, keylength):
    # create a list of substrings from cipher_text that would have been encrypted by each subkey 
    # letter of the key each string is a 'column' of letters from the ciphertext (using list 
    # comprehension)
    substrings = ["" for i in range(keylength)]
    for i, letter in enumerate(cipher_text):
        substrings[i % keylength] += letter
            
    # create a corresponding table of frequency distribution dictionary for each substring
    # table contains a frequency dictionary per substring
    # initialize frequencies of all letters to 0 for all substrings
    frequency_table = [{} for i in range(keylength)]
    for substring_freqs in frequency_table:
        for letter in ALPHABET:
            substring_freqs[letter] = 0
    
    # calculate frequency of occurance of each letter for each substring
    for i, substr in enumerate(substrings):
        for letter in substr:
            if letter in ALPHABET:
                frequency_table[i][letter] += 1
    
    return substrings, frequency_table


# given a table of substring frequencies, determine possible key values
def substr_frequency_analysis(substrings, table, keylength):
    for i in range(keylength):
        substr_frequency = sorted(table[i].items(),key=itemgetter(1), reverse=True)
        pkey_index = ALPHABET.find(substr_frequency[1][0]) - ALPHABET.find('E')
        pkey = ALPHABET[pkey_index]
        print(f"Substring {i}: {substrings[i][:15]}..., Possible Key: {pkey}",)
        

In [4]:
if __name__ == "__main__":
    myKey = "BEARS"
    myCiphertext = encrypt_vignere(pt_1,myKey,ALPHABET)
    myPlaintext = decrypt_vignere(myCiphertext,myKey,ALPHABET)
    substrings, table = generate_frequency_table(myCiphertext,len(myKey))
    substr_frequency_analysis(substrings, table, len(myKey))


Message : THE END WHEN IT...
Ciphertext : ULEQWOH CZFR ZA...
Plaintext : THE END WHEN IT...
Substring 0: UOFAMBBTAUFBBPH..., Possible Key: B
Substring 1: LHRJPQWXGMWR1TD..., Possible Key: E
Substring 2: E  IYE  HCSD P ..., Possible Key: A
Substring 3: QCZ5QQ1RRQEQUZR..., Possible Key: R
Substring 4: WZASUDB 75R206 ..., Possible Key: S
