## Python Module: Frequency Analysis

Frequency analysis is the process of counting the number of occurrences of ciphertext letters in order to make a guess about what plaintext letters they came from. Frequency analysis only works for *monoalphabetic substitution ciphers*, in which letters are always enciphered the same way (for example, every 'e' in the plaintext is enciphered as a 'T'). Frequency analysis breaks down (though it can be modified to work) in the case of *polyalphabetic substitution ciphers*, in which different occurences of a plaintext letter could be enciphered differently. (For example, one 'e' may be enciphered as a 'T' and another as an 'R'.)

For example: For the text "test" the frequency of 'e' is 1, 's' is 1 and 't' is 2. The input to the function will be an encrypted body of text that only contains the capital letters A-Z. As output, we will print a list of the frequency for each of the letters A-Z.

To begin, let's look at the code for a frequency count of the text "TEST".


We'll start with an empty count_list ([0,0,0...0,0]). Each 0 in the list will eventually reflect the number of occurrences of a letter in our message.
For example, we'll go through the message "TEST" looking for the letter "A". Since we don't find any "A"s, we'll leave the first 0 as a 0.

This part of the code adds 26 zeroes to our list of letter counts. We want to use a "for loop" here rather than adding 26 zeroes to our list by hand. In other words, we want to tell our computer to "append" 0 to our count_list 26 times.

Do you remember how to specify the range of numbers between 0 and 25, including 0 and 25? **Edit the chunk of code below to add the correct numerical range in between the parentheses after `range` in the line `for i in range():`**

In [0]:
alphabet = ['A','B','C','D','E','F','G','H','I','J','K','L','M',
           'N','O','P','Q','R','S','T','U','V','W','X','Y','Z']
message = "TEST"

count_list = []

## Edit the line of code below ##
for i in range():
    count_list.append(0)

For technical reasons, we need to convert our message string to a list of letters (so we can count letters without worrying about e.g. spaces).
We do this by going through our message letter by letter and adding each letter to our list.
This is done using the command `.append`. 

If our list were called `cool_list`, the code to do this would be as follows:

In [20]:
# start with a blank list
cool_list = []
for i in message:
    cool_list.append(i)

Below, write the code to begin with a blank list called `letter_list` and append all the letters in our message to `letter_list`. (Hint: copy the code chunk above and just change what you need.)

In [0]:
## Write your code here ##

Now, we need to go through `count_list` and add each occurrence of each letter. For example, in the word 'TEST', we want to start at the first T in the message and add $1$ to the entry of `count_list` corresponding to the letter T. Since $T = 19$, we want to add 1 to the 19th entry of `count_list` every time we see a T.

The code to add 1 to the 19th entry of `count_list` is:

```
count_list[19] += 1
```

Here, "+=" denotes that we're replacing the 19th entry of `count_list` with the 19th entry of `count_list` plus 1.

**Below, complete the code to replace the $i$th entry of `count_list` with the $i$th entry plus 1.**

In [0]:
# counts occurences of each letter
for x in letter_list:
    i = 0
    for i in range(0,26):
        if x == alphabet[i]:
            ## write your code here ##
            


The code above spits out a list of 26 numbers, where each number is equal to the number of occurrences of the corresponding letter. For example, if we run the above code on the message `'TEST'`, we'd get the list

```
[0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,2,0,0,0,0,0,0]
```

where the 1s in the 4th and 17th spots in the list (remember, Python starts counting at 0) reflect the counts of the letters $E = 4$ and $S = 17$, and the 2 in the 18th spot reflects the count of $T = 18$.

Do you see any problems? It's annoying to have to count how far out in the list we are in order to determine what letter's frequency we're looking at. It would be ideal if each spot in the list was "labeled" with the corresponding letter.

The way we'll do this is by *merging* our `alphabet` list with `count_list`. This takes the first entries in each list, in this case `'A'` and `0`, and merges them together into the entry `['A',0]`.

Similarly, we'll get the entries 

```
['B',0], ['C',0], ['D',0], ['E',1]
```

, etc.

The Python command to merge two lists together is `zip(list1, list2)`. **Below, write the command to merge the lists `alphabet` and `count_list` together, then name your "zipped" list `letter_freq`**.

In [13]:
# pair each letter of the alphabet with its frequency in a new list

## Write your code here ##


Lastly, it'd be useful to sort our list from most frequent to least frequent in order to use it to break codes.

The way Python sorts lists is with the command 

```
{name of new sorted list} = sorted({list to sort}, key = lambda {name of column to sort by}: {name of column to sort by}[number of column to sort by])
```

Here, I use curly brackets `{}` to denote where you fill in the blanks. There should be no curly brackets in your code.

In this case, we want to sort the list `letter_freq` according to the values in the column `count_list` (which is the first column in the list, since `alphabet` is considered the zeroth column).

Below, write the code to perform this sort, name the new sorted list `sorted_letter_freq`, then print the list `sorted_letter_freq`.

In [19]:
# sort by decreasing order of relative frequency and return the frequencies together with the corresponding ciphertext letters

## write your code here ##

Now, let's bring it all together. **Below, copy-paste all your code in order, beginning with the code defining the `alphabet` list, and run it in order to perform a frequency count on the message `'TEST'`.**

In [21]:
## Copy-paste your code here ##

Finally, we don't just want code to perform frequency analysis on one message; we want to be able to input any message and have our code perform frequency analysis on that message.

As usual, we'll define a *function* that takes as input a ciphertext `message` and outputs the list of letter frequencies, sorted from greatest to least.

**Copy-paste your code from above, changing your code as necessary, in order to define such a function `freq_analysis` below.** Don't forget to end the code with a line that `print`s the frequencies.

In [17]:
def freq_analysis(message):
    ## copy-paste and edit your code here to define the function ##

**Once you think you have a working function, test it below by performing frequency analysis on three messages of your choice.** Note that your messages must be all caps and enclosed in `'apostrophes'` or `"quotation marks"`.

In [22]:
#Tests

## write your code here ##