**Q1**. Ciphers and Statistics

A Caesar cipher is a very simple method of encoding and decoding data. The cipher simply replaces characters with the character offset by $k$ places. For example, if the offset is 3, we replace `a` with `d`, `b` with `e` etc. The cipher wraps around so we replace `y` with `b`, `z` with `c` and so on. Punctuation, spaces and numbers are left unchanged.

- (25 points) Write a function `encode` that takes as arguments a string and an integer offset and returns the encoded cipher.

In [2]:
import string
def encode (toBeEncoded, offset):
    offset = offset % 26
    lc = string.ascii_lowercase
    uc = string.ascii_uppercase
    table = str.maketrans(lc + uc, lc[offset:] + lc[:offset] + uc[offset:] + uc[:offset])
    return toBeEncoded.translate(table)

In [36]:
encode ("hello worldz", 2)

'jgnnq yqtnfb'

- (5 points) Write a function `decode` that takes as arguments a cipher and an integer offset and returns the decoded string. 

In [4]:
def decode (toBeDecoded, offset):
    offset = offset % 26
    lc = string.ascii_lowercase
    uc = string.ascii_uppercase
    table = str.maketrans(lc[offset:] + lc[:offset] + uc[offset:] + uc[:offset], lc + uc)
    return toBeDecoded.translate(table)

In [5]:
decode ('jgnnq yqtnfb', 2)

'hello worldz'

- (50 points) Write a function `auto_decode` that takes as argument a cipher and uses a statistical method to guess the optimal offset to decode the cipher, assuming the original string is in English which has the following letter frequency:

```python
freq = {
 'a': 0.08167,
 'b': 0.01492,
 'c': 0.02782,
 'd': 0.04253,
 'e': 0.12702,
 'f': 0.02228,
 'g': 0.02015,
 'h': 0.06094,
 'i': 0.06966,
 'j': 0.00153,
 'k': 0.00772,
 'l': 0.04025,
 'm': 0.02406,
 'n': 0.06749,
 'o': 0.07507,
 'p': 0.01929,
 'q': 0.00095,
 'r': 0.05987,
 's': 0.06327,
 't': 0.09056,
 'u': 0.02758,
 'v': 0.00978,
 'w': 0.0236,
 'x': 0.0015,
 'y': 0.01974,
 'z': 0.00074
}
```

In [6]:
def count_letters(myString):
    """count_letters is a method thattakes a string argument and counts the letters in a string and excludes punctuation"""
    
    return len(myString) - myString.count(' ') - myString.count('!') - myString.count('-') - myString.count('.') \
- myString.count('?')- myString.count('\n') - myString.count(',') - myString.count(';') - myString.count(':')

In [7]:
def find_ratio(char, myString):
    """find_ratio will find the ratio of a particular character to the total number of letters in a string"""
    
    totalLetterCount = count_letters(myString)
    myString = myString.lower()
    charCount = myString.count(char)
    ratio = charCount/totalLetterCount
    return ratio

In [8]:
def ratio_list(myString):
    """ratio_list will provide the the ratio of every character to a given string"""
    
    frequencies = []
    for char in string.ascii_lowercase:
        frequencies.append(find_ratio(char, myString))
    return frequencies

In [59]:
def offset_score(listOfFrequencies):
    """offset_score takes a list of character frequencies and uses a least square regression to return a 
    list of scores. This list of scores is basically the error rate between the frequency table from the English dictionary
    and the distribution of letters in a string. Note that a score of approximately 0 indicates a distribution similar to 
    the frequency table below."""
    freq = {
 'a': 0.08167,
 'b': 0.01492,
 'c': 0.02782,
 'd': 0.04253,
 'e': 0.12702,
 'f': 0.02228,
 'g': 0.02015,
 'h': 0.06094,
 'i': 0.06966,
 'j': 0.00153,
 'k': 0.00772,
 'l': 0.04025,
 'm': 0.02406,
 'n': 0.06749,
 'o': 0.07507,
 'p': 0.01929,
 'q': 0.00095,
 'r': 0.05987,
 's': 0.06327,
 't': 0.09056,
 'u': 0.02758,
 'v': 0.00978,
 'w': 0.0236,
 'x': 0.0015,
 'y': 0.01974,
 'z': 0.00074
}
    freqList=[freq[x] for x in string.ascii_lowercase]
    listOfScores = []
    for i in range(26):
        listOfScores.append((listOfFrequencies[i] - freqList[i])**2)
        
    return listOfScores

In [83]:
import math
def best_offset(myString):
    """best_offset takes a string argument and calls the offset_score function to obtain a list of scores
    for each individual offset. Thus, there should be a list of 26 scores--one for every possible offset.
    Returns the optimal offset key."""
    
    offsetScoreList = []
    for i in range(26):
        offsetString = decode(myString, i)
        individualScore = math.fsum(offset_score(ratio_list(offsetString)))
        offsetScoreList.append(individualScore)
    
    return offsetScoreList.index(min(offsetScoreList))

In [94]:
def auto_decode(encryptedString):
    """auto_decode takes a string argument and decrypts it based on a sum of least squares method with a frequency table 
    based on the English language"""
    
    print(decode(encryptedString, best_offset(encryptedString)))
    

- (20 points) Encode the following nursery rhyme using a random offset from 10 to 20, then recover the original using `auto_decode`:

```text
Baa, baa, black sheep,
Have you any wool?
Yes, sir, yes, sir,
Three bags full;
One for the master,
And one for the dame,
And one for the little boy
Who lives down the lane.
```

In [98]:
plsEncode = """Baa, baa, black sheep,
Have you any wool?
Yes, sir, yes, sir,
Three bags full;
One for the master,
And one for the dame,
And one for the little boy
Who lives down the lane."""
message = encode(plsEncode, 19)
print("Encrypted message:\n", message, "\n\nDecrypted message:")
auto_decode(message)

Encrypted message:
 Utt, utt, uetvd laxxi,
Atox rhn tgr phhe?
Rxl, lbk, rxl, lbk,
Makxx utzl ynee;
Hgx yhk max ftlmxk,
Tgw hgx yhk max wtfx,
Tgw hgx yhk max ebmmex uhr
Pah eboxl whpg max etgx. 

Decrypted message:
Baa, baa, black sheep,
Have you any wool?
Yes, sir, yes, sir,
Three bags full;
One for the master,
And one for the dame,
And one for the little boy
Who lives down the lane.
