# Case Study 05 - Caesar Cipher  <img src = "https://www.cryptomuseum.com/crypto/caesar/img/302021/000/small.jpg" width = 150, align = "right">
<hr>

## Introduction

Encryption is the process of obscuring information to make it unreadable without special knowledge. For centuries, people have devised schemes to encrypt messages - some better than others - but the advent of the computer and the Internet revolutionized the field. These days, it's hard not to encounter some sort of encryption, whether you are buying something online or logging into a shared computer system. Encryption lets you share information with other trusted people, without fear of disclosure.

A `cipher` is an algorithm for performing encryption (and the reverse, decryption). The original information is called plaintext. After it is encrypted, it is called ciphertext. The ciphertext message contains all the information of the plaintext message, but it is not in a format readable by a human or computer without the proper mechanism to decrypt it; it should resemble random gibberish to those for whom it is not intended.

A cipher usually depends on a piece of auxiliary information, called a key. The key is incorporated into the encryption process; the same plaintext encrypted with two different keys should have two different ciphertexts. Without the key, it should be difficult to decrypt the resulting ciphertext into readable plaintext.

This assignment will deal with a well-known (though not very secure) encryption method called the `Caesar cipher`. Some vocabulary to get you started on this problem:

- **Encryption** - the process of obscuring or encoding messages to make them unreadable until they are decrypted
- **Decryption** - making encrypted messages readable again by decoding them
- **Cipher** - algorithm for performing encryption and decryption
- **Plaintext** - the original message
- **Ciphertext** - the encrypted message. Note: a ciphertext still contains all of the original message information, even if it looks like gibberish.

The idea of the `Caesar Cipher` is to pick an integer and shift every letter of your message by that integer. In other words, suppose the shift is k. Then, all instances of the i-th letter of the alphabet that appear in the plaintext should become the (i+k)-th letter of the alphabet in the ciphertext. You will need to be careful with the case in which i + k > 26 (the length of the English alphabet that we will use for this program). Here is what the whole alphabet looks like shifted three spots to the right:

Original:  a b c d e f g h i j k l m n o p q r s t u v w x y z  
 3-shift:  d e f g h i j k l m n o p q r s t u v w x y z a b c

Using the above key, we can quickly translate the message "happy" to "kdssb" (note how the 3-shifted alphabet wraps around at the end, so x -> a, y -> b, and z -> c).

We will treat uppercase and lowercase letters individually, so that uppercase letters are always mapped to an uppercase letter, and lowercase letters are always mapped to a lowercase letter. If an uppercase letter maps to "A", then the same lowercase letter should map to "a". Punctuation and spaces should be retained and not changed. For example, a plaintext message with a comma should have a corresponding ciphertext with a comma in the same position.  

| plaintext | shift | ciphertext |
| --- |---|--- |
| 'abcdef' | 2 | 'cdefgh' |
| 'Hello, World!' | 5 | 'Mjqqt, Btwqi!' |
| '' | any value | '' |


## Getting Started

To get started, you need to put the following accompanying text files in your working directory. 

- `words.txt` - a file containing valid English words.
- `story.txt` - a file containing an encrypted message that you will have to decode.

First, we implement two helper functions: `load_words` and `is_word` with the following docstring specifications. Then, we will use these to implement three classes.

In [None]:
import string

def load_words(file_name):
    '''
    file_name (string): the name of the file containing 
    the list of words to load    
    
    Returns: a list of valid words. Words are strings of lowercase letters.
    
    Depending on the size of the word list, this function may
    take a while to finish.
    '''
    print('Loading word list from file...')
    # inFile: file
    in_file = open(file_name, 'r')
    # line: string
    line = in_file.readline()
    # word_list: list of strings
    word_list = line.split()
    print('  ', len(word_list), 'words loaded.')
    in_file.close()
    return word_list


def is_word(word_list, word):
    '''
    Determines if word is a valid word, ignoring
    capitalization and punctuation

    word_list (list): list of words in the dictionary.
    word (string): a possible word.
    
    Returns: True if word is in word_list, False otherwise

    Example:
    >>> is_word(word_list, 'bat') returns
    True
    >>> is_word(word_list, 'asdf') returns
    False
    '''
    word = word.lower()
    word = word.strip(" !@#$%^&*()-_+={}[]|\:;'<>?,./\"")
    return word in word_list


def get_story_string():
    """
    Returns: a joke in encrypted text.
    """
    f = open("story.txt", "r")
    story = str(f.read())
    f.close()
    return story

WORDLIST_FILENAME = 'words.txt'

# Problem 1 - Build the Shift Dictionary and Apply Shift

We will have a `Message` class with two subclasses `PlaintextMessage` and `CiphertextMessage`. The `Message` class contains methods that could be used to apply a cipher to a string, either to encrypt or to decrypt a message (since for Caesar codes this is the same action). 

In the next two questions, you will fill in the methods of the Message class according to the specifications in the docstrings. The methods in the Message class already filled in are:

- `__init__(self, text)`
- The getter method `get_message_text(self)`
- The getter method `get_valid_words(self)`, notice that this one returns a copy of `self.valid_words` to prevent someone from mutating the original list.

In this problem, you will fill in two methods:

1. Fill in the `build_shift_dict(self, shift)` method of the `Message` class. Be sure that your dictionary includes both lower and upper case letters, but that the shifted character for a lower case letter and its uppercase version are lower and upper case instances of the same letter. What this means is that if the original letter is "a" and its shifted value is "c", the letter "A" should shift to the letter "C".

For the ordering or characters of the English alphabet, we will be following the letter ordering displayed by `string.ascii_lowercase` and `string.ascii_uppercase`.

A reminder from the introduction - characters such as the space character, commas, periods, exclamation points, etc will not be encrypted by this cipher - basically, all the characters within `string.punctuation`, plus the space (`' '`) and all numerical characters (0 - 9) found in `string.digits`.

2. Fill in the `apply_shift(self, shift)` method of the `Message` class. You may find it easier to use `build_shift_dict(self, shift)`. Remember that spaces and punctuation should not be changed by the cipher.


In [None]:
class Message(object):
    
    def __init__(self, text):
        '''
        Initializes a Message object
                
        text (string): the message's text

        a Message object has two attributes:
            self.message_text (string, determined by input text)
            self.valid_words (list, determined using helper function load_words
        '''
        self.message_text = text
        self.valid_words = load_words(WORDLIST_FILENAME)

    
    def get_message_text(self):
        '''
        Used to safely access self.message_text outside of the class
        
        Returns: self.message_text
        '''
        return self.message_text

    
    def get_valid_words(self):
        '''
        Used to safely access a copy of self.valid_words outside of the class
        
        Returns: a COPY of self.valid_words
        '''
        return self.valid_words[:]
        
    
    def build_shift_dict(self, shift):
        '''
        Creates a dictionary that can be used to apply a cipher to a letter.
        The dictionary maps every uppercase and lowercase letter to a
        character shifted down the alphabet by the input shift. The dictionary
        should have 52 keys of all the uppercase letters and all the lowercase
        letters only.        
        
        shift (integer): the amount by which to shift every letter of the 
        alphabet. 0 <= shift < 26

        Returns: a dictionary mapping a letter (string) to 
                 another letter (string). 
        '''
        lower_keys = list(string.ascii_lowercase)
        lower_values = list(string.ascii_lowercase)
        shift_lower_values = lower_values[shift:] + lower_values[:shift]
        
        upper_keys = list(string.ascii_uppercase)                 
        upper_values = list(string.ascii_uppercase)
        upper_shift_values = upper_values[shift:] + upper_values[:shift]

        full_keys = lower_keys + upper_keys
        full_values = shift_lower_values + upper_shift_values

        self.shift_dict = dict(zip(full_keys, full_values))
        return self.shift_dict


    def apply_shift(self, shift):
        '''
        Applies the Caesar Cipher to self.message_text with the input shift.
        Creates a new string that is self.message_text shifted down the
        alphabet by some number of characters determined by the input shift        
        
        shift (integer): the shift with which to encrypt the message.
        0 <= shift < 26

        Returns: the message text (string) in which every character is shifted
        down the alphabet by the input shift
        '''
        shift_msg = []
        for i in self.message_text:
            if i not in self.build_shift_dict(shift).keys():
                shift_msg.append(i)
                continue
            else:
                shift_msg.append(self.build_shift_dict(shift)[i])
        return ''.join(shift_msg)


# Example Test Case:

# apply_shift on "TESTING.... so many words we are testing out your code: last one" with random shift
# Output:
# WHVWLQJ.... vr pdqb zrugv zh duh whvwlqj rxw brxu frgh: odvw rqh

## Problem 2 - PlaintextMessage

`PlaintextMessage` is a subclass of `Message` and has methods to encode a string using a specified shift value. Our class will always create an encoded version of the message, and will have methods for changing the encoding.

Implement the methods in the class `PlaintextMessage` according to the docstring specifications below. The methods you should fill in are:

- `__init__(self, text, shift)`: Use the parent class constructor to make your code more concise.  
- The getter method `get_shift(self)`  
- The getter method `get_encrypting_dict(self)`: This should return a COPY of self.encrypting_dict to prevent someone from mutating the original dictionary.  
- The getter method `get_message_text_encrypted(self)`  
- `change_shift(self, shift)`: Think about what other methods you can use to make this easier. It shouldn’t take more than a couple lines of code. 


In [None]:
class PlaintextMessage(Message):
    def __init__(self, text, shift):
        '''
        Initializes a PlaintextMessage object        
        
        text (string): the message's text
        shift (integer): the shift associated with this message

        A PlaintextMessage object inherits from Message and has five attributes:
            self.message_text (string, determined by input text)
            self.valid_words (list, determined using helper function load_words)
            self.shift (integer, determined by input shift)
            self.encrypting_dict (dictionary, built using shift)
            self.message_text_encrypted (string, created using shift)

        Hint: consider using the parent class constructor so less 
        code is repeated
        '''
        self.shift = shift
        self.message_text = text
        self.valid_words = load_words(WORDLIST_FILENAME)
        self.encrypting_dict = super(PlaintextMessage, self).build_shift_dict(shift)
        self.message_text_encrypted = super(PlaintextMessage, self).apply_shift(shift)
    
    def get_shift(self):
        '''
        Used to safely access self.shift outside of the class
        
        Returns: self.shift
        '''
        return self.shift
    
    def get_encrypting_dict(self):
        '''
        Used to safely access a copy self.encrypting_dict outside of the class
        
        Returns: a COPY of self.encrypting_dict
        '''
        encrypting_dict_copy = self.encrypting_dict.copy()
        return encrypting_dict_copy
    
    def get_message_text_encrypted(self):
        '''
        Used to safely access self.message_text_encrypted outside of the class
        
        Returns: self.message_text_encrypted
        '''
        return self.message_text_encrypted

    def change_shift(self, shift):
        '''
        Changes self.shift of the PlaintextMessage and updates other 
        attributes determined by shift (ie. self.encrypting_dict and 
        message_text_encrypted).
        
        shift (integer): the new shift that should be associated with this message.
        0 <= shift < 26

        Returns: nothing
        '''
        self.shift = shift
        self.encrypting_dict = super(PlaintextMessage, self).build_shift_dict(shift)
        self.message_text_encrypted = super(PlaintextMessage, self).apply_shift(shift)
        
# Example Test Case 1:
# Testing get_shift with message "1.hello!!"
# Output:
# 7

# Example Test Case 2:
# Testing get_encrypting_dict with message "1.hello!!"
# Output:
# Successfully made a copy of encrypting_dict.

# Example Test Case 3:
# Testing get_message_text_encrypted with message "1.hello!!"
# Output:
# 1.jgnnq!!

## Problem 3 - CiphertextMessage

Given an encrypted message, if you know the shift used to encode the message, decoding it is trivial. If message is the encrypted message, and `s` is the shift used to encrypt the message, then `apply_shift(26-s)` on a message gives you the original plaintext message. Do you see why?

The problem, of course, is that you don’t know the shift. But our encryption method only has 26 distinct possible values for the shift! We know English is the main language of these emails, so if we can write a program that tries each shift and maximizes the number of English words in the decoded message, we can decrypt their cipher! A simple indication of whether or not the correct shift has been found is if most of the words obtained after a shift are valid words. Note that this only means that most of the words obtained are actual words. It is possible to have a message that can be decoded by two separate shifts into different sets of words. While there are various strategies for deciding between ambiguous decryptions, for this problem we are only looking for a simple solution.

Fill in the methods in the class `CiphertextMessage` acording to the docstring specifications. The methods you should fill in are:

- `__init__(self, text)`: Use the parent class constructor to make your code more concise.  
- `decrypt_message(self)`: You may find the helper function `is_word(wordlist, word)` and the string method `split()` useful. Note that `is_word` will ignore punctuation and other special characters when considering whether a word is valid.

In [None]:
class CiphertextMessage(Message):
    def __init__(self, text):
        '''
        Initializes a CiphertextMessage object
                
        text (string): the message's text

        a CiphertextMessage object has two attributes:
            self.message_text (string, determined by input text)
            self.valid_words (list, determined using helper function load_words)
        '''
        self.message_text = text
        self.valid_words = load_words(WORDLIST_FILENAME)

    def decrypt_message(self):
        '''
        Decrypt self.message_text by trying every possible shift value
        and find the "best" one. We will define "best" as the shift that
        creates the maximum number of real words when we use apply_shift(shift)
        on the message text. If s is the original shift value used to encrypt
        the message, then we would expect 26 - s to be the best shift value 
        for decrypting it.

        Note: if multiple shifts are  equally good such that they all create 
        the maximum number of you may choose any of those shifts (and their
        corresponding decrypted messages) to return

        Returns: a tuple of the best shift value used to decrypt the message
        and the decrypted message text using that shift value
        '''
        word_counter = 0
        max_count = 0
        for i in range(26):
            for j in list(super(CiphertextMessage, self).apply_shift(i).split(' ')):
                if is_word(self.valid_words, j):
                    word_counter += 1
                if word_counter > max_count:
                    max_count = word_counter
                    shift_value = i
                    decrypted_msg = super(CiphertextMessage, self).apply_shift(i)
                        
        return (shift_value, decrypted_msg)

In [None]:
#Example test case (PlaintextMessage)
plaintext = PlaintextMessage('hello', 2)
print('Expected Output: jgnnq')
print('Actual Output:', plaintext.get_message_text_encrypted())

In [None]:
#Example test case (CiphertextMessage)
ciphertext = CiphertextMessage('jgnnq')
print('Expected Output:', (24, 'hello'))
print('Actual Output:', ciphertext.decrypt_message())

## Problem 4 - Decrypt a Story

Now that you have all the pieces to the puzzle, use them to decode the file story.txt. The helper function `get_story_string()` returns the encrypted version of the story as a string. Create a `CiphertextMessage` object using the story string and use `decrypt_message` to return the appropriate shift value and unencrypted story string.

In [None]:
def decrypt_story():
    joke_code = CiphertextMessage(get_story_string())
    return joke_code.decrypt_message()


<div class="alert alert-info" role="alert" style="margin-top: 1px">

### This notebook has been created by [ALIREZA RAFIYI](www.linkedin.com/in/alireza-rafiyi) and last updated in June 2020.  

The above codes were my solutions to problem set 5 of MIT online course [Introduction to Computer Science and Programing in Python](https://www.edx.org/course/introduction-to-computer-science-and-programming-7), offered through [edX](www.edx.org) platform. For a better presentation, the statement of the problem and the codes have been presented in a Jupyter notebook rather than '.py' modules.