We will implement a very simple encryption scheme that closely resembles the one-time-pad. You have probably seen this method used in movies like [Unknown](http://www.imdb.com/title/tt1401152/?ref_=nm_flmg_act_43). The idea is that you and your counterparty share a book whose words you will use as the raw material for a  codebook. In this case, you need [Metamorphosis, by Franz Kafka](https://storage.googleapis.com/class-notes-181217.appspot.com/pg5200.txt).

Your job is to create a codebook of 2-tuples that map to specific words in the given text based on the line and position the words appears in the text. The text is very long so there will be duplicated words. Strip out all of the punctuation and make everything lowercase.

For example, the word **let** appears on line `1682` in the text as the fourth word (reading from left-to-right). Similarly,
the word **us** appears in the text on line `1760` as the fifth word.

Thus, if the message you want to send is the following:

    let us not say we met late at the night about the secret
    
Then, one possible valid sequence for that message is the following:
    
    [(1682,4),(1760,5),(1650,2),(304,7),(1190,4),(2327,2),(731,4),(988,4),(1091,6),(958,7),(564,10),(1923,9),(849,2)]

Your counterparty receives the above sequence of tuples, and, because she has the same text, she is able to look up the line and word numbers of each of the tuples to retrieve the encoded message. Notice that the word **the** appears twice in the above message but is encoded differently each time. This is because re-using codewords (i.e., 2-tuples) destroys the encryption strength. In case of repeated words, you should have a randomized scheme to ensure that no message contains the same 2-tuple, even if the same word appears multiple times in the message. If there is only one occurrence of a word in the text and the message uses that word repeatedly so that each occurrence of the word cannot have a unique 2-tuple, then the message should be rejected (i.e., assert against this).

Your assignment is to create an encryption function and the corresponding decryption function to implement this scheme. Note that your downloaded text should have 2362 lines and 25186 words in it.



In [1]:
def encrypt_message(message,fname):
    '''
    Given `message`, which is a lowercase string without any punctuation, and `fname` which is the
    name of a text file source for the codebook, generate a sequence of 2-tuples that
    represents the `(line number, word number)` of each word in the message. The output is a list
    of 2-tuples for the entire message. Repeated words in the message should not have the same 2-tuple. 
    
    :param message: message to encrypt
    :type message: str
    :param fname: filename for source text
    :type fname: str
    :returns: list of 2-tuples
    '''
    def allUnique(x):
        seen = set()
        return not any(i in seen or seen.add(i) for i in x)
    assert isinstance(fname,str)
    assert isinstance(message,str)
    
    import string, re
    import random
    message = message.lower()
    table = str.maketrans({key: None for key in string.punctuation})
    message = message.translate(table) 
    assert len(re.findall('[%s]'%string.punctuation,message)) == 0
    # no uppercase characters
    assert len(re.findall('[%s]'%string.ascii_uppercase,message))==0    
    wordList = re.sub("[^\w]", " ",  message).split()
    #print(wordList)    
    with open(fname) as f:
        lines = f.read().split("\n")        
    #print("Number of lines is {}".format(len(lines)))    
    final= []
    for i in range(len(wordList)):
        word = wordList[i]
        #print(word)
        choice_option=[]
        for i,line in enumerate(lines):
            if word in line: # or word in line.split() to search for full words
                result = [i+1 for i,w in enumerate(line.split()) if w.lower() == word]
                if (result == []):
                    pass
                else:
                    line_number=i+1
                    #print("Word \"{}\" found in line {} at the position {}".format(word, line_number,result))
                    #print(line)
                    choice_option.append((line_number-1,result[0]-1))
        assert (choice_option == [])== False, "cannot find the match word"
        final.append(random.choice(choice_option)) 
        assert (allUnique(final))==True,"Cannot contain same code"    
    return(final)    


In [2]:
def decrypt_message(inlist,fname):
    '''
    Given `inlist`, which is a list of 2-tuples`fname` which is the
    name of a text file source for the codebook, return the encrypted message. 
    
    :param message: inlist to decrypt
    :type message: list
    :param fname: filename for source text
    :type fname: str
    :returns: string decrypted message
    '''
    assert isinstance(fname,str)
    assert isinstance(inlist,list)
    with open(fname) as f:
        lines = f.read().split("\n") 
    #print(inlist)   
    final="" 
    for x in inlist:    
        for i, line in enumerate(lines):
            if(i == x[0]):
                #print (line)
                for k,word in enumerate(line.split()):
                    if(k==x[1]):
                        find_word=(word)
                        final+=find_word+" "
    return (final)