# CS1102 Lab: Python Jupyter Lab on Security

## Introduction

In this lab, your goal is to recover a secret message from its encryption in the file `topsecret.txt`. You will accomplish this using a text programming language called **Python**, which is a very popular tool among hackers and scientists. You will do the lab in an interactive coding environment called the **Jupyter Lab**, which allows you to run code in different languages and typeset your notes and equations as well. You will learn how to use *conditional*, *loop*, *list*, *unicode*, and *module* to *encrypt*, *decrypt* and launch a *dictionary attack*. This will cover the basic idea of *cryptology* (*cryptography* and *cryptanalysis*) in **cyber-security**.

## Task 1: Encrypt a character

Each character of the message is encrypted using the following code using the text programming language **Python**. Select the following cell by clicking on it, and run the cell by Shift+Enter: You should see `In [ ]` on the left changed to `In [*]`, meaning that the input is being evaluated. After evaluation, `In [1]` (the first evaluated input) and `Out [1]` (the first evaluated output) should appear.

In [None]:
def encrypt_character(char, key):
    """
    Given a character and a key, encrypt the character by the key 
    using transposition cipher, and return the encrypted character.

    Parameters
    ----------
    char : character
        The character to be encrypted
    key : int
        The secret key
    Returns
    -------
    The encrypted character : character
    """
    char_int_code = ord(char)
    shifted_char_int_code = (char_int_code + key) % 1114112
    encrypted_char = chr(shifted_char_int_code)
    return encrypted_char

The code may look a bit scary at first. Don't worry, it is easy to use. 
To encrypt a letter 'H' using the key 5, for example, we write `encrypt_character('H', 5)`. The returned value (encryption) can be printed to an output cell using the `print` function. Run the following cell to see this in action.

In [None]:
print(encrypt_character('H', 5))

In the following cell, write and run the code that prints the encryption of 
- 'H' by the key 6;
- 'I' by the key 5;
- '🥑' by the key 5;
- '🥖' by the key -5;

In [None]:
# Task 1 [Write your code below to perform the tasks described above
#         Run your code by Shift+Enter]


## Task 2: Decrypt a character

You will write a Python function called `decrypt_character` to decrypt an encrypted character. It should satisfy the following equality
$$\operatorname{decrypt\_character}(\operatorname{encrypt\_character}(a,k),k) = a \quad \forall \;\text{character }a, \text{key }k$$
To put it simply, a legitimate user who possesses the key should be able to recover the original character from its encryption.

The following cell contains a skeleton of the function. Read the descriptions below the following cell and follow each step to complete the code in the following cell. 

In [None]:
def decrypt_character(encrypted_char, key):
    """
    Given a character encrypted using transposition cipher,
    return the decrypted character.

    Parameters
    ----------
    encrypted_char : character
        The character to dencrypted
    key : int
        The secret key
    Returns
    -------
    The decrypted character : character
    """
    
    # Step 1 [Write your code below]

    # Step 2 [Write your code below]
    
    # Step 3 [Write your code below]
    
    # Last Step [Write your code below]
    
    # After completing the above code, run it by Shift+Enter
    

The keyword `def` defines a function named `decrypt_character`. The function takes two input arguments, whose values will be stored in the variables named `encrypted_char` and `key`. The colon signals the beginning of the body of the function definition. Every line of the body must be indented because, similar to scratch, Python uses the indentation (by the Tab key or at least 4 spaces) to organize consecutive lines of code into blocks (instead of {} as in Javascript). To learn more details of a keyword in Python, you can use the `help` function. For example, run the following cell to learn more about function definitions. 

In [None]:
help('def')

You can also use the help function to see the documentation of any function defined before. Run the following cell to see the documentation of the `decrypt_character` function. 

In [None]:
help(decrypt_character)

Note that the body of the decryption function is currently empty, with only a description enclosed by `"""`, and some lines of comments (pieces of code that are not evaluated) preceded by `#`. Follow the procedure below to fill in the missing code in the decryption function.

### Step 1
First, convert the encrypted character to unicode using the function `ord` and then assign the unicode to a variable named `encrypted_char_int_code`.

In [None]:
# Try out the ord() function here.
ord('N')

### Step 2
Next, transpose `encrypted_char_int_code` by the `key` as follows:  
- subtract `key` from `encrypted_char_int_code` using the operator `-`.
- take modulo 1114112 of the resulting number using the operator `%` and parantheses `()`.
- assign the value to a variable named `shifted_int_code` using the operator `=`.

The operator `-` and `=` are pretty straight-forward. The modulo operation `a % b` essentially gives the remainder when `a` is divided by `b`. The parentheses () can be used to override the order of performing different operations. You can learn more about the modulo operator and its precedence by the running the following cell. 

In [None]:
# Try out the modulo operator %.
print((1-2) % 3)
print(1-2 % 3)
help('%')

### Step 3
Transform the shifted integer representation back to a character using the function `chr` and then assign it to a variable named `decrypted_char`.

In [None]:
chr(72)

### Last step
Return the decrypted character using the `return` statement. Run the cell again to update the definition of the decryption function.

**Test**: 
Run the cell below to test the correctness of your implementation.

In [None]:
message = ""
for key in [-1,0,5,1114112]:
    for char in 'CS1102':
        decrypted_char = decrypt_character(encrypt_character(char,key),key)
        message += "'"+char+"' encrypted with key " + str(key) + ": "
        if decrypted_char!=char:
             message += "Decryption failed! Decrypted to '"+decrypted_char+"'.\n"
        else:
             message += "Decryption succeeded.\n"
print(message)

The above code uses
- two nested `for` loops that assign the variable`key` to each values in the list `[-1,0,5,114111]`, and assign `char` to each character in the string `'CS1102'` (a string is nothing but a list of characters),
- an `if` statement to check if the decryption fails, i.e., whether the decryption of the encryption of `char` by `key` is not equal to (`!=`) `char`,
- the operator `+` to concatenate two strings into one string, and
- and the augmented assignment operator `+=` to append a string to a variable containing a string.

You will need to use the above statements and operators very soon. Remind yourself that you need to use colon : in the for-statement and if-statement. Once again, you can type `help('for')` and `help('if')` for more details. 

In [None]:
help('if')
help('for')
help('+=')

## Task 3: Decrypt a ciphertext

Instead of decrypting one character at a time, you will write the function `decrypt` which conveniently decrypts all characters in a given cyphertext and return the entire plaintext. Complete the function below.

In [None]:
def decrypt(ciphertext, key):
    """
    Given a ciphertext and the key, return the plaintext.

    Parameters
    ----------
    ciphertext : string
        The text to be decripted
    key : int
        The secret key
    Returns
    -------
    string
        The plaintext
    """
    # Task 3 [Write your code below to complete this function]


To test your code, run the following cell to see if you can get a meaningful plaintext.

In [None]:
ciphertext = 'Mjqqt%\\twqi&'
decrypt(ciphertext, 5)

## Task 4: Decrypt without the secret key
You will launch a dictionary attack to decrypt the ciphertext in topsecret.txt without knowing the secret key. The idea is simple: you try decrypting the ciphertext with different keys and see which of the resulting plaintexts make most sense (most english-like). Execute the following pieces of code step-by-step to recover the top secret.

### Step 1: Read the file

Read the content of the file `topsecret.txt` by running the following cell.

In [None]:
import io               # Python module for stream handling

# open the file with unicode standard utf-8
with io.open('topsecret.txt','r',encoding="utf-8") as f:
    ciphertext = f.read()   # read the content of the file
    
print(ciphertext)

The `import` statement loads a module (a library of predefined functions) that can handle file read and write. You can learn more about different Python modules using the `help` function, e.g., `help('io')`. Once loaded, you can use the function `open` of `io` as `io.open` to open a file as above. The above uses a `with` statement that properly closes the file after use. If you are curious how the `with` statement works, take a look at <http://effbot.org/zone/python-with-statement.htm>. Don't worry if you do not fully understand now. It just handles all the tedious tasks for you.

### Step 2: Give a score to a text

Now, you will enter all the English words in a dictionary to a list.

In [None]:
dictionary = ['all','english','words',...]

Just kidding! You do not need to do it manually. Obtain the list from a Natural Language Toolkit (NLTK) by running the following cell. Python has many great tools for hackers!

In [None]:
import nltk             # import the NLTK package. For details, run help(nltk).
nltk.download('words')  # download the Corpus of English words. Run help(words).
from nltk.corpus import words
dictionary = set(word.lower() for word in words.words())

The dictionary is a very long list, so you may need to wait for a little while until you see the message "Package words is already up-to-date!" Check how many words are there using the `len` command.

In [None]:
len(dictionary)

We can use the dictionary as follows:

In [None]:
print('hello' in dictionary)
print('helo' in dictionary)

You will now write the following function `get_score` which computes how english-like a given text is using the formula
$$\frac{\text{number of english words in the text}}{\text{number of words in the text}}$$

In [None]:
def get_score(text):
    """
    Given a text, return a score within [0,1] defined by 
        (# english words)/(# words), 
    or 0 if text is empty.

    Parameters
    ----------
    text : string

    Returns
    -------
    float
        The score of a sentence
    """
    # when text is empty
    if len(text) == 0:
        return 0
    
    # change all character to lower case to match the words in dictionary
    text = text.lower()
    
    # transform the text into a list of words
    list_of_words = text.split() # e.g. ['apple', 'is', ]

    # Get the total number of words in the sentence
    num_words = len(list_of_words)

    # Step 2 [Write your code below to complete this function,
    #         i.e., compute the score given by the formula above this cell]


Test your function by running the following cell.

In [None]:
# Test the function
print(get_score('Today is Monday'))  # score should be 1
print(get_score('abc is asdf'))      # score should be 1/3

### Step 3: Dictionary attack
Finally, complete the for loop below to search for one possible plaintext with a score of at least 0.5. Print the plaintext and their corresponding score and key. (You can use `break` to break out of a loop.) 

In [None]:
for key in range(1114112):
    decrypted_content = decrypt(ciphertext, key)
    # Step 3 [Write your code below to complete this step, i.e., check the
    #         score and print the plaintext with score of at least 0.5]


### Task 5 (Optional): Understanding the level of security

__*Task 5a:*__


How does the transposition cipher work? What is the reason of taking modulo `1114112`?

__*Task 5b:*__


In the dictionary attack, is it necessary to try more keys outside `range(1114112)`? Why?

__*Task 5c:*__


If we encrypt the characters at odd positions and even positions using two different keys, is the security level higher? Why?

__*Task 5d:*__


Using transposition cipher, encrypt a secret message by a secret key of your choice and write it to a file called `mysecret.txt`. Try to have an ethical and meaningful message that is hard to decode without the key. Send the file to your friend or post your message to the discussion page [Lab 10: Your secret messages](https://canvas.cityu.edu.hk/courses/23846/discussion_topics/164818) to see if anyone can decrypt your message without the secret key.

In [None]:
# Task 5d [Write your code below]
def encrypt(text,key):
    # your code here
    

myplaintext =          # your message here
mykey =                # you key

In [None]:
# Encrypt
myciphertext = encrypt(myplaintext,mykey)
if decrypt(myciphertext,mykey)==myplaintext:
  print('Success!')
else:
  print('Failed.')

In [None]:
# write to file
with io.open('mysecret.txt','w',encoding='utf-8') as f:
    f.write(myciphertext)

__*Task 5e:*__


Take a look at an encryption scheme called the [one-time pad (OTP)](https://en.wikipedia.org/wiki/One-time_pad), and explain how one can achieve perfect secrecy. 