## Using AES encryption in Python

### Due 10/22/19

Below you will see a simple example of how to use AES for encryption in Python.  

The Advanced Encryption Standard (AES) is one of the most widely used and secure ciphers.

AES is a _block cipher_ meaning that it encrypts data block by block. A _block_ for AES is 128 bits of information (16 bytes).  This is about 16 keyboard characters (or about 5 pixels) worth of information.  

The details of what AES does to a block of data to encrypt it is a little complicated, but it basically involves repeatedly making substitutions in the bytes (like our substitution cipher) and then mixing the bytes around.  Exactly what comes out also depends on the _key_ which is mixed into the data during the encryption process.  

A block cipher like AES usually requires a mode of operation.  Again this is beyond the scope of our project but the curious can read the Wikipedia page below.  

[https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation](https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation)

The goals of the block cipher modes are to blur the statistical relationship between the plaintext and the ciphertext, and also to ensure that if a message is encrypted twice with the same key then different ciphertexts come out.

That's possible because in addition to the _key_ the block cipher takes an initialization vector (IV) which is a random number associated with a single encryption.  It does not need to be kept private.  

There are a lot of libraries for doing cryptography in Python.

The library we will use is called [pycrypto](https://pypi.org/project/pycrypto/).  

The basic usage is demonstrated below.  Ciphertext is not printable (because it involves random bits which do not render as normal characters.)  We print the ciphertext out in a few different ways to give you a feel for what it is.  In a Python byte string '\xAA' is the hexadecimal number AA, '\x10' is the hexadecimal number 10, etc.  

Sometimes the ciphertext is printed as a mix of printable characters and hex numbers.  For example the following is a ciphertext:

    ';\x9a~2|\x187\x83v\x07p?\x93\x19,z\x83\xd6\xdd\x12\xff\xf5`\xa2J\x11E\xa1m~-\xa8'
   
Where possible the characters are printed (eg ';' 'a' '~'  etc).  When I non-printing character is encountered the hex is printed instead (eg 0x9a 0x18 0x83 etc).   


In [1]:
#https://pypi.org/project/pycrypto/

from Crypto.Cipher import AES
obj = AES.new('This is a key123', AES.MODE_CBC, 'This is an IV456')
message = "The answer is noBye mom, love you.**************"
ciphertext = obj.encrypt(message)

print("Encrypting the message M='{}'' using the key 'This is a key123' and the IV 'This is an IV456'".format(message))

print("\nHere is the ciphertext as a byte string:")

print( ciphertext)

print("\nHere is the ciphertext as a list of hexadecimal numbers:")
print( [hex(x) for x in list(ciphertext)])

print("\nHere is the ciphertext as a sequence of bits:")
print("".join( [bin(x)[2:] for x in list(ciphertext)]))

Encrypting the message M='The answer is noBye mom, love you.**************'' using the key 'This is a key123' and the IV 'This is an IV456'

Here is the ciphertext as a byte string:
b'\xd6\x83\x8dd!VT\x92\xaa`A\x05\xe0\x9b\x8b\xf1n6D^-:\xb2\xfd\xdc,\xcc0\x9adp\x92\xe7\xdd\x19\xfd\x06%\xac\xcc\x04`\xd4\x88\xfe\t\x95\x96'

Here is the ciphertext as a list of hexadecimal numbers:
['0xd6', '0x83', '0x8d', '0x64', '0x21', '0x56', '0x54', '0x92', '0xaa', '0x60', '0x41', '0x5', '0xe0', '0x9b', '0x8b', '0xf1', '0x6e', '0x36', '0x44', '0x5e', '0x2d', '0x3a', '0xb2', '0xfd', '0xdc', '0x2c', '0xcc', '0x30', '0x9a', '0x64', '0x70', '0x92', '0xe7', '0xdd', '0x19', '0xfd', '0x6', '0x25', '0xac', '0xcc', '0x4', '0x60', '0xd4', '0x88', '0xfe', '0x9', '0x95', '0x96']

Here is the ciphertext as a sequence of bits:
110101101000001110001101110010010000110101101010100100100101010101011000001000001101111000001001101110001011111100011101110110110100010010111101011011110101011001011111101110111001011001100110

## Decryption

Decryption works like this.

In [2]:

obj2 = AES.new('This is a key123', AES.MODE_CBC, 'This is an IV456')
obj2.decrypt(ciphertext)

b'The answer is noBye mom, love you.**************'

In [3]:
# Exercise:
## What happens if the IV or the key is wrong in the decryption (try it)?
obj2 = AES.new('This is a key623', AES.MODE_CBC, 'This is an IV196')
obj2.decrypt(ciphertext)

b'?\xa9Dv\xa4\xde\xad\xfc\xeb8\xd9I\xf8\x83\xad\xf3\x1a\x98\xeb\x009\x9a\xb6_\xb8\xaf\x833S\xe8o\x0c\xba\xed\xe6\x83\xe7\xe3\x06\xd5\xcf\xb6rO\xe1\x04\xd7\xba'

In [4]:
# You might want to read the documentation for AES.new, below. 

In [5]:
AES.new?

# An irritating detail (part 1)

You might notice that the message encrypted above is exactly a multiple of 16 bytes (characters) long.

If you change the message to be shorter or longer (but not an exact multiple of 16 characters) you'll find that you get an error message.

This happens because the block cipher mode CBC requires the message to divide evenly into blocks. Because a block is 16 bytes, messages must have lengths that are multiples of 16.  

You can get around this with "padding" which rounds out the length of a message.

Here is an example of how it works.


In [5]:
# Much of this code is borrowed from here:  https://gist.github.com/crmccreary/5610068
BS = 16
pad = lambda s: s + (BS - len(s) % BS) * chr(BS - len(s) % BS) 
unpad = lambda s : s[0:-ord(s[-1])]

#The above lambdas do the padding and unpadding

padded_message = pad("squ")
padded_message  

'squ\r\r\r\r\r\r\r\r\r\r\r\r\r'

In [6]:
ps = "squ" + 13*chr(13)
print(repr( ps))
ps[0:-13]

'squ\r\r\r\r\r\r\r\r\r\r\r\r\r'


'squ'

Because the message "squ" is length 3, it lacks 13 characters to make a full block.

The padding scheme works by filling out 13 more bytes in the message with the value 13 (which happens to represent the ASCII carriage return character '\r').


In [7]:
#Another example
padded_message = pad("Class of 2023")
padded_message

'Class of 2023\x03\x03\x03'

Because the message "Class of 2023" is 13 characters (bytes) long, it lacks 3 characters (bytes) to make a full block.  The padding scheme works by appending 3 bytes consisting of 0x03 to the message to make the length a multiple of 16 bytes.

In [8]:
#Now we encrypt the padded message as usual...

obj3 = AES.new('This is a key123', AES.MODE_CBC, 'This is an IV456')
ciphertext = obj3.encrypt(padded_message)
print(ciphertext)

b'_\xbf\xe8"\x86*\xc6\xf3\xc2\n\xf5\x0c\x98\xcc\x01z'


In [9]:
#Aside:  The decode function turns a byte array into a python string
type(b'hello'), type(b'hello'.decode())

(bytes, str)

In [10]:
type( b'Nicholas Cage'.decode())

str

In [11]:
# Now we decrypt the ciphertext and "unpad" the result to get the original message.

obj4 = AES.new('This is a key123', AES.MODE_CBC, 'This is an IV456')
msg_with_pad = obj4.decrypt(ciphertext)
print(msg_with_pad)
message = unpad(msg_with_pad.decode())
message

b'Class of 2023\x03\x03\x03'


'Class of 2023'

# Exercise:

Encrypt and then decrypt a message of your choice using a key and IV of your choice.  The key must be 16,24 or 32 characters, exactly, and the IV must be 16 characters. 

Be sure to use padding so that you don't need to worry about the length of the message.


In [24]:
# Solution to exercise goes here
#https://pypi.org/project/pycrypto/

from Crypto.Cipher import AES
import hashlib

m = hashlib.sha256()
passphrase = input("Please enter a passphrase: ")
passphrase = passphrase.encode() #turns a string into a byte string
m.update(passphrase)
digest = m.digest() #this "digest" is the hash
key = digest[:16]  # The first 16 bytes of the 32 byte hash are the key
IV = digest[16:]   # The last 16 bytes of the 32 byte hash are the IV

message = input("Enter a short message to encrypt: ")
BS = 16
pad = lambda s: s + (BS - len(s) % BS) * chr(BS - len(s) % BS) 
unpad = lambda s : s[0:-ord(s[-1])]

obj = AES.new(key, AES.MODE_CBC,IV)
message = pad(message)
ciphertext = obj.encrypt(message)

print("This produced the ciphertext below.")
print(ciphertext)

m = hashlib.sha256()
passphrase = input("Please enter a passphrase for decryption (same as encryption): ")
passphrase = passphrase.encode() #turns a string into a byte string
m.update(passphrase)
digest = m.digest() #this "digest" is the hash
key = digest[:16]  # The first 16 bytes of the 32 byte hash are the key
IV = digest[16:]   # The last 16 bytes of the 32 byte hash are the IV

obj = AES.new(key, AES.MODE_CBC,IV)

try:
    message = obj.decrypt(ciphertext)
    message = unpad(message.decode())
    print("The plaintext was:")
    print(message)
except:
    print("Decryption failed.  Was the passphrase correct?")

Please enter a passphrase:  bluemoon

Enter a short message to encrypt:  how lucky can one guy be

This produced the ciphertext below.
b'\xe7\x9e\x19e\xaf\xac\xdfb\xeakF\x92\xbcF(\xbf;&\x9a\xbd6&L\x8d\xe7"?\xde\xfe\xda\xe2X'


Please enter a passphrase for decryption (same as encryption):  redmoon

Decryption failed.  Was the passphrase correct?


In [13]:

# What happens if the message is already a multiple of 16 in length
# and you pad it?

S = "Even God doesn't work on Sundays"
print( len(S))
pad(S)

32


"Even God doesn't work on Sundays\x10\x10\x10\x10\x10\x10\x10\x10\x10\x10\x10\x10\x10\x10\x10\x10"

## An irritating detail (part 2)

Notice that the "key" must be exactly 16, 24, or 32 bytes long.  This means that we can't use ordinary passwords like "swordfish" as a key.

We can get around this and use natural passphrases as keys, we just need to be sure to use the _hash_ of a passphrase as a key.  Below is an example of how this can work.

We use [SHA256](https://en.wikipedia.org/wiki/SHA-2) as a hash function.

It returns a 256 bit = 32 byte hash digest.  We use the first 16 bytes for the key, and the last 16 bytes for the IV.  Now we can relax and use regular passwords, plus we don't have to remember the IV because it's built into the password. 

In [14]:
# A quick example of hashing ...

import hashlib
m = hashlib.sha256()
message_to_hash = "I liked you the whole time".encode()
D = m.digest()
print("Here's the digest of the message:")
print(D)
print("\nHere are the first 16 bytes")
print(D[:16])
print("\nHere are the last 16 bytes")
print(D[16:])

Here's the digest of the message:
b"\xe3\xb0\xc4B\x98\xfc\x1c\x14\x9a\xfb\xf4\xc8\x99o\xb9$'\xaeA\xe4d\x9b\x93L\xa4\x95\x99\x1bxR\xb8U"

Here are the first 16 bytes
b'\xe3\xb0\xc4B\x98\xfc\x1c\x14\x9a\xfb\xf4\xc8\x99o\xb9$'

Here are the last 16 bytes
b"'\xaeA\xe4d\x9b\x93L\xa4\x95\x99\x1bxR\xb8U"


In [20]:
"""
This code turns a natural passphrase into a 16 byte key, a 16 byte IV, and encrypts a given message.
"""

import hashlib
m = hashlib.sha256()
passphrase = input("Please enter a passphrase: ")
passphrase = passphrase.encode() #turns a string into a byte string
m.update(passphrase)
digest = m.digest() #this "digest" is the hash
key = digest[:16]  # The first 16 bytes of the 32 byte hash are the key
IV = digest[16:]   # The last 16 bytes of the 32 byte hash are the IV

message = input("Enter a short message to encrypt: ")
obj = AES.new(key, AES.MODE_CBC,IV)
message = pad(message)
ciphertext = obj.encrypt(message)

print("This produced the ciphertext below.")
print(ciphertext)

Please enter a passphrase:  qweqwe

Enter a short message to encrypt:  asd

This produced the ciphertext below.
b'0A\x89-g\xcek"\n1\xf2\xfd\x93\xe8\x942'


In [25]:
# Now we get the passphrase once again for decryption.

import hashlib
m = hashlib.sha256()
passphrase = input("Please enter a passphrase for decryption (same as encryption): ")
passphrase = passphrase.encode() #turns a string into a byte string
m.update(passphrase)
digest = m.digest() #this "digest" is the hash
key = digest[:16]  # The first 16 bytes of the 32 byte hash are the key
IV = digest[16:]   # The last 16 bytes of the 32 byte hash are the IV

obj = AES.new(key, AES.MODE_CBC,IV)

try:
    message = obj.decrypt(ciphertext)
    message = unpad(message.decode())
    print("The plaintext was:")
    print(message)
except:
    print("Decryption failed.  Was the passphrase correct?")

Please enter a passphrase for decryption (same as encryption):  wider than a mile

The plaintext was:
moonriver


In [3]:
#Excercise: 
## Complete the function below so that it works as described.
from Crypto.Cipher import AES
import hashlib

def get_AES_obj(passphrase):
    """Input: a passphrase string
       Output: the function returns AES.new(key, AES.MODE_CBC,IV), where key and IV are derived from the passphrase as above.
    """
    m = hashlib.sha256()
    passphrase = passphrase.encode() #turns a string into a byte string
    m.update(passphrase)
    digest = m.digest() #this "digest" is the hash
    key = digest[:16]  # The first 16 bytes of the 32 byte hash are the key
    IV = digest[16:]   # The last 16 bytes of the 32 byte hash are the IV
    obj = AES.new(key, AES.MODE_CBC,IV)
    return obj

## Complete the function below so that it works as described.    
def encrypt(message,passphrase):
    """
    Input:  a message string to encrypt and an arbitrary passphrase string.
    Output: a byte string corresponding to the ciphertext produced by encrypting
    "message" in the manner shown in the above cells of this notebook.  
    Be sure to use the hash trick and padding. 
    """
    obj = get_AES_obj(passphrase)
    BS = 16
    pad = lambda s: s + (BS - len(s) % BS) * chr(BS - len(s) % BS) 
    padded_message = pad(message)
    ciphertext = obj.encrypt(padded_message)
    return ciphertext

    
    
    
    
    
## Complete the function below so that it works as described.
def decrypt(ciphertext,passphrase):
    """
    Input:  a ciphertext byte string to decrypt and an arbitrary passphrase string.
    Output: a string corresponding to the original plaintext

    Be sure to unpad the message and use try/except structures for robustness.
    """    
    obj = get_AES_obj(passphrase)
    unpad = lambda s : s[0:-ord(s[-1])]
    pdm = obj.decrypt(ciphertext)
    unpadded = unpad(pdm.decode())
    return unpadded


In [22]:
# Test your encryption and decryption functions...

CT = encrypt("prof johnson is in his kitchen","stalker666")
PT = decrypt(CT,"stalker666")
print(PT)
### The output should be "prof johnson is in his kitchen"

prof johnson is in his kitchen


In [21]:
## Exercise:

###  Read in and encrypt a file.
###  Write the ciphertext to a new file.
###  Read in the ciphertext and decrypt it.
###  Print out the plaintext

### I've done some of the rigamorole for you to get you started.

### Update:  Because this is not a class on Python I have completed this exercise for you.
### Be sure to read through and see how this program works.

m = hashlib.sha256()
passphrase = input("Please enter a passphrase: ")
passphrase = passphrase.encode() #turns a string into a byte string
m.update(passphrase)
digest = m.digest() #this "digest" is the hash
key = digest[:16]  # The first 16 bytes of the 32 byte hash are the key
IV = digest[16:]   # The last 16 bytes of the 32 byte hash are the IV
obj = AES.new(key, AES.MODE_CBC,IV)


## This creates a file you can encrypt
file_to_encrypt = open("fte.txt","w")
file_to_encrypt.write("Luxembourg is a country in central Europe.")
file_to_encrypt.close()

## Open and read in the plaintext
fp = open("fte.txt","r")
plaintext = fp.read()
fp.close()
plaintext = pad(plaintext)   ## Pad the plaintext
ct = obj.encrypt(plaintext)  ## Encrypt the plaintext

## Write the ciphertext to a file
fp = open("fte.enc","wb")
fp.write(ct)   ### Write the ciphertext to the fte.enc file
fp.close()

## Open the ciphertext for decryption
fp = open("fte.enc","rb")
obj = AES.new(key, AES.MODE_CBC,IV)
ct = fp.read()    ## Read the file
fp.close()

pt = obj.decrypt(ct)   ## Decrypt the ciphertext
pt = unpad(pt.decode())   ## Unpad the message
pt                        ## Print the plaintext


Please enter a passphrase:  not blah

'Luxembourg is a country in central Europe.'

## The Project

The goal of this project is to create a standalone command line tool which encrypts and decrypts files.  You already have the basic parts:  File IO and functions for encrypting and decrypting messages. You just need to put all of these parts together.  

You should have absorbed a bit about how Python works in the above.  You can watch this video for a little more info:

[Python in 10 Minutes](https://www.youtube.com/watch?v=pxXTHGRtXXs&lc=z23qcnhq1ujpfxgnkacdp43axgdsz4tk00mduure3nhw03c010c)

Two things about that video:  

1. You don't need to install Python if you're using Cocalc.  
1. That tutorial is for Python 2.

The main practical difference between Python 2 and Python 3 is that print statments look a little different:

Python 2:

    print "hello, no parens!"
    
Python 3:

    print("Oh no, I have to use parentheses!")
    
#### A Standalone Program

Here is how you create a standalone program in Python.

**First** Create a new file with the name you want your program to have.  Please call your program for this project `encrypt360`.  

From the command line you can do this by entering:

    $ touch encrypt360
    

**Second** Modify the file permissions for this file so that it is executable. You do this using *chmod* as follows (from the command line):

    $ chmod u+x encrypt360

This says "allow the user to execute the file named encrypt360."

**Third** Add the following line to the top of the file:

    #! /usr/bin/python3
    
This tells the system:  "When this file is executed, run it using the Python 3 interpreter, which is located in the directory `/usr/bin`."

To make sure your setup is working, add the line

    print("Hello world!")
    
to your file.  Then execute it from the command line like this:

    $ ./encrypt360
    
This should print "Hello World" to the terminal.  (You can delete that line now if the check worked).

#### Specifications

Your command line tool should take 2 command line inputs:  A mode and a filename.

The *mode* will be either "e" or "d" representing *encrypt* or *decrypt*.  

The *filename* will be the name of the file to encrypt or decrypt.  

For example the user might type

    $ encrypt360 e diary.txt
    
where the command line arguments are 'e' and 'diary.txt'.

If the user doesn't include these command line arguments you want to yell at them and tell them how the program works.

Here is some starter code:

    #! /usr/bin/python3

    import sys

    arguments = sys.argv

    if len(arguments) != 3:
        print("Usage:  encrypt360 <mode> <filename>")
        exit(0)
        
    print(arguments)

Try running this.  This program just gets the command line inputs and makes sure that there are two of them (in addition to the filename which is always given to a program as the 0th command line input).

#### Robustness 

You can make your program more robust by checking for other user goofs.

You should add code to make sure the following are true:

1. Verify that `arguments[1]` is either 'e' or 'd'.  If not yell at the user and stop the program (exit(0)). 
2. Verify that a file called <filename> (whatever the user typed) actually exists.  If it doesn't, yell at the user and stop the program.  

The following code shows you how to do (2). 

    #! /usr/bin/python3

    import sys

    arguments = sys.argv

    if len(arguments) != 3:
        print("Usage:  encrypt360 <mode> <filename>")
        exit(0)


    mode, filename = arguments[1],arguments[2]

    import os

    if not os.path.exists(filename):
        print("File does not exist!")
        exit(0)

This bit of code uses the Python `os` module to check that the filename given as the 2nd input by the user refers to a file that really exists.  If the file does not exist, the program shuts down.

#### The Real Job

Now it's time to add the "guts" to your program:  the encryption and decryption.  

You may or may not want to copy the functions you wrote in cell 18 above into `encrypt360` so that you can use them in your code.  You can just put them at the top (but below the #! line).  You will also need `pad` and `unpad` from cell 6 above.  You will also need to add the line

    from Crypto.Cipher import AES

at the top of your program (above the functions but below #!).

Now, borrow code from cell 27 above to actually implement your program.

To name your ciphertext file, just add ".enc" to whatever filename the user gave:
    
    ciphertext_name = filename+".enc"
    
When decrypting, you can strip off the ".enc" like this:

    plaintext_name = filename[:-4]
    
Your program should prompt the user for a passphrase as in cell 27 and use this to generate the key/IV.

#### Testing

Test your program like this:

**First** Use it to encrypt a plaintext `pt.txt` that you create.

**Second** Use xxd to look at the resulting ciphertext:
    $ xxd pt.txt.enc
You should see some random looking bytes.  *Do not* open the ciphertext using the Cocalc text viewer (because it breaks binaries).

**Third** Use your program to decrypt `pt.txt.enc` and verify that the result is the same as `pt.txt`.

(Be sure not to overwrite your original `pt.txt` -- keep a safe copy of that file under a different name.)