<h1 align='center'> COMP2420/COMP6420 - Introduction to Data Management, Analysis and Security</h1>

<h2 align='center'> Lab 11 - Security II</h2>

*****

## Aim

Our aim in this lab is the following:
- Become experienced with hash functions, their uses and weaknesses
- Understand the use of Digital Signatures

## Learning Outcomes
- L10: Explain key security concepts and the use of cryptographic techniques, digital signatures and PKI in security


## Preparation


Before starting this lab, we suggest you complete the following:
- Finish watching the lectures on security
- Read the [following article](https://coindoo.com/how-does-a-hashing-algorithm-work/) on hashing functions (with the added information about how it is used in blockchain technologies!)


In [4]:
# Code Imports - Place here to ensure students don't miss it!
import numpy as np
import pandas as pd
import random

****
## Topic 1:  Hash Functions
Hash functions are considered very very (very!) important to cryptography today. Without repeating the lessons of the lecture, a hash function should have the following properties:
- Collision Resistance - It should be computationally infeasible to find two different messages that hash to the same value. That is:
```
M = Message
M' = Another Message
M != M' (Messages are not the same)
H() = Hash Function 
H(M) != H(M')
```
If `H(M) == H(M')`, it is possible to falseify the hash. Obviously, not a good thing!
- Pre-image resistance - It should be computationally infeasible to determine the original message `M` given a hash function `H() `and a hashed message `H(M)`. Also known as _onewayness_
- 2nd-pre-image resistance - Given the knowledge of a hash function `H()` and a message `M`, it should be computationally infeasible to find another message `M'` that will hash to the same value.

A recent-ish paper is a good introduction to this: [Cryptographic Hash Functions: Recent Design Trends and Security Notions](https://eprint.iacr.org/2011/565.pdf).

Other nice properties include:
- A small change in the original message should yield a large change in the resulting hash so it seems unrelated completely to a similar message.

Lets look at a quick comparsion to a regular encrpytion method:

<img src="./img/hashing.png" alt="Hashing Example" style="width: 400px;"/>

Where we can encrypt and decrypt to get back to the original message, you don't have the same ability with hashing.

Enough reading, lets get onto some questions

*****

## Question 1:  Hash Functions
The following question is aimed to provide the properties of hash functions, and give you the ability to determine what could be wrong with other hash functions.

### Q1.1:  Hash Functions Properties
With the knowledge of the above properties (and properties discussed in the lectures), check out Alex's custom hasing functions and provide information on what is wrong with the hash functions.

#### Q1.1.1
What is wrong with the following hash function? Run it multiple times, and feel free to change the message.

In [6]:
def hash1(message):
    candidate_list = [chr(x) for x in range(ord('a'), ord('z') + 1)]  + [str(i) for i in range(0,9)]
    hashed = ''.join([random.choice(candidate_list) for i in range(8)])
    return hashed

message = 'this is a short test string'
print (hash1(message))

es4wgxaq


#### Q1.1.2
What is wrong with the following hash function? Run it multiple times, and feel free to change the message.

In [13]:
def hash2(message):
    candidate_list = [chr(x) for x in range(ord('a'), ord('z') + 1)]  + [str(i) for i in range(0,9)]
    return ''.join([candidate_list[hash(char)%35] for char in message])

message = 'this is a short test string'
print (hash2(message))

ojrq6rq616qj10o6oiqo6qo0rao


#### Q1.1.3
What is wrong with the following hash function? Run it multiple times, and feel free to change the message.

In [17]:
def hash3(message):
    candidate_list = [chr(x) for x in range(ord('a'), ord('z') + 1)]  + [str(i) for i in range(0,9)]
    splited = np.array_split(list(message), 8,axis=0)
    hashed = ''.join([ candidate_list[ sum([ord(char) for char in chunk])%35] for chunk in splited])
    return hashed

message = 'this is a short test string'
print (hash3(message))

ue70es4d


### Q1.2:  Real Hash Functions
Python is lovely with its packages, as we already have various hash functions to call and use.

You have a number of tasks. Firstly:
- Generate a hash of the message `this is a short test string` using the `md5`, `sha256`, and `black2b` hash functions in the [`hashlib` library](https://docs.python.org/3.7/library/hashlib.html)

In [20]:
# Your Code here
# TODO, Import the hashlib library, and hash the message from the block above
from hashlib import sha256
from hashlib import md5
from hashlib import blake2b

msg = "this is a short test string"
md5_hash = md5(msg.encode("UTF-8")).hexdigest()
sha_hash = sha256(message.encode("UTF-8")).hexdigest()
blake2b = blake2b(message.encode("UTF-8")).hexdigest()

print ("md5 hash: ", md5_hash)
print ("sha256 hash: ", sha_hash)
print ("blake2b hash: ", blake2b)

md5 hash:  0af57bc31c06d32bb81e6c7171a1e51e
sha256 hash:  db44d7adb45d2edaffd11d29d1f83a44308c26978a76f6d13ba99c28fbccfe53
blake2b hash:  a13c9bff609acfac9225e0301b8dcbedd3715b29c6f5391a774dbda75997359f1ec45cda7fa63d1b88ab914bdc019717d39927d1f34cd2e17fa9fceee7e31759


Now that you have the fuctions included, modify the message to prove to yourself that the functions hold to the properties we spoke about above. For each hash function, show:
- The function is deterministic
- The function is pre-iamge resistant
- The function is collision resistant

In [None]:
# Your code here

### Q1.3:  MD5 Collision
Despite the previous question, it is widely considered to be insecure (various references exist for this, but for now we will use [this one](https://www.zdnet.com/article/a-quarter-of-major-cmss-use-outdated-md5-as-the-default-password-hashing-scheme/) and [this one](https://www.kb.cert.org/vuls/id/836068/)). 

Informally, a hashing function is **considered insecure when**: The attack method that can break the function takes less than 2<sup>n/2</sup> tries (where n is the length of the key).

Now, we will provide this. We have included two files in the `data` folder. Using the hashlib library, hash the function using the md5 function and look at the resulting hashes. Open the files in a text editor to verify they are different messages.

In [21]:
# Your Code here
def hash_file(file):
    with open(file, 'rb') as f:
        return hashlib.md5(f.read()).hexdigest()
file1 = './data/h1.ps'
file2 = './data/h2.ps'
print ("file 1: ",hash_file(file1))
print ("file 2: ",hash_file(file2))
hash_file(file1) == hash_file(file2) # Expecting to be True

file 1:  39d9fae4c5233948bf014bc52f453e93
file 2:  602de8f129e417a1e39e5ca55456b968


False

In [22]:
# If the above two files does not show collision, you can try downloading the letter_of_rec.ps and order.ps from the following resource 
# and check whether you can notice the collision
# source: http://web.archive.org/web/20071226014140/http://www.cits.rub.de/MD5Collisions/
import hashlib
def hash_file(file):
    with open(file, 'rb') as f:
        return hashlib.md5(f.read()).hexdigest()
file1 = './data/letter_of_rec.ps'
file2 = './data/order.ps'
print (hash_file(file1))
print (hash_file(file2))
hash_file(file1) == hash_file(file2) # violates collision resistance property (you also need to check whether the messages are different as well)

# You can more examples like this in the following resource
# https://crypto.stackexchange.com/questions/1434/are-there-two-known-strings-which-have-the-same-md5-hash-value

a25f7f0b29ee0b3968c860738533a4b9
a25f7f0b29ee0b3968c860738533a4b9


True

Based on the properties above, what properties are violated here? What are the security implications for this issue? How can you mitigate this issue if you were using md5 in a workplace?

*****
## Topic 2: Digital Signatures
Alex received his rental agreement the other day to sign, and he was unsure of whether the real estate agent would know if he had actually signed the document, or whether someone signed it on his behalf. He didn't have time to visit the agent physically to sign the document, so how could they communicate that he had signed it in a trusted manner?

Digital Signatures is the answer!

Recall the digital signature method:

<img src="./img/dsig.png" alt="Dig Example" style="width: 400px;"/>
<sub> Source: By FlippyFlink - https://en.wikipedia.org/wiki/File:Private_key_signing.svg, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=83220480 </sub>

<br>
This has been covered in detail in the lectures, so we will get onto implementing some signature methods

****
## Question 2: **Digital Signature**

In Question 2, you will be dealing with two methods for implementing digital signatures: RSA Digital Signatures, and Hash Functions.

### Q2.1: Digital Signature with RSA

We have already been introduced to RSA in the previous lab, so this functionality should be somewhat familar. If this is not familar, please revisit Lab10. We will be modifying the RSA code we used in the last lab to implement digital signature verification through RSA. 

Your task is as follows:
1. Modify the `encrypt()` function such that it recieves the public key pair and a string, and encodes the string into a list of numbers (each number represents a character).
2.  Modify the `decrypt()` function such that it recieves the private key pair and a list of numbers, and converts the number list into the original string.

Hint: You can use ord(char) to convert a character into number in Python. 

In [23]:

def are_relatively_prime(a, b):
    """Return ``True`` if ``a`` and ``b`` are two relatively prime numbers.

    Two numbers are relatively prime if they share no common factors,
    i.e. there is no integer (except 1) that divides both.
    """
    for n in range(2, min(a, b) + 1):
        if a % n == b % n == 0:
            return False
    return True



def generate_key_pair(p,q):

    # Calculate N = p*q and r = (p-1)(q-1)
    N = p*q 
    r = (p-1)*(q-1) 
    
    # choose a random number e such that e < N and e and R share no common factors.
    for cand in range(3, r, 2):
        if are_relatively_prime(cand, r):
            e = cand
            break
            
    # calculate number d, such that ed mod (r) = 1
    d = 0
    for cand in range(3, r, 2):
        if cand * e % r == 1:
            d = cand
            break     
    
    # return public key and private key pairs.
    public_key = (N,e)
    private_key = (N,d)
    return public_key, private_key

def encrypt(public_key, message):
    # FIX ME
    n, key = public_key
    return [pow(ord(char),key,n) for char in message]

def decrypt(private_key, message):
    # FIX ME
    # retrieve the original message
    n, key = private_key
    return  ''.join([chr(pow(char, key, n)) for char in message])



Run the below block to test!

In [25]:
def get_primes(start, stop):
    """Return a list of prime numbers in ``range(start, stop)``."""
    if start >= stop:
        return []

    primes = [2]

    for n in range(3, stop + 1, 2):
        for p in primes:
            if n % p == 0:
                break
        else:
            primes.append(n)

    while primes and primes[0] < start:
        del primes[0]

    return primes

    
def hashFunction(message):
    hashed = sha256(message.encode("UTF-8")).hexdigest()
    return hashed


primes_candidates = get_primes(1000,9999)
p = random.choice(primes_candidates)
q = random.choice(primes_candidates)
public_key, private_key = generate_key_pair(p,q)
print ('keys:', public_key, private_key)
file_path = './data/script.txt'
with open('./data/script.txt','r', encoding='utf-8') as script:
    text = ''.join(script.readlines())

text = hashFunction(text)

encrypted = encrypt(private_key,text)
decrypted = decrypt(public_key,encrypted)

if decrypted != text:
    print ('Your encryted file dose dot equal the original one')
else:
    print ('Your digital signature works perfectly!')

keys: (16575787, 3) (16575787, 11043387)
Your digital signature works perfectly!


### Q2.2: Digital Signature with Hash Functions 
While RSA is commonly used and considered (somewhat) secure, it is quite a hastle compared to using hash functions. We aim to show this now.

Do the following with our completed RSA signature functions and hash functions in hashlib:
1. create a text file and write something, save it locally.
2. compute the hash of the file, record it in some place (write down, create another file, etc)
3. generate a key pair with rsa algorithm
3. use your private key to sign the hash of the file.
4. send the original file, the public key and the signature to your friend, and let them verify your signature.

In [26]:
# Code here
# The following code is only sending across the hashed message 
# Also needs to send aross the original message to the recipient

primes_candidates = get_primes(1000,9999)
p = random.choice(primes_candidates)
q = random.choice(primes_candidates)

#3. generate a key pair with rsa algorithm
public_key, private_key = generate_key_pair(p,q)#(5, 11) #(p,q)  # generating the key pair for the sender (Alice)
print ('keys:', public_key, private_key)
file_path = './data/script.txt'

#1. create a text file and write something, save it locally.
with open('./data/script.txt','r', encoding="utf8") as script:
    text = ''.join(script.readlines())
    
#text = 'a simple message'

#2. compute the hash of the file, record it in some place (write down, create another file, etc)
hashed_text = hashFunction(text) 
print(hashed_text)

#4. use your private key to sign the hash of the file.
encrypted = encrypt(private_key,hashed_text) #sign the hashed message with your private key

#5. send the original file, the public key and the signature to your friend, and let them verify your signature.
#THIS IS DONE BY THE RECEIVER
#can your friend verify your signature with the original file, the public key and the encrypted message (signature)?
decrypted = decrypt(public_key, encrypted) # this will retrieve the hash of the file (this make sure that the sender is authentic)

# now your friend should hash the original message and see if it matches with the decrypted message (i.e. the hash of the file)
hashed_text2 = hashFunction(text)

decrypted == hashed_text2 # this would verify the signature 


keys: (13042391, 5) (13042391, 2606957)
182e01077ed37d006c2c729b312bd3d5206c727195a226541556c616315e213a


True

*****

## Homework & Extension Questions

With the upcoming exam and time for revision, there will be no extra questions for this week. If you're looking for additional information, talk with your tutors. 

*****
## Resources

All the resources have been linked within each topic, so any extra resources can be found by following the link.