In [None]:
# Initialize Otter
import otter
grader = otter.Notebook("lab05.ipynb")

# Lab 5: Hashing and Length-Extension Attacks
Contributions From: Ryan Cottone

Welcome to Lab 5! In this lab, you will construct a basic Merkle-Damgard hash function and implement a length-extension attack.

In [None]:
%%capture
import sys
!{sys.executable} -m pip install pycrypto

In [None]:
from Crypto.Cipher import AES

## Hash Functions

A **hash function**, denoted $H(x)$, is a deterministic function taking in some arbitrary amount of data, and outputting a fixed amount of data that appears random. This is very useful as a "tag" of data, to condense a bunch of data down into a tag that (almost) uniquely identifies it.

They are often used in passwords, in which websites store a hash of your password rather than the actual text. When a user tries to login, they compare $H($submitted_password$)$ to their database entry to see if they match up.

"Attacks" on hash functions usually involve finding a *collision*, that is, two values $m_1$ and $m_2$ such that $m_1 \neq m_2$ and $H(m_1) = H(m_2)$. Say an attacker had the hash of your password, $H(p)$, and was able to find some value $k$ such that $H(k) = H(p)$. They could submit your username and the password $k$ to the server, which would then be accepted as the correct password!

We will see how often these collisions can occur later in the lab. For now, we will focus on building our very own hash function from scratch.

# Merkle-Damgard Construction

The Merkle-Damgard Construction is a general class of hash functions that operation in a "block" mode. We first split our data into equal-size blocks of data (in our case, 256 bits), and then perform operations block-wise. 

For a Merkle-Damgard hash, we need a *compression function* $f(v, m)$ that takes in two equal-size inputs and produces an output of the same size (compressing 2 256 bit inputs into 1 256 bit output). At each step, we compress the last result with the newest block of data.

Consider the internal state of the hash function as $h_i$. To derive $h_{i}$, we find $f(h_{i-1}, m_i)$, where $m_i$ is the $i$-th block of the input data. We then repeat this, continually compressing input data against the last result of our compression function. Once we run out of input data, we output of our final $h$.

![test](https://www.researchgate.net/publication/322094216/figure/fig3/AS:608783873105920@1522156794791/Merkle-Damgard-construction-A-compression-function-accepts-two-inputs-a-chaining.png)

The initial $h_0$ is called the Initialization Vector (IV), and is some fixed, public value. 

**Question 1**: Implement a function to split the input data into blocks, padding with zeroes if needed.

In [None]:
BLOCK_SIZE = 16

def processInput(data):
    # We want to turn data into some list of 128-bit blocks in binary string form
    data = data.replace(" ", "")
    byteData = bytes(data, 'UTF-8')    
    
    if (len(byteData) % BLOCK_SIZE != 0):
        byteData += (BLOCK_SIZE - (len(byteData) % BLOCK_SIZE)) * b'0'
    
    blocks = []
    i = 0
    
    while i < len(byteData):
        ...
        ...
    
    return blocks

In [None]:
grader.check("q1_1")

Here is a dummy compression function we will use just for demonstration:

In [None]:
def bxor(b1, b2): # use xor for bytes
    result = bytearray()
    for b1, b2 in zip(b1, b2):
        result.append(b1 ^ b2)
    return result

# Returns E(a,b) for bytestrings a,b
def encrypt(key,plaintext):
    if len(key) != 16 or len(plaintext) != 16:
        raise AssertionError("Key and plaintext must be 16 bytes long")
    
    cipher = AES.new(key, AES.MODE_ECB)
    
    return cipher.encrypt(plaintext)

def compress(state,data):
    return bytes(bxor(state, encrypt(data, state)))

**Question 1.2**: Implement a Merkle-Damgard hash function!

In [None]:
def merkleDamgardHash(data):
    
    # Split data into blocks
    dataBlocks = ...
    
    IV = b'DEADBEEF' * 2
    
    h = IV    
    
    for i in range(len(dataBlocks)):
        h = ...
        
    return h.hex()

In [None]:
grader.check("q1_2")

## Length-Extension Attacks

Merkle-Damgard based hash functions are often susceptible to *length-extension attacks*, in which an attacker can take an existing hash $H(x)$ and compute $H(x||n)$ for some arbitrary data $n$, without knowing $x$. This is because the output of our Merkle-Damgard hash is just the final internal state, not some output separable from the state. We can just pick up where we left off and treat $H(x)$ as $h_k$, and then proceed to compute the hash function like usual with new data blocks ($h_{k+1} = f(h_k, n_1)$, etc).

A problem occurs with padding, however, if the original message was not some multiple of 128 bits. While there are ways to get around this, for now, we will use a 'convienent' original message thats 128 bits long.

In [None]:
def lengthExtensionAttack(existingHash, extraData):
    # Compute H(x || extraData) given H(x)
    
    dataBlocks = processInput(extraData) 
        
    existingHash = bytes.fromhex(existingHash)

    h = ...
    
    for i in range(len(dataBlocks)):
        h = ...
    
    return h.hex()

In [None]:
grader.check("q2_1")

Now that we have a length extension attack setup, let's try and use it to forge a digital signature! While we will cover signature in more detail next week, let's assume for now that Alice has sent her bank a request like m = "send twelve thousand dollars to alice" alongside a **signature** $H(k||m)$. For the purposes of our example, $H(k||m)$ uses some secret $k$, so one must know $k$ to figure out $H(k||m)$. 

We as the attacker don't know $k$, but want to add the data " and bob" to the end of the signature so we can pass off our new malicious message as legitimate.

Their old signature hash was **8b9eca2ac997048e6687f8d8887482e1**. Figure out a way to make a new valid signature on the message we want!

**Question 2.2**: Figure out the signature for the message "send twelve thousand dollars to alice and bob"

In [None]:
oldsignature = '8b9eca2ac997048e6687f8d8887482e1'

newsignature = ...

print('Our new signature is:', newsignature)

In [None]:
grader.check("q2_2")

Congrats on finishing Lab 5!

## Submission

Make sure you have run all cells in your notebook in order before running the cell below, so that all images/graphs appear in the output. The cell below will generate a zip file for you to submit. **Please save before exporting!**

Once you have generated the zip file, go to the Gradescope page for this assignment to submit.

In [None]:
# Save your notebook first, then run this cell to export your submission.
grader.export(pdf=False, run_tests=True)