# Prerequisites
- Hashes

# Theory

- https://medium.com/@StevieCEllis/the-beautiful-hash-algorithm-f18d9d2b84fb - good read

**Task**
- Construct collision resistant hash functions

*Idea*
- Start from a short collision resistant function
- Build a bigger collision resistant function

## Merkle-Damgard paradigm

### The construction

![image.png](attachment:image.png)

Let 
- $h:\mathcal{X}\times \mathcal{Y} \longrightarrow \mathcal{X}$ be a compression hash function
    - The name compression is because you take a bigger input and map it to a smaller one
    - Usually $\mathcal{X} = \{0,1\}^n$ and $\mathcal{Y} = \{0,1\}^l$
- $IV$ = initial value and it's **fixed**
- $m_1, ... m_s$ = message blocks in $\mathcal{Y}$
- $t_0, ... t_1$ = chaining variables in $\mathcal{X}$
- $PB$ = padding block

**Merkle-Damgard Construction**
- Pad $M$ using a padding scheme => $M' = M || PB$
- Divide $M' \in \{0,1\}^L$ into $s$ blocks of length $l$
    - $M' = m_1 || m_2 ||...|| m_s$ where $m_i \in \{0,1\}^l$
- $t_0 = IV \in \mathcal{X}$
- for $i=1$ to $s$
    - $t_i = h(t_{i-1}, m_i)$
- return $t_s \in \mathcal{X}$

We denote the above Merkel-Damgard function derived from $h$ with $H_{MD}$ 

**Example**
- SHA256 is a Merkle-Damgard function where $l = 512$ and $n = 256$

### The padding block

Let's look at $PB$ - the padding block

Format
- $PB = 100...00 |s$
- $s$ = the number of blocks
    - Usually $s$ is on 64 bits => messages $<2^{64}$ blocks are accepted
- $1$ is used to identify the poistion where the pad ends

### Security

**Theorem**
> If $h$ is collision resistant then the Merkle-Damgard hash function $H_{MD}$ derived from $h$ is collision resistant

**Proof**
- We want to show that starting from a collision in $H_{MD}$ we find a collision in $h$
- Suppose the attacker finds  $M \neq M'$ s.t $H_{MD}(M) = H_{MD}(M')$
    - Let $M = m_1...m_u$ with $t_1,... t_u$
    - Let $M' = m'_1...m'_v$ with $t'_1,... t'_v$
- We work from the last block backwards
    - Since $H_{MD}(M) = H_{MD}(M')$ then $h(t_{u-1}, m_u) = h(t'_{v-1}, m'_v)$
    - if $t_{u-1} \neq t'_{v-1})$ or $m_u \neq m'_v$ then we have a collision on $h$ -> Finished
    - else $t_{u-1} = t'_{v-1})$ and $m_u = m'_v$ => We have the same padding blocks => $u = v$
- Since we know $t_{u-1} = t'_{u-1}$ we repeat the process
    - If we find a collision -> We are finished
- Suppose there are no collisions => $m_i = m'_i$ for $1\leq i \leq u => M = M'$ => Contradicts the fact that $M$ and $M'$ were a collision
    

## Constructing Compression functions

### Davies-Meyer

![image.png](attachment:image.png)

Since block ciphers are secure and fast constructions we are using a block cipher to construct our compression function
- Let $(E,D)$ be a block cipher defined over $(\mathcal{K}, \mathcal{X})$
- Let $h_{DM}:\mathcal{X} \times \mathcal{K} \rightarrow \mathcal{X}$ be our compression function
- $h_{DM}(x,y) = E(y,x) \oplus x$
- For 
    - $x = t_i \in \mathcal{X}$ -> the chaining variable is the input
    - $y = m_i \in \mathcal{K}$ -> the message is the key
    - $\boxed{h_{DM}(t_i,m_i) = E(m_i,t_i) \oplus t_i} \in \mathcal{X}$


Of course there are multiple ways to xor stuff, some are secure some aren't
- https://en.wikipedia.org/wiki/One-way_compression_function

**Theorem**
> If (E,D) is an *ideal* block cipher => finding a collision takes $\sqrt{|X|}$ - same as birthday attack => best as possible

### Hard problems

We can also build compression functions based on hard problems
- The security is based on the hardness of the problem => To find a collision it means finding a solution to an underlying hard problem
- Usually these functions are way slower than block ciphers and therefore are not used in practice

## SHA256

- Merkle-Damgard construction
- Davies-Meyer compression function $h$
- $h:\{0,1\}^{256} \times \{0,1\}^{512} \longrightarrow \{0,1\}^{256}$
    - SHACAL-2 block cipher
    - Splits the message into 512 bit blocks
    - $IV =6A09E667 BB67AE85 3C6EF372 A54FF53A 510E527F 9B05688C 1F83D9AB 5BE0CD19$
    - Instead of XOR, SHA256 uses adition mod $2^{32}$

# Code

## SHA256

- Merkle-Damgard construction
- Davies-Meyer compression function $h$
- $h:\{0,1\}^{256} \times \{0,1\}^{512} \longrightarrow \{0,1\}^{256}$
    - SHACAL-2 block cipher
        - https://crypto.stackexchange.com/questions/25233/shacal-2-vs-aes-as-underlying-block-cipher-for-secure-hash-aka-sha-256
    - Splits the message into 512 bit blocks
    - $IV =6A09E667 BB67AE85 3C6EF372 A54FF53A 510E527F 9B05688C 1F83D9AB 5BE0CD19$
    - Instead of XOR, SHA256 uses adition mod $2^{32}$

![image.png](attachment:image.png)

In [7]:
from Crypto.Util.number import long_to_bytes, bytes_to_long
import hashlib

In [8]:
def shr(x, n):
    return ((x & 0xffffffff) >>n) 
def rotr(x, n):
    return ((x>>n) | (x << (32 - n))) & 0xffffffff 
def ch(x, y, z):
    #Ch(x,y,z)=(x∧y)⊕(¬x∧z) 
    return z ^ (x & (y ^ z))
def maj(x, y, z):
    return (x&y) ^(x&z) ^(y&z) 

def sigma0(x):
    return rotr(x, 7) ^ rotr(x, 18) ^ shr(x, 3)
def sigma1(x):
    return rotr(x, 17) ^ rotr(x, 19) ^ shr(x, 10)
def big_sigma0(x):
     return rotr(x, 2) ^ rotr(x, 13) ^ rotr(x, 22)
def big_sigma1(x):
    return rotr(x, 6) ^ rotr(x, 11) ^ rotr(x, 25)

def split_blocks(x, block_size, block_nr):
    res = []
    mask = (1<<block_size)-1
    for i in range(block_nr):
        r = (x>>(block_size * i)) & mask
        res.insert(0, r) #last block should be last in list so we insert in front
    return res

In [9]:
x = 0x6A09E667BB67AE853C6EF372A54FF53A510E527F9B05688C1F83D9AB5BE0CD19
y = split_blocks(x, 32, 8)
print([hex(i) for i in y])

['0x6a09e667', '0xbb67ae85', '0x3c6ef372', '0xa54ff53a', '0x510e527f', '0x9b05688c', '0x1f83d9ab', '0x5be0cd19']


Notation
- `x` = an int, `xs` the int split into blocks / words
    - `t` = the chaining variable, `ts` = t split into words of 32b
    - `m` = the chaining variable, `ms` = m split into words of 32b
    - `M` = the message, `Ms` = the message split into blocks of 512b
    - `Ws` = the round keys as 64 words of 32b

In [34]:
class SHA256:
    def __init__(self):
        
        self.mask = 0xffffffff
        #self.iv = 0x6A09E667BB67AE853C6EF372A54FF53A510E527F9B05688C1F83D9AB5BE0CD19
        self.iv = [0x6a09e667, 0xbb67ae85, 0x3c6ef372, 0xa54ff53a, 0x510e527f, 0x9b05688c, 0x1f83d9ab, 0x5be0cd19]
        self.Ks = [
                   0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5, 0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5,
                   0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3, 0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174,
                   0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc, 0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da,
                   0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7, 0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967,
                   0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13, 0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85,
                   0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3, 0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070,
                   0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5, 0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3,
                   0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208, 0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2
                ]
        self.word_size = 32
        self.block_size = 512
        

    
    def hash(self, M_as_bytes):
        '''
        Takes a message M of variable block size, pads it and sends it to the Merkle-Damgard construction along with the iv
        Returns the hash as int
        '''
        M = bytes_to_long(M_as_bytes)
        
        M_pad = self.pad2(M, self.block_size)
        ts = self.merkle_damgard_construction(M_pad, self.iv)
        t = 0
        for i, ti in enumerate(ts):
            t|=( ti << (self.word_size * (7-i)))
        return t
    
    
    def merkle_damgard_construction(self, M_pad, ts_init: list):
        '''
        Input: M - Padded message ; ts - the initial 256b chaining variable as a list of 8 32b words
        Output: ts - final ts as a list of 8 32b words
        '''
        s = (len(long_to_bytes(M_pad)) * 8 // self.block_size)
        ts = ts_init
        Ms = split_blocks(M_pad, 512, s) #split the message into s blocks of 512b
        for m in Ms:
            ms = split_blocks(m, self.word_size, 16) #split the message block into 16 32b words
            ts = self.davies_meyer_function(ms, ts)
        return ts
    
    
    def davies_meyer_function(self, ms: list, ts: list):
        '''Input: ms - the message  as 16 32b words; ts - the chaining variable as 8 32b words '''
        t_next = []
        es = self.encrypt_block(ms, ts)
        for ei, ti in zip(es, ts):
            t_next.append((ei + ti) & 0xffffffff) #32b addition
        return t_next

    
    def get_round_keys(self, ks: list):
        Ws =[0] * 64
        #copy the first 15
        for i, ki in enumerate(ks[:16]):
            Wi = ki
            Ws[i] = Wi
        #extent the rest
        for i in range(16, 64):
            Wi = (sigma1(Ws[i-2]) + Ws[i-7] + sigma0(Ws[i-15]) + Ws[i-16]) & self.mask
            Ws[i] = Wi

        return Ws
        
    
    def encrypt_block(self, ks: list, ts: list):
        '''The block cipher'''
        Ws = self.get_round_keys(ks)
        
        a0, b0, c0, d0, e0, f0, g0, h0 = ts
        ai, bi, ci, di, ei, fi, gi, hi = a0, b0, c0, d0, e0, f0, g0, h0
        for i in range(64):
            T1 = (hi + big_sigma1(ei) + ch(ei, fi, gi) + self.Ks[i] + Ws[i]) & 0xffffffff
            T2 = big_sigma0(ai) + maj(ai, bi, ci)
            ai, bi, ci, di, ei, fi, gi, hi = (T1+T2) & 0xffffffff, ai, bi, ci, (di+T1) & 0xffffffff, ei, fi, gi
        
        #res = ai << (32 * 7) | bi << (32 * 6) | ci << (32 * 5) | di << (32 * 4) | ei << (32 * 3) | fi << (32 * 2) | gi << (32 * 1) | hi
        #return res
        return ai, bi, ci, di, ei, fi, gi, hi
        
    def pad2(self, M, block_size):
        #long_to_bytes(bytes_to_long(''.encode())) has length 1 instead of 0
        if M != 0:
            #pad to a multiple of 8
            M_bits = M.bit_length()
            if M_bits % 8:
                M_bits = M_bits + 8 - (M_bits % 8) 
            #equivalent to this
            #M_bits= len(long_to_bytes(M)) * 8 #this is better for shorter messages
        else:
            M_bits = 0
        
        res = (M << 1) | 1 
        last_block_space = block_size - M_bits % block_size - 1
        #make space for the counter if necessary
        if last_block_space < 64:
            last_block_space+=512
        res = (res<<last_block_space) | M_bits # add the bit_length
        return res

In [35]:
sha256 = SHA256()
msg = b'hello world'

In [36]:
h = sha256.hash(msg)

In [37]:
h, int(hashlib.sha256(b'hello world').hexdigest(), 16)

(83814198383102558219731078260892729932246618004265700685467928187377105751529,
 83814198383102558219731078260892729932246618004265700685467928187377105751529)

In [85]:
m1 = 'Lorem ipsum dolor sit amet, consectetur adipiscing elit.'
test_list = [m1, m1 * 7, '', 'hello⊕', u'\uE52D', m1 * 999]
for m in test_list:
    to_enc = m.encode()
    h1 = sha256.hash(to_enc)
    h2 = int(hashlib.sha256(to_enc).hexdigest(), 16)
    print(h1==h2)

True
True
True
True
True
True


# Resources

- https://en.wikipedia.org/wiki/SHA-2
- https://www.youtube.com/watch?v=DMtFhACPnTY&t