# Cryptography (CC4017) -- Week 5

### Extra 5

**Q1: Collision resistant Hash Functions**

Consider $ H : M → T $ a collision resistant hash function that takes messages of any size $ m ∈ M = \{0, 1\}^∗$
and produces outputs with 64 bit length $t ∈ T = \{0, 1\}^64$.
1. $ H´ = (H(m) || H(m) || H(m)) $
2. $ H´ = H(m || m || m) $
3. $ H´ = H(64) $ 
4. $ H´ = H(m||64) $
5. $ H´ = H(m)[0 . . . 10] $ // truncate the output to 10 bits
6. $ H´ = H(m[0 . . . |m|-2]) $ // hash m without its last bit
7. $ H´ = H(m) || H(m ⊕ 1^{|m|})$
8. $ H´ = H(m) \ if \ m = 0^{64} ∧ m = 1^{64},\ H(m ⊕ 1|m| )$ otherwise

**Question:** 
Which of the proposed hash constructions H' are also collision resistant?
>
>1. It's Collision Resistant. This simply concatenates three instances of H(m)H(m). If HH is collision-resistant, finding a collision for H′H′ requires finding a collision for HH
>
>2. It's Collision Resistant. The message m is repeated three times before hashing. However, if a collision exists for H, it would still hold in this construction since the input format doesn’t fundamentally change the collision properties
>
>3. Isn't Collision Resistant. This is a constant input to the hash function H. It doesn't depend on m, so every input produces the same hash output, making it trivially non-collision-resistant.
>
>4. It's Collision Resistant. This construction appends the constant 64 to m before hashing. Doesn't do much, if H is collision-resistant the function H' keep the property.
>
>5. Isn't Collision Resistant. By truncating the output of H to 10 bits, we significantly reduce the output space, making it much easier to find collisions (by the pigeonhole principle). 
>
>6. Isn't Collision Resistant. While this might preserve some security properties, the modification of the input could introduce vulnerabilities, especially for short inputs. If two distinct messages differ only in their last bit, they would collide in H′
>
>7. It's Collision Resistant. In this m and m XORed with all ones (i.e., bitwise negation of m) are both hashed and concatenated. This construction depends on the distinctness of $H(m)$ and $H(m ⊕ 1^{∣m∣})$, which is likely robust if H is collision-resistant.
>
>8. It's Collision Resistant. This conditional hashing could introduce weaknesses if these cases overlap or behave inconsistently, but it likely remains collision-resistant if H is robust.


**Q2: Rho method to find Hash collisions**

![alt text](image.png)

As described in [1], the Rho method is an algorithm for finding collisions that, unlike the naive birthday attack, requires only a small amount of memory. To find collision in hash function H(m), it works as
follows.
1. Given a hash function with n-bit values, pick some random hash value h1 and define $h'_1 = h_1$.

2. Compute $h_2 = H(h_1)$ and $h'_2 = H(H(h'_1))$. 
In the first case, we apply the hash function once. In the second, we apply it twice.

3. Iterate the process and compute $h_{i+1} = H(h_i)$ and $h'_{i+1} = H(H(h'_i))$, until you reach a i such that $h'_{i+1} = h_{i+1}$

4. If this is the case, then you have found a loop within the possible hash values. How can we find the collision now? Check out this proof.

Complete the code in **rho_exercise.py** to do this.

• You must complete **function rho**, which is parametrized by an initial value

• Function H computes hashes truncated as necessary.

• You can adjust the global parameter during testing, but the goal is to find a collision in L = 5.

Also include a succinct analysis of how long it takes to find these collisions, both in cycle iterations and real time. How does this scale with L?

In [29]:
from cryptography.hazmat.primitives import hashes
import os
import time


L = 6  # Set output hash length in bytes

# Define a helper function for hashing, with fixed output size of LENGTH bytes
def H(X):
	digest = hashes.Hash(hashes.SHA256())
	digest.update(X)
	return (digest.finalize()[0:L])

# Function to detect a hash collision using Floyd's cycle-finding algorithm
def rho(h0):
    print("Hash is "+str(8*L)+" bits")

    # Initialize the "tortoise" and "hare" starting points
    tortoise = h0
    hare = h0

    attempts = 0

    # Iterate until a collision is found
    while True:
        attempts += 1
        begin_time = time.time()
        next_hare = H(hare)
        next_tortoise = H(H(tortoise))

        tortoise = next_tortoise
        hare = next_hare

        if tortoise == hare:
            break

    # Trace back to find the exact collision points
    if tortoise == hare:
        tortoise = h0

        while H(tortoise) != H(hare):
            attempts += 1
            tortoise = H(tortoise)
            hare = H(hare)
    
        finish_time = time.time()
        print("Collision detected:")
        print(f"First value: {tortoise.hex()}")
        print(f"Hash of first value: {H(tortoise).hex()}")
        print(f"Second value: {hare.hex()}")
        print(f"Hash of second value: {H(hare).hex()}\n")
        print(f"Time taken: {finish_time - begin_time} seconds")
        print(f"Total attempts: {attempts}")
        return (tortoise, hare)
    else:
        print("No collision found.")
        return None

# Generate a random starting point and find a collision
start = os.urandom(L)
(h0, h1) = rho(start)


Hash is 48 bits
Collision detected:
First value: e516d402a94f
Hash of first value: 0bace0db6b43
Second value: bc8651b23f20
Hash of second value: 0bace0db6b43

Time taken: 12.772124767303467 seconds
Total attempts: 8957024


>**ANALYSIS** 
>
>After some tests, for L = 5 it usually takes around between 1.8 second and 12 seconds, and between 340000 and 1950000 iterations.
>
>If we lower the L to 3, it takes significantly less. The lowest was 0.009 seconds and 2930 iterations, but mostly every one took around less then 0.1 second and 21000 iterations.
>
> With L = 6 takes a lot more time. Running just once we took 1m20s and 8957024 iterations.

**Q3: Weak ciphers**

The code in **ciphersuite_fsr.py** contains a very poorly implemented "stream cipher".

1. Consider the IND-CPA security experiment. How many calls to the encryption oracle do you have to do to succeed?

>The functions of encryption(E) and decryption(D) both rely on LFSR to generate a pseudo-random keystream that is XORed with the plaintext or ciphertext. The LFSR does not reset between calls to E or D, meaning the internal state X remains the same. 
>
>Two Points can be made from this:
>
>1.  The attacker can call enc() with a known plaintext and obtain the keystream directly, as XORing known plaintext with ciphertex reveals the keystream.
>
>2. The LFSR does not reinitialize with a unique seed for each message. The lack of per-encryption randomness makes it predictable.
>
>As result, in the excelent guess, the oracle just neeeds one call to reveal the keystream, if the attacker can't make a good guess it takes a max of 1009 calls to find the keystream.


2. Describe how one can construct an attacker against the IND-CPA experiment running this encryption scheme.

> We can exploit the encryption scheme's predictable keystream generation using the following method, we begin by generating two distinct plaintext messages, $M_0$​ and $M_1$​, ensuring they are of the same length. These messages are then submitted to the encryption oracle. Upon receiving these messages, the oracle returns the ciphertext of one of the plaintexts, encrypted using the keystream generated by an LFSR (Linear Feedback Shift Register) sequence starting from an initial state.
>
> Next step is extract the keystream. Due to the nature of the LFSR-based keystream generator, which lacks per-encryption variability or resetting, we can deduce the keystream by encrypting a known plaintext message and then XORing it with the resulting ciphertext. This operation directly reveals the keystream sequence.
>
> Finally, equipped with the extracted keystream, we can distinguish between the two original plaintexts. By XORing the oracle-provided ciphertext with the known keystream the original plaintext is recovered, thereby determining whether the ciphertext corresponds to $M_0$​ or $M_1$​. This attack succeeds because the stream cipher fails to provide unique randomness for each encryption instance.


