# Cryptography (CC4017) -- Week 3

## EXTRA 3

### **Q1: Weak Security**


Unpredictability of key generation is a central requirement to the security of an encryption scheme.
If the key can be efficiently guessed, then no encryption scheme can ever be shown to be IND-CPA
secure, as any adversary can simply enumerate the possible keys and test for decryptions.

The code **ciphersuite_aesnotrand.py** is encrypting a block message using a very weak key. Check
it out to understand what it is doing wrong.

**QUESTION P1:**

Program **q1.py** produces weak_ciphertexts. Suppose you know that the encrypted
message was “Attack at Dawn!!’ ’. Extend that program to read the file and guess the key used for that
encryption

> The problem with ciphersuite is taht the offset is too low. 
> As we can see the offset is set for a value between 1 and 3.
> >
> > offset = sysrand.randint(1,3)
> >
> 
> That exposes the code inside ciphersuite to a brute force attack
> 
> $
> offset = 1 : 2^8 $      values
> 
> 
> $
> offset = 2 : (2^8)^2 $  values
> 
> $
> offset = 3 : (2^8)^3 $  values
> 
> All those possibilities added, will give 16 843 008 possibilities of possible keys to be tested.
> 
> The other obvious problem is the use of ECB Mode.

In [2]:
import ciphersuite_aesnotrand as ciphersuite
from binascii import hexlify, unhexlify
import time

key = ciphersuite.gen()
msg = 'Attack at dawn!!'
cph = ciphersuite.enc(key, bytearray(msg,'ascii'))

f = open("weak_ciphertext", "wb")
f.write(cph)
f.close()

plaintext_msg = b'Attack at dawn!!'
offset_value = 3

with open("weak_ciphertext", "rb") as encrypted_file:
    start_time = time.time()
    encrypted_data = encrypted_file.read()

    possible_keys_count = sum((2**8)**j for j in range(offset_value + 1))
    attempted_keys = set()

    print(f"Attempting {possible_keys_count} possible keys...")

    for attempt in range(possible_keys_count):
        print(f"Attempt {attempt + 1}...")

        # Generate a candidate key
        candidate_key = ciphersuite.gen()

        # Ensure uniqueness of each candidate key
        while candidate_key in attempted_keys:
            candidate_key = ciphersuite.gen()

        attempted_keys.add(candidate_key)

        # Decrypt the ciphertext with the candidate key
        decrypted_msg = ciphersuite.dec(candidate_key, encrypted_data)

        # Check if the decrypted data matches the plaintext message
        if decrypted_msg == plaintext_msg:
            print(decrypted_msg)
            print("Decryption successful!")
            print(f"Key found: {candidate_key}")
            print(f"Time taken: {time.time() - start_time} seconds; Attempts: {attempt + 1}")
            break


Attempting 16843009 possible keys...
Attempt 1...
Attempt 2...
Attempt 3...
Attempt 4...
Attempt 5...
Attempt 6...
Attempt 7...
Attempt 8...
Attempt 9...
Attempt 10...
Attempt 11...
Attempt 12...
Attempt 13...
Attempt 14...
Attempt 15...
Attempt 16...
Attempt 17...
Attempt 18...
Attempt 19...
Attempt 20...
Attempt 21...
Attempt 22...
Attempt 23...
Attempt 24...
Attempt 25...
Attempt 26...
Attempt 27...
Attempt 28...
Attempt 29...
Attempt 30...
Attempt 31...
Attempt 32...
Attempt 33...
Attempt 34...
Attempt 35...
Attempt 36...
Attempt 37...
Attempt 38...
Attempt 39...
Attempt 40...
Attempt 41...
Attempt 42...
Attempt 43...
Attempt 44...
Attempt 45...
Attempt 46...
Attempt 47...
Attempt 48...
Attempt 49...
Attempt 50...
Attempt 51...
Attempt 52...
Attempt 53...
Attempt 54...
Attempt 55...
Attempt 56...
Attempt 57...
Attempt 58...
Attempt 59...
Attempt 60...
Attempt 61...
Attempt 62...
Attempt 63...
Attempt 64...
Attempt 65...
Attempt 66...
Attempt 67...
Attempt 68...
Attempt 69...
Attemp

**QUESTION P2:**
Increase the size of the offset in the ciphersuite. How large must it be for your machine to be unable to test it in 3 hours?


>To tell which offset would make a brute-force attack unable in 3 hours. We need to calculate the time it would take to run all million possibilities by altering the conditional statament to never match our target message byte string 
>
>$$ 16843009 \  possibilities  \  took   \ 0.0268  \ seconds $$
>
>We can assume offset=4, brute-forcint would take, at max, about 
>$$ 4 294 967 296 * 0.0268 \ seconds =  31,972 \ hours $$
>
>So, offset=4 would take my machine way longer than 3 hours to test all possibilities of possible combinations of 4 bytes, not even considering adding the other possible inferior offsets.

### **Q2: Fixed Initialization Vectors**

![alt text](image.png)

Figure 1 depicts the Chaining Block Mode (CBC). One key characteristic of AES-CBC (AES as block
encryption, used in combination of CBC) is that it requires for initialization vectors to be unique and
unpredictable

Recall the IND-CPA experiment:

- Challenger generates a secret key k and a random bit b
- Attacker can send m and receive Enc(k, m) – has access to an encryption oracle
- Attacker provides ($m_0$ , $m_1$) such that $|m_0| = |m_1|$ and receives $Enc(k, m_b)$.
- Attacker guesses b′
- Attacker is victorious if b = b′ .

Scheme is broken if this occurs with non-negligible probability $\frac{1}{2}$

**Question:**

Suppose our encryption scheme is AES-CBC using a fixed IV. Construct an attack against the IND-CPA security experiment of this scheme, i.e. write an algorithm for our adversary to beat the IND-CPA security experiment, namely:
- What are the queries performed to the encryption oracle
- What are the messages produced as $m_0$ , $m_1$
- How b is decided

> **Attack Outline:**
> 
>To break IND-CPA that uses AES-CBC with a fixed IV, an attacker can exploit the determinism in the first ciphertext block.
> 
>**STEPS:**
> 
> 1. **Oracle Query:** The attacker has acess to the encryption oracle, so they can provide any two plaintext messages $m_0$ and $m_1$ such that  the length of $m_0$ is equal to the length of $m_1$.
> 
> 2. **Choosing Plaintexts:** Select $m_0$ and $m_1$ with the same first block (e.g Same First 128-bit block if using AES-128). For simplicty, let:
>
>    - $ m_0 = p_0,p_1,... \ with \ p_0 \ as \ the \ first \ block $
>    - $ m_1 = p_0',p_1',... \ where \ p_0' = p_0 $
>    
> 3. **Observing Ciphertext:** Request the encryption of $m_b$, where b is either 0 or 1, and observe the resulting ciphertext $c_0,c_1,...$
> 
> 4. **Attack Decision:** Since the IV is fixed, if the first ciphertext block $c_0$ for both messages $m_0$ and $m_1$ are identical due to the  same first block $p_0$, the attacker can guess b = 0; otherwise, b = 1.

### **Q3: Predictable Initialization Vectors**

Nonce-based encryption schemes are encryption schemes that take the nonce as a parameter.
Enc(k, m, n) takes key k, message m and nonce n. These are secure, as long as no nonce is ever used
twice. E.g. AES-CTR is a nonce-based encryption scheme.

Consider the following encryption scheme:
- Use the block encryption function (with the same key) on the nonce to generate an IV ← E(k, n)
- Compute the encryption of the message using AES-CBC with that IV

Observe that this prevents trivial attacks, such as setting the IV to 0 – as it is encrypted – and also disallows fixing the IV – as the same nonce cannot be reused. However, the IV is predictable, and that can lead to an attack.

**QUESTION P1:** 

Construct an attack against the nonce-based IND-CPA security experiment of this scheme (Nonce-based IND-CPA is exactly the same as IND-CPA, but repeated nonces are disallowed.)

**Hint:** Consider encrypting $0^l$ with nonce $0^l$ . How can I request a correlated encryption that can help me break the indistinguishability of the cipher?

>**Attack Strategy:**
>
>To break the IND-CPA security, we can exploit the predictability of the IV. The attack steps are as follows:
>
>1. **Choose the Nonce and Message Pair:**
>
>   - To exploit the predictability of the IV, we can set both the nonce n and the message m to $0^l$ ( a sequence of zero bits of length $l$).
>
>   - Let $m_0 = 0^l$ and $n_0 = 0^l$. Since the scheme is nonce-based, we can control or predict the nonce used.
>
>2. **Calculate the IV:**
>
>   - Given $n_0 = 0^l$, the scheme generates the IV by encrypting $n_0$ with the key $k \ :$
>     $$
>     IV = E(k, 0^l)
>     $$
>   - This IV is now predictable for any encryption using $ n_0 = 0^l $ as the nonce.
>
>3. **Choose Two Distinct Messages for the IND-CPA Game:**
>
>   - In an IND-CPA experiment, we choose two distinct messages, $ m_0 $ and $ m_1 $ , such that we can observe how their encryptions behave under a predictable IV.
>
>   - Set $ m_0 = 0^l $  and $ m_1 $ to a different known value, say $ m_1 = 1^l $ (a sequence of ones of length $ l $ ).
>
>4. **Observe the Encryption of Each Message:**
>
>   - When encrypting $ m_0 = 0^l $ with $ n_0 = 0^l $:
>
>     - Since $ m_0 $ is a sequence of zeros, the CBC mode encryption with a predictable IV $ E(k, 0^l) $ results in a ciphertext $ c_0 $ that directly reflects the encryption of the IV itself (because XORing $ m_0 $ with IV leaves IV unchanged).
>     - Specifically:
>       $
>       c_0 = E(k, E(k, 0^l))
>       $
>       
>   - When encrypting $ m_1 = 1^l $ with the same nonce $ n_0 = 0^l $, the ciphertext will differ due to the XOR operation with the predictable IV. Let’s denote this ciphertext as $ c_1 $.
>
>5. **Distinguish Between the Ciphertexts:**
>
>   - In the IND-CPA game, the adversary submits $ m_0 = 0^l $ and $ m_1 = 1^l $ and receives the encryption of one of them (either $ c_0 $ or $ c_1 $).
>
>   - Due to the predictable IV, the structure of $ c_0 $ and $ c_1 $ will be distinct. Since $ c_0 $ will look like an encryption of just the IV itself (due to the zero message), while $ c_1 $ will look like an encryption that incorporates the message $ 1^l $, the attacker can distinguish which ciphertext corresponds to which message
>   
>   - By observing whether the ciphertext resembles the pattern of $ E(k, E(k, 0^l)) $ (for $ m_0 = 0^l $) or a modified version due to XOR with $ 1^l $, the attacker can determine which message was encrypted.
>

**Question P2:** 

Write a program that prints the messages/ciphertexts used in this attack, and that shows this IND-breaking correlation.

In [None]:
import os
from cryptography.hazmat.primitives.ciphers import Cipher, algorithms, modes

# Encryption key and IVs
key = b'\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1A\x1B\x1C\x1D\x1E\x1F'
iv_old = b'\x01\x02\x03\x04\x05\x06\x07\x08\x09\x10\x11\x12\x13\x14\x15\x16'
iv_new = b'\x16\x15\x14\x13\x12\x11\x10\x09\x08\x07\x06\x05\x04\x03\x02\x01'

# Message to encrypt
message = b'\x00' * 16

def xor_bytes(*args):
    """Returns the XOR of multiple byte sequences of the same length."""
    return bytes(x ^ y for x, y in zip(*args))

def aes_encrypt(key, iv, message):
    """Encrypts the message using AES in CBC mode."""
    cipher = Cipher(algorithms.AES(key), modes.CBC(iv))
    encryptor = cipher.encryptor()
    return encryptor.update(message) + encryptor.finalize()

def main():
    print("Starting a predictable vector attack...")

    # Encrypt with iv_old and the original message
    ciphertext_0 = aes_encrypt(key, iv_old, message)
    print(f"The ciphertext with IV_old and message = 0^l is:\n C0: {ciphertext_0}\n")

    # Craft new message as IV_new XOR IV_old XOR 0^l
    crafted_message = xor_bytes(iv_new, iv_old, message)
    ciphertext_1 = aes_encrypt(key, iv_new, crafted_message)
    print(f"The ciphertext with IV_new and message = IV_new XOR IV_old XOR 0^l is:\n C1: {ciphertext_1}\n")

    print("Produced the exact ciphertext the indistinguishability property has been broken!")

if __name__ == '__main__':
    main()

### **Q4: Padding Attacks**

Encryption schemes such as AES-CBC can encrypt messages of varying size, by dividing the input message into chunks of size b, where b is the block size. However, it is common for messages to not be multiples of b, and for these cases one can use padding.

Let k denote the next multiple of b for the message m. PKCS#7 padding entails filling the last $k − | M |$
bytes with value $k − | M |$, e.g
- 0x01 means 1 byte of padding added with this value
- 0x03 means 3 bytes of padding added with this value

**Question P1:**

Consider a message that is already of size multiple of b. Why is it necessary to add padding?

>Padding is a technique that allows you to encrypt a message of any length, even one smaller than a single block. Padding is used to expand a message to fill a complete block by adding extra bytes to the plaintext. 
>
>When a message is already a multiple of the block size b, padding is still necessary because we can't distinguish between cases where padding was added and where it wasn't. In the PKCS#7 padding scheme, padding bytes are added, even if the message exactly fits the block size. This ensures that, upon decryption, the padding can be identified and removed correctly.
>
>Adding padding, regardless of the original message length, maintains a consistent structure where the last byte indicates the number of padding bytes. This way, the receiver can unambiguously remove the padding, preserving the integrity of the original message.

**Question - P2:**

Consider an AES-CBC encryption scheme that, upon decryption, produces an error whenever a padding error occurs, i.e. if the decrypted message does not follow PKCS#7 padding.

How can an adversary that is given a ciphertext use a decryption oracle to extract information about the original message?

Hint: Consider how AES-CBC decrypts messages. How can we provoke alterations on the last block, where padding must be observed?

> In CBC (Cipher Block Chaining) mode, each ciphertext block is influenced by the plaintext block XORed with the previous ciphertext block before encryption. During decryption, each plaintext block is reconstructed by XORing the decrypted ciphertext block with the previous ciphertext block. This dependency between blocks is central to the attack, as manipulating one block affects the decryption of the following block.
>
>The attacker goal is to manipulate ciphertext blocks to see if they produce valid padding upon decryption. Specifically, the attacker focuses on the last block in the ciphertext, where padding should be present. By altering the preceding block, the attacker can cause changes in the decrypted value of this last block, revealing whether it aligns with PKCS#7 padding rules.
>
> The attacker makes deliberate modifications to the second-to-last ciphertext block to impact the padding in the last block. By manipulating the second-to-last block​, the attacker alters the decrypted output of the last block when passed through the decryption oracle. Each modification that results in valid padding (i.e., no error response) reveals partial information about the underlying plaintext of the last block due to the way XOR operations propagate through CBC mode.
>
> With successive guesses, the attacker can, byte-by-byte, deduce the plaintext of last block​. After reconstructing, the process can be repeated on earlier blocks, allowing the attacker to incrementally decrypt the entire message block-by-block, exploiting the padding oracle's feedback at each step.