In [None]:
# Initialize Otter
import otter
grader = otter.Notebook("lab03.ipynb")

In [None]:
import math
import matplotlib.pyplot as plt
import numpy as np
import string
import itertools
import re
from functools import reduce

In [None]:
primes = [1]
def gen_primes():
    D = {}
    q = 2
    
    while primes[-1] < 100000:
        if q not in D:
            primes.append(q)
            D[q * q] = [q]
        else:
            for p in D[q]:
                D.setdefault(p + q, []).append(p)
            del D[q]
        
        q += 1
gen_primes()
primes = primes[100:]

# Lab 3: RSA, Low-Exponent Attack, Diffie-Hellman
Contributions From: Ryan Cottone

Welcome to Lab 3! In this lab, we will build an RSA cryptosystem and demonstate how to break it via a low-exponent attack. We will also look at chosen-ciphertext attacks and the Diffie-Hellman cryptosystem.

## Helpers

In [None]:
def getExpansion(n,m):
    arr = []
    
    while n > 0:
        r = n % m
        n //= m
        
        arr.append(r)
    
    return arr

In [None]:
def textToInt(s):
    total = 0
    
    for i in range(len(s)):
        total += ord(s[i])*(256**i)
    
    return total

In [None]:
def intToText(n):
    expansion = getExpansion(n, 256)
    
    finalStr = ""
    
    for i in range(len(expansion)):
        finalStr += chr(expansion[i])
        
    return finalStr

In [None]:
def egcd(a, b):
    if a == 0:
        return (b, 0, 1)
    else:
        g, y, x = egcd(b % a, a)
        return (g, x - (b // a) * y, y)

def modularInverse(a, m):
    g, x, y = egcd(a, m)
    if g != 1:
        raise Exception('modular inverse does not exist')
    else:
        return x % m

## Basics of RSA

As we covered before, RSA is based on the concept of *public-key cryptography*, in which Alice and Bob (two people who want to communicate secretly) publish their public keys, which either can then use to encrypt a secret message only readable by the owner of that public key. 

Formally, Alice finds some (large) prime $N = pq$ and exponent $e$, and publishes $(N, e)$ as her **public key**. 

She also defines her private key as $d = e^{-1} \mod (p-1)(q-1)$.

For the purposes of this lab, we will be using $N=55$ and $e=3$. Normal values are much larger, for reasons you will see shortly.

**Question 1**: Complete the following function to compute the private key from p and q alongside the exponent e.

*HINT: Use our modularInverse helper function*

In [None]:
def computePrivateKey(p,q,e):
    ...

In [None]:
grader.check("q1_1")

**Question 1.2**: Implement textbook RSA encryption and decryption.

You **must** use **pow(base, exponent, modulus)** to compute $$m^e \mod N$$ as other methods are too slow.

In [None]:
def encryptRSA(message, e, N):
    ...

In [None]:
def decryptRSA(encrypted, d, N):
    ...

In [None]:
grader.check("q1_2")

# The Chinese Remainder Theorem 

The Chinese Remainder Theorem (hereafter CRT) is a very powerful theorem that asserts there exists a unique solution for a system of modular equations (modulo the product of their respective modulus).

For example, if we have

$$x \equiv 3 \mod 5$$
$$x \equiv 4 \mod 7$$

we can find a unique solution mod 35, in this case:

$$x \equiv 18 \mod 35$$

In [None]:
def CRT(pairs):
    total = 0
    N = 1
    
    for pair in pairs:
        N *= pair[1]
    
    for i in range(len(pairs)):
        a_i = pairs[i][0]
        b = pairs[i][1]
        
        
        b_i = ((N/b)*modularInverse(N//b, b))
        
        total += a_i * b_i 
    
    return math.floor(total % N)

# The Low-Exponent Attack

Using CRT, we can attack a faulty use of RSA!

Let's see what happens when we broadcast one message to three different people, each with their own public keys:

$$c_1 = m^e \mod N_1$$
$$c_2 = m^e \mod N_2$$
$$c_3 = m^e \mod N_3$$

Rearranging, we have a system of modular equations in which we can solve for $m^e$ mod $N_1 \cdot N_2 \cdot N_3$. The key comes in the fact that $e = 3$, and as such we have $m^3 \mod N_1N_2N_3$. Critically, $m < N_1, N_2, N_3$, as $m$ is taken modulo these before encrypting. Therefore, $m^3 < N_1N_2N_3$, and we can just take the cube root without worrying about whether it looped around the modulus before (a much harder problem to solve).

**Question 2.1**: Implement the low exponent attack!

In [None]:
def lowExponentAttack(ciphertexts, moduli):
    pairs = [(ciphertexts[i], moduli[i]) for i in range(len(ciphertexts))]
    
    CRT_solution = ...
    
    cuberoot = ...
    
    return round(cuberoot) 

In [None]:
grader.check("q2_1")

# Chosen-Ciphertext Attacks

Textbook RSA is particularly susceptible to a **chosen-ciphertext attack**, in which our attacker Eve can choose an arbitrary ciphertext and have it decrypted by Bob. For example, she can encrypt a message using his public key and have him tell her the result of decrypting it.

Sounds strange, since then we could just ask them to decrypt the ciphertext we wish to break. In this sense a raw chosen-ciphertext attack is not feasible in practice. We will see how a variant of it can be used to gradually reveal the full message, however, relying on the non-padded nature of RSA. 

First, let's go over how a chosen-ciphertext attack works in the basic case.

Say Alice encrypts her message as $m^e$ and sends it over to Bob. Eve intercepts this, and has an oracle that tells her the decryption of any arbitrary ciphertext via Bob's private key **except for $m^e$**. This way we can't just pass it immediately to the oracle. However, Eve can use some properties of modular arithmetic to her advantage.

Define $C = m^e$ as our ciphertext.

$$C^* = C \cdot 2^e = (2m)^e$$

We then send this $C^*$ to the oracle and receive $$(2m)^{ed} = 2m$$. From here we can simply divide by two to get our message.

This only works because the RSA ciphertext is malleable, in that we can multiply/add/etc with predictable results to the underlying message.

## A Practical Example: Tencent's QQ Browser
Sounds interesting in theory, but what about reality? Unfortunately for internet users, this type of attack is far more common than one might expect. Tencent, a Chinese technology conglomerate, owns a popular web browser named QQ Browser. Said browser was torn apart by researchers in a [2018 paper](https://arxiv.org/pdf/1802.03367.pdf) which found a shocking amount of vulnerabilities. We will take a look at the CCA2 attack in this lab.

First, let's set the stage for the attack. Whenever a user interacts with the browswer, it sends a variety of sensitive personal information to QQ Browser servers (why a browser needs its own servers is suspicious enough) using RSA encryption to exchange a 128 bit (for our example, 16 bit) AES key. You don't need to know anything about AES for now, just know that it is a very secure symmetric cipher. The device and server then use this key to send encrypted data. 

Our goal is to reveal a 16 bit key to decrypt all the message traffic. There are a few key observations about the server that make this feasible:

1. When the server receives a session request, it attempts to decrypt it using the last 128 bits of the RSA plaintext as an AES key. If the decryption results in a valid session packet, it responds with some sort of success message. Otherwise, it responds with failure (or not at all).
2. The RSA encryption has no padding.
3. We can make requests to the server on our own (without needing access to the client device). 

In [None]:
# We don't actually need to use AES for this. Dummy cipher will do 
def dummyAES(msg, key):
    return msg ^ key 

class QQServer:
    def __init__(self, privkey, N):
        self.privkey = privkey
        self.N = N
        
    def recoverAESKey(self, rsaData):
        mask = (1 << 16) - 1
                
        data = decryptRSA(rsaData, self.privkey, self.N)
                        
        data = data & mask
                
        return data
    
    def decryptSession(self, rsaData, sessionData):
        key = self.recoverAESKey(rsaData)
                
        return intToText(dummyAES(sessionData, key))[:11] == "sessiondata"        
        

Armed with this knowledge, let's take a look at what we can do on our end. Say we intercept the rsaData of some user we want to spy on.

Shifting said rsaData over by 15, we can make it so the last bit of their AES key is now the first bit of the AES key, with the rest zeroes. This is possible because of unpadded RSA and because the server only looks at the last 16 bits.

**Remember, "shifting" rsaData is not just bitshifting the raw ciphertext. You MUST use the chosen-ciphertext attack (also known as homomorphic encryption) to multiply by 2^k in order to shift by k bits.**

**To do this, you need to multiply the ciphertext by rsaEncrypt(2\*\*k, e, N)** in order to have the plaintext multiplied by 2^k.

For example,

Original RSA data: 1111 0000 1111 0101

Shifted RSA data: 0111 1000 0111 1010 **1000 0000 0000 0000**


The server will only look at the last 16 bits, and therefore will try to decrypt with the key 1000 0000 0000 0000. We as an attacker know the last 15 bits, but don't actually know the first (most significant bit). So we might set it up as such:

X000 0000 0000 0000

And try to figure out whether that X is 1 or 0. We know the server will respond with True if the key is correct or False if not, and we can encrypt a session with whatever key we want. So we try encrypting a session with the following key: 1000 0000 0000 0000. If it works, we know X=1, otherwise, X=0. 

In our example, the key works, so we know that bit is 1. We store it in the most-significant bit position and move onto our new iteration. Each time, we shift over key by 1, so we now have

X100 0000 0000 0000

and encrypt with 1100 0000 0000 0000. This will again tell us if the X is 1 or 0. We then repeat for all 16 bits.

**Question 3**: Implement a function to recover the 16 bit AES key from an intercepted RSA-encrypted session.

In [None]:
def recoverAESKey(interceptedRSA, server, e, N):
    aesKey = 0

    # Loop from a shift of 15 to a shift of 0
    for i in range(15, -1, -1):

        # Generate a session of our own
        testsession = textToInt("sessiondata blah")

        # Shift over our key by one bit to the right
        aesKey = aesKey >> 1

        # Shift the interceptedRSA by i bits to the left to make room for our key.
        # Remember, we have to transform our shift into an encrypted ciphertext first!
        # Re-read the chosen ciphertext paragraph if you are unsure how to do this.
        # HINT: Shifting by i bits is equivalent to multiplying by 2**i
        encryptedShift = ...
        
        modifiedRSA = interceptedRSA * encryptedShift

        # Encrypt our test session with our AES key + one bit in the MSB
        encryptedTestSession = dummyAES(testsession, 1<<15 | aesKey)

        # Test whether that index with a 1 works, if so, set that bit to 1 in the final key
        if server.decryptSession(modifiedRSA, encryptedTestSession):
            aesKey = ...
    
    return aesKey

In [None]:
p,q = 100043,100049
prime = p*q

e = 3

d = modularInverse(e, (p-1)*(q-1))

sampleAESkey = 0b1111000011110101

server = QQServer(d, prime)

recovered = recoverAESKey(encryptRSA(sampleAESkey, e, prime), server, e, prime)
print('Our used AES key:', bin(sampleAESkey))
print('Our recovered AES key:', bin(recovered))

We've just broken the entire AES key with 16 requests!

In [None]:
grader.check("q3")

# Diffie-Hellman Key Exchange

In the real world, RSA encryption is too slow for messages of any appreciable length. Therefore, we want use it to trade a symmetric cipher key (like the previous example). 

Instead of raw RSA, however, there is a more efficient solution in the form of Diffie-Hellman. It works as follows:

Alice and Bob wish to communicate, and Eve can spy on all their communications. Alice and Bob agree on and publish a value $g$ and modulus $N$. (There are considerable restrictions on what these need to be, but for now, let's abstract that away).

Alice chooses her private key $a$, and Bob chooses his key $b$. 

Alice then finds the value $g^a \mod N$, and Bob finds $g^b \mod N$. 

They send these values to each other over the insecure channel. Eve now has both $g^a$ and $g^b$.

Once Alice receives $g^b$, she raises it to the $a$-th power, finding $(g^b)^a \equiv g^{ab} \mod N$. Bob does the same to also find $g^{ab} \mod N$. This a number they can then use as their symmetric key.

Eve, on the other hand, only has $g^a$ and $g^b$, and no easy way to find $g^{ab}$. At best, she can multiply them to find $g^{a+b}$, but that is worthless.

This system relies on the fact that finding $a$ given $g^a \mod N$ is hard. We will explore why this is the case next lab!

**Question 4**: Implement the Diffie-Hellman Key Exchange.

In [None]:
# Compute g^a mod N given g, a, N
def computePublicMessage(generator, privatekey, N):
    ...

# Given g^x and y, find g^{xy}
def computeFinalKey(received, exponent, N):
    ...

In [None]:
grader.check("q4_1")

**Question 4.2:** Walk through the steps of the Diffie-Hellman algorithm using your new functions.

In [None]:
# Modulus (N) = p
p = 2612691

# Generator = g
g = 2 

# Alice's private key = a
a = 2553

# Bob's private key = b
b = 26511

# Find g^a mod N and g^b mod N
alice_pub = ...
bob_pub = ...


# Given g^a mod N and g^b mod N (and a,b) find g^ab for both users
alice_final = ...
bob_final = ...

print("Alice's final key:", alice_final)
print("Bob's final key:", bob_final)

In [None]:
grader.check("q4_2")

That concludes Lab 3, congratulations!

## Submission

Make sure you have run all cells in your notebook in order before running the cell below, so that all images/graphs appear in the output. The cell below will generate a zip file for you to submit. **Please save before exporting!**

Once you have generated the zip file, go to the Gradescope page for this assignment to submit.

In [None]:
# Save your notebook first, then run this cell to export your submission.
grader.export(pdf=False, run_tests=True)