# Lecture 9.2 - cryptography: the RSA protocol in action

## Summary 

### Programming

- Preliminary computational results/reminders for use in the RSA Protocol
    - Modular Arithmetic in python
    - The `pow` function
    - Computing the Totient function
    - Computing the modular inverse 
- Implementing the RSA protocol: key generation, encryption and decryption

### Mathematics 

- Preliminary mathematical results/reminders for use in the RSA Protocol
    - The Euler-Fermat Theorem
    - Multiplicative Functions
    - The totient function
    - Modular Roots
- The RSA protocol 


# Extra resources

You will need functions belonging to the modules contained in the files
- `number_theory_lecture_functions.py`,
- `conversion_functions.py`, 
- `miller_rabin.py`. 

So you should make sure to download/upload these files into your present working directory. 


    
### The `is_prime` function



An important function that we will import from  the `miller_rabin` module and use below is the function  `is_prime`. This is a fast prime testing function (that replaces `isprime_basic` developed in Lecture 8.1) designed using probabilistic methods. 

<div class="alert alert-warning">

The development of `is_prime` is carried out  in **Bonus Lecture 9.3**. Note that this Bonus Lecture is for enthusiasts: it is entirely optional and you do not need to study it. 
</div>
    
However we recommend that you cast an eye over the definition of the  `is_prime` function by opening/inspecting  the python (text) file `miller_rabin.py` via the Jupyter Dashboard.  


## Preliminary results for use in the RSA Protocol

We begin with a reminder of a few number theoretic results that will useful in our development of the RSA protocol. We also test our findings! 

### Modular Arithmetic in python

<div class="alert alert-block alert-info">
This is a reminder: we have already implemented modular arithmetic in earlier lectures.
</div>

Remember that `%` is the remainder operator in python. 
Also remember  that if $r$ is the remainder of $a$ when divided by $b$, then $a \equiv r \; \mbox{modulo}\; b$, i.e$.$ $0 \le r < b$ is the *natural representative* of $a$ working modulo $b$. 

**Example.** What is $279$ modulo $43$?

In [None]:
279 % 43         # We have 279 = 6 * 43 + 21 so expect 21 to be output

So $279 \equiv 21$ modulo $43$, i.e. $21$ is the natural representative of $279$ working modulo $43$. But going further, you also saw in the first year that the modulo operator distributes over the arithmetic operations $+$, $\times$ etc.  

**Example.**  Suppose we want to compute $17^{212}$ modulo $43$. Then we can do this via brute force by firstly computing a huge number...

In [None]:
17**212         # Well yes, this is rather large

and then apply the modulo operator. 

In [None]:
17 ** 212 % 43

In other words a lot of work to extract a 2 digit number... or instead use that fact that the modulo operator distributes over $\times$ and so achieve the same outcome by repeatedly multiplying by $17$ and computing the result modulo $43$ at each step. 

In [None]:
product = 1 
for i in range(212):             #  We repeat the next line 212 times 
    product = product * 17 % 43  #  Apply mod at each stage 
product    

###  The `pow` function

OK. That's good to know. But there is better news: python has a built in function `pow` for both non-modular and modular exponentiation. 

In [None]:
pow(17,212)       # The same as 17**212

In [None]:
pow(17,212,43)    # The same as 17**212 % 43 

(We will use `pow` for modular exponentiation when we implement the RSA protocol below.) 

### The Euler-Fermat Theorem 
<div class="alert alert-block alert-info">
This is a reminder from Lecture 8.2.
</div>

For $N \in \mathbb{N}$ we define  $\phi : \mathbb{N} \rightarrow \mathbb{N}$ to be the function such that $\phi(N)$ is the number of elements of $\{0,\dots,N-1\}$ that are coprime to $N$. We say that $\phi(N)$ is the **totient** of $N$ and that $\phi$ is the (or Euler's) **totient function**. 

**The Euler-Fermat Theorem** states that, if $m$ and $N$ are  positive integers such that  $\mathrm{gcd}(m,N) = 1$, then

$$
m^{\phi(N)} \,\equiv\, 1 \; (\mathrm{mod}\; N) \,.
$$

**Note.** Fermat's Little Theorem is a special case of the Euler-Fermat Theorem. 

### Multiplicative functions

<div class="alert alert-block alert-info">
This is a reminder from Lecture 8.2.
</div>

Saying that an arithmetic function $f$ is a **multiplicative function**  means that **(i)** $f(1) = 1$ and **(ii)** for any positive integers $a$ and $b$, if $\mathrm{gcd}(a,b) = 1$, then $f(a \cdot b) = f(a) \cdot f(b)$. 

### Fast Method for computing the $\phi(N)$

It's easy to write a "brute force" algorithm  to compute $\phi(N)$ which simply counts all the numbers $0 \le i < N$ such that $gcd(i,N) = 1$.   Note however that any such algorithm is **VERY SLOW**. Nevertheless we can show the following. 

**Lemma 9.1.** Suppose that the prime decomposition of $N$ is $\prod_{i=1}^k p_i^{e_i}$ (i.e. that $N = \prod_{i=1}^k p_i^{e_i}$). Then

$$
\phi(N) \;=\; \prod_{i=1}^k \phi\left(p_i^{e_i}\right) 
        \;=\; \prod_{i=1}^k p_i^{e_i - 1}(p_i - 1) \,.  \tag{1}
$$

**Proof.**  This follows from the fact that the totient function is multiplicative and that, for prime $p$, 
$\phi(p^e) = p^e - p^{e-1} = p^{e-1}(p-1)$. See Lecture 8.2. 

**Corollary 9.2.** Suppose  $N = p \cdot q$ for some prime numbers $p, q$. Then 

$$ 
\phi(N) \;=\; \phi(p) \cdot \phi(q) \;=\; (p - 1)\cdot(q - 1) \,. \tag{2}
$$
</div>

**Proof.** This follows from Lemma 9.1. However, taking into account that $\phi$ is multiplicative, we can also derive this directly from the fact that, given prime $p$, $\phi(p) = p - 1$, as $\phi(p)$ simply counts the  set of non-zero numbers in $\{0,\dots,p-1\}$, i.e. the set $\{1,\dots,p-1\}$. 

**Remark.** The point of this is that, if I (for decryption purposes) know the prime decomposition of $N$ then I have a very fast method of computing $\phi(N)$. However if you (the potential hacker) don't have this information then the best that you can do is to compute $\phi(N)$ by some form of brute force algorithm... and this is very slow.



### Modular Roots 

<div class="alert alert-block alert-info">
This is a reminder from Lecture 8.3.
</div>


Suppose that $e,t$ are positive integers such that $\mathrm{gcd}(e,t) = 1$. Then there exists positive integer $f$ such that $e \cdot f \equiv 1 \;(\mathrm{mod}\; t)$. We call $f$ the **multiplicative inverse** of $e$ modulo $t$. 

**Lemma 9.3.**  If $\mathrm{gcd}(m,N) = 1$, $\mathrm{gcd}(e,\phi(N)) = 1$, and $f$ is the multiplicative inverse of $e$ modulo $\phi(N)$ $-$ i.e. $e \cdot f \equiv 1\;(\mathrm{mod} \;\phi(N))$ $-$ then 

$$                                                                                             
m^{e \cdot f} \equiv m \;(\mathrm{mod}\; N) \,.                                               
$$ 




<font color="blue"><b>Important Fact 9.4.</b></font> Suppose we have  $m$, $e$, and $N$ as stated in Lemma 9.3 with $m < N$ and such that we know the value of $e$ and $N$ but NOT the value of $m$. Suppose also that we receive the integer $m^e \;(\mathrm{mod}\; N)$. Then it suffices that we compute $f$ - the multiplicative inverse of $e$ modulo $\phi(N)$ - in order to retrieve $m$ via the computation  

$$
\left(m^e\right)^f \;(\mathrm{mod}\; N) = m \,. \tag{3}
$$
</div>

### Computing the Modular Inverse

Now we need a function that computes the modular inverse. But we already developed a function `modular_inverse` in Lecture 8.1 which, on input `(a,n)` such that `a` and `n` are coprime positive integers, returns the inverse of `a` modulo `n`. This function has been added to the module `number_theory_lecture_functions` and so we simply import it here. 

**Note.** For completeness we also show how to develop the function `modular_inverse` in the Appendix below.   

In [None]:
from number_theory_lecture_functions import modular_inverse

Let's try this. 

In [None]:
modular_inverse(16,37)

### Testing <font color="blue">Important Fact 9.4</font> in python (in readiness for the RSA Protocol)

Now suppose that $N = p \cdot q$ with $p$, $q$ prime (as this is the case that we will use). Let's randomly test whether we can retrieve $m$ from $m^e$ modulo $N$ in the manner described in (3), under the assumption that $m < N$ and that the first two assumptions of Lemma 9.3. are satisfied. 

**Note 1.** Having generated the two primes $p$, $q$ whose product forms $N$, our  test consists of a `while` loop that only processes the random numbers generated once the first two assumptions of Lemma 9.3 are satisfied $-$ i.e. in the case when $\mathrm{gcd}(m,N) = 1$ and $\mathrm{gcd}(e,\phi(N)) = 1$. (Satisfaction of the second condition means that we can compute $f$ such that  $e \cdot f \equiv 1\;(\mathrm{mod} \;\phi(N))$). 

**Note 2.** Since $N = p \cdot q$, we know by Corollary 9.2 that 

$$ 
\phi(N) 
\;=\; (p - 1)\cdot(q - 1) 
\;=\; p \cdot q - (p + q) + 1
\;=\; N - (p + q) + 1 \,.
$$ 

We use this below.

**Remark.** In the test below with $N = p \cdot q$, with $p$ and $q$ both prime and $m$ a positive integer with a  number of digits less than either that of  $p$ or of  $q$ the test `gcd(m,N) == 1` is trivially satisfied. 

In [None]:
# MAKE SURE THAT THE FILES number_theory_lecture_functions.py AND miller_rabin.py ARE IN YOUR
# CURRENT WORKING FOLDER TO ALLOW US TO IMPORT THE FUNCTIONS gcd, modular_inverse AND is_prime
from random import randint
from number_theory_lecture_functions import gcd, modular_inverse  
from miller_rabin import is_prime

# We start by generating two prime numbers p, q
primes = []
while len(primes) < 2: 
    n = randint(1000,9999)      # A random 4-digit number
    if is_prime(n):
        primes.append(n)

[p, q] = primes                 # For clarity 

not_found = True
while not_found:
    N = p * q                   # N is a 7 or 8-digit number 
    e = randint(1000,9999)      # A random 4-digit number
    m = randint(100,999)        # A random 3-digit number
    if gcd(m,N) == 1:           
        totient = N - (p + q) + 1  # This is \phi(N) - see Note 2 above
        if gcd(e,totient) == 1:
            f = modular_inverse(e,totient)
            me_received = m**e % N 
            m_computed = pow(m, e*f, N) # This computes m**(e*f) (mod N) [alternatively: pow(me_received, f, N)] 
            print("Done! Our random search resulted in the following.")
            print("m = {}, e = {}, N = {}, euler_totient(N) = {}.".format(m,e,N,totient))
            print("Also gcd(m,N) = 1 and gcd(e,euler_totient(N)) = 1.")
            print("The multiplicative inverse of {} modulo {} is f = {}.".format(e,totient,f))
            print("We check below that our computation yields {} as required.".format(m))
            print("\n(m**e)**f = {}**({} * {}) = {} (mod {})\n".format(m,e,f,m_computed,N))
            print("So on reception of m**e = {}**{} = {} (mod {})".format(m,e,me_received,N), end = " ")
            print("we compute {}.".format(m))
            print("\nTHE POINT: under the assumption that we know e = {} and N = {}".format(e,N))
            print("and that we have a fast algorithm", end = " ")       
            print("for computing euler_totient(N) = {},".format(totient))
            print("when we receive\n\n\t\t\t m**e (mod N) = {}\n".format(me_received))
            print("we are able to easily compute and retrieve m = {}.".format(m_computed))
            not_found = False

## The RSA Protocol in action

**Note.** We have already studied the number theoretic core of the RSA Protocol in Lecture 8.3. Here we implement the protocol via a realistic example in which a textual message is encrypted and decrypted using 512 bit (private key) prime numbers. 

The RSA Encryption protocol, invented by **R**ivest, **S**hamir and **A**delman, is asymmetric. Alice, who is to receive a message, has two keys: a **public key**, that she publishes online and that Bob (or anyone else) can use to encrypt his message, and a **private key** that Alice uses to decrypt the message that Bob sends her. 

In the RSA protocol the private key is a pair $(p,q)$ where $p$ and $q$ are large (e.g$.$ 512 bit) prime numbers. The public key is the pair $(N,e)$ where $N$ is defined to be the product $N = p \cdot q$ and $e$ is an auxiliary number called the **exponent**. Note that $e$ can be repeatedly used by different people and that often the number $e = 65537$ is used. (You should check that this is a prime with nice properties.) 

Note that the asymmetry of the protocol is due to the fact that only Alice generates the keys used when Bob is going to send her a message (as opposed to both Alice and Bob generating keys as is the case under, for example, the Diffie Hellman protocol). Note also that it is VITAL that a private key $(p,q)$ is NOT used by different people. 

To proceed we need to be able to generate large randomly chosen primes. We use the function `is_prime` that we  imported from the `miller_rabin` module in the testing cell above (but we perform the import again below to make sure - in case you have not run the testing cell in your present session).  We also need a special function `SystemRandom` from the `random` module to generate cryptographically secure random numbers. 

In [None]:
from random import SystemRandom
from miller_rabin import is_prime

# Find a cryptographically secure random number of bitlength many bits. 
def random_prime(bit_length):
    '''
    Returns a cryptographically secure random numbber 
    of bit_length many (binary) bits 
    '''
    while True:
        p = SystemRandom().getrandbits(bit_length)  
        # Check whether p is a prime of the right bit length
        if p >= 2**(bit_length-1):
            if is_prime(p):
                return p

In [None]:
test_p = random_prime(1024)   # A random prime of 1024 (binary) bits
print("The length of the binary representation of test_p is: {}".format(len(bin(test_p)[2:])))
print("\ntest_p = \n{}".format(test_p))

## Alice: key generation. ##

We have given Alice the necessary means to generate her private and public key. 

In [None]:
def rsa_private_key(bit_length):
    '''
    Given input bit_length returns a private RSA key (p,q) where 
    both p and q are primes with bit_length number of (binary) bits. 
    '''
    p = random_prime(bit_length)
    q = random_prime(bit_length)
    return (p,q)    # This is a 'tuple'. 

Alice now generates her private key: a pair of 512-bit primes. 

In [None]:
(p,q) = rsa_private_key(512)
print("Alice's private key is (p,q) where: \n")
print("p = ")
print(p)
print("\nq = ")
print(q)


Alice also needs a public key. 

In [None]:
def rsa_public_key(p,q, e = 65537):
    '''
    Given input (p,q,e) returns the RSA public key 
    from the two prime numbers p and q and auxiliary 
    exponent e. If only (p,q) input, e = 65537 is used.
    '''
    N = p * q
    return (N,e)

In [None]:
(N,e) = rsa_public_key(p,q)
print("Alice's public key is (N,e) where:\n")
print("N = ")
print(N)
print("\ne = {}".format(e))

Alice  now makes her public key `(N,e)` available online. 

## Bob: encryption.

Bob writes his message and then uses the `convert_to_integer` function developed in the last lecture to convert it into an integer `m`. 

In [None]:
message = "This is a secret message meant only for Alice."

In [None]:
from conversion_functions import convert_to_integer

m = convert_to_integer(message)
print("Bob's message is : '" + message + "'")
print("Bob has converted this into the number:\n\nm = {}".format(m))

**Note.** The integer `m` must be smaller than `N` since this ensures  that, when we compute $m \;\left( \mathrm{mod}\; N  \right)\,$ we obtain $m$ itself.  However we will also make sure that the number of bits of `m` is less than the number of bits of `p` or `q` (i.e$.$ 512 bits in the present case).  This means that `m` is less than both `p` and `q` (and hence not a multiple of either) which implies that $\mathrm{gcd}(m,N) = 1$ (since this is another condition that we need). Note that this is the case here (as `m` is a 369-bit integer). For a longer message Bob should slice it into pieces to be sent one at a time.  

Bob's encryption procedure is now simple. He encrypts $m$ as a ciphertext integer $c$ by setting 

$$
c \,=\, m^e \;\left( \mathrm{mod}\; N  \right)\, . 
$$ 

In [None]:
def rsa_encrypt(m, N, e):  
    '''
    Given input (m,N,E) where m is the numerical 
    encoding of a message, returns the RSA 
    encryption of m using public key (N,e).
    '''
    return pow(m, e, N) # This is m**e (mod N)

In [None]:
c = rsa_encrypt(m,N,e)
print("Bob has encrypted his converted message m into ciphpertext c.")
print("\nc = ")
print(c)

Bob now sends his message to Alice. 

## Alice: decryption.

To decrypt the ciphertext $c$ Alice needs to find $m$ - i.e$.$ the $e$-th root of $c$ modulo $N$. But we know how to do this from Lemma 9.3 (see Important Fact 9.4): as 
$c \equiv m^e \;\left( \mathrm{mod}\;N  \right)$ it suffices to find integer $f$ such that $e \cdot f \equiv 1$ modulo $\phi(N)$. Indeed, as we saw above, Alice can then can compute: 

$$
c^f \equiv m^{e \cdot f} \equiv m \;\left( \mathrm{mod}\;N  \right) \,.
$$

In other words, by raising $c$ to the power of $f$ Alice extracts $m$. So Alice just needs to find $f$, the multiplicative inverse of $e$ modulo $\phi(N)$. 

**Remark.** We have used the fact that  the two conditions of Lemma 9.3 are satisfied: firstly $m < p$ and $m < q$ so that $\mathrm{gcd}(m,N) = 1$ and secondly  $e$ is prime so that $\mathrm{gcd}(e,\phi(N)) = 1$.

**What about an eavesdropper?** Suppose that an eavesdropper Eve - who,
like everyone else, knows the public key $(N,e)$ - managed to get hold of the ciphertext $c$. Then, if Eve could find $f$, she could also decipher the message! However to do this Eve must compute $\phi(N)$. Remember that $N$ is very large. Without further information on $N$ this is VERY DIFFICULT to do and would take a VERY LONG TIME. (This is the point... as already mentioned above.) 

**Alice and her private key.** Alice however knows that $N = p \cdot q$ and knows the pair of prime numbers $(p,q)$. Indeed this is her private key. So she can use the fact, already mentioned above (see Corollary 9.4), that

$$
\phi(N) \,=\, (p-1)\cdot(q-1) \,=\, p \cdot q - (p + q) + 1 \,. 
$$

I.e. Alice easily and quickly decrypts the ciphertext. 

In [None]:
def rsa_decrypt(c,p,q,N,e): 
    '''
    Given input (c,p,q,N,e) returns the RSA decryption of ciphertext
    c using private key (p,q) and public key (N,e). (We input N as a 
    parameter to avoid having to recompute N = p*q.)
    '''
    totient = N - (p + q) + 1       # This is (p-1)*(q-1)
    f = modular_inverse(e,totient)  # Note: f * e = 1 (mod totient)
    return pow(c,f,N)               # This is c**f (mod N)

In [None]:
m_new = rsa_decrypt(c,p,q,N,e)
print("Alice has deciphered the ciphertext c.")
print("The result of Alice's decryption is as follows.")
print("\nm_new = ")
print(m_new)

if (m_new == m): 
    print("\nThis is the number that Bob sent.")

Alice now just has to convert this back in to text form and she's done!

In [None]:
from conversion_functions import convert_to_text

new_message = convert_to_text(m_new)
print("Alice is now able to read the message printed below.\n")
print("new_message = '" + new_message + "'")

### Sources. 

Much of the material in this lecture is drawn from the book and Jupyter notebooks by Martin H. Weissman as listed below.

[1] Martin H. Weissman. *An Illustrated Theory of Numbers*. AMS, 2017. 

[2] Martin H. Weissman. [Python for number Theory (Jupyter Notebooks)](http://illustratedtheoryofnumbers.com/prog.html)

## Appendix 

This Appendix is a reminder of how we developed the function `modular_inverse` in Lecture 8.1   using the  extended version of Euclid's algorithm.  The latter was implemented in python as the function `gcd_ext` where for (positive) integers `a` and `b`, `gcd_ext(a,b)` returns the triple `(g,x,y)` where $g = \mathrm{gcd}(a,b)$ and 

$$
g = x\cdot a + y\cdot b. 
$$

This means that $x \cdot a = g + (-y)\cdot b$. Thus, when the greatest common divisor $g = 1$, we have $x \cdot a \equiv 1\; (\mathrm{mod}\; b$), so that $x \;(\mathrm{mod}\; b)$ is the multiplicative inverse of $a \; (\mathrm{mod}\; b)$.

We then implemented this as the function `modular_inverse`. 

In [None]:
from number_theory_lecture_functions import gcd_ext

def modular_inverse(a,b): 
    '''
    Given input (a,b) with a and b integers returns 
    the multiplicative inverse of a modulo b provided 
    gcd(a,b) = 1. Otherwise returns an error message.
    '''
    ic_message = 'The numbers are not coprime'
    # We firstly extract (g,x,y) such that g = x*a + y*b
    (g,x,y) = gcd_ext(a,b)
    if not g == 1: 
        print(ic_message)
        return None
    # If we get here we know that 1 = x*a + y*b so x*a = 1 (mod b)
    # So (as it may be that x > b) we have that  x (mod b) is the 
    # modular inverse a (mod b)
    x = x % b 
    return x 

Let's test this. You can repeatedly execute the following cell to do this. 

In [None]:
from random import randint
from number_theory_lecture_functions import gcd

not_found = True
while not_found:
    a = randint(1, 10**6) 
    b = randint(1,10**6)
    if gcd(a,b) == 1:
        f = modular_inverse(a,b)
        print("The multiplicative inverse of {} (mod {}) is: {}".format(a, b, f))
        print("Indeed {} * {} (mod {}) = {}".format(a, f, b, a*f % b))
        not_found = False       