# Cryptography and RSA

Cryptography is the study and practice of techniques for secure communication in the presence of adversaries. It involves transforming information in such a way that only authorized parties can access it, while preventing unauthorized access.

In this lecture, we'll explore how number theory provides the mathematical foundation for modern cryptography, particularly through the RSA encryption system. This is a really neat application of modular arithmetic and some of the structure we have uncovered there!

_Acknowledgement:_ The material on this page has been inspired by the notes _Introduction to Cryptography_ by Shishir Agrawal, (https://sagrawalx.github.io/crypt/), and some of the code presented here is from those notes. 

#### Converting Text to Numbers

In cryptography, we want to send messages, and classically this means messages conisting of text, like "Bet all on red". In modern days, we may want to send text, or other information like bank account information etc, in a secure matter. For mathematical cryptography, we are mostly concerned with sending numbers, but if you can do this securely, then you can also send text or anything else you want by just turning your information into numbers. We call this process **integrification**, since we are turning information into integers. This section runs through one possible way of doing that.  

#### Integrification 

The benefit of integrification is that it allows us to:

- Apply mathematical transformations to messages
- Use modular arithmetic for encryption
- Leverage properties of prime numbers and number theory

The basic idea is to map each letter to a number. A common approach is:
- A → 0, B → 1, C → 2, ..., Z → 25

For longer messages, we can represent entire words or sentences as a single large integer using a **base-26 representation**. For example, if we have the letters with values $a_0, a_1, a_2, \ldots, a_{n-1}$ (where each $a_i$ is between 0 and 25), we can create the integer:

$$M = a_0 \cdot 26^{n-1} + a_1 \cdot 26^{n-2} + \cdots + a_{n-1} \cdot 26^0$$

This converts any text string into a unique integer, which we can then encrypt using mathematical operations.

Of course, you may want to include other characters, like spaces, punctuation, numbers,..., and maybe you care about the difference between upper case and lower case letters. We won't worry about any of this, and in the following we will only care about letters (not spaces or any other characters) which are not case sensitive. 

In [1]:
from re import sub

# Remove all non alphabetic characters and capitalize
def encode(text: str):
    stripped = sub(r"[^a-zA-Z]", "", text)
    return stripped.upper()

# Encode a string as a list of numbers 0--25
def numerify(text: str):
    return [(ord(x) - 65) for x in encode(text)]

# Turn a list of numbers 0--25 back into a string
def denumerify(nums: list):
    return "".join([chr((x % 26) + 65) for x in nums])
    
# Get integer representation
def integerify(text: str):
    n = 0
    for i, x in enumerate(reversed(numerify(text))):
        n += x * 26^i
    return n
    
# Get text from integer representation
def deintegerify(nums: str):
    n = int(sub(r'\D', '', nums))
    rems = []
    while n > 0:
        rems.append(n % 26)
        n = n // 26
    return denumerify(reversed(rems))
    
# Prints an output div aligning with the interact controls   
def output_div(label: str, content: str):
    s = '<div class="sagecell_interactControlCell" style="width: 100%;">'
    s += f'<label class="sagecell_interactControlLabel">{label}</label>'
    s += f'<div class="sagecell_interactControl">{content}</div>'
    s += '</div>'
    pretty_print(html(s))

@interact
def _(text=input_box(default="HIBOB", label="Input", height=2, width=70, type = str),
      actions=selector(["integerify", "deintegerify"], buttons=True, label="Action")):
    output = eval(actions)(text)
    output_div("Output", f'<textarea readonly rows="2" cols="70">{ output }</textarea>')

Interactive function <function _ at 0x133e60d60> with 2 widgets
  text: TransformTextarea(value='HIBOB', description='Input', layout=Layout(max_width='71em'))
  actions: ToggleButtons(description='Action', options=('integerify', 'deintegerify'), value='integerify')

## The RSA Cryptosystem

The _RSA algorithm_, named after its inventors Rivest, Shamir, and Adleman (1977), is one of the first public-key cryptosystems and is widely used for secure data transmission. Its security is based on the computational difficulty of factoring large integers.

#### Mathematical Setup

RSA relies on the following mathematical concepts from number theory:

1. _Euler's totient function:_ For a positive integer $n$, $\phi(n)$ counts the number of integers from 1 to $n$ that are coprime to $n$.
   - If $n = pq$ where $p$ and $q$ are distinct primes, then $\phi(n) = (p-1)(q-1)$.

2. _Euler's theorem:_ If $\gcd(a, n) = 1$, then $a^{\phi(n)} \equiv 1 \pmod{n}$.

3. _Modular multiplicative inverses:_ If $\gcd(e, \phi(n)) = 1$, there exists a unique $d$ modulo $\phi(n)$ such that $ed \equiv 1 \pmod{\phi(n)}$.

#### RSA Key Generation

To set up RSA encryption, we perform the following steps:

1. _Choose two large prime numbers_ $p$ and $q$.
2. _Compute_ $n = pq$. This will be the **modulus** for both encryption and decryption.
3. _Compute_ $\phi(n) = (p-1)(q-1)$.
4. _Choose an encryption exponent_ $e$ such that $1 < e < \phi(n)$ and $\gcd(e, \phi(n)) = 1$.
5. _Compute the decryption exponent_ $d$ such that $de \equiv 1 \pmod{\phi(n)}$.
   - In other words, $d = e^{-1} \bmod \phi(n)$.

The **public key** is the pair $(n, e)$, which can be shared openly.

The **private key** is the pair $(n, d)$, which must be kept secret. The primes $p$ and $q$ should also be kept secret (or destroyed after generating the keys).

#### RSA Encryption and Decryption

Once the keys are generated:

**Encryption**: To encrypt a message $M$ (represented as an integer with $0 \leq M < n$), compute:
$$C = M^e \bmod n$$
where $C$ is the ciphertext.

**Decryption**: To decrypt the ciphertext $C$, compute:
$$M = C^d \bmod n$$

**Why this works**: Since $ed \equiv 1 \pmod{\phi(n)}$, we have $ed = 1 + k\phi(n)$ for some integer $k$. Therefore:
$$C^d \equiv (M^e)^d = M^{ed} = M^{1+k\phi(n)} = M \cdot (M^{\phi(n)})^k \equiv M \cdot 1^k = M \pmod{n}$$
by Euler's theorem. If you are astute, you've noticed that we used Euler's theorem here but we didn't check that $M$ satisfies the hypothesis that $\gcd(M, n) = 1$. In fact, this won't always be the case, but miraculously $M^{ed} \equiv M \mod n$ even when $\gcd(M,N) \neq 1$! 

**Exercise:** Prove this! 


<div class="theorem" style="border: 1px solid #ccc; padding: 10px; margin: 5px 0; background-color: #f9f9f9; border-radius: 5px; overflow: hidden;">
    <p style="font-size: 1.2em; font-weight: bold; margin-top: 10px;">Theorem (RSA Correctness)</p>
    <p>Let $n = pq$ where $p$ and $q$ are distinct primes. Let $e$ and $d$ be positive integers such that $ed \equiv 1 \pmod{\phi(n)}$. Then for any integer $M$ with $0 \leq M < n$:
    $$M^{ed} \equiv M \pmod{n}$$
    </p>
</div>

Examples of RSA: Write simple code that shows how to encrypt and decrypt with RSA. Check that the decryption gives the original message. 

In [6]:
## Simple RSA example
# Key generation 
p = 31
q = 53
n = p*q

phi = euler_phi(n)
e = 7
d = inverse_mod(7, phi)
print(f"Full RSA key: (n, e, d) = {n,e,d}\n")

# Here is our message
m = ZZ.random_element(1,n)

# Encryption
c = power_mod(m,e,n)
print(f"Encrypting the message M = {m} using the public key {n,e} gives cypertext C = {c}\n")

# Decryption
print(f"Decrypting the cyphertext C = {c} using the private key {n,d} gives back our original message: {power_mod(c,d,n)}")

Full RSA key: (n, e, d) = (1643, 7, 223)

Encrypting the message M = 1132 using the public key (1643, 7) gives cypertext C = 504

Decrypting the cyphertext C = 504 using the private key (1643, 223) gives back our original message: 1132


#### Security of RSA

The security of RSA depends on the difficulty of factoring the modulus $n$. If an adversary can factor $n = pq$, they can compute $\phi(n) = (p-1)(q-1)$ and then calculate the private key $d$ from the public exponent $e$. 

However, for sufficiently large primes (typically 1024 bits or larger in practice), factoring is computationally infeasible with current algorithms and computers. This makes RSA secure for practical purposes.

**Code-breaking approach**: If you know $n$ and $e$ (the public key) and can factor $n$ into $p$ and $q$, then you can:
1. Compute $\phi(n) = (p-1)(q-1)$
2. Compute $d = e^{-1} \bmod \phi(n)$
3. Decrypt any ciphertext using $M = C^d \bmod n$

#### Interactive RSA Key Generation

Use the tool below to generate RSA keys. Specify the number of bits for the primes $p$ and $q$:

In [7]:
# Prints an output div aligning with the interact controls   
def output_div(label: str, content: str):
    s = '<div class="sagecell_interactControlCell" style="width: 100%;">'
    s += f'<label class="sagecell_interactControlLabel">{label}</label>'
    s += f'<div class="sagecell_interactControl">{content}</div>'
    s += '</div>'
    pretty_print(html(s))
    
@interact
def _(bits=input_box(default="10", label="Bits", width=62)):
    p = random_prime(2^bits-1, lbound=2^(bits-1))
    q = random_prime(2^bits-1, lbound=2^(bits-1))
    n = p*q
    phi_n = (p-1)*(q-1)
    d = randint(0, phi_n)
    while gcd(d, phi_n) != 1:
        d = randint(0, phi_n)
    e = inverse_mod(d, phi_n)
    
    output_div("n", str(n))
    output_div("e", str(e))
    output_div("d", str(d))

Interactive function <function _ at 0x137124ea0> with 1 widget
  bits: EvalText(value='10', description='Bits', layout=Layout(max_width='63em'))

### Conclusion

RSA encryption demonstrates the beautiful connection between pure number theory and practical applications. The security of RSA relies on:
- The difficulty of factoring large numbers
- Properties of Euler's totient function
- Modular exponentiation and multiplicative inverses

In practice, RSA is used with key sizes of 2048 bits or larger to ensure security against modern factoring algorithms and computational power. While RSA encryption of long messages can be slow, it's often used to encrypt symmetric keys, which are then used for faster symmetric encryption of the actual message data (a hybrid cryptosystem).

#### Interactive Code-Breaking Example

Try breaking RSA with small keys. Factor $n$, compute $\phi(n)$, find $d$, and decrypt the message:

### Example: Breaking RSA with Small Keys

Let's see an example of how RSA can be broken if the modulus $n$ is too small. Suppose Alice uses the following public key:
- $n = 33$
- $e = 7$

Bob sends Alice an encrypted message: $C = 26$

An eavesdropper Eve can break this by:
1. Factoring $n = 33 = 3 \times 11$
2. Computing $\phi(n) = (3-1)(11-1) = 2 \times 10 = 20$
3. Finding $d$ such that $ed \equiv 1 \pmod{20}$, i.e., $7d \equiv 1 \pmod{20}$
4. Solving: $d = 3$ (since $7 \times 3 = 21 \equiv 1 \pmod{20}$)
5. Decrypting: $M = 26^3 \bmod 33 = 17576 \bmod 33 = 2$

So the original message was $M = 2$, which corresponds to the letter 'C' in our encoding (B in 0-indexed).

This demonstrates why RSA requires large primes—small values can be easily factored!