In [None]:
# Initialize Otter
import otter
grader = otter.Notebook("lab09.ipynb")

In [None]:
import numpy as np
import matplotlib.pyplot as plt

# Lab 9: Lattice Cryptography

Welcome to Lab 9! In this lab, we will explore properties of cryptographic lattices and construct the GGH cryptosystem.

Recall that a lattice is the set of all **integer** linear combinations of basis vectors: 

$$L = \{a_0v_0 + a_{1}v_{1} + \ldots + a_nv_0n     :      a \in \mathbb{Z}, b \in \mathbb{Z}^n\}$$

For example, the following 

<img src="./lattice1.png" style="size:400px">

(Photo credit to Hoffstein et. al textbook). The F region is the **fundamental region**, or the (continuous) set

$$F(B) = \{t_1b_1 + t_2b_2 + \ldots + t_nb_n : 0 \leq t_i < 1 \}$$

The volume of $F(B)$ is equal to the determinant of L: $Vol(B) = \det L$.

Notably, there are infinitely many different basis that descrive the same lattice $L$. We can transform any basis $B$ into another basis $B'$ by matrix multiplying $B' = UB$. $U$ must be a unimodular matrix: a matrix with all integer entries and determinant $\pm 1$. This is because all basis of a given lattice $L$ preserve the volume of the fundamental domain, so a non-unit determinant would scale the volume to be some other value.

This leads us to the concept of *good basis* vs *bad basis*. A good basis is one that has highly orthogonal vectors: roughly, they are at 'right angles' to each other. Likewise, a bad basis has vectors that overlap significantly.

<img src="goodbadbasis.png">


To quantify this, we define the $\textbf{Hadamard ratio}$ as: $$H(B) = \left(\frac{\det L}{\prod_i^n \Vert v_i \Vert}\right)^{\frac{1}{n}}$$.

Note that a larger Hadamard ratio means the basis is more orthogonal. Why does this matter? It turns out that solving certain problems on lattices is far easier when working in a good basis than a bad basis.

There are mathematical problems on lattices that are considered hard to solve in the worst case, even for quantum computers. They have a lot of applications to mathematical schemes, especially in cryptography.

The $\textbf{shortest vector problem}$, or SVP, asks for the vector in the lattice with the smallest $\textbf{norm}$ (usually $\ell_2$, aka Euclidean distance). The exact version of this problem is known to be $\textbf{NP-Hard}$ (under randomized reductions). The approximation version, which asks for vector $v' \in L$ such that $\Vert v' \Vert = \gamma\lambda(L)$ (where $\lambda(L)$ is the real shortest vector in $L$). The approximation version, often denoted $\text{SVP}_{\gamma}$ or apprSVP, asks for a short vector with a norm at most some fixed constant multiple greater than the real one. Interesting enough, this problem is also considered NP-Hard for small $\gamma$, but polynomially solvable for $\gamma \geq 2^N$.

The $\textbf{closest vector problem}$, or CVP, asks for a vector in the lattice with the smallest distance from another given vector. Formally, we want $v'$ such that $\Vert v' - v \Vert$ is minimized for the input $v$. The CVP is known to be NP-Hard unconditionally. The approximation version, often denoted $\text{CVP}_{\gamma}$ or apprCVP, asks for some vector $v'$ given $v$ such that $\Vert v' - v \Vert \leq \gamma(a - v)$, where $a$ is the optimal closest vector. $\text{CVP}_{\gamma}$ is generally hard for subexponential $\gamma$. 

## Babai's Algorithm
$\textbf{Babai's algorithm}$ is a rather straightforward approximation algorithm for the CVP. Given the input vector $v = a_0b_0 + a_1b_1 + \ldots + a_nb_n$, we round all the coefficients to the nearest integer and return $v' = \lfloor a_0 \rceil b_0 + \lfloor a_1 \rceil b_1 + \ldots + \lfloor a_n \rceil b_n$. Evidently this algorithm is fast, but has really poor approximation results in higher dimensions and for less orthogonal basis vectors.


Let's take a look at an example for Babai's Algorithm. We are given a basis $B$:

$$v_1 = \begin{bmatrix} 137 \\ 312 \end{bmatrix}, v_2 = \begin{bmatrix} 215 \\ -187 \end{bmatrix}$$

and wish to solve the CVP for $$t = \begin{bmatrix} 53172 \\ 81743\end{bmatrix}$$

First, we need to express $t$ as a linear combination of $v_1$ and $v_2$. We can do this by remembering $Bx = t$ for the linear combination coefficients $x$. Therefore we just need to invert $B$ (we represent B as row vectors, so the order is reversed):

In [None]:
target = np.array([53172, 81743])
basis = np.array([[137, 312], [215, -187]])

linear_coeffs = np.dot(target, np.linalg.inv(basis))
linear_coeffs

This confirms that $$\begin{bmatrix} 53172 \\ 81743\end{bmatrix} \approx 294.85\begin{bmatrix} 137 \\ 312 \end{bmatrix} + 58.15\begin{bmatrix} 215 \\ -187 \end{bmatrix}$$

Babai's algorithm says we need to round those coefficients to their closest integer, so our returned lattice vector is

$$v = 295\begin{bmatrix} 137 \\ 312 \end{bmatrix} + 58\begin{bmatrix} 215 \\ -187 \end{bmatrix} = \begin{bmatrix} 53159 \\ 81818 \end{bmatrix}$$

In [None]:
v = np.array([53159, 81818])
np.linalg.norm(target - v, 2)

We see our $\ell_2$ norm for $\Vert t - v \Vert$ is $\approx 76$. This is a really good solution for the approximate CVP.

This was possible because our original basis was quite *good*: its Hadamard ratio is **0.977**!

Let's see what happens when we use a worse basis. Take our new basis $B'$:

$$v_1 = \begin{bmatrix} 1975 \\ 438 \end{bmatrix}, v_2 = \begin{bmatrix} 7548 \\ 1627 \end{bmatrix}$$


This was formed via the unimodular matrix $$U = \begin{bmatrix} 5 & 6 \\ 19 & 23 \end{bmatrix}$$

You can verify for yourself that $U$ has determinant 1 and all-integer entries, and that $B' = UB$.

Notably, the Hadamard ratio of this new basis is quite a lot worse at **0.077**. 

First, we need to implement Babai's algorithm.

In [None]:
def babai(basis, target):
    # Recall that we can represent target as Bx = target.
    
    x = np.dot(target, np.linalg.inv(basis))
    
    # Now, round each component of x and add it to base
    # Remember to multiply the rounded component by its corresponding basis vector!
    base = np.array([0]*len(x))
    
    for i in range(len(x)):
        base += round(x[i]) * basis[i] # BEGIN SOLUTION
        
    # Finally, return a recombined vector.
    return base

In [None]:
grader.check("q1")

In [None]:
target = np.array([53172, 81743])
result = babai(np.array([[1975, 438], [7548, 1627]]), target)
print('Found closest vector:', result)
np.linalg.norm(target - result)

We see that the returned vector is *significantly* worse than the returned vector using the good basis. This problem compounds exponentially in the number of basis vectors -- most modern lattices use *hundreds* of basis vectors, so you can imagine how inaccurate the algorithm gets.

## GGH Cryptosystem

Now that we've seen how to create hard problems on lattices, let's implement a cryptosystem using them! The **GGH public-key cryptosystem** (partially authored by a Berkeley professor and Turing award winner, Shafi Goldwasser) uses the CVP and Babai's algorithm for encryption.

We generate a $\textbf{good basis}$ for a lattice $L$, denoted $B_{good}$. This will be our $\textbf{private key}$. We then transform it using a integral matrix $M$ with determinant equal to 1 to get $B_{bad} = M(B_{good})$, our $\textbf{public key}$. Both of these basis describe the same lattice.

To send a message $m$, Bob first generates some short error vector $r$ (often using Gaussian noise). This is to hide the actual message $m$, which is a lattice vector. Bob sends $m + r$ to Alice.

Remember that Bob only has access to the public (bad) basis, but Alice knows the good basis. Alice can then run Babai's algorithm on $m + r$ in the good basis to recover $m'$ (a representation of $m$ in the good basis), and then transform it back into the original form via $B_{good}^{-1}$.

In [None]:
# These are the basis from earlier, but any pair of good/bad will work fine
private_basis = np.array([[137, 312], [215, -187]])
public_basis = np.array([[1975, 438], [7548, 1627]])

Bob wants to send a message to Alice, and encodes it as a vector $\begin{bmatrix} 2 & 4 \end{bmatrix}$. To add some indistinguishability and non-linearity, he adds an error vector $r$.

In [None]:
message = np.array([2, 4])
print('Original message:', message)

r = np.array([np.random.normal(), np.random.normal()])

print('Random vector:', r)

enc = np.dot(message, public_basis) + r
print('Final encrypted message:', enc)

To decrypt, Alice first runs her Babai algorithm on the *good basis*.

In [None]:
close_enc = babai(private_basis, enc)
print('Lattice vector found:', close_enc)

Finally, she needs to convert it back to Bob's original vector. Recall that Bob originally encoded it using the public basis, so we have to convert back to the public basis.

In [None]:
dec = np.dot(close_enc, np.linalg.inv(public_basis))

print("Message found:", dec)

And Alice has recovered the message!

There are many more lattice-based cryptosystems than just GGH (in fact, GGH is now considered *insecure*), but we unfortunate don't have the space to cover them. If you are interested, I highly recommend looking up schemes like NTRU. Next week, we will cover an application of the Learning With Errors problem, which is based on lattices, to homomorphic encryption!

Congrats on finishing Lab 9.

## Submission

Make sure you have run all cells in your notebook in order before running the cell below, so that all images/graphs appear in the output. The cell below will generate a zip file for you to submit. **Please save before exporting!**

Once you have generated the zip file, go to the Gradescope page for this assignment to submit.

In [None]:
# Save your notebook first, then run this cell to export your submission.
grader.export(pdf=False, run_tests=True)