<h1><center>SimpleFHE: Full Homomorphic Encryption Lab</center></h1>
<h3><center>CS456 <br /><br />
    Dr. Joseph Gersch </center></h3>

----


## Lab Overview

You will use the `simpleFHE` library to do experiments with full homomorphic encryption; that is, performing additive and multiplicative operations on encrypted data.  `simpleFHE` is a python "wrapper" encapsulating the Microsoft SEAL library.


### Table of Contents

* [Miscellaneous Notes](#notes)
* [Polynomial Example](#polynomial_example)
    * [The Problem](#problem)
    * [The Solution](#solution)
    * [A More Realistic Example](#realistic_example)
        * [Step 1: Keypair Generation](#step1)
        * [Step 2: Client-Side Encryption](#step2)
        * [Step 3: Server-Side Processing](#step3)
        * [Step 4: Client-Side Decryption](#step4)
* [Task: Prediction based on Simple Linear Regression](#frankenstein)
* [What to submit to CANVAS](#submittal)
* [Grading Rubric](#rubric)


### Sources

This lab is a slight re-write of the examples found in the [simpleFHE README document](https://libraries.io/pypi/simplefhe) and formatted to run in a Jupyter Notebook.  The linear regression assignment is adapted and story-booked from a Python article on [Performing Linear Regression from Scratch](https://www.edureka.co/blog/linear-regression-in-python/)

### Installation Dependencies

You can run this notebook on the CS department lab servers using SSH port forwarding for jupyter notebooks.
In order to gain access to the libraries, you will have to execute the following commands:

>export PYTHONPATH=/usr/local/microsoft-seal/lib64/python3.8/site-packages/:/usr/local/simplefhe/lib/python3.8/site-packages/

>export LD_LIBRARY_PATH=/usr/local/microsoft-seal/lib/

As a test, our sys admins were able to import these modules w/o any error:

`python
Python 3.8.3 (default, May 29 2020, 00:00:00) 
[GCC 10.1.1 20200507 (Red Hat 10.1.1-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
import seal
from simplefhe import initialize`
 
----


You can also run on this notebook on your own laptop if you follow the instructions on the `SEAL-Python` web page.  They should work for LINUX and WINDOWS.  I was unable to get the dependencies to work on ymy MacBook Pro, although they did work in an Ubuntu VM on my MacBook Pro.


`simplefhe` depends on [SEAL-Python](https://github.com/Huelse/SEAL-Python) and all its prerequisites. After installing SEAL-Python, the simplefhe library is just a pip install away: 

>`pip3 install simplefhe`

Finally, you might need to install matplotlib and numpy if you don't have them already.

>`pip3 install numpy
pip3 install matplotlib`


----
<a id='notes'></a>
## Miscellaneous Notes

* To enable floating point computations (results will be approximate):

>`from simplefhe import initialize
initialize('float')`

This must be done before any other simplefhe code (keygen, encryption/decryption, etc.) is executed. A full example is shown later.

* To increase the maximum range of allowable integers,execute the cell below.  Integers in the range [-MAX_INT + 1, MAX_INT] inclusive are representable.

In [None]:
from simplefhe import initialize, display_config
MAX_INT = pow(2, 25)
initialize('int', max_int=MAX_INT)
display_config()

* Comparison operations (<, =, >) are not supported on encrypted data. If they were, it would be pretty easy to figure out what the plaintext is! As a side effect, it's not really possible to branch based on encrypted data.
* There is some randomness in the encryption process: the same value, encrypted with the same key, will yield different ciphertexts. This prevents a simple plaintext enumeration attack.

----
<a id='polynomial_example'></a>
# Polynomial Example

SimpleFHE using Microsoft SEAL library is able to do addition, subtraction and multiplication on encrypted data.  It cannot do division by encrypted data, although it does allow division by plaintext data.  There are some other limitations as well that are listed in the NOTES section at the end of this notebook.

This example uses a simple cubic polynomial to illustrate how simpleFHE works.


<a id='problem'></a>
## The Problem

Suppose we have some sensitive data we wish to process on a remote server. The usual approach is to send the data over a secure connection to be processed server-side.

It's nice the communication link is secure, but the cloud server data is open to attack.

In [None]:
#  Example of an INSECURE solution.  Pretend this is running on a remote server.
#  Data is sent from the client to the server, processed at the server, then sent 
#  back to the client.

# The server
def process(x):
    return x**3 - 3*x + 1


# The client
sensitive_data = [-30, -5, 17, 28]
for entry in sensitive_data:
    print(entry, process(entry)) # Bad! We are leaking sensitive information.

However, this solution requires trusting the server to keep your data confidential. One rogue admin or database hack is all it takes to expose your sensitive data to the public.

<a id='solution'></a>
## The Solution

A few lines of extra code is all it takes to implement Full Homomorphic Encryption (FHE).  We require FHE because the polynomial we are solving has both addition and multiplication.

In this example, we encrypt the data on the client, send only the encrypted data to the server, process the encrypted data server-side, and return the encrypted result to be client-side decrypted. This requires zero trust of the remote server.

Execute the cell below and compare the results to the previous insecure example.  There may be slight differences in floating point, but the answers are basically the same even though the operations were performed on encrypted data.


In [None]:
from simplefhe import (
    encrypt, decrypt,
    generate_keypair,
    set_public_key, set_private_key, set_relin_keys,
    display_config
)

# In a real application, the keypair would be generated once,
# and only the public key would be provided to the server.
# The relin key is a relinearization key needed for SEAL's version of FHE bootstrapping
# A more realistic example is given later.
public_key, private_key, relin_keys = generate_keypair()
set_private_key(private_key)
set_public_key(public_key)
set_relin_keys(relin_keys)

display_config()


# The server
def process(x):
    return x**3 - 3*x + 1


# The client
sensitive_data = [-30, -5, 17, 28]
for entry in sensitive_data:
    encrypted = encrypt(entry) # Encrypt the data...
    result = process(encrypted) # Process the encrypted data on the server...
    print(entry, decrypt(result)) # Decrypt the result on the client.

----
<a id='realistic_example'></a>
## A More Realistic Example

Of course, the client and server will generally be separate applications. Here we demonstrate a more realistic pipeline.

<a id='step1'></a>
### Step 1: Keypair Generation

Create the directories needed by this notebook

Generate and store a set of keys to be used throughout the process.  As usual, a public and private keypair will be generated.  A separate relinearization key, `relin`, is also created to handle SEAL's method of doing FHE bootstrapping.

In [None]:
! mkdir keys
! mkdir inputs
! mkdir outputs

In [None]:
from simplefhe import generate_keypair

public_key, private_key, relin_keys = generate_keypair()
public_key.save('keys/public.key')
private_key.save('keys/private.key')
relin_keys.save('keys/relin.key')
print('Keypair saved to keys/ directory')

In [None]:
! ls keys

<a id='step2'></a>
### Step 2: Client-Side Data Encryption

We generate sample datapoints using a linear model, and save the encrypted data to disk. In the real world, the encrypted data would be sent to the server over a (possibly insecure) network.


In [None]:
from simplefhe import encrypt, load_public_key, load_relin_keys, display_config

load_public_key('keys/public.key')
load_relin_keys('keys/relin.key')
display_config()


# Encrypt our data (client-side)
sensitive_data = [-30, -5, 17, 28]

for i, entry in enumerate(sensitive_data):
    encrypted = encrypt(entry)
    encrypted.save(f'inputs/{i}.dat')
    print(f'[CLIENT] Input {entry} encrypted to inputs/{i}.dat')


# We may now safely send these files to the server
# over a (possibly insecure) network connection

<a id='step3'></a>
### Step 3: Server-Side Processing

We process the encrypted data from the client. The server never has access to the private key, and can never decrypt the client's sensitive data.

In [None]:
from simplefhe import load_public_key, load_relin_keys, display_config, load_encrypted_value


# The private key never leaves the client.
load_public_key('keys/public.key')
load_relin_keys('keys/relin.key')
display_config()

# Process values on server.
def f(x): return x**3 - 3*x + 1

for i in range(4):
    # Load encrypted value sent from client
    value = load_encrypted_value(f'inputs/{i}.dat')

    # simplefhe seamlessly translates all arithmetic to
    # FHE encrypted operations.
    # We never gain access to the unencrypted information.
    result = f(value) 

    # Send encrypted result back to client
    result.save(f'outputs/{i}.dat')
    print(f'[SERVER] Processed entry {i}: inputs/{i}.dat -> outputs/{i}.dat')


<a id='step4'></a>
### Step 4: Client-Side Decryption

Finally, the encrypted results are sent back to the client, where they are decrypted. The private key never needs to leave the client.

In [None]:
from simplefhe import (
    load_private_key, load_relin_keys,
    display_config,
    decrypt, load_encrypted_value
)

# Note: this is the only step at which the private key is used!
load_private_key('keys/private.key')
load_relin_keys('keys/relin.key')
display_config()


# Decrypt results from the server (client-side)
sensitive_data = [-30, -5, 17, 28]

for i, entry in enumerate(sensitive_data):
    encrypted = load_encrypted_value(f'outputs/{i}.dat')
    result = decrypt(encrypted)
    print(f'[CLIENT] Result for {entry}: {result}')

----
<a id='frankenstein'></a>
## Task: Prediction based on Simple Linear Regression

_Warning:  the following story contains scenes of graphic violence and may not be suitable for the faint-of-heart.  This does not give you an excuse for skipping the assignment._ 

You are the great-great-great-grandchild of the famed scientist and medical doctor, Viktor Frankenstein.  You have dicovered his notebook containing formulae and instructions for re-animating dead corpses.  

(If this story sounds familiar, you may have seen the movie *Young Frankenstein* by comedic genius Mel Brooks.)

You hire an assistant, appropriately named Igor, in your newly renovated castle in Transylvania.  Igor has been busy robbing graves from local cemeteries and you have been making scientific measurements of their head size and brain weights.  You have dug up 30 samples (sorry for the bad pun...) and completed your measurements.  The data seems random... but maybe there is a linear relationship pattern linking head size to brain weight.

You are now almost ready to repeat your grandfather's famous experiment!!!  You intend to send Igor to steal the brain of famed historian, scientist, scholar and saint,  Hans Delbrück, which is appropriately stored in the CSU biochemistry lab.  You have read the data published on this brain.  It weight 1820 grams.  You don't know how big it is, however.  You need to predict the head size needed and acquire an appropriate-sized body.  

Your data should be able to help.  But you don't know how to create a linear regression formula to predict the head size needed.  You call the CSU statistics department for help but you certainly cannot tell them what the data actually is.  You need to save yourself from embarrassment and possible arrest.   You tell the statistics department that you have a highly confidential set of data to analyze in order to create a simple linear formula.  

You are in luck!!!  The statistics department has a server that will perform the linear regression without needing to actually see the data.  The server uses full homomorphic encryption.  All you have to do is send encrypted data to them and they will send you the coefficients of the linear formula `y = mx + c`.  If you know the values of `m` and `c` you can predict the size skull you need.

Now, switching roles back to you as a student, your assignment is to write the CSU server code to return the coefficients `m` and `c` based on linear regression.  Make the appropriate changes to the code in the cells for step 3 and 4 and run all cells. 

In [None]:
# Step 1.  Key Generation.  You don't have to make any changes to this cell.

from simplefhe import initialize, generate_keypair

# All subsequent processing must be done with the same initialization
initialize('float')

# Generate keypair
public_key, private_key, relin_keys = generate_keypair()

# Save keys
public_key.save('keys/public.key')
private_key.save('keys/private.key')
relin_keys.save('keys/relin.key')

print('Keys saved to keys/ directory')

In [None]:
# Step 2.  Encrypt your extremely sensitive data. 

# You don't need to change anything in this cell.


# Initialization and keys

from simplefhe import initialize, encrypt, load_public_key, load_relin_keys, display_config
initialize('float')
load_public_key('keys/public.key')
load_relin_keys('keys/relin.key')

display_config()

# Here is your highly sensitive data. 
# The first number is head size in cubic centimeters.  The second number is brain weight in grams.

data = [[4512,1530],
        [3738,1297],
        [4261,1335],
        [3777,1282],
        [4177,1590],
        [3585,1300],
        [3785,1400],
        [3559,1255],
        [3613,1355],
        [3982,1375],
        [3993,1380],
        [3640,1355],
        [4208,1522],
        [3832,1208],
        [3876,1405],
        [3497,1358],
        [3466,1292],
        [3095,1340],
        [4424,1400],
        [3878,1357],
        [4046,1287],
        [3804,1275],
        [3710,1270],
        [4747,1635],
        [4423,1505],
        [4036,1490],
        [4022,1485],
        [3454,1310],
        [4175,1420],
        [3787,1318]
]

# split into separate arrays to be used for generating a graph later
head_size, brain_weight = zip(*data)


# encrypt the data
for i, d in enumerate(data):
    encrypt(d[0]).save(f'inputs/x-{i}.dat')
    encrypt(d[1]).save(f'inputs/y-{i}.dat')
    
    
! ls inputs

In [None]:
# Step 3:  THE CSU FHE LINEAR REGRESSION SERVER.  

# You must add your code to make this cell work.
# We will use raw python and avoid numpy in this assignment.
# The client has sent you data files and the public key and relinearization key files

# Initialization and keys:  Read the keyfiles sent to you by the client.
from simplefhe import initialize, load_public_key, load_relin_keys, load_encrypted_value
initialize('float')
load_public_key('keys/public.key')
load_relin_keys('keys/relin.key')

# Read the data files. 
 
N = 30
X = []
Y = []
# **** add your code here.  You should read the 30 X data and 30 Y data files 
# and put them into the two arrays named X and Y.


# Linear regression formula: m = ∑(x-mean_x)(y-mean_y) 
#                                ----------------------
#                                    ∑(x-mean_x)^2


# In order to find the value of m and c, you first need to calculate the mean of X and Y
# do these calculations in a loop to perform FHE addition and the scalar division.

mean_x = encrypt(0) # starting value
mean_y = encrypt(0) # starting value
#** add a loop here to create a total sum of encrypted x and a total sum of encrypted y.
#** calculate the mean_x and mean_y by dividing each by the unencrypted value of N

# Use the linear formula to calculate the numerator and the denominator
numerator = 0
denominator = 0
for i in range(N):
    numerator += (X[i] - mean_x) * (Y[i] - mean_y)
    denominator += (X[i] - mean_x) ** 2
    
# We cannot finish the calculation of m because encrypted division is not supported:
#     m = numerator / denominator
 
# Instead, send intermediate data back to the client for post-processing

numerator.save('outputs/numerator')
denominator.save('outputs/denominator')
mean_x.save('outputs/mean_x')
mean_y.save('outputs/mean_y')
 
! ls outputs

In [None]:
# step 4:  client decrypts data, does the post-processing, plots the data, 
# then calculates the head size needed that would be big enough for this particular brain!!!
# 
# Note: You still have the unencrypted data values in arrays X and Y for plotting the data

%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt

from simplefhe import (
    initialize,
    decrypt, load_encrypted_value,
    load_private_key, load_relin_keys
)

# Initialization and keys
initialize('float')
load_private_key('keys/private.key')
load_relin_keys('keys/relin.key')

# **** decrypt the two files; add your code here
numerator =  # add your code
denominator = # add your code
mean_x = # add your code
mean_y = # add your code

print (numerator, denominator)
print (mean_x, mean_y)

# Postprocessing: get the values for the linear equation y = mx + c

m = numerator / denominator
c = mean_y - (m * mean_x)
print (m,c)

# Plot the data and the linear regression line

plt.rcParams['figure.figsize'] = (20.0, 10.0)

# Plotting Values and Regression Line
max_x = np.max(head_size) + 100
min_x = np.min(head_size) - 100
# Calculating line values x and y
x = np.linspace(min_x, max_x, 1000)
y = c + m * x 
 
# Plotting Line
plt.plot(x, y, color='#52b920', label='Regression Line')
# Ploting Scatter Points
plt.scatter(head_size, brain_weight, c='#ef4423', label='Scatter Plot')
 
plt.xlabel('Head Size in cm3')
plt.ylabel('Brain Weight in grams')
plt.legend()
plt.show()

# Now calculate just how big a head is need to fit this very nice brain!

print ("the linear formula is: weight = ",m," * head_size + ", c)
print ("inverting this gives: head_size = (weight - ", c, ") / ", m)
print ("The brain weights 1820 grams")
print ("You need a head size of", (1820-c)/m, "cubic centimeters")

### Conclusion to our exciting parody

You go into a panic!  How will I ever find a head that big???!??!?!?!?!

Igor says, "no problem, we can use a politician".... and immediately books a flight to Washington DC.

The end.

----
<a id='submittal'></a>
## What to submit to CANVAS

You must submit a copy of the cells for steps 3 and 4 showing your additions to the code, as well as the value of the final calculated head size.

<a id='rubric'></a>
## Grading Rubric

Total points: 50

* Correct code: 30 points 
* Correct head size value: 20 points