# Homomorphic Encrypted LeNet-1
This notebook will show a very practical example of running the famous LeNet-1 DL model directly on encrypted data.

![scheme](HE_processing.png)

## Homomorphic encryption operations
First of all, we will look at Pyfhel, a Python library which wraps SEAL, one of the most used frameworks for HE.
Pyfhel supports the BFV scheme, so, it is the one that we will use.

In [1]:
from Pyfhel import Pyfhel, PyPtxt, PyCtxt

HE = Pyfhel()
HE.contextGen(p=65537, m=4096)
HE.keyGen()

print(HE)

a = 127.15717263
b = -2.128965182
ctxt1 = HE.encryptFrac(a)
ctxt2 = HE.encryptFrac(b)

ctxtSum = ctxt1 + ctxt2
ctxtSub = ctxt1 - ctxt2
ctxtMul = ctxt1 * ctxt2

resSum = HE.decryptFrac(ctxtSum)
resSub = HE.decryptFrac(ctxtSub) 
resMul = HE.decryptFrac(ctxtMul)

print(f"Expected sum: {a+b}, decrypted sum: {resSum}")
print(f"Expected sub: {a-b}, decrypted sum: {resSub}")
print(f"Expected mul: {a*b}, decrypted sum: {resMul}")

<Pyfhel obj at 0x7f0de4174b50, [pk:Y, sk:Y, rtk:-, rlk:-, contx(p=65537, m=4096, base=2, sec=128, dig=64i.32f, batch=False)]>
Expected sum: 125.028207448, decrypted sum: 125.02820744784549
Expected sub: 129.286137812, decrypted sum: 129.28613781183958
Expected mul: -270.7131931708334, decrypted sum: -270.7131931686308


In [2]:
m1 = HE.encryptFrac(a)
print(HE.noiseLevel(m1))

m2 = HE.encodeFrac(2)

82


In [3]:
print(HE.noiseLevel(m1+m1))
print(HE.noiseLevel(m1*m1))
print(HE.noiseLevel(m1+m2))
print(HE.noiseLevel(m1*m2))

81
54
82
82


Before starting, let's note that:
  1. We will use the fractional encoder to encode (and encrypt) the values in our examples. BFV was born for integers, so, CKKS should be used if the use case involves fractional values. However it is a more complex scheme, and for this example BFV is sufficient.
  2. We will not use batching (also called *packing*). While batching can greatly speed up the computations, it introduces limitations which make the encrypted ML much more complex. For this example, we will encrypt/encode each number with a polynomial.

## LeNet-1
The LeNet-1 is a small CNN developed by LeCun et al. It is composed of 5 layers: a convolutional layer with 4 kernels of size 5x5 and tanh activation, an average pooling layer with kernel of size 2, another convolutional layer with 16 kernels of size 5x5 and tanh activation, another average pooling layer with kernel of size 2, and a fully connected layers with size 192x10. 

The highest value in the output tensor corresponds to the label LeNet-1 associated to the input image. 

For this tutorial we will use the MNIST dataset.

In [4]:
import torch
import torchvision
import torchvision.transforms as transforms
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

import numpy as np

In [5]:
device = 'cuda' if torch.cuda.is_available() else 'cpu'

In [6]:
transform = transforms.ToTensor()

train_set = torchvision.datasets.MNIST(
    root = './data',
    train=True,
    download=True,
    transform=transform
)

test_set = torchvision.datasets.MNIST(
    root = './data',
    train=False,
    download=True,
    transform=transform
)

train_loader = torch.utils.data.DataLoader(
    train_set,
    batch_size=50,
    shuffle=True
)

test_loader = torch.utils.data.DataLoader(
    test_set,
    batch_size=50,
    shuffle=True
)

In [7]:
def get_num_correct(preds, labels):
    return preds.argmax(dim=1).eq(labels).sum().item()

def train_net(network, epochs, device):
    optimizer = optim.Adam(network.parameters(), lr=0.001)
    for epoch in range(epochs):

        total_loss = 0
        total_correct = 0

        for batch in train_loader: # Get Batch
            images, labels = batch 
            images, labels = images.to(device), labels.to(device)

            preds = network(images) # Pass Batch
            loss = F.cross_entropy(preds, labels) # Calculate Loss

            optimizer.zero_grad()
            loss.backward() # Calculate Gradients
            optimizer.step() # Update Weights

            total_loss += loss.item()
            total_correct += get_num_correct(preds, labels)

        
def test_net(network, device):
    network.eval()
    total_loss = 0
    total_correct = 0
    
    with torch.no_grad():
        for batch in test_loader: # Get Batch
            images, labels = batch 
            images, labels = images.to(device), labels.to(device)

            preds = network(images) # Pass Batch
            loss = F.cross_entropy(preds, labels) # Calculate Loss

            total_loss += loss.item()
            total_correct += get_num_correct(preds, labels)

        accuracy = round(100. * (total_correct / len(test_loader.dataset)), 4)

    return total_correct / len(test_loader.dataset)

In [8]:
train = True # If set to false, it will load models previously trained and saved.

In [9]:
experiments = 1

In [10]:
if train:
    accuracies = []
    for i in range(0, experiments):
        LeNet1 = nn.Sequential(
            nn.Conv2d(1, 4, kernel_size=5),
            nn.Tanh(),
            nn.AvgPool2d(kernel_size=2),

            nn.Conv2d(4, 12, kernel_size=5),
            nn.Tanh(),
            nn.AvgPool2d(kernel_size=2),

            nn.Flatten(),

            nn.Linear(192, 10),
        )
        
        LeNet1.to(device)
        train_net(LeNet1, 15, device)
        acc = test_net(LeNet1, device)
        accuracies.append(acc)
        
    torch.save(LeNet1, "LeNet1.pt")
else:
    LeNet1 = torch.load("LeNet1.pt")
    LeNet1.eval()
    LeNet1.to(device)

In [11]:
m = np.array(accuracies)
print(f"Mean accuracy on test set: {np.mean(m)}")
print(f"Var: {np.var(m)}")

Mean accuracy on test set: 0.9882
Var: 0.0


## Approximating
As we know, there are some operations that cannot be performed homomorphically on encrypted values. Most notably, these operations are division and comparison. It is possible to perform only linear functions.

Consequently, in the LeNet-1 scheme we used, we can not use `tanh()`. This is because we cannot apply its non-linearities.


One of the most common approach is to replace it with a simple polynomial function, for example a square layer (which simply performs $x \rightarrow x^2$).

We define the model with all the non-linearities removed **approximated**. This model can be re-trained, and it will be ready to be used on encrypted values.

In [52]:
class Square(nn.Module):
    def __init__(self):
        super().__init__()

    def forward(self, t):
        return torch.pow(t, 2)

LeNet1_Approx = nn.Sequential(
    nn.Conv2d(1, 4, kernel_size=5),
    Square(),
    nn.AvgPool2d(kernel_size=2),
            
    nn.Conv2d(4, 12, kernel_size=5),
    Square(),
    nn.AvgPool2d(kernel_size=2),
    
    nn.Flatten(),
    
    nn.Linear(192, 10),
)

In [72]:
if train:
    approx_accuracies = []
    for i in range(0, experiments):
        LeNet1_Approx = nn.Sequential(
            nn.Conv2d(1, 4, kernel_size=5),
            Square(),
            nn.AvgPool2d(kernel_size=2),

            nn.Conv2d(4, 12, kernel_size=5),
            Square(),
            nn.AvgPool2d(kernel_size=2),

            nn.Flatten(),

            nn.Linear(192, 10),
        )
        
        LeNet1_Approx.to(device)
        train_net(LeNet1_Approx, 15, device)
        acc = test_net(LeNet1_Approx, device)
        approx_accuracies.append(acc)
        
    torch.save(LeNet1, "LeNet1_Approx.pt")

else:
    LeNet1_Approx = torch.load("LeNet1_Approx.pt")
    LeNet1_Approx.eval()
    LeNet1_Approx.to(device)

In [73]:
m = np.array(approx_accuracies)
print(f"Mean: {np.mean(m)}")
print(f"Var: {np.var(m)}")

Mean: 0.9889
Var: 0.0


We can see that replacing `tanh()` with `square()` did not impact the accuracy of the model dramatically. Usually this is not the case, and approximating DL models may worsen the performance badly. This is one of the challenges that HE-ML will have to consider: the creation of DL models keeping in mind the HE constraints from the beginning.

In any case, now the network is HE-compatible.

## Encoding
From the applicative point of view, we have two options on how we want our Torch model to run on encrypted values:
  1. Modify Torch layers code in order to be fully compatible with arrays of Pyfhel ciphertexts/encoded values;
  2. Create the code for the general blocks of LeNet-1 (convolutional layer, linear layer, square layer, flatten...)
  
We opt for the second path, having already done this in our previous work: https://github.com/AlexMV12/PyCrCNN

Let's remember that, in order to be used with the encrypted values, also the weights of the models will have to be **encoded**. This means that each value in the weights of each layer will be encoded in a polynomial.

First, we define some useful functions to encrypt/encode matrices:

In [74]:
def encode_matrix(HE, matrix):
    try:
        return np.array(list(map(HE.encodeFrac, matrix)))
    except TypeError:
        return np.array([encode_matrix(HE, m) for m in matrix])


def decode_matrix(HE, matrix):
    try:
        return np.array(list(map(HE.decodeFrac, matrix)))
    except TypeError:
        return np.array([decode_matrix(HE, m) for m in matrix])


def encrypt_matrix(HE, matrix):
    try:
        return np.array(list(map(HE.encryptFrac, matrix)))
    except TypeError:
        return np.array([encrypt_matrix(HE, m) for m in matrix])


def decrypt_matrix(HE, matrix):
    try:
        return np.array(list(map(HE.decryptFrac, matrix)))
    except TypeError:
        return np.array([decrypt_matrix(HE, m) for m in matrix])

Then, the actual code for the convolutional, linear, square, flatten and average pooling layer is required:

In [75]:
class ConvolutionalLayer:
    def __init__(self, HE, weights, stride=(1, 1), padding=(0, 0), bias=None):
        self.HE = HE
        self.weights = encode_matrix(HE, weights)
        self.stride = stride
        self.padding = padding
        self.bias = bias
        if bias is not None:
            self.bias = encode_matrix(HE, bias)

    def __call__(self, t):
        t = apply_padding(t, self.padding)
        result = np.array([[np.sum([convolute2d(image_layer, filter_layer, self.stride)
                                    for image_layer, filter_layer in zip(image, _filter)], axis=0)
                            for _filter in self.weights]
                           for image in t])

        if self.bias is not None:
            return np.array([[layer + bias for layer, bias in zip(image, self.bias)] for image in result])
        else:
            return result


def convolute2d(image, filter_matrix, stride):
    x_d = len(image[0])
    y_d = len(image)
    x_f = len(filter_matrix[0])
    y_f = len(filter_matrix)

    y_stride = stride[0]
    x_stride = stride[1]

    x_o = ((x_d - x_f) // x_stride) + 1
    y_o = ((y_d - y_f) // y_stride) + 1

    def get_submatrix(matrix, x, y):
        index_row = y * y_stride
        index_column = x * x_stride
        return matrix[index_row: index_row + y_f, index_column: index_column + x_f]

    return np.array(
        [[np.sum(get_submatrix(image, x, y) * filter_matrix) for x in range(0, x_o)] for y in range(0, y_o)])

def apply_padding(t, padding):
    y_p = padding[0]
    x_p = padding[1]
    zero = t[0][0][y_p+1][x_p+1] - t[0][0][y_p+1][x_p+1]
    return [[np.pad(mat, ((y_p, y_p), (x_p, x_p)), 'constant', constant_values=zero) for mat in layer] for layer in t]

In [76]:
class LinearLayer:
    def __init__(self, HE, weights, bias=None):
        self.HE = HE
        self.weights = encode_matrix(HE, weights)
        self.bias = bias
        if bias is not None:
            self.bias = encode_matrix(HE, bias)

    def __call__(self, t):
        result = np.array([[np.sum(image * row) for row in self.weights] for image in t])
        if self.bias is not None:
            result = np.array([row + self.bias for row in result])
        return result

In [77]:
class SquareLayer:
    def __init__(self, HE):
        self.HE = HE

    def __call__(self, image):
        return square(self.HE, image)


def square(HE, image):
    try:
        return np.array(list(map(lambda x: HE.power(x, 2), image)))
    except TypeError:
        return np.array([square(HE, m) for m in image])

In [78]:
class FlattenLayer:
    def __call__(self, image):
        dimension = image.shape
        return image.reshape(dimension[0], dimension[1]*dimension[2]*dimension[3])

In [79]:
class AveragePoolLayer:
    def __init__(self, HE, kernel_size, stride=(1, 1), padding=(0, 0)):
        self.HE = HE
        self.kernel_size = kernel_size
        self.stride = stride
        self.padding = padding

    def __call__(self, t):
        t = apply_padding(t, self.padding)
        return np.array([[_avg(self.HE, layer, self.kernel_size, self.stride) for layer in image] for image in t])


def _avg(HE, image, kernel_size, stride):
    x_s = stride[1]
    y_s = stride[0]

    x_k = kernel_size[1]
    y_k = kernel_size[0]

    x_d = len(image[0])
    y_d = len(image)

    x_o = ((x_d - x_k) // x_s) + 1
    y_o = ((y_d - y_k) // y_s) + 1

    denominator = HE.encodeFrac(1 / (x_k * y_k))

    def get_submatrix(matrix, x, y):
        index_row = y * y_s
        index_column = x * x_s
        return matrix[index_row: index_row + y_k, index_column: index_column + x_k]

    return [[np.sum(get_submatrix(image, x, y)) * denominator for x in range(0, x_o)] for y in range(0, y_o)]

We can now define a function to "convert" a PyTorch model to a list of sequential HE-ready-to-be-used layers:

In [80]:
def build_from_pytorch(HE, net):
    # Define builders for every possible layer

    def conv_layer(layer):
        if layer.bias is None:
            bias = None
        else:
            bias = layer.bias.detach().numpy()

        return ConvolutionalLayer(HE, weights=layer.weight.detach().numpy(),
                                  stride=layer.stride,
                                  padding=layer.padding,
                                  bias=bias)

    def lin_layer(layer):
        if layer.bias is None:
            bias = None
        else:
            bias = layer.bias.detach().numpy()
        return LinearLayer(HE, layer.weight.detach().numpy(),
                           bias)

    def avg_pool_layer(layer):
        # This proxy is required because in PyTorch an AvgPool2d can have kernel_size, stride and padding either of
        # type (int, int) or int, unlike in Conv2d
        kernel_size = (layer.kernel_size, layer.kernel_size) if isinstance(layer.kernel_size, int) else layer.kernel_size
        stride = (layer.stride, layer.stride) if isinstance(layer.stride, int) else layer.stride
        padding = (layer.padding, layer.padding) if isinstance(layer.padding, int) else layer.padding

        return AveragePoolLayer(HE, kernel_size, stride, padding)

    def flatten_layer(layer):
        return FlattenLayer()

    def square_layer(layer):
        return SquareLayer(HE)

    # Maps every PyTorch layer type to the correct builder
    options = {"Conv": conv_layer,
               "Line": lin_layer,
               "Flat": flatten_layer,
               "AvgP": avg_pool_layer,
               "Squa": square_layer
               }

    encoded_layers = [options[str(layer)[0:4]](layer) for layer in net]
    return encoded_layers

## Encrypted processing

Let's list the activities that we will now do:
  1. Create a HE context, specifiying the encryption parameters `m` (polynomial modulus degree) and `p` (plaintext modulus). Let's remember that `q` will be chosen automatically in order to guarantee a 128-bit RSA equivalent security;
  2. Convert our Torch approximated model to a list of layers able to work on matrices of encrypted values. The weights will be encoded;
  3. Encrypt an image from our testing set;
  4. Verify that the final classification result is correct.

If we look at our model, we can see that we have two **square layers**: these are the layers which have more impact on our noise!
Two square layers corresponds to two ciphertext-ciphertext multiplications. Let's see if $m=4096$ gives us enough room to perform 2 encrypted multiplications.

We can define a function which, after receiving $n$ and $p$ tries to encrypt and image and forward it to our approximated model (suitable encoded). This will let us see the homomorphic encryption in function.

In [96]:
import time

def test_parameters(n, p, model):
    HE = Pyfhel()
    HE.contextGen(p=p, m=n) # what Pyfhel calls m, we call n.
    HE.keyGen()
    relinKeySize=3
    HE.relinKeyGen(bitCount=2, size=relinKeySize)
    
    images, labels = next(iter(test_loader))

    sample_image = images[0]
    sample_label = labels[0]
    
    model.to("cpu")
    model_encoded = build_from_pytorch(HE, model)
    
    with torch.no_grad():
        expected_output = model(sample_image.unsqueeze(0))
    
    encrypted_image = encrypt_matrix(HE, sample_image.unsqueeze(0).numpy())
    
    start_time = time.time()
    for layer in model_encoded:
        encrypted_image = layer(encrypted_image)
        print(f"Passed layer {layer}...")
    
    requested_time = round(time.time() - start_time, 2)
    
    result = decrypt_matrix(HE, encrypted_image)
    difference = expected_output.numpy() - result
    
    print(f"\nThe encrypted processing of one image requested {requested_time} seconds.")
    print(f"\nThe expected result was:")
    print(expected_output)
    
    print(f"\nThe actual result is: ")
    print(result)
    
    print(f"\nThe error is:")
    print(difference)    

Let's try with $n=4096$ and $p=65536$ on our approximated model.

In [97]:
test_parameters(4096, 65536, LeNet1_Approx)

Passed layer <__main__.ConvolutionalLayer object at 0x7f0d3d651820>...
Passed layer <__main__.SquareLayer object at 0x7f0d3f110850>...
Passed layer <__main__.AveragePoolLayer object at 0x7f0d3f110ee0>...
Passed layer <__main__.ConvolutionalLayer object at 0x7f0d3f110ca0>...
Passed layer <__main__.SquareLayer object at 0x7f0d3f110eb0>...
Passed layer <__main__.AveragePoolLayer object at 0x7f0d3f1101c0>...
Passed layer <__main__.FlattenLayer object at 0x7f0d3f110880>...
Passed layer <__main__.LinearLayer object at 0x7f0d3f110310>...

The encrypted processing of one image requested 148.9 seconds.

The expected result was:
tensor([[ -8.9517,   2.1390,  -5.7292, -15.0571,  -6.9279, -10.5527,  -8.9003,
          -3.8584,  -4.2362, -10.4316]])

The actual result is: 
[[-0.46503287  0.21019887  0.0091486  -1.42267955 -0.37975147 -0.45589089
  -0.00624331 -0.05148689 -0.63904862 -1.1372307 ]]

The error is:
[[ -8.48666639   1.92882038  -5.73832035 -13.63442551  -6.54817841
  -10.09681296  -8.89

Unfortunately, the NB is not sufficient and the decryption fails.
We could try decrementing $p$, in order to reduce the NB consumption. However, the actual value is already quite low so it is difficult that it will change something.
We could try incrementing $n$ to 8192, but it would result in a huge overhead in terms of memory and time.

For now, it is simpler to remove a square layer from the DL model (usually removing the last one is better). This will result in a less NB demanding processing, allowing us to use $n=4096$.


**Note that this does not happen on - every - image, but test a few and you will see that it happens often, making the computation unreliable. It means that we are very near the limit of NB. However, even if decryption does not fail, the results are quite unaccurate.**

In [98]:
LeNet1_Approx_singlesquare = nn.Sequential(
    nn.Conv2d(1, 4, kernel_size=5),
    Square(),
    nn.AvgPool2d(kernel_size=2),

    nn.Conv2d(4, 12, kernel_size=5),
#     Square(),
    nn.AvgPool2d(kernel_size=2),

    nn.Flatten(),

    nn.Linear(192, 10),
)

LeNet1_Approx_singlesquare.to(device)
train_net(LeNet1_Approx_singlesquare, 15, device)
acc = test_net(LeNet1_Approx_singlesquare, device)
print(f"Accuracy on test set (single square layer): {acc}")

Accuracy on test set (single square layer): 0.9827


Let's try again.

In [99]:
test_parameters(4096, 65536, LeNet1_Approx_singlesquare)

Passed layer <__main__.ConvolutionalLayer object at 0x7f0d3d6513d0>...
Passed layer <__main__.SquareLayer object at 0x7f0d3d651760>...
Passed layer <__main__.AveragePoolLayer object at 0x7f0d3d651ca0>...
Passed layer <__main__.ConvolutionalLayer object at 0x7f0d58654ca0>...
Passed layer <__main__.AveragePoolLayer object at 0x7f0d586545b0>...
Passed layer <__main__.FlattenLayer object at 0x7f0d58654cd0>...
Passed layer <__main__.LinearLayer object at 0x7f0d58654610>...

The encrypted processing of one image requested 135.65 seconds.

The expected result was:
tensor([[ -2.3643, -10.4025,  -9.4854,   8.1968, -13.4863,  23.9925,  -2.2109,
          -9.5026,   3.1787,   6.5724]])

The actual result is: 
[[-0.12954384  0.54565309  0.54853017 -0.28500836 -1.45542385  0.78199623
  -0.38797344  1.00632607 -0.23414565 -0.98088525]]

The error is:
[[ -2.23472518 -10.94810842 -10.03392465   8.48180477 -12.03084037
   23.21052506  -1.82297359 -10.50893819   3.41285397   7.55328582]]


Now the decryption works! However, the final result are similar to the expected one, but with a discrepancy.
The "perfect" value for $p$ can be found with a trial and error process. For our purposes, it has been set to 953983721.

In [100]:
test_parameters(4096, 953983721, LeNet1_Approx_singlesquare)

Passed layer <__main__.ConvolutionalLayer object at 0x7f0d58648fa0>...
Passed layer <__main__.SquareLayer object at 0x7f0d58654610>...
Passed layer <__main__.AveragePoolLayer object at 0x7f0d586544f0>...
Passed layer <__main__.ConvolutionalLayer object at 0x7f0d3d6518b0>...
Passed layer <__main__.AveragePoolLayer object at 0x7f0d3d651730>...
Passed layer <__main__.FlattenLayer object at 0x7f0d3d651c70>...
Passed layer <__main__.LinearLayer object at 0x7f0d3d651760>...

The encrypted processing of one image requested 136.9 seconds.

The expected result was:
tensor([[ 14.9175,  -9.0088,  -4.3928,   3.4706,  -8.4480,   4.0986,   7.3694,
         -12.8007,   1.6725,   0.8914]])

The actual result is: 
[[ 14.88360887  -9.00425966  -4.39821204   3.46483131  -8.40817041
    4.07068493   7.36104876 -12.79374951   1.67553707   0.89423412]]

The error is:
[[ 0.03386778 -0.00449126  0.00543603  0.00572877 -0.03981619  0.02789209
   0.00833744 -0.00692779 -0.00305124 -0.00278729]]


We are now happy with the precision of the result, which should now guarantee that the final accuracy of the encrypted processing on the whole test set is equal (or, at least, very similar) to the same obtained on unencrypted data.

### Computational load
Obviously, we cannot ignore the huge computational overhead generated by the encrypted processing.

In fact, the processing of one image took about ~2min on a common desktop machine.
The computation has not been parallelized; so, it used only one thread.

While parallelizing allows to speed up the computation, also the occupied memory is a concern: the processing of this image occupied ~700MB of RAM.