# IBM HElayers


![title](img/fhe.jpg)

<br>


## Why FHE?

With business data stored across cloud environments, on a daily basis, we see that it can be exposed to various security risks and vulnerabilities. IBM X-Force Threat Intelligence Index found 8.5 billion records breached in 2019, giving attackers access to more stolen credentials. Securing credentials and access controls is more important than ever.

While encryption allows data to be protected both during transit and storage, the data typically must be decrypted while it is being accessed for computing and business-critical operations – creating the opportunity for potential compromise of privacy and confidentiality controls.

FHE is a more advanced form of encryption, which is designed to close this gap -- by allowing data to remain encrypted even during computation. The mathematics behind FHE are designed so that computations can be performed on encrypted data (ciphertext), without the service behind it needing to "see" that data in order to provide accurate results.

<br>

## Introduction

These tutorials are a set of Jupyter notebooks showcasing different aspects of Fully Homomorphic
Encryption(FHE). All of these notebooks present a guided interactive experience and you
are encouraged to run these notebooks and try out the different use cases. The goal of these tutorials are to 
make the user familiar with FHE along with showcasing how the HElayers library can be used to implement FHE in Python.

The FHE demos use three backends: SEAL, HELib, and HEaaN. For most demos it is easy to switch between different backends, and explore which works best in each case.
<br>
<br>
**Switch to the Table of Contents view to get the best viewing experience for the notebook.**

<br>


## How to Use

If you are unfamiliar with Jupyter notebooks, they are self-contained python applications that create a web-based development environment in the browser.  They integrate code, text, formatting, and output into a single document that can be run in inidividual steps or all together in sequence.  

Each file that ends in `.ipynb` is considered one notebook.  From the menu on the left, choose a notebook to open by double clicking one.  Each notebook consists of different groups of code that are organized into cells.  A cell can be run by highlighting it, and then clicking the `Run` button, which is notated by the Play icon.  Or, you can run the whole notebook, choose `Run All Cells` from the `Run` menu at the top of the notebook. 

Some of the notebooks use large data sets when running so they require more memory than what might be set by default.  Please allocate at least 8gb of memory in the Resources tab.  This can be found by navigating to `Docker -> Preferences -> Resources` .  A few demos require more memory, as noted in the beginning of their notebook (in particular, `05_Deep_neural_networks.ipynb` demo may require up to 150 GB of memory).

More information about how to set resources can be found on the [docker website](https://docs.docker.com/config/containers/resource_constraints/).  Additionally, you can check out the user manual for windows [https://docs.docker.com/desktop/windows/](https://docs.docker.com/desktop/windows/), or mac [https://docs.docker.com/desktop/mac/](https://docs.docker.com/desktop/mac/) to learn how to allocate resources on the platform you are using.


## Table of Contents


#### 1. Basics of FHE [`01_FHE_basics.ipynb`]

This covers the basics of **FHE** and showcases how **FHE** can enable us to do tensor mathematics while keeping the computations fully encrypted. It walks through the HElayers library, showcasing how to set it up correctly, and how to use it to carry out basic operations such as addition, multiplication, and rotation all while keeping the data encrypted. 


#### 2. Neural Network Inference on a Credit Card Fraud Detection Dataset [`02_Neural_network_fraud_detection.ipynb`]

This demonstrates the use case where we are leveraging FHE along with Neural Networks(NN). The notebook has a pre-trained model that is assumed to be trained locally in the user's infrastructure. This notebook walks through the steps of

- Encrypting the pre-trained NN model such that the weights of the NN are now encrypted 
- Encrypting the data to be used for inference, such that they test data is also encrypted
- Carrying out inference using encrypted data and encrypted NN and getting an encrypted result back
- Decrypting the encrypted result back to get the actual result

Its purpose is to show that one can encrypt the NN and the data in their local trusted environment and then leverage the power of cloud, by using the cloud, to carry out inference on encrpyted data and encrypted NN. The computations are carried out on encrypted data and data as well as the weights of the NN model remain encrypted throughout the process. Additionally, the results of the computation are encrypted as well. One can then decrypt these results again in their local trusted environment.

To be able to run this notebook, you first need to follow and run the notebook that generates a trained model on the plain text data. This notebook can be found under `data_gen/fraud_detection_demo.ipynb`.

#### 3. Logistic Regression Inference on a Credit Card Fraud Detection Dataset [`03_Logistic_regression_fraud_detection.ipynb`]

This notebook showcases how you can use FHE with logistic regression. It is done by encrypting a logistic regression based model that was trained in a trusted environment, along with encrypting the data that will be used to carry out the inference. Predictions can then be carried out in a public cloud environment, while data and computations remain encrypted along with the results. The results can then be decrypted in a trusted environment. 

To be able to run this notebook, you first need to follow and run the notebook that generates a trained model on the plain text data. This notebook can be found under `data_gen/fraud_detection_lr_demo.ipynb`.

#### 4. Text Classification [`04_Text_classification.ipynb`]

This tutorial displays text classification under FHE using a neural network.

#### 5. Deep Neural Networks [`05_Deep_neural_networks.ipynb`]

This classifies 224 x 224 pixel RGB images using deep neural netowrks: Alex-Net, Squeeze-Net or Res-Net-18.
<br>
<br>
**NOTE:** This tutorial requires up to 150 GB memory and may not run on all machines.

#### 6. Country / Capital Database Search [`06_Database_search.ipynb`]

This tutorial demostrates encrypted query over an encrypted database.  It uses the BGV scheme, and Fermat's little theorem to compute equality over the modular arithmetic supplied by the scheme.

#### 7. K-means Nearest Neighbor [`07_Kmeans.ipynb`]

This illustrates how to encrypt a set of centroids, and find nearest neighbor under HE.  Given a set of encrypted samples, we compute the distance between each sample and each centroid under encryption. On the client side, the results are decrypted and automatically post-processed to obtain the nearest neighbor.

#### 8. Linear Regression [`08_Linear_regression.ipynb`]

This computes linear regression using an encrypted model and data.

#### 9. Neural Network Inference on the MNIST Dataset [`09_Neural_network_MNIST.ipynb`]

This notebook demonstrates how to build a NN encrypted under FHE, and run inference of encrypted samples from the MNIST dataset.

#### 10. Neural Network Inference on Heart Disease UCI medical Dataset [`10_Neural_network_heart_disease.ipynb`]

This demonstrates how to build a NN encrypted under FHE, and run inference of encrypted samples from a dataset of patients and their medical information.  The purpose is to detect a potential heart attack. 
To be able to run this notebook, you first need to follow and run the notebook that generates a trained model on the plain text. This notebook can be found under  *TODO: add path*
The dataset can be downloaded from: https://www.kaggle.com/ronitf/heart-disease-uci

#### 11. Tile Tensor [`11_Tile_tensor.ipynb`]

Tile tensors are useful tools for handling tensors in a packing oblivious way.

This tutorial has code examples that help show how to encrypt, decrypt and perform basic operations using tile tensors.

See a more extensive tutorial with 3 additional notebooks in subfolder tile_tensors, explaining the tile tensor concept in detail.

#### 12. Logistic Regression Inference on a Credit Card Fraud Detection Dataset [`12_Pyhelayers_ext_Logistic_regression_fraud_detection.ipynb`] 
#### 13. Neural Network Inference on a Credit Card Fraud Detection Dataset [`13_Pyhelayers_ext_Neural_network_fraud_detection.ipynb`]

 These notebooks demonstrate the pyhelayers.ext api, which offers an easy integration with the scikit-learn/keras libraries. It replaces the scikit-learn/keras predictions with the FHE implementation. The FHE configuration details are taked from fhe.json configuration file.
 This config file contains FHE parameters that the user can tune (e.g. batch size, security level, etc.).

#### 14. NN inference for detecting Covid19 from CT image [`14_COVID_inference.ipynb`]
This tutorial demonstrates a NN with 3 convolutional and 2 fully-connected layers for detecting Covid19 from CT image with non-trivial size of 224x224x3 in a secure way.

#### 15. Logistic regression training over encrypted data[`15_Logistic_regression_fraud_detection_training.ipynb`]
This tutorial demo the training of logistic regression with encrypted samples from Kaggle's creditcardfraud dataset.

#### 16. Completely Random Forest training over encrypted data [`16_Complete_random_forest_training.ipynb`]
This tutorial demo the training process of a Completely Random Forest model with encrypted samples from UCI's Adult dataset.

#### 17. Entity Resolution [`17_Entity_resolution.ipynb`]
This tutorial demonstrates the Entity Resolution process using a Privacy Preserving Record Linkage (PPRL) Protocol between two parties.

#### 18. ARIMA training and prediction on encrypted data [`18_ARIMA.ipynb`]
This tutorial demonstrates an ARIMA model training and prediction on encrypted data.

#### 19. MLToolbox demonstration [`19_MLToolbox.ipynb`]
This tutorial demonstrates how MLToolbox can be used to convert a NN into an FHE-Friendly model with nearly the same performance, that can later be encrypted and used for prediction on encrypted data.

#### 20. XGBoost prediction on encrypted data [`20_XGBoost_prediction.ipynb`]
This tutorial demonstrates how to perform prediction over encrypted data with an XGBoost model.

#### 21. One-Hot demonstration [`21_One_hot_encoding.ipynb`]
This tutorial demonstrates calculation of one-hot encoding under homomorphic encryption.

#### 22. Private Set Intersection for Vertical Federated Learning demo [`22_PSI_federated_learning.ipynb`]
This tutorial demonstrates a Private Set Intersection process between three parties to be used for Vertical Federated Learning.

#### 23. Multi-Party FHE demo [`23_Multi_Party_FHE.ipynb`]
This tutorial demonstrates a use of a multi-party FHE setting.