# Logistic Regression Inference for Fraud Detection Using FHE
expected memory usage: 900MB.  
expected runtime: 10 seconds.

## Introduction

This example is very similar to the previous Neural Network example in Notebook 02; it demonstrates how a logistic regression (LR) machine learning model trained on a trusted environment can be encrypted and used with encrypted data to carry out predictions in a less trusted public environment (cloud) with results being encrypted. These encrypted results can then be sent back to the data owner to be decrypted in a trusted environment. The concept of providing fully outsourced, but fully encrypted computation to a cloud provider is a major motivating factor in the field of FHE. This use case example shows the capability of the SDK to build such applications.

**NOTE: while the client and server are not literally separated (nor demonstrating true remote cloud computation), the concepts generalize. One can imagine running the trusted code on local environment and the prediction code on a less trusted environment like the cloud. Additionally, we are working on FHE cloud that simplifies a lot of this.**

#### This demo uses the Credit Card Fraud Detection dataset, originally taken from: https://www.kaggle.com/mlg-ulb/creditcardfraud
This dataset contains actual anonymized transactions made by credit card holders from September 2013 and is labeled for transactions being fraudulent or genuine. See references at the bottom of the page.

## Use case

Global credit card fraud is expected to reach $35B by 2025 (Nilson Report, 2020) and since the beginning of the COVID-19 pandemic, 40% of financial services firms saw an increase in fraudulent activity (LIMRA, 2020). As well as volume effects, COVID-19 has worsened the false positive issue for over two-thirds of institutions (69%). A key challenge for many institutions is that significant changes in consumer behavior have often resulted in existing fraud detection systems wrongly identifying legitimate behavior as suspected fraud (Omdia, 2021).

With FHE, you are now able to unlock the value of regulated and sensitive PII data in the context of a less trusted cloud environment by performing AI, machine learning, and data analytic computations without ever having to decrypt. By training your AI models with additional sensitive data, you are able to achieve higher accuracy in fraud detection and reduce the false positive rate while also utilizing the many benefits of cloud computing.

FHE can also help to support a zero trust strategy and can implement strong access control measures by keeping the data, the models that process the data and the results generated encrypted and confidential; only the data owner has access to the private key and has the privilege to decrypt the results.

This demo uses SEAL backend since release 1.5.5

<br>

## Step 1. Load the existing model and dataset in the trusted environment and encrypt them
#### 1.1. Preliminary setup

In this step we are loading a LR model and a dataset while operating in a trusted client environment, which correspond to a credit card fraud dataset.


For convenience, the model has been pre-trained and is available in the `../data/lr_fraud` folder. But you can also experiment with model you generate yourself. To do that, first run the notebook at `data_gen/fraud_detection_lr_demo.ipynb`, then turn off the boolean flag below, and continue with this notebook.

In [None]:
import pyhelayers
import utils 
from pathlib import Path
import utils

utils.verify_memory()

load_from_pre_prepared = True

if load_from_pre_prepared:
    INPUT_DIR = Path(utils.get_data_sets_dir()) / 'lr_fraud'
else:
    INPUT_DIR = Path('data/lr_fraud/')

X_H5 = INPUT_DIR / 'x_test.h5'
Y_H5 = INPUT_DIR / 'y_test.h5'
MODEL_JSON = str(INPUT_DIR / 'model.json')

batch_size = 8192
plain_samples, labels = utils.extract_batch_from_files(X_H5, Y_H5, batch_size, 0)

print(plain_samples.shape)
print(plain_samples[0])

#### 1.2. Encrypt the model in a trusted environment

In a similar manner to the previous example.

In [None]:
he_run_req = pyhelayers.HeRunRequirements()
# Request a SEAL context
he_run_req.set_he_context_options([pyhelayers.HeContext.create(["SEAL_CKKS"])])
he_run_req.optimize_for_batch_size(batch_size)

client_lr = pyhelayers.LogisticRegression()
client_lr.encode_encrypt([MODEL_JSON], he_run_req)
client_context = client_lr.get_created_he_context()

#### 1.3. Encrypt the data in a trusted environment

Also in a similar manner to the previous example, the plaintext samples are encrypted:

In [None]:
modelIoEncoder = pyhelayers.ModelIoEncoder(client_lr)

client_samples = pyhelayers.EncryptedData(client_context)
modelIoEncoder.encode_encrypt(client_samples, [plain_samples])
print('predication data has been encrypted.')

#### 1.4. Save and send
In this notebook we demonstrate the serialization API. We save the encrypted model, the context, and the samples in preparation for sending them to the server.

In [None]:
lr_buffer = client_lr.save_to_buffer()
samples_buffer = client_samples.save_to_buffer()

# Save the context. Note that this saves all the HE library information, including the 
# public key, allowing the server to perform HE computations.
# The secret key is not saved here, so the server won't be able to decrypt.
# The secret key is never stored unless explicitly requested by the user using the designated 
# method.
context_buffer = client_context.save_to_buffer()

print('Context, model, and samples saved')

<br>

## Step 2. Perform predictions on a remote server using encrypted data and logistic regression

#### 2.1. Load the model, samples and context in the server

In the server side, we use the previously saved data to prepare the server:

In [None]:
server_context = pyhelayers.load_he_context(context_buffer)
server_lr = pyhelayers.load_he_model(server_context, lr_buffer)
server_samples = pyhelayers.load_encrypted_data(server_context, samples_buffer)

#### 2.2. Perform inference in cloud/server using encrypted data and encrypted LR

We can now run the inference of the encrypted data and encrypted LR to obtain encrypted results. This computation does not use the secret key and acts on completely encrypted values.

**NOTE: the data, the LR model and the results always remain in an encrypted state, even during computation.**

In [None]:
server_predictions = pyhelayers.EncryptedData(server_context)
with utils.elapsed_timer('predict', batch_size) as timer:
    server_lr.predict(server_predictions, server_samples)

predictions_buffer = server_predictions.save_to_buffer()
print('Predictions saved.')

<br>

## Step 3. Decrypt the prediction results in the trusted environment

The encrypted predictions computed by the server (stored at `predictions_buffer`) can now be decrypted and decoded in the client:

In [None]:
# Load the encrypted predictions.
client_predictions = pyhelayers.load_encrypted_data(client_context,predictions_buffer)
print('predictions loaded')

# Decrypt results
plain_predictions = modelIoEncoder.decrypt_decode_output(client_predictions)
print('predictions',plain_predictions)

<br>

## Step 4. Assess the results - precision, recall, F1 score

Finally, we assess the results by comparing the positive and negative classifications with the true labels, also calculating the precision, recall and F1 score.

When running the model in the plain (see `data_gen/fraud_detection_lr_demo.ipynb`), we get the following confusion matrix:  
[[8175 1]  
 [6 &emsp; 10]].  
Comparing the plain results with the confusion matrix reported below shows that the FHE model produces the same results as the plain one.

In [None]:
utils.assess_results(labels, plain_predictions)

In [None]:
print("RAM usage:", utils.get_used_ram(), "MB")

<br>

References:

<sub><sup> 1.	Andrea Dal Pozzolo, Olivier Caelen, Reid A. Johnson and Gianluca Bontempi. Calibrating Probability with Undersampling for Unbalanced Classification. In Symposium on Computational Intelligence and Data Mining (CIDM), IEEE, 2015 </sup></sub>

<sub><sup> 2.	Dal Pozzolo, Andrea; Caelen, Olivier; Le Borgne, Yann-Ael; Waterschoot, Serge; Bontempi, Gianluca. Learned lessons in credit card fraud detection from a practitioner perspective, Expert systems with applications,41,10,4915-4928,2014, Pergamon </sup></sub>

<sub><sup> 3.	Dal Pozzolo, Andrea; Boracchi, Giacomo; Caelen, Olivier; Alippi, Cesare; Bontempi, Gianluca. Credit card fraud detection: a realistic modeling and a novel learning strategy, IEEE transactions on neural networks and learning systems,29,8,3784-3797,2018,IEEE </sup></sub>

<sub><sup> 4.	Dal Pozzolo, Andrea Adaptive Machine learning for credit card fraud detection ULB MLG PhD thesis (supervised by G. Bontempi) </sup></sub>

<sub><sup> 5.	Carcillo, Fabrizio; Dal Pozzolo, Andrea; Le Borgne, Yann-Aël; Caelen, Olivier; Mazzer, Yannis; Bontempi, Gianluca. Scarff: a scalable framework for streaming credit card fraud detection with Spark, Information fusion,41, 182-194,2018,Elsevier </sup></sub>

<sub><sup> 6.	Carcillo, Fabrizio; Le Borgne, Yann-Aël; Caelen, Olivier; Bontempi, Gianluca. Streaming active learning strategies for real-life credit card fraud detection: assessment and visualization, International Journal of Data Science and Analytics, 5,4,285-300,2018,Springer International Publishing </sup></sub>

<sub><sup> 7.	Bertrand Lebichot, Yann-Aël Le Borgne, Liyun He, Frederic Oblé, Gianluca Bontempi Deep-Learning Domain Adaptation Techniques for Credit Cards Fraud Detection, INNSBDDL 2019: Recent Advances in Big Data and Deep Learning, pp 78-88, 2019 </sup></sub>

<sub><sup> 8.	Fabrizio Carcillo, Yann-Aël Le Borgne, Olivier Caelen, Frederic Oblé, Gianluca Bontempi Combining Unsupervised and Supervised Learning in Credit Card Fraud Detection Information Sciences, 2019 </sup></sub>

<sub><sup> 9.	Yann-Aël Le Borgne, Gianluca Bontempi Machine Learning for Credit Card Fraud Detection - Practical Handbook </sup></sub>