# Logistic Regression Inference for Fraud Detection Using FHE (simple)

Expected RAM usage: less than 1 GB.  
Expected runtime: less than 1 minute.

## Introduction

This notebook is very similar to the previous Logistic Regression Fraud Detection example in Notebook 03, but uses a single line of FHE code and is much more simple. Unlike Notebook 03, you do not have to call the Optimizer, specify different parameters or encode and encrypt; you only call a single FHE command using the pyhelayers extension API. 

This example demonstrates the *pyhelayersext API*, which offers an easy integration with the scikit-learn library and replaces the scikit-learn predictions with the FHE implementation. The FHE configuration details are taken from fhe.json configuration file. This config file contains FHE parameters that the user can tune (e.g. batch size, security level, etc.).

This demo uses the Credit Card Fraud Detection dataset, originally taken from: https://www.kaggle.com/mlg-ulb/creditcardfraud
This dataset contains actual anonymized transactions made by credit card holders from September 2013 and is labeled for transactions being fraudulent or genuine. See references at the bottom of the page.

## Step 1. Train a Logistic Regression model using standard practices

In this step we'll train a Logistic Regression model using a standard ML package: sklearn
### 1.1. Imports

In [None]:
import os
import warnings
import utils 

utils.verify_memory()

warnings.filterwarnings("ignore")
##### For reproducibility
seed_value= 1
os.environ['PYTHONHASHSEED']=str(seed_value)
import random
random.seed(seed_value)
#####
import h5py
import pandas as pd 
from sklearn.model_selection import train_test_split
from sklearn import metrics
import sklearn_json as skljson
from sklearn.linear_model import LogisticRegression
#####
# import utils
import sys
path_to_utils='.'
sys.path.append(path_to_utils)
import utils

# Disable tensorflow warnings
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'

PATH = os.path.join(utils.get_data_sets_dir(), 'lr_fraud')

### 1.2. Read the data set

In [None]:
df = pd.read_csv(os.path.join(utils.get_data_sets_dir(path_to_utils), 'net_fraud', 'creditcard.csv'))

print(f'Reading {df.shape[0]} samples')

X = df.loc[:, df.columns.tolist()[1:30]].values
Y = df.loc[:, 'Class'].values
print(f'number of features: {X.shape[1]}')

x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=0.33, stratify=Y, random_state=0)

y_train = y_train.reshape(y_train.shape[0], -1)
y_test = y_test.reshape(y_test.shape[0], -1)

print("Training data ready")

### 1.3. Train the logistic regression model

In [None]:
lr = LogisticRegression(C=0.1)
lr.fit(x_train, y_train)

print('LR model ready')

## Step 2. Test the trained model

In this step we'll test the model we have trained.

We are still using the standard sklearn package and performance metrics. However, we are using an import statement that will cause the usual predict method to run under encryption.

This allows a data scientist to easily test the performance of AI models under encryption.

### 2.1. Create a batch of 8192 test samples 

In [None]:
batch_size = 8192
batch_x_test = x_test[0:batch_size,:]
batch_y_test = y_test[0:batch_size,:]

Confusion Matrix - TEST
After the replacement below instead of regular scikit-learn code runs code predicting on encrypted data 

### 2.2. Run prediction under encryption
The first line below is the only line of code that deals with FHE!  

It adds the pyhelayers extension, replacing the usual predict method with a predict that runs under encryption. You'll notice it's somewhat slower.

In [None]:
# Replace predict with FHE version
lr.predict = __import__('pyhelayers.ext').ext.replace(lr.predict, config_file='./fhe.json')

batch_y_pred = lr.predict(batch_x_test)

### 2.3. Assess the results

We now assess the results we got. The expected confusion matrix should be similar to  
[[8175 1]  
 [6 &emsp; 10]].  

Obtaining this result means the predict above running under FHE worked the same as ordinary predict in the plain.

In [None]:
f,t,thresholds = metrics.roc_curve(batch_y_test, batch_y_pred)
cm = metrics.confusion_matrix(batch_y_test, batch_y_pred)
print(f"AUC Score: {metrics.auc(f,t):.3f}")
print("Classification report:")
print(metrics.classification_report(batch_y_test, batch_y_pred))
print("Confusion Matrix:")
print(cm)

# Test we didn't get more than 10 misclassified samples
if (cm[1][0]+cm[0][1]>10):
    raise Exception("Failed to obtain the expected confusion matrix")
print('Prediction under FHE succeeded')

<br>

References:

<sub><sup> 1.	Andrea Dal Pozzolo, Olivier Caelen, Reid A. Johnson and Gianluca Bontempi. Calibrating Probability with Undersampling for Unbalanced Classification. In Symposium on Computational Intelligence and Data Mining (CIDM), IEEE, 2015 </sup></sub>

<sub><sup> 2.	Dal Pozzolo, Andrea; Caelen, Olivier; Le Borgne, Yann-Ael; Waterschoot, Serge; Bontempi, Gianluca. Learned lessons in credit card fraud detection from a practitioner perspective, Expert systems with applications,41,10,4915-4928,2014, Pergamon </sup></sub>

<sub><sup> 3.	Dal Pozzolo, Andrea; Boracchi, Giacomo; Caelen, Olivier; Alippi, Cesare; Bontempi, Gianluca. Credit card fraud detection: a realistic modeling and a novel learning strategy, IEEE transactions on neural networks and learning systems,29,8,3784-3797,2018,IEEE </sup></sub>

<sub><sup> 4.	Dal Pozzolo, Andrea Adaptive Machine learning for credit card fraud detection ULB MLG PhD thesis (supervised by G. Bontempi) </sup></sub>

<sub><sup> 5.	Carcillo, Fabrizio; Dal Pozzolo, Andrea; Le Borgne, Yann-Aël; Caelen, Olivier; Mazzer, Yannis; Bontempi, Gianluca. Scarff: a scalable framework for streaming credit card fraud detection with Spark, Information fusion,41, 182-194,2018,Elsevier </sup></sub>

<sub><sup> 6.	Carcillo, Fabrizio; Le Borgne, Yann-Aël; Caelen, Olivier; Bontempi, Gianluca. Streaming active learning strategies for real-life credit card fraud detection: assessment and visualization, International Journal of Data Science and Analytics, 5,4,285-300,2018,Springer International Publishing </sup></sub>

<sub><sup> 7.	Bertrand Lebichot, Yann-Aël Le Borgne, Liyun He, Frederic Oblé, Gianluca Bontempi Deep-Learning Domain Adaptation Techniques for Credit Cards Fraud Detection, INNSBDDL 2019: Recent Advances in Big Data and Deep Learning, pp 78-88, 2019 </sup></sub>

<sub><sup> 8.	Fabrizio Carcillo, Yann-Aël Le Borgne, Olivier Caelen, Frederic Oblé, Gianluca Bontempi Combining Unsupervised and Supervised Learning in Credit Card Fraud Detection Information Sciences, 2019 </sup></sub>

<sub><sup> 9.	Yann-Aël Le Borgne, Gianluca Bontempi Machine Learning for Credit Card Fraud Detection - Practical Handbook </sup></sub>