## Encrypted Logistic Regression

In this demo, we are comparing the sklearn library with our own VenumML library enabled by the Vaultree FHE python library, VENumpy.

#### Plaintext Logistic Regression with Scikit-Learn

The below code demonstrates a basic implementation of Logistic Regression using Scikit-Learn for binary classification. Logistic Regression is a statistical model that predicts the probability of a binary outcome (0 or 1) based on a set of features.

Here's a breakdown of the steps involved:

1. **Data Generation:**
    - We use `sklearn.datasets.make_regression` to generate sample data with 10 samples, 2 features, and a small amount of noise.

2. **Data Preprocessing:**
    - The generated data represents continuous values. For logistic regression, we typically want the target variable to be binary (0 or 1).
    - We binarize the target variable `y` using a threshold (set to 4 in this example). Values above the threshold are converted to 1, and values below are converted to 0.
    - This process creates a binary classification problem where the model predicts the probability of a sample belonging to the class labeled 1.
    - The threshold value can be adjusted based on your specific problem.

3. **Train-Test Split:**
    - We split the data into training and testing sets using `sklearn.model_selection.train_test_split`.
    - The training set is used to train the model, and the testing set is used to evaluate its performance on unseen data.

4. **Model Training:**
    - We create a Logistic Regression model instance from `sklearn.linear_model.LogisticRegression`.
    - The `solver` parameter is set to 'liblinear' which is a suitable choice for smaller datasets.

5. **Model Fitting:**
    - We call the `fit` method on the model, passing the training data (X_train and y_train_binary) to train the model.

6. **Prediction:**
    - We use the trained model to make predictions on the testing data (X_test) using the `predict` method. This results in an array of predicted binary labels (0 or 1).

7. **Evaluation (not shown in this example):**
    - Typically, we would use metrics like accuracy, precision, recall, or F1-score to evaluate the model's performance on the testing set. This step is not included in this basic example.

The following code block implements these steps:

In [None]:
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_regression

# Generate Sample Data
X, y = make_regression(n_samples=100, n_features=2, noise=0.1)

# Binarize the target variable (consider adjusting the threshold)
threshold = 4
y_binary = np.where(y > threshold, 1, 0)

In [2]:
# Split data into training and testing sets
X_train, X_test, y_train_binary, y_test_binary = train_test_split(X, y_binary, test_size=0.2, random_state=42)

# Train Scikit-Learn LogisticRegression
sk_lr = LogisticRegression(solver='liblinear')
sk_lr.fit(X_train, y_train_binary)

# Make Predictions on Test Set
sk_lr_predictions = sk_lr.predict(X_test)

# %timeit sk_lr_predictions
print("Scikit-Learn Predictions:", sk_lr_predictions)
# X, y

Scikit-Learn Predictions: [0 0 1 1 0 0 1 1 0 0 1 1 0 1 1 0 1 1 1 1]


### Encrypted Logistic Regression with VenumMLlib

This code demonstrates training and using an encrypted Logistic Regression model from our VenumMLlib library.

1. **Venumpy Context and Security:**
   - A `venumpy.SecretContext` object (`ctx`) is created, specifying a security level (128 bits in this example). This context manages the encryption process.

2. **Encrypted Logistic Regression Model:**
   - The `EncryptedLogisticRegression` class, imported from `venumMLlib.linear_models.regression.logistic_regression`, handles training and prediction with encrypted data representations. It ensures model parameters and data privacy during training and inference.

3. **Model Training:**
   - An instance of `EncryptedLogisticRegression` is created, passing the `ctx` object to establish the encryption context.
   - The `fit` method trains the model using encrypted data representations. Similar to the non-encrypted case, it iteratively updates the model's coefficients to minimize the loss function.
Encryption and Prediction:

   - **Note:** Due to the stochastic nature of Stochastic Gradient Descent (SGD), even with encrypted logistic regression, there might be slight variations in the decision boundaries between different model runs. This is because SGD updates the model's coefficients based on randomly chosen mini-batches of data.

4. **Encryption and Prediction:**
   - After training, `my_lr.encrypt_coefficients(ctx)` encrypts the model's coefficients, ensuring they are not stored or revealed in plain text.
   - The testing data `X_test` is encrypted using `encrypt_array` from `venum_tools` before making predictions. This protects the raw data from unauthorized access.

5. **Encrypted Predictions and Decryption:**

   - The predict method is called on the model with the encrypted testing data `(cipher_X)`. The predictions themselves are also encrypted.
   - `decrypt_array` from `venum_tools` is used to decrypt the model's predictions, allowing you to work with the results in plain text.

6. **Output:**
   - The code snippet shows printing the decrypted predictions before applying a threshold. This is an intermediate step before converting the probabilities to class labels.(0 or 1)


In [None]:
from venumML.venumpy import small_glwe as vp
from venumML.venum_tools import decrypt_array

# Create venumpy context with 128 bits of security
ctx = vp.SecretContext()
ctx.precision = 6

from venumML.linear_models.regression.logistic_regression import EncryptedLogisticRegression
from venumML.venum_tools import encrypt_array

# Train EncryptedLogisticRegression Model
my_lr = EncryptedLogisticRegression(ctx)
my_lr.fit(X_train, y_train_binary)

# Decrypt and Evaluate Predictions
my_lr.encrypt_coefficients(ctx)  # Encrypt coefficients

cipher_X = encrypt_array(X_test,ctx)

# Make predictions on encrypted data
encrypted_prediction = np.array(my_lr.predict(cipher_X, ctx))
# encrypted_prediction
# X, y

In [4]:
threshold = 0.5

# decrypt result
decrypted_prediction = decrypt_array(encrypted_prediction)
# convert with sigmoid and condition on threshold
binary_predictions = np.where(1 / (1 + np.exp(-decrypted_prediction)) > 0.5, 1, 0)  # Apply sigmoid and threshold

print("VENUmpy predictions: ", binary_predictions)
# Check differences
differences = binary_predictions != sk_lr_predictions
print("Number of differing predictions:", np.sum(differences))
print("Percentage of differing predictions: {:.2f}%".format(np.sum(differences) / len(X_test) * 100))


VENUmpy predictions:  [0 0 1 1 0 0 1 1 0 0 1 1 0 1 1 0 1 1 1 1]
Number of differing predictions: 0
Percentage of differing predictions: 0.00%
