<img src="https://github.com/OpenMined/design-assets/raw/master/logos/OM/horizontal-primary-light.png" alt="he-black-box" width="600"/>


# Homomorphic Encryption using Duet: Data Owner
## Tutorial 1: Training and Evaluating a Logistic Regression over Encrypted Data


Welcome!
This tutorial will show you how to train and evaluate a Logistic Regression using Duet and TenSEAL. This notebook illustrates the Data Owner view on the operations.

We recommend going through Tutorial 0 before trying this one.

### Setup

All modules are imported here, make sure everything is installed by running the cell below.

In [None]:
import os
import syft as sy
import tenseal as ts
import torch
import pandas as pd
import random
import numpy as np
import requests

import pytest
from time import time
import matplotlib.pyplot as plt
import sys

sy.load("tenseal")

sy.logger.add(sys.stdout)


### Start Duet Data Owner instance

In [None]:
# Start Duet local instance
duet = sy.launch_duet(loopback=True)

### <img src="https://github.com/OpenMined/design-assets/raw/master/logos/OM/mark-primary-light.png" alt="he-black-box" width="100"/> Checkpoint 1 : Now STOP and run the Data Scientist notebook until the same checkpoint.

## Scenario 1:  Evaluation of the Logistic Regression on Encrypted Data

### Prepare the data

We now prepare the training and test data, the dataset was downloaded from Kaggle [here](https://www.kaggle.com/dileep070/heart-disease-prediction-using-logistic-regression).
T
his dataset provides patients' information along with a 10-year risk of future coronary heart disease (CHD) as a label, and the goal is to build a model that can predict this 10-year CHD risk based on patients' information, you can read more about the dataset in the link provided.

In [None]:
from syft.util import get_root_data_path

def split_train_test(x, y, test_ratio=0.3):
    idxs = [i for i in range(len(x))]
    random.shuffle(idxs)
    # delimiter between test and train data
    delim = int(len(x) * test_ratio)
    test_idxs, train_idxs = idxs[:delim], idxs[delim:]
    return x[train_idxs], y[train_idxs], x[test_idxs], y[test_idxs]

def download_dataset():
    try:
        os.makedirs(get_root_data_path(), exist_ok=True)
    except BaseException as e:
        print(e)

    url = "https://raw.githubusercontent.com/OpenMined/TenSEAL/master/tutorials/data/framingham.csv"
    path = f"{get_root_data_path()}/framingham.csv"
    
    r = requests.get(url)

    with open(path, 'wb') as f:
        f.write(r.content)
            
def heart_disease_data():
    download_dataset()
    
    data = pd.read_csv(f"{get_root_data_path()}/framingham.csv")
    # drop rows with missing values
    data = data.dropna()
    # drop some features
    data = data.drop(columns=["education", "currentSmoker", "BPMeds", "diabetes", "diaBP", "BMI"])
    # balance data
    grouped = data.groupby('TenYearCHD')
    data = grouped.apply(lambda x: x.sample(grouped.size().min(), random_state=73).reset_index(drop=True))
    # extract labels
    y = torch.tensor(data["TenYearCHD"].values).float().unsqueeze(1)
    data = data.drop("TenYearCHD", 'columns')
    # standardize data
    data = (data - data.mean()) / data.std()
    x = torch.tensor(data.values).float()
    return split_train_test(x, y)

x_train, y_train, x_test, y_test = heart_disease_data()

### Make Training data Referenceable over Duet

In this scenario, we train over the plain data over Duet.

In [None]:
x_train_ptr = x_train.send(duet, pointable=True, tags=["x_train"])
y_train_ptr = y_train.send(duet, pointable=True, tags=["y_train"])

### <img src="https://github.com/OpenMined/design-assets/raw/master/logos/OM/mark-primary-light.png" alt="he-black-box" width="100"/> Checkpoint 2 : Now STOP and run the Data Scientist notebook until the same checkpoint.

### Approve the requests

In [None]:
duet.requests[0].accept()
duet.requests[0].accept()

In [None]:
duet.requests.pandas

### <img src="https://github.com/OpenMined/design-assets/raw/master/logos/OM/mark-primary-light.png" alt="he-black-box" width="100"/> Checkpoint 3 : Now STOP and run the Data Scientist notebook until the same checkpoint.

### TenSEAL Context

The next step is to prepare the data for encrypted evaluation.

As you may recall from the first tutorial, the first step for that is to create a __TenSEAL context__.

In [None]:
context = ts.Context(
    ts.SCHEME_TYPE.CKKS,
    poly_modulus_degree=8192,
    coeff_mod_bit_sizes=[60, 40, 40, 60]
)
context.global_scale = 2**40
context.generate_galois_keys()
context

### Encrypt the data


In [None]:
t_start = time()
x_test = x_test[:10]
enc_x_test = sy.lib.python.List([ts.ckks_vector(context, x.tolist()) for x in x_test])
t_end = time()
print(f"Encryption of the test-set took {int(t_end - t_start)} seconds")

### Make Context and Encrypted Vectors Referenceable over Duet

In [None]:
# tag them so our partner can easily reference it

ctx_ptr = context.send(duet, pointable=True, tags=["context"])
enc_x_test_ptr = enc_x_test.send(duet, pointable=True, tags=["enc_x_test"])

In [None]:
# we can see that our three objects are now inside the store we control
duet.store.pandas

### <img src="https://github.com/OpenMined/design-assets/raw/master/logos/OM/mark-primary-light.png" alt="he-black-box" width="100"/> Checkpoint 4 : Now STOP and run the Data Scientist notebook until the same checkpoint.

In [None]:
# We can see that there are two requests, for the context and for the encrypted data.
duet.requests.pandas

### Approve the requests

In [None]:
duet.requests[0].accept()
duet.requests[0].accept()

In [None]:
# The requests should have been handled
duet.requests.pandas

### <img src="https://github.com/OpenMined/design-assets/raw/master/logos/OM/mark-primary-light.png" alt="he-black-box" width="100"/> Checkpoint 5 : Now STOP and run the Data Scientist notebook until the same checkpoint.

In [None]:
print(duet.store.pandas)

In [None]:
# Test the accuracy

result_eval = duet.store["result_eval"].get(delete_obj=False)
correct = 0
for actual, expected in zip(result_eval, y_test):
    actual.link_context(context)
    actual = torch.tensor(actual.decrypt())
    actual = torch.sigmoid(actual)

    if torch.abs(actual - expected) < 0.5:
        correct += 1
        
print(f"Evaluated test_set of {len(x_test)} entries. Accuracy: {correct}/{len(x_test)} = {correct / len(x_test)}")

### <img src="https://github.com/OpenMined/design-assets/raw/master/logos/OM/mark-primary-light.png" alt="he-black-box" width="100"/> Checkpoint 6: Well Done!

# Congratulations!!! - Time to Join the Community!

Congratulations on completing this notebook tutorial! If you enjoyed this and would like to join the movement toward privacy preserving, decentralized ownership of AI and the AI supply chain (data), you can do so in the following ways!

### Star PySyft and TenSEAL on GitHub

The easiest way to help our community is just by starring the Repos! This helps raise awareness of the cool tools we're building.

- [Star PySyft](https://github.com/OpenMined/PySyft)
- [Star TenSEAL](https://github.com/OpenMined/TenSEAL)

### Join our Slack!

The best way to keep up to date on the latest advancements is to join our community! You can do so by filling out the form at [http://slack.openmined.org](http://slack.openmined.org). #lib_tenseal and #code_tenseal are the main channels for the TenSEAL project.

### Donate

If you don't have time to contribute to our codebase, but would still like to lend support, you can also become a Backer on our Open Collective. All donations go toward our web hosting and other community expenses such as hackathons and meetups!

[OpenMined's Open Collective Page](https://opencollective.com/openmined)