<img src="https://github.com/OpenMined/design-assets/raw/master/logos/OM/horizontal-primary-light.png" alt="he-black-box" width="600"/>


# Homomorphic Encryption using Duet: Data Owner
## Tutorial 1: Training and Evaluating a Logistic Regression over Encrypted Data


Welcome!
This tutorial will show you how to train and evaluate a Logistic Regression using Duet and TenSEAL. This notebook illustrates the Data Owner view on the operations.

We recommend going through Tutorial 0 before trying this one.

### Setup

All modules are imported here, make sure everything is installed by running the cell below.

In [1]:
import os
import syft as sy
import tenseal as ts
import torch
import pandas as pd
import random
import numpy as np
import requests
from syft.grid.client.client import connect
from syft.grid.client.grid_connection import GridHTTPConnection
from syft.core.node.domain.client import DomainClient

import pytest
from time import time
import matplotlib.pyplot as plt
import sys

sy.load("tenseal")

sy.logger.add(sys.stdout)


## Connect to PyGrid

Connect to PyGrid Domain server.

In [2]:
client = connect(
    url="http://localhost:5000", # Domain Address
    credentials={"email":"admin@email.com", "password":"pwd123"},
    conn_type= GridHTTPConnection, # HTTP Connection Protocol
    client_type=DomainClient) # Domain Client type

### <img src="https://github.com/OpenMined/design-assets/raw/master/logos/OM/mark-primary-light.png" alt="he-black-box" width="100"/> Checkpoint 1 : Now STOP and run the Data Scientist notebook until the same checkpoint.

## Scenario 1:  Evaluation of the Logistic Regression on Encrypted Data

### Prepare the data

We now prepare the training and test data, the dataset was downloaded from Kaggle [here](https://www.kaggle.com/dileep070/heart-disease-prediction-using-logistic-regression).
T
his dataset provides patients' information along with a 10-year risk of future coronary heart disease (CHD) as a label, and the goal is to build a model that can predict this 10-year CHD risk based on patients' information, you can read more about the dataset in the link provided.

In [3]:
def split_train_test(x, y, test_ratio=0.3):
    idxs = [i for i in range(len(x))]
    random.shuffle(idxs)
    # delimiter between test and train data
    delim = int(len(x) * test_ratio)
    test_idxs, train_idxs = idxs[:delim], idxs[delim:]
    return x[train_idxs], y[train_idxs], x[test_idxs], y[test_idxs]

def download_dataset():
    try:
        os.mkdir("./data")
    except BaseException as e:
        print(e)

    url = "https://raw.githubusercontent.com/OpenMined/TenSEAL/master/tutorials/data/framingham.csv"
    path = "./data/framingham.csv"
    
    r = requests.get(url)

    with open(path, 'wb') as f:
        f.write(r.content)
            
def heart_disease_data():
    download_dataset()
    
    data = pd.read_csv("./data/framingham.csv")
    # drop rows with missing values
    data = data.dropna()
    # drop some features
    data = data.drop(columns=["education", "currentSmoker", "BPMeds", "diabetes", "diaBP", "BMI"])
    # balance data
    grouped = data.groupby('TenYearCHD')
    data = grouped.apply(lambda x: x.sample(grouped.size().min(), random_state=73).reset_index(drop=True))
    # extract labels
    y = torch.tensor(data["TenYearCHD"].values).float().unsqueeze(1)
    data = data.drop("TenYearCHD", 'columns')
    # standardize data
    data = (data - data.mean()) / data.std()
    x = torch.tensor(data.values).float()
    return split_train_test(x, y)

x_train, y_train, x_test, y_test = heart_disease_data()

### Make Training data Referenceable over Duet

In this scenario, we train over the plain data over Duet.

In [4]:
x_train_ptr = x_train.send(client, pointable=True, tags=["x_train"])
y_train_ptr = y_train.send(client, pointable=True, tags=["y_train"])

### <img src="https://github.com/OpenMined/design-assets/raw/master/logos/OM/mark-primary-light.png" alt="he-black-box" width="100"/> Checkpoint 2 : Now STOP and run the Data Scientist notebook until the same checkpoint.

### Approve the requests

In [5]:
client.requests[0].accept()
client.requests[0].accept()

In [6]:
client.requests.pandas

### <img src="https://github.com/OpenMined/design-assets/raw/master/logos/OM/mark-primary-light.png" alt="he-black-box" width="100"/> Checkpoint 3 : Now STOP and run the Data Scientist notebook until the same checkpoint.

### TenSEAL Context

The next step is to prepare the data for encrypted evaluation.

As you may recall from the first tutorial, the first step for that is to create a __TenSEAL context__.

In [7]:
context = ts.Context(
    ts.SCHEME_TYPE.CKKS,
    poly_modulus_degree=8192,
    coeff_mod_bit_sizes=[60, 40, 40, 60]
)
context.global_scale = 2**40
context.generate_galois_keys()
context

<tenseal.enc_context.Context at 0x7f79f5a77ca0>

### Encrypt the data


In [8]:
t_start = time()
x_test = x_test[:10]
enc_x_test = sy.lib.python.List([ts.ckks_vector(context, x.tolist()) for x in x_test])
t_end = time()
print(f"Encryption of the test-set took {int(t_end - t_start)} seconds")

Encryption of the test-set took 0 seconds


### Make Context and Encrypted Vectors Referenceable over Duet

In [9]:
# tag them so our partner can easily reference it

ctx_ptr = context.send(client, pointable=True, tags=["context"])
enc_x_test_ptr = enc_x_test.send(client, pointable=True, tags=["enc_x_test"])

In [10]:
# we can see that our three objects are now inside the store we control
client.store.pandas

Unnamed: 0,ID,Tags,Description,object_type
0,<UID: 90a2a838f5b149aeb39a62b8bbe2835e>,[x_train],,<class 'torch.Tensor'>
1,<UID: fb476d205c804da9a8d5bbff12140398>,[y_train],,<class 'torch.Tensor'>
2,<UID: 28666e88d3694db7bfa214f794f931f9>,[context],,<class 'tenseal.enc_context.Context'>
3,<UID: 5a0a3af5aee7477bb15ec01f55cd07e8>,[enc_x_test],,<class 'syft.lib.python.list.List'>


### <img src="https://github.com/OpenMined/design-assets/raw/master/logos/OM/mark-primary-light.png" alt="he-black-box" width="100"/> Checkpoint 4 : Now STOP and run the Data Scientist notebook until the same checkpoint.

In [11]:
# We can see that there are two requests, for the context and for the encrypted data.
client.requests.pandas

Unnamed: 0,Requested Object's tags,Reason,Request ID,Requested Object's ID,Requested Object's type
0,[context],I would like to get the context,<UID: e7d67605f01742d0b36fbee554cc01df>,<UID: 28666e88d3694db7bfa214f794f931f9>,<class 'tenseal.enc_context.Context'>
1,[enc_x_test],I would like to get encrypted test set,<UID: 78d26c8fa7804d70b0912b5d94306f6f>,<UID: 5a0a3af5aee7477bb15ec01f55cd07e8>,<class 'syft.lib.python.list.List'>


### Approve the requests

In [12]:
client.requests[0].accept()
client.requests[0].accept()

In [13]:
# The requests should have been handled
client.requests.pandas

### <img src="https://github.com/OpenMined/design-assets/raw/master/logos/OM/mark-primary-light.png" alt="he-black-box" width="100"/> Checkpoint 5 : Now STOP and run the Data Scientist notebook until the same checkpoint.

In [15]:
client.store.pandas

Unnamed: 0,ID,Tags,Description,object_type
0,<UID: 90a2a838f5b149aeb39a62b8bbe2835e>,[x_train],,<class 'torch.Tensor'>
1,<UID: fb476d205c804da9a8d5bbff12140398>,[y_train],,<class 'torch.Tensor'>
2,<UID: 28666e88d3694db7bfa214f794f931f9>,[context],,<class 'tenseal.enc_context.Context'>
3,<UID: 5a0a3af5aee7477bb15ec01f55cd07e8>,[enc_x_test],,<class 'syft.lib.python.list.List'>
4,<UID: 5588e9efb0344e249630e6c40a7b356f>,[result_eval],,<class 'syft.lib.python.list.List'>


In [16]:
# Test the accuracy

result_eval = client.store["result_eval"].get(delete_obj=False)
correct = 0
for actual, expected in zip(result_eval, y_test):
    actual.link_context(context)
    actual = torch.tensor(actual.decrypt())
    actual = torch.sigmoid(actual)

    if torch.abs(actual - expected) < 0.5:
        correct += 1
        
print(f"Evaluated test_set of {len(x_test)} entries. Accuracy: {correct}/{len(x_test)} = {correct / len(x_test)}")

Evaluated test_set of 10 entries. Accuracy: 8/10 = 0.8


### <img src="https://github.com/OpenMined/design-assets/raw/master/logos/OM/mark-primary-light.png" alt="he-black-box" width="100"/> Checkpoint 6: Well Done!

# Congratulations!!! - Time to Join the Community!

Congratulations on completing this notebook tutorial! If you enjoyed this and would like to join the movement toward privacy preserving, decentralized ownership of AI and the AI supply chain (data), you can do so in the following ways!

### Star PySyft and TenSEAL on GitHub

The easiest way to help our community is just by starring the Repos! This helps raise awareness of the cool tools we're building.

- [Star PySyft](https://github.com/OpenMined/PySyft)
- [Star TenSEAL](https://github.com/OpenMined/TenSEAL)

### Join our Slack!

The best way to keep up to date on the latest advancements is to join our community! You can do so by filling out the form at [http://slack.openmined.org](http://slack.openmined.org). #lib_tenseal and #code_tenseal are the main channels for the TenSEAL project.

### Donate

If you don't have time to contribute to our codebase, but would still like to lend support, you can also become a Backer on our Open Collective. All donations go toward our web hosting and other community expenses such as hackathons and meetups!

[OpenMined's Open Collective Page](https://opencollective.com/openmined)