# Confidential ML Training Demo (Data Owner 1)

This notebook is the Data Owners part of the *Confidential ML Training Demo* showing how a simple logistic regression classifier can be trained while keeping the training data provably confidential. The demo requires the [Training Client API](https://github.com/decentriq/avato-python-client-training) and its dependencies to be installed.  

## 1 - Import dependencies and submission code

In [4]:
import pandas as pd
from avato import Client
from avato import Secret
from avato_training import Training_Instance
import example

dataowner1_username, dataowner1_***REMOVED*** = example.dataowner1_credentials

# This is the hash of the code
expected_measurement = "4ff505f350698c78e8b3b49b8e479146ce3896a06cd9e5109dfec8f393f14025"

# The datafiles uploaded by the 
dataowner1_file, _ = example.data_filenames

backend_host = "localhost" 
backend_port = 3000 

## 2 - Set instance id received from Analyst

In [5]:
instance_id_from_analyst = "651a23f6-0233-41f3-b13a-fb216b1f11b8"

## 3 - Submit Data

#### Create client and connect to the instance created by the analyst

In [8]:
# Create client
dataowner1_client = Client(
    username=dataowner1_username,
    ***REMOVED***=dataowner1_***REMOVED***,
    instance_types=[Training_Instance],
    backend_host=backend_host,
    backend_port=backend_port
)

# Connect to instance (using ID from the analyst user)
dataowner1_instance = dataowner1_client.get_instance(instance_id_from_analyst)

#### Verify security and create own keypair

In [9]:
# Verify security.
dataowner1_instance.validate_fatquote(
    expected_measurement=expected_measurement,
    accept_debug=True,
    accept_group_out_of_date=True
)

# Create and set public-private keypair for secure communication.
dataowner1_secret = Secret()
dataowner1_instance.set_secret(dataowner1_secret)

#### Get data format from the enclave

In [10]:
# Get data format from the enclave
data_format = dataowner1_instance.get_data_format()
print("Data format:\n{}".format(data_format))

Data format:
categoriesColumns: "fixed acidity"
categoriesColumns: "volatile acidity"
categoriesColumns: "citric acid"
categoriesColumns: "residual sugar"
categoriesColumns: "chlorides"
categoriesColumns: "free sulfur dioxide"
categoriesColumns: "total sulfur dioxide"
categoriesColumns: "density"
categoriesColumns: "pH"
categoriesColumns: "sulphates"
categoriesColumns: "alcohol"
valueColumn: "quality"



#### Load data

In [12]:
# Load data
df = pd.read_csv(dataowner1_file)
df.head(2)

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
0,7.2,0.23,0.32,8.5,0.058,47.0,186.0,0.9956,3.19,0.4,9.9,6
1,6.2,0.32,0.16,7.0,0.045,30.0,136.0,0.9949,3.18,0.47,9.6,6


#### Submit data

In [14]:
(ingested_rows, failed_rows) = dataowner1_instance.submit_data(df)
print("\nNumber of successfully ingested rows: {}, number of failed rows: {}".format(ingested_rows, failed_rows))


Number of successfully ingested rows: 2483, number of failed rows: []
