# Confidential ML Training Demo (Dataowners Part)

This notebook is the Dataowners part of the *Confidential ML Training Demo* showing how a simple logistic regression classifier can be trained while keeping the training data provably confidential. The demo requires the [Training Client API](https://github.com/decentriq/avato-python-client-training) and its dependencies to be installed.  

Note that in a realistic, non-demo use of the Confidential Training tool, one analyst user and multiple dataowner users would upload data from different computers. In this demo, for simplicity, both dataowners submit data from the same notebook.

## 1 - Import dependencies

In [1]:
import pandas as pd
from avato import Client
from avato import Secret
from avato_training import Training_Instance
import example

dataowner1_username, dataowner1_***REMOVED*** = example.dataowner1_credentials
dataowner2_username, dataowner2_***REMOVED*** = example.dataowner2_credentials

# This is the hash of the code
expected_measurement = "4ff505f350698c78e8b3b49b8e479146ce3896a06cd9e5109dfec8f393f14025"

# The datafiles uploaded by the 
dataowner1_file, dataowner2_file = example.data_filenames

backend_host = "localhost" 
backend_port = 3000 

## 2 - Set instance id received from Analyst

In [2]:
instance_id_from_analyst = "95921473-6649-4a7d-92df-8e76a96903f5"

## 3 - Submit Data

In [3]:
# This function submits for a given dataowner a data file to the instance.
def dataowner_submit_data(dataowner_username, dataowner_***REMOVED***, instance_id, data_file):

    # Create client
    dataowner_client = Client(
        username=dataowner_username,
        ***REMOVED***=dataowner_***REMOVED***,
        instance_types=[Training_Instance],
        backend_host=backend_host,
        backend_port=backend_port
    )

    # Connect to instance (using ID from the analyst user)
    dataowner_instance = dataowner_client.get_instance(instance_id)

    # Check security guarantees.
    dataowner_instance.validate_fatquote(
        expected_measurement=expected_measurement,
        accept_debug=True,
        accept_group_out_of_date=True
    )

    # Create and set public-private keypair for secure communication.
    dataowner_secret = Secret()
    dataowner_instance.set_secret(dataowner_secret)

    # Get data format from the enclave
    data_format = dataowner_instance.get_data_format()
    print("Data format:\n{}".format(data_format))

    # Load data
    df = pd.read_csv(data_file)
    
    print("Loaded data:\n")
    print(df.head(2))

    # Submit data
    (ingested_rows, failed_rows) = dataowner_instance.submit_data(df)
    print("\nNumber of successfully ingested rows: {}, number of failed rows: {}".format(ingested_rows, failed_rows))
    
    return dataowner_instance

### Dataowner 1 - Submit data

In [4]:
dataowner1_instance = dataowner_submit_data(
    dataowner1_username, 
    dataowner1_***REMOVED***, 
    instance_id_from_analyst, 
    data_file=dataowner1_file
)

Data format:
categoriesColumns: "fixed acidity"
categoriesColumns: "volatile acidity"
categoriesColumns: "citric acid"
categoriesColumns: "residual sugar"
categoriesColumns: "chlorides"
categoriesColumns: "free sulfur dioxide"
categoriesColumns: "total sulfur dioxide"
categoriesColumns: "density"
categoriesColumns: "pH"
categoriesColumns: "sulphates"
categoriesColumns: "alcohol"
valueColumn: "quality"

Loaded data:

   fixed acidity  volatile acidity  citric acid  residual sugar  chlorides  \
0            7.2              0.23         0.32             8.5      0.058   
1            6.2              0.32         0.16             7.0      0.045   

   free sulfur dioxide  total sulfur dioxide  density    pH  sulphates  \
0                 47.0                 186.0   0.9956  3.19       0.40   
1                 30.0                 136.0   0.9949  3.18       0.47   

   alcohol  quality  
0      9.9        6  
1      9.6        6  

Number of successfully ingested rows: 2483, number of f

### Dataowner 2 - Submit Data

In [5]:
dataowner2_instance = dataowner_submit_data(
    dataowner2_username, 
    dataowner2_***REMOVED***, 
    instance_id_from_analyst, 
    dataowner2_file
)

Data format:
categoriesColumns: "fixed acidity"
categoriesColumns: "volatile acidity"
categoriesColumns: "citric acid"
categoriesColumns: "residual sugar"
categoriesColumns: "chlorides"
categoriesColumns: "free sulfur dioxide"
categoriesColumns: "total sulfur dioxide"
categoriesColumns: "density"
categoriesColumns: "pH"
categoriesColumns: "sulphates"
categoriesColumns: "alcohol"
valueColumn: "quality"

Loaded data:

   fixed acidity  volatile acidity  citric acid  residual sugar  chlorides  \
0            7.0              0.27         0.36            20.7      0.045   
1            6.3              0.30         0.34             1.6      0.049   

   free sulfur dioxide  total sulfur dioxide  density   pH  sulphates  \
0                 45.0                 170.0    1.001  3.0       0.45   
1                 14.0                 132.0    0.994  3.3       0.49   

   alcohol  quality  
0      8.8        6  
1      9.5        6  

Number of successfully ingested rows: 2415, number of fail