# Confidential ML Training Demo - Analyst

This notebook is the Analyst part of the *Confidential ML Training Demo* showing how a simple logistic regression classifier can be trained while keeping the training data provably confidential. The demo requires the [Training Client API](https://github.com/decentriq/avato-python-client-training) and its dependencies to be installed.  

Note that while we demo the training of a logistic regression enclave, it can be used to train a variety of other classifiers. 

## 1 - Import dependencies and set parameters

In [1]:
import os
import json
import example

analyst_username = os.getenv('ANALYST_ID')
analyst_***REMOVED*** = os.getenv('ANALYST_PASSWORD')

# The analyst needs these to control who can upload data
data_owner_usernames = (os.getenv('DATAOWNER1_ID'), os.getenv('DATAOWNER2_ID'))

# How the analyst expects the data to be formatted
feature_columns = [
    'fixed acidity', 
    'volatile acidity', 
    'citric acid', 
    'residual sugar', 
    'chlorides', 
    'free sulfur dioxide', 
    'total sulfur dioxide', 
    'density', 
    'pH', 
    'sulphates', 
    'alcohol'
]
label_column = "quality"

## 2 - Set up instance

In [2]:
analyst_instance = example.analyst_set_up_instance(
    analyst_username, 
    analyst_***REMOVED***, 
    data_owner_usernames, 
    feature_columns, 
    label_column
)

Created Instance with ID: 3fa57b0e-2914-4312-bacd-f8748be7e0b1

Configured instance with feature columns 
  'fixed acidity'
  'volatile acidity'
  'citric acid'
  'residual sugar'
  'chlorides'
  'free sulfur dioxide'
  'total sulfur dioxide'
  'density'
  'pH'
  'sulphates'
  'alcohol'
 and label column 
  'quality'


## 3 - Train model (after data has been uploaded by the Data Owners)
### Train with first set of hyperparameters
This returns the trained classifier and metadata

In [3]:
hyperparameters = {
    "learning_rate": 1.0,
    "num_splits": 10,
    "num_epochs": 5,
    "l2_penalty": 0.0,
    "l1_penalty": 0.0,
}

analyst_instance.start_execution(analyst_***REMOVED***, hyperparameters)

In [4]:
classifier, metadata = analyst_instance.get_results(analyst_***REMOVED***)
print("metadata: {}".format(json.dumps(metadata, indent=2)))

metadata: {
  "CV Test Accuracy": "0.47443762",
  "CV Train Accuracy": "0.47695622",
  "Fullset Accuracy": "0.48040017"
}


### Train with second set of hyperparameters

In [5]:
hyperparameters = {
    "learning_rate": 0.5,
    "num_splits": 10,
    "num_epochs": 5,
    "l2_penalty": 0.0,
    "l1_penalty": 0.0,
}

analyst_instance.start_execution(analyst_***REMOVED***, hyperparameters)

In [6]:
classifier, metadata = analyst_instance.get_results(analyst_***REMOVED***)
print("metadata: {}".format(json.dumps(metadata, indent=2)))

metadata: {
  "CV Test Accuracy": "0.48936605",
  "CV Train Accuracy": "0.49038333",
  "Fullset Accuracy": "0.48672926"
}


## 4 - Use the classifier
### Use classifier, compute accuracy on full dataset, compare with metadata results
Note that in a realistic situation, the analyst would not have access to the dataset.

In [7]:
X, y = example.load_data()
some_predictions = classifier.predict(X[0:2, :])
accuracy = example.compute_accuracy(classifier, X, y)

print("Some predictions of the classifier: \n{}\n".format(some_predictions))
print("Accuracy of the enclave classifier on the full dataset (from the metadata object): \n{:.4f}\n".format(float(metadata["Fullset Accuracy"])))
print("Accuracy of the local classifier on the full dataset: \n{:.4f}\n".format(accuracy))

Some predictions of the classifier: 
['6' '6']

Accuracy of the enclave classifier on the full dataset (from the metadata object): 
0.4867

Accuracy of the local classifier on the full dataset: 
0.4867



## 5 - Clean Up

In [8]:
analyst_instance.shutdown()
analyst_instance.delete()