# Sonar - Decentralized Model Training Simulation (local)

DISCLAIMER: This is a proof-of-concept implementation. It does not represent a remotely product ready implementation or follow proper conventions for security, convenience, or scalability. It is part of a broader proof-of-concept demonstrating the vision of the OpenMined project, its major moving parts, and how they might work together.


# Getting Started: Installation

##### Step 1: install IPFS

- https://ipfs.io/docs/install/

##### Step 2: Turn on IPFS Daemon
Execute on command line:
> ipfs daemon

##### Step 3: Install Ethereum testrpc

- https://github.com/ethereumjs/testrpc

##### Step 4: Turn on testrpc with 1000 initialized accounts (each with some money)
Execute on command line:
> testrpc -a 1000

##### Step 5: install openmined/sonar and all dependencies (truffle)

##### Step 6: Locally Deploy Smart Contracts in openmined/sonar
From the OpenMined/Sonar repository root run
> truffle compile
> truffle migrate

you should see something like this when you run migrate:
```
Using network 'development'.

Running migration: 1_initial_migration.js
  Deploying Migrations...
  Migrations: 0xf06039885460a42dcc8db5b285bb925c55fbaeae
Saving successful migration to network...
Saving artifacts...
Running migration: 2_deploy_contracts.js
  Deploying ConvertLib...
  ConvertLib: 0x6cc86f0a80180a491f66687243376fde45459436
  Deploying ModelRepository...
  ModelRepository: 0xe26d32efe1c573c9f81d68aa823dcf5ff3356946
  Linking ConvertLib to MetaCoin
  Deploying MetaCoin...
  MetaCoin: 0x6d3692bb28afa0eb37d364c4a5278807801a95c5
```

The address after 'ModelRepository' is something you'll need to copy paste into the code
below when you initialize the "ModelRepository" object. In this case the address to be
copy pasted is `0xe26d32efe1c573c9f81d68aa823dcf5ff3356946`.

##### Step 7: execute the following code

# The Simulation: Diabetes Prediction

In this example, a diabetes research center (Cure Diabetes Inc) wants to train a model to try to predict the progression of diabetes based on several indicators. They have collected a small sample (42 patients) of data but it's not enough to train a model. So, they intend to offer up a bounty of $5,000 to the OpenMined commmunity to train a high quality model.

As it turns out, there are 400 diabetics in the network who are candidates for the model (are collecting the relevant fields). In this simulation, we're going to faciliate the training of Cure Diabetes Inc incentivizing these 400 anonymous contributors to train the model using the Ethereum blockchain.

Note, in this simulation we're only going to use the sonar and syft packages (and everything is going to be deployed locally on a test blockchain). Future simulations will incorporate mine and capsule for greater anonymity and automation.

### Imports and Convenience Functions

In [1]:
import warnings
import numpy as np
import phe as paillier
import time
from sonar.contracts_listclass_unencrypted import ModelRepository,Model,Gradient_List
from syft.he.paillier.keys import KeyPair,SecretKey,PublicKey
from syft.nn.linear import LinearClassifier
from sklearn.datasets import load_diabetes

#import pandas
#from sklearn import model_selection
#import pickle
#url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv"
#names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
#dataframe = pandas.read_csv(url, names=names)
#array = dataframe.values
#X = array[:,0:8]
#print(type(X))
#y = array[:,8]
#test_size = 0.33
#seed = 7
#X_train, X_test, Y_train, Y_test = model_selection.train_test_split(X, y, test_size=test_size, random_state=seed)

#print(X_train)


start = time.time()
def get_balance(account):
    return repo.web3.fromWei(repo.web3.eth.getBalance(account),'ether')

warnings.filterwarnings('ignore')


1.1.0


### Setting up the Experiment

In [2]:
# for the purpose of the simulation, we're going to split our dataset up amongst
# the relevant simulated users

diabetes = load_diabetes()
y = diabetes.target
X = diabetes.data

#print (type(diabetes.data))

validation = (X[0:5],y[0:5])
anonymous_diabetes_users = (X[6:],y[6:])

# we're also going to initialize the model trainer smart contract, which in the
# real world would already be on the blockchain (managing other contracts) before
# the simulation begins

# ATTENTION: copy paste the correct address (NOT THE DEFAULT SEEN HERE) from truffle migrate output.
repo = ModelRepository('0xBE30EC73A0b86b2632783EA5414cE07df7C94be6') # blockchain hosted model repository

web3.version.api 3.15.0
No account submitted... using default[2]
Connected to ModelRepository:0xBE30EC73A0b86b2632783EA5414cE07df7C94be6


In [3]:


# we're going to set aside 10 accounts for our 42 patients
# Let's go ahead and pair each data point with each patient's 
# address so that we know we don't get them confused
patient_addresses = repo.web3.eth.accounts[2:11]
anonymous_diabetics = list(zip(patient_addresses,
                               anonymous_diabetes_users[0],
                               anonymous_diabetes_users[1]))

# we're going to set aside 1 account for Cure Diabetes Inc
cure_diabetes_inc = repo.web3.eth.accounts[0]
agg_addr = repo.web3.eth.accounts[1]

## Step 1: Cure Diabetes Inc Initializes a Model and Provides a Bounty

In [4]:
pubkey,prikey = KeyPair().generate(n_length=1024)
#pubkey,prikey=paillier.paillier.generate_paillier_keypair()
diabetes_classifier = LinearClassifier(desc="DiabetesClassifier",n_inputs=10,n_labels=1)
initial_error = diabetes_classifier.evaluate(validation[0],validation[1])
#diabetes_classifier.encrypt(pubkey)
s1,s2=paillier.paillier.genKeyShares(prikey.sk,pubkey.pk)
st=SecretKey(s1)
sab=SecretKey(s2)
s3,s4=paillier.paillier.genKeyShares(s2,pubkey.pk)
sa=SecretKey(s3)
scb=SecretKey(s4)

diabetes_model = Model(owner=cure_diabetes_inc,
                       syft_obj = diabetes_classifier,
                       bounty = 10,
                       initial_error = initial_error,
                       target_error =1 ,
                       best_error= initial_error
                      )
model_id = repo.submit_model(diabetes_model)
print('initial error',initial_error)

initial error 27


## Step 2: An Anonymous Patient Downloads the Model and Improves It

In [5]:
model_id

0

In [6]:
model = repo[model_id]

In [7]:
diabetic_address,input_data,target_data = anonymous_diabetics[0]
print(diabetic_address)

0x0029e651892dbcc4dfa3d4ab9da3f77361a8a614


In [8]:
#local_loss = 0
#alpha = 0.5
#gradient,candidate=model.generate_gradient(diabetic_address,prikey,input_data,target_data,alpha)
#local_loss=candidate.evaluate(validation[0],validation[1])
#model.submit_transformed_gradients(gradient,pubkey,st)

## Step 3: Cure Diabetes Inc. Evaluates the Gradient 

In [9]:
#print (model.gradient_list)
#print(model.model_id)

In [10]:
#old_balance = get_balance(diabetic_address)
#print(old_balance)

In [11]:
#alpha = 0.5
#gradient_list=Gradient_List(model_id, repo=repo, model=model)
#gradient_list[model_id]
#model=repo[model_id]
#avg_gradient=gradient_list.generate_gradient_avg(agg_addr,sa,alpha)
#decrypted_avg = model.decrypt_avg(scb)
#new_error = model.evaluate_gradient_from_avg(agg_addr,decrypted_avg,prikey,pubkey,validation[0],validation[1],alpha)

In [12]:
#new_error

In [13]:
#new_balance = get_balance(diabetic_address)
#incentive = new_balance - old_balance
#print(incentive)

## Step 4: Rinse and Repeat

In [14]:
repo[model_id]

Desc:DiabetesClassifier
Owner:0x3fb8d374f4a68f5e0d2221bc06266c8f851239c2
Bounty:10
Initial Error:27
Best Error:27
Target Error:1
Model ID:0
Num Grads:0

In [None]:
local_losses=0
alpha = 0.5
j=0
new_error = repo[model_id].syft_obj.evaluate(validation[0],validation[1])
print ("new_error", new_error)
while new_error >= model.target_error:
    print("round", j)
    print("model_id",model_id)
    if j >0:
        model=repo[j - 1]
        model.model_id =j
    for i,(addr, data, target) in enumerate(anonymous_diabetics):
    #for j in range (3,10):
    #address=repo.web3.eth.accounts[i]
        print("i", i)
        old_balance = get_balance(addr)
        print('model_id',model_id)
        
     #patient is doing this
        gradient,candidate=model.generate_gradient(addr,prikey,data,target,alpha)
        print("number of gradients of model",len(model))
        local_losses=candidate.evaluate(validation[0],validation[1])
    #local_loss=model.evaluate_gradient(addr, gradient, prikey, pubkey,validation[0],validation[1], alpha)
        print("local loss",local_losses)
        model.submit_transformed_gradients(gradient,pubkey,st)
        
        
    # Cure Diabetes Inc does this
    #old_balance = get_balance(address)
        #print(old_balance)
    gradient_list=Gradient_List(model_id, repo=repo, model=model)
    #gradient_list=gradient_list[model_id]
    avg_gradient=gradient_list.generate_gradient_avg(addr,sa,alpha)
    decrypted_avg=model.decrypt_avg(scb)
    new_error = model.evaluate_gradient_from_avg(agg_addr,decrypted_avg,prikey,pubkey,validation[0],validation[1],alpha)
    #repo[model_id].syft_obj.decrypt(prikey)
    print("model best error", model.best_error)
    updatedModel=model
    print("updated model's best error", updatedModel.best_error)
    print("new error from averaged gradients = "+str(new_error))
    incentive = (get_balance(addr) - old_balance)
    print("incentive = "+str(incentive))
    #umodelid=repo.submit_updated_model(updatedModel)
    #model2=repo.getUpdatedModel(umodelid)
    model_id=repo.submit_model(model)
    end = time.time()
    j=j+1
    print('execution time', end - start)
    if (new_error <= model.target_error):
        print("broken round", j)
        break





new_error 27
round 0
model_id 0
i 0
model_id 0
gradvaltype <class 'syft.tensor.TensorBase'>
number of gradients of model 0
local loss 25
i 1
model_id 0
gradvaltype <class 'syft.tensor.TensorBase'>
number of gradients of model 1
local loss 26
i 2
model_id 0
gradvaltype <class 'syft.tensor.TensorBase'>
number of gradients of model 2
local loss 30
i 3
model_id 0
gradvaltype <class 'syft.tensor.TensorBase'>
number of gradients of model 3
local loss 31
i 4
model_id 0
gradvaltype <class 'syft.tensor.TensorBase'>
number of gradients of model 4
local loss 23
i 5
model_id 0
gradvaltype <class 'syft.tensor.TensorBase'>
number of gradients of model 5
local loss 26
i 6
model_id 0
gradvaltype <class 'syft.tensor.TensorBase'>
number of gradients of model 6
local loss 21
i 7
model_id 0
gradvaltype <class 'syft.tensor.TensorBase'>
number of gradients of model 7
local loss 29
i 8
model_id 0
gradvaltype <class 'syft.tensor.TensorBase'>
number of gradients of model 8
local loss 22
length of gradient list

number of gradients of model 2
local loss 22
i 3
model_id 5
gradvaltype <class 'syft.tensor.TensorBase'>
number of gradients of model 3
local loss 20
i 4
model_id 5
gradvaltype <class 'syft.tensor.TensorBase'>
number of gradients of model 4
local loss 25
i 5
model_id 5
gradvaltype <class 'syft.tensor.TensorBase'>
number of gradients of model 5
local loss 17
i 6
model_id 5
gradvaltype <class 'syft.tensor.TensorBase'>
number of gradients of model 6
local loss 16
i 7
model_id 5
gradvaltype <class 'syft.tensor.TensorBase'>
number of gradients of model 7
local loss 20
i 8
model_id 5
gradvaltype <class 'syft.tensor.TensorBase'>
number of gradients of model 8
local loss 19
length of gradient list is 9
length of gradient list is 9
length of gradient list is 9
length of gradient list is 9
length of gradient list is 9
length of gradient list is 9
length of gradient list is 9
length of gradient list is 9
length of gradient list is 9
typeeeee <class 'syft.he.paillier.basic.PaillierTensor'>
type i 

i 6
model_id 10
gradvaltype <class 'syft.tensor.TensorBase'>
number of gradients of model 6
local loss 11
i 7
model_id 10
gradvaltype <class 'syft.tensor.TensorBase'>
number of gradients of model 7
local loss 12
i 8
model_id 10
gradvaltype <class 'syft.tensor.TensorBase'>
number of gradients of model 8
local loss 14
length of gradient list is 9
length of gradient list is 9
length of gradient list is 9
length of gradient list is 9
length of gradient list is 9
length of gradient list is 9
length of gradient list is 9
length of gradient list is 9
length of gradient list is 9
typeeeee <class 'syft.he.paillier.basic.PaillierTensor'>
type i wanna know <class 'syft.tensor.TensorBase'>
type i wanna know2 <class 'syft.tensor.TensorBase'>
new_model_error_iwannaknow 37
model best error 27
updated model's best error 27
new error from averaged gradients = 37
incentive = -2.65323E-13
execution time 152.02710914611816
round 11
model_id 11
i 0
model_id 11
gradvaltype <class 'syft.tensor.TensorBase'>
n

length of gradient list is 9
length of gradient list is 9
length of gradient list is 9
length of gradient list is 9
length of gradient list is 9
length of gradient list is 9
length of gradient list is 9
typeeeee <class 'syft.he.paillier.basic.PaillierTensor'>
type i wanna know <class 'syft.tensor.TensorBase'>
type i wanna know2 <class 'syft.tensor.TensorBase'>
new_model_error_iwannaknow 14
self.best_error 14
model best error 14
updated model's best error 14
new error from averaged gradients = 14
incentive = 4.999999999999734677
execution time 1804.5789499282837
round 16
model_id 16
i 0
model_id 16
gradvaltype <class 'syft.tensor.TensorBase'>
number of gradients of model 0
local loss 6
i 1
model_id 16
gradvaltype <class 'syft.tensor.TensorBase'>
number of gradients of model 1
local loss 38
i 2
model_id 16
gradvaltype <class 'syft.tensor.TensorBase'>
number of gradients of model 2
local loss 2
i 3
model_id 16
gradvaltype <class 'syft.tensor.TensorBase'>
number of gradients of model 3
loc