![fig](https://raw.githubusercontent.com/zk-ml/demo/main/protocol_overview.jpg)

## zk-ml: truly private machine learning on blockchain

Peiyuan Liao, Milo Cress, @ludens

### **Thesis: modern machine learning platforms lack security and transperancy**

* https://www.kaggle.com/, https://codalab.org/, https://www.crowdai.org/
* Experiences of competing in Kaggle as a Competitions Grandmaster (Peiyuan):
 - user downloads dataset
 - trains model
 - uploads it (???)
 - oranigzer evaluates against some test set (???) 
 - and some metric (???)
 - some random meetings with the organizers (???)
 - prize is delivered 
* https://www.kaggle.com/c/deepfake-detection-challenge/discussion/157983#885598
 - Top ML competition team denied of 500k USD due to opaque data issues
* https://www.kaggle.com/c/global-wheat-detection/discussion/167298
 - Several under-the-hood re-runs, edits to private test set, arbitrarily invalidating submissions
 
### Solution: zk-SNARKS on Blockchain

* What zk-SNARKs provide: 
 - succint proof of computation that neural networks perform certain way under certain datasets
* What the blockahin provides:
 - securely post and claim bounties
 - reliable way to transfer model for key agreement protocol
* ML models usually operates in float, double and half precisions, but zk-SNARKS work in prime fields
* Quantization!
  - Prime complement for negative numbers
  - Slight precision loss on both sides (sometimes better, sometimes worse)
  - Division circuits are very expernsive but doable (negatives, quotioent-remainder relationships)
  - Multiply-add approximations for nonlinearities (sigmoid, tanh, etc.)

### The zk-ml protocol

* The circuit: Linear Regression
    - Checks that all the public inputs that defines a ml dataset (x,y pairs with quantization constants) hashes correctly to hash_input, as this will be the main key for bounty deposit and claim
    - confirms that the actual model params correctly encrypts to the public model params. This way, the bounty issuer can simply use the public key and the encrypted params in the calldata along with their private key to restore the params
    - Performs the model inference and confirms that it indeed has a certain measure wrt public targets.
    - Theory behind GEMM quantization: https://leimao.github.io/article/Neural-Networks-Quantization/
* The contract
    - Organizers post bounties with IPFS links to datasets
    - Competitors query for bounty and download datasets
    - Competitors trains model, quantizes it and generate proofs locally
    - (if conditions are met) Competitors uploads proof and claims bounty
    - (if conditions are not met) Organizers removes bounty and gets fund back

### Future works

* More ML models
 - Language models: I-BERT (https://arxiv.org/abs/2101.01321)
 - Gradient Boosted Decision Trees, XGBoost, LightGBM, etc: trees are representable as circuits
 - Compiler from neural network IR to groth16 verifier (zokrates w. optimizations)
* Protocol revamp
 - Multiple contributions to a contract: proxy contracts
 - Pushing beyond contract size limit:
     - Diamond patterns, libraries
     - 16/8/4-bit quantization and bit packing inside uint256
     - Moving zk-SNARK computation to cryptoprocessors
 - Preventing frontrunners: bounty issuer can see the calldata, front-run the transcation, and remove the bounty
 - Preventing over-fitting for over-parameterized models (MLP, ConvNets): two stage competition -> public leaderboard proof and private leaderboard proofs
 - Preventing adversarial organizers creating invalid test sets (random noise, exceptionally hard, etc).
     - User public keys are kept private during public leaderboard
     - Slashing if organizer does not relesase private dataset in-time
     - Slashing if private dataset behaves in an adversarial manner 
* DAO
 

In [1]:
alias zkml source ~/.nvm/nvm.sh >/dev/null && nvm use 14.0.0 >/dev/null && yarn >/dev/null && bash zkml

In [2]:
alias prepare source ~/.nvm/nvm.sh >/dev/null && nvm use 14.0.0 >/dev/null && yarn >/dev/null && bash zkml add_bounty --amount 5 >/dev/null 

In [5]:
prepare

An unexpected error occurred:
HTTPError: basic auth failure: invalid project id or project secret

    at Object.errorHandler [as handleError] (/home/bowen/Desktop/workspace/demo/eth/node_modules/[4mipfs-http-client[24m/src/lib/core.js:100:15)
[90m    at processTicksAndRejections (internal/process/task_queues.js:97:5)[39m
    at Client.fetch (/home/bowen/Desktop/workspace/demo/eth/node_modules/[4mipfs-utils[24m/src/http.js:145:9)
    at Object.addAll (/home/bowen/Desktop/workspace/demo/eth/node_modules/[4mipfs-http-client[24m/src/add-all.js:40:17)
    at SimpleTaskDefinition.action (/home/bowen/Desktop/workspace/demo/eth/hardhat.config.js:533:22)
    at Environment._runTaskDefinition (/home/bowen/Desktop/workspace/demo/eth/node_modules/[4mhardhat[24m/src/internal/core/runtime-environment.ts:217:14)
    at Environment.run (/home/bowen/Desktop/workspace/demo/eth/node_modules/[4mhardhat[24m/src/internal/core/runtime-environment.ts:129:14)
    at main (/home/bowen/Des

In [4]:
zkml list_datasets

Available datasets:
[
  [32m'14797455496207951391356508759149962584765968173479481191220882411966396840571'[39m
]


In [5]:
zkml list_bounties --hash '14797455496207951391356508759149962584765968173479481191220882411966396840571'

Available bounties on dataset: 14797455496207951391356508759149962584765968173479481191220882411966396840571
[
  {
    PubKey1: [32m'4335450774744029667338374268876724953162212166350367311071783936960844219437'[39m,
    PubKey2: [32m'12294985779291632745949915528747628813970908319399977746384186218556045373103'[39m,
    MSEcap: [32m'12888'[39m,
    Bounty: [32m'5.0'[39m,
    Issuer: [32m'0xd3162F2B88d05C882a1B26031E144753337ACDBF'[39m,
    IPFS: [32m'QmWLRJVL5uViT7h64bdeUM3GKMWP9DSWRggGC8igDuQdHR'[39m
  }
]


Command:

```
zkml download_dataset --hash '14797455496207951391356508759149962584765968173479481191220882411966396840571' --publickey ./keys/out_public.json --mse 12888 --path ./ipfs_dataset
```

In [6]:
zkml download_dataset --hash '14797455496207951391356508759149962584765968173479481191220882411966396840571' --publickey ./keys/out_public.json --mse 12888 --path ./ipfs_dataset

Downloading from IPFS to ./ipfs_dataset ...


In [7]:
!ls ./ipfs_dataset

X.npy Y.npy


In [68]:
import numpy as np
import pandas as pd
from sklearn.metrics import mean_squared_error
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import StandardScaler

In [117]:
X = np.load('./dataset/X_first.npy')
Y = np.load('./dataset/Y_first.npy')

In [118]:
Y.shape, X.shape

((6955, 1), (6955, 67))

In [129]:
# Instantiating LinearRegression() Model
scaler = StandardScaler()

X = scaler.fit_transform(X)
X = X[0:20, 0:4]
Y = Y[0:20]

lr = LinearRegression()

# Training/Fitting the Model
lr.fit(X, Y)

# Making Predictions
pred = lr.predict(X)

# Evaluating Model's Performance
print('Mean Squared Error:', mean_squared_error(Y, pred))

mse = mean_squared_error(Y, pred)

Mean Squared Error: 0.19852095774279208


In [130]:
W = lr.coef_.reshape(-1, 1)
b = lr.intercept_.reshape(-1, 1)
Yt_expected = Y.reshape(-1, 1)

np.save('model/W.npy',W)
np.save('model/b.npy',b)
np.save('dataset/X.npy',X)
np.save('dataset/Y.npy',Yt_expected)

In [131]:
print(W)

[[-8.94693752]
 [ 8.82111425]
 [ 0.03092156]
 [ 0.25509201]]


In [132]:
print(b)

[[0.6]]


In [133]:
shuffle = lambda x: np.random.shuffle(x)
shuffle(W)
shuffle(b)
print(mean_squared_error(Y, np.matmul(X, W.reshape(-1)) + b.reshape(-1)))

np.save('model_shuffled/W.npy',W)
np.save('model_shuffled/b.npy',b)

143.19081226152846


In [143]:
from scripts import *
from copy import deepcopy
import json

data = dict(
    alpha_X = 0,
    beta_X = 8,

    alpha_W = -1,
    beta_W = 8,

    alpha_Y = 0,
    beta_Y = 8,

    alpha_Yt = 0,
    beta_Yt = 8,

    alpha_b = 0,
    beta_b = 8,

    alpha_R = -1,
    beta_R = 1,

    alpha_S = 0,
    beta_S = 10,

    m = 20,
    p = 4,
    n = 1,
    
    mse_target = 50
)

json.dump(data, open('./settings.json', 'w'), indent = 2)

Command:

```
zkml claim_bounty --payment 0x2546BcD3c84621e976D8185a91A922aE77ECEc30 --model ./model_shuffled --dataset ./dataset --publickey ./keys/out_public.json --settings ./settings.json
```

In [144]:
zkml claim_bounty --payment 0xDD63369Cd353f731De50cd2d5F6594Dd7B1083bA --model ./model_shuffled --dataset ./dataset --publickey ./keys/out_public.json --settings ./settings.json

Mean Squared Error actual:  143.19081226152846
... quantized  65536
Mean Squared Error simulated:  56.018128547884998625
... quantized  65536
Circuit Outputs:
367102
Proof took 17.514 s
ERROR: Invalid proof
[31mAn unexpected error occurred:[39m

Error: Could not verify the proof
    at SimpleTaskDefinition.action (/home/bowen/Desktop/workspace/demo/eth/hardhat.config.js:273:26)
    at Environment._runTaskDefinition (/home/bowen/Desktop/workspace/demo/eth/node_modules/[4mhardhat[24m/src/internal/core/runtime-environment.ts:217:14)
    at Environment.run (/home/bowen/Desktop/workspace/demo/eth/node_modules/[4mhardhat[24m/src/internal/core/runtime-environment.ts:129:14)
    at main (/home/bowen/Desktop/workspace/demo/eth/node_modules/[4mhardhat[24m/src/internal/cli/cli.ts:197:5)


Command:

```
zkml claim_bounty --payment 0x2546BcD3c84621e976D8185a91A922aE77ECEc30 --model ./model  --dataset ./dataset  --publickey ./keys/out_public.json --settings ./settings.json
```

In [145]:
zkml claim_bounty --payment 0xDD63369Cd353f731De50cd2d5F6594Dd7B1083bA --model ./model  --dataset ./dataset  --publickey ./keys/out_public.json --settings ./settings.json

Mean Squared Error actual:  0.19852095774279208
... quantized  1305
Mean Squared Error simulated:  48.60190441311114995
... quantized  65536
Circuit Outputs:
318502
Proof took 20.227 s
ERROR: Invalid proof
[31mAn unexpected error occurred:[39m

Error: Could not verify the proof
    at SimpleTaskDefinition.action (/home/bowen/Desktop/workspace/demo/eth/hardhat.config.js:273:26)
    at Environment._runTaskDefinition (/home/bowen/Desktop/workspace/demo/eth/node_modules/[4mhardhat[24m/src/internal/core/runtime-environment.ts:217:14)
    at Environment.run (/home/bowen/Desktop/workspace/demo/eth/node_modules/[4mhardhat[24m/src/internal/core/runtime-environment.ts:129:14)
    at main (/home/bowen/Desktop/workspace/demo/eth/node_modules/[4mhardhat[24m/src/internal/cli/cli.ts:197:5)


In [17]:
zkml list_bounties --hash 14797455496207951391356508759149962584765968173479481191220882411966396840571

Available bounties on dataset: 14797455496207951391356508759149962584765968173479481191220882411966396840571
[
  {
    PubKey1: [32m'4335450774744029667338374268876724953162212166350367311071783936960844219437'[39m,
    PubKey2: [32m'12294985779291632745949915528747628813970908319399977746384186218556045373103'[39m,
    MSEcap: [32m'12888'[39m,
    Bounty: [32m'5.0'[39m,
    Issuer: [32m'0xd3162F2B88d05C882a1B26031E144753337ACDBF'[39m,
    IPFS: [32m'QmWLRJVL5uViT7h64bdeUM3GKMWP9DSWRggGC8igDuQdHR'[39m
  }
]


In [18]:
zkml list_datasets

Available datasets:
[]
