# Section: Encrypted Deep Learning

Learn how to predict on encrypted data while the model that you're using for prediction is also encrypted. Take all the parameters of our model and in our dataset, encrypt them and build a run prediction while it's in an encrypted state. 

- Lesson: Reviewing Additive Secret Sharing
- Lesson: Encrypted Subtraction and Public/Scalar Multiplication
- Lesson: Encrypted Computation in PySyft
- Project: Build an Encrypted Database
- Lesson: Encrypted Deep Learning in PyTorch
- Lesson: Encrypted Deep Learning in Keras
- Final Project

# Lesson: Reviewing Additive Secret Sharing

_For more great information about SMPC protocols like this one, visit https://mortendahl.github.io. With permission, Morten's work directly inspired this first teaching segment._

* `encode()`, `decode()`
    * take our decimal number and encode them using fixed precision
* `encrypt()`, `decrypt()`
    * encrypt takes a decrypted number and converts it into three shares specifically and decrypt does the opposite
    * focus on three party NPC 

In [1]:
import random
import numpy as np

BASE = 10

PRECISION_INTEGRAL = 8
PRECISION_FRACTIONAL = 8
Q = 293973345475167247070445277780365744413

PRECISION = PRECISION_INTEGRAL + PRECISION_FRACTIONAL

assert(Q > BASE**PRECISION)

def encode(rational):
    upscaled = int(rational * BASE**PRECISION_FRACTIONAL)
    field_element = upscaled % Q
    return field_element

def decode(field_element):
    upscaled = field_element if field_element <= Q/2 else field_element - Q
    rational = upscaled / BASE**PRECISION_FRACTIONAL
    return rational

def encrypt(secret):
    first  = random.randrange(Q)
    second = random.randrange(Q)
    third  = (secret - first - second) % Q
    return [first, second, third]

def decrypt(sharing):
    return sum(sharing) % Q

def add(a, b):
    c = list()
    for i in range(len(a)):
        c.append((a[i] + b[i]) % Q)
    return tuple(c)

In [11]:
# encode 5.5 and encrypt it
x = encrypt(encode(5.5))
x

[254896672618373432957458641401131068308,
 70507723943966050542400807089569501182,
 262542294387995010641031107070580919336]

In [8]:
decode(decrypt(x))

5.5

In [12]:
# encode 2.3 and encrypt it
y = encrypt(encode(2.3))
y

[13511971430580352317927105767615962278,
 122325886910351309041885371848514781736,
 158135487134235585710632800164465000398]

In [13]:
# addition modulus a large prime
z = add(x,y)
z

(268408644048953785275385747168747030586,
 192833610854317359584286178938084282918,
 126704436047063349281218629454680175321)

In [14]:
# decrypt z and decode it
decode(decrypt(z))

7.79999999

We are encoding each number as a tuple length 3. Build a few more functions that operate on top of these tuples. How we can do other operations that combine these tuples, perform functions over these tuples to do things like: subtraction, scalar multiplication, encrypted multiplication.

# Lesson: Encrypted Subtraction and Public/Scalar Multiplication

* subtract two encrypted numbers from each other
* multiply an encrypted number by a decrypted scalar

#### subtraction

In [15]:
def sub(a, b):
    c = list()
    for i in range(len(a)):
        c.append((a[i] - b[i]) % Q)
    return tuple(c)

In [16]:
field = 23740629843760239486723
x = 5

In [17]:
bob_x_share = 2372385723 # random number
alices_x_share = field - bob_x_share + x
alices_x_share

23740629843757867101005

Field arithmetic: When we add the two share together modulus the field size they will sum to x

In [18]:
# getting x
(bob_x_share + alices_x_share) % field

5

In [19]:
(bob_x_share + alices_x_share) % field

5

In [20]:
field = 10
x = 5
bobs_x_share = 8

In [21]:
alices_x_share = field - bobs_x_share + x
alices_x_share

7

In [22]:
# sum of the two shares
bobs_x_share + alices_x_share

15

In [23]:
# the remainder of the two shares
(bobs_x_share + alices_x_share) % field

5

In [24]:
# what if the sum is 25
25 % field, 55 % field

(5, 5)

In [25]:
# enconding of 5 (15) encoding of 3 (13)
15 + 13

28

In [26]:
28 % field

8

In [27]:
(15 + 2353) % field

8

When we add two numbers together that are inside of a field, whatever the remainder is also add up in the process but the field piece of it is equivalent to zero (it does not add anything to the final sum). It is only the last decimal place that actually matters because the rest is evenly divisible by our field (10). This is what makes the addition of these encrypted numbers work.

Subtraction works the same but in reverse. 

In [28]:
(55 - 23) % field

2

In [29]:
(15 - 13) % field

2

In [30]:
field = 10

x = 5

bob_x_share = 8
alice_x_share = field - bob_x_share + x

y = 1

bob_y_share = 9
alice_y_share = field - bob_y_share + y

In [31]:
alice_x_share + bob_x_share

15

In [32]:
alice_y_share + bob_y_share

11

In [33]:
# this encode a 6
(alice_x_share + bob_x_share) + (alice_y_share + bob_y_share)

26

In [34]:
# this encode a 6
((alice_x_share + bob_x_share) + (alice_y_share + bob_y_share)) % field

6

In [35]:
# this encodes a 4
(alice_x_share + bob_x_share) - (alice_y_share + bob_y_share)

4

In [36]:
((bob_x_share + alice_x_share) - (bob_y_share + alice_y_share)) % field

4

In [37]:
# is equivalent the one from above 
((bob_x_share - bob_y_share) + (alice_x_share - alice_y_share)) % field

4

When you have numbers that are in a field, if you add them together, the field components go away. Even though we are adding 15, we are actually adding 5. The adition can be done locally.

In [38]:
((bob_x_share - bob_y_share) + (alice_x_share - alice_y_share)) % field

4

In [39]:
bob_x_share + alice_x_share + bob_y_share + alice_y_share

26

In [40]:
bob_z_share = (bob_x_share - bob_y_share)
alice_z_share = (alice_x_share - alice_y_share)

In [41]:
(bob_z_share + alice_z_share) % field

4

In [42]:
def sub(a, b):
    c = list()
    for i in range(len(a)):
        c.append((a[i] - b[i]) % Q)
    return tuple(c)

In [43]:
field = 10

x = 5

bob_x_share = 8
alice_x_share = field - bob_x_share + x

y = 6

bob_y_share = 9
alice_y_share = field - bob_y_share + y

In [44]:
bob_x_share + alice_x_share

15

In [45]:
bob_y_share + alice_y_share

16

In [46]:
# decrypts to 1
((bob_x_share + alice_x_share) + (bob_y_share + alice_y_share)) % field

1

The field size is the maximum size of the number that you can represent. That's why we use large prime numbers because we want to be able to represent large numbers.

In [47]:
bob_x_share + alice_x_share

15

In [48]:
bob_y_share + alice_y_share

16

In [49]:
((bob_x_share + alice_x_share) - (bob_y_share + alice_y_share)) % field

9

It is more challenging to represent negative numbers.

In [50]:
# in the field it represents 10-6
(11 - 15) % field

6

* 6 can represent 6, 16 so on or 6 can represent -4, -14

In [51]:
((bob_y_share * 3) + (alice_y_share * 3)) % field

8

In [52]:
encode(-4)

293973345475167247070445277779965744413

In [53]:
x = encrypt(encode(-5.5))
x

[114147498003504527214657343162622905568,
 261707851591041489824458958147491208014,
 212091341355788477101774254250067375244]

In [54]:
encode(-5.5)

293973345475167247070445277779815744413

In [55]:
# our field
Q

293973345475167247070445277780365744413

In [56]:
decode(decrypt(x))

-5.5

If field element is half the field size, it's going to return the field element, otherwise it will return the field minus Q -> we re separating the field in two sections:
- 0, Q/2 represents 0 to Q/2
- Q/2, Q represents -Q/2 to 0

Encode positive decimal numbers as positive integers.

In [57]:
encode(-5.5)

293973345475167247070445277779815744413

In [58]:
# this is > than Q/2
decode(293973345475167247070445277779815744413)

-5.5

In [59]:
field = 10

x = 5

bob_x_share = 8
alice_x_share = field - bob_x_share + x

y = 1

bob_y_share = 9
alice_y_share = field - bob_y_share + y

In [60]:
# encoding 1
bob_y_share + alice_y_share

11

In [61]:
# encoding 2
bob_y_share + alice_y_share + bob_y_share + alice_y_share

22

Adding something to itself is `(bob_y_share + alice_y_share)*2`

In [62]:
# the remainder doubles
(bob_y_share + alice_y_share)*2

22

In [63]:
((bob_y_share*2) + (alice_y_share*2)) % field

2

#### multiplication of an encrypted number with a decrypted number

In [64]:
# multiplication of an encrypted number with a decrypted number
def imul(a, scalar):
    
    # logic here which can multiply by a public scalar
    
    c = list()
    
    for i in range(len(a)):
        c.append((a[i] * scalar) % Q)
        
    return tuple(c)

In [65]:
x = encrypt(encode(5.5))
x

[46476210617726845268786143540053252409,
 186186630550439737189265430853650139174,
 61310504307000664612393703387212352830]

In [66]:
z = imul(x, 3)
z

(139428631853180535806358430620159757227,
 264586546176151964497351014780584673109,
 183931512921001993837181110161637058490)

In [67]:
decode(decrypt(z))

16.5

# Lesson: Encrypted Computation in PySyft

In [69]:
import syft as sy
import torch as th
from torch import nn, optim

In [70]:
hook = sy.TorchHook(th)

In [71]:
# create virtual workers
bob = sy.VirtualWorker(hook, id="bob").add_worker(sy.local_worker)
alice = sy.VirtualWorker(hook, id="alice").add_worker(sy.local_worker)
secure_worker = sy.VirtualWorker(hook, id="secure_worker").add_worker(sy.local_worker)

In [72]:
# create data
x = th.tensor([1,2,3,4])
y = th.tensor([2,-1,1,0])

In [73]:
# share x to bob and alice
# crypto provider is generating random numbers and distribute them accordingly
# crypto provider increases performance significantly
x = x.share(bob, alice, crypto_provider=secure_worker)
y = y.share(bob, alice, crypto_provider=secure_worker)

In [74]:
# add x, y
z = x + y
z.get()

tensor([3, 1, 4, 4])

In [75]:
# subtract x, y
z = x - y
z.get()

tensor([-1,  3,  2,  4])

In [76]:
# encryptd multiplication
z = x * y
z.get()

tensor([ 2, -2,  3,  0])

#### comparison operator

In [77]:
z = x < y
z.get()

tensor([1, 0, 0, 0])

In [78]:
z = x == y
z.get()

tensor([0, 0, 0, 0])

#### operations on fixed precision tensors

In [79]:
x = th.tensor([1,2,3,4])
y = th.tensor([2,-1,1,0])

x = x.fix_precision().share(bob, alice, crypto_provider=secure_worker)
y = y.fix_precision().share(bob, alice, crypto_provider=secure_worker)

In [80]:
z = x + y
z.get().float_precision()

tensor([3., 1., 4., 4.])

In [81]:
z = x - y
z.get().float_precision()

tensor([-1.,  3.,  2.,  4.])

In [82]:
z = x * y
z.get().float_precision()

tensor([ 2., -2.,  3.,  0.])

In [83]:
z = x > y
z.get().float_precision()

tensor([0., 1., 1., 1.])

In [84]:
z = x < y
z.get().float_precision()

tensor([1., 0., 0., 0.])

In [85]:
z = x > y
z.get().float_precision()

tensor([0., 1., 1., 1.])

In [86]:
x[0:3].get().float_precision()

tensor([1., 2., 3.])

# Project: Build an Encrypted Database

* store strings as one-hot encodings of chas or as a list of character indices (map each char to a specific number)
* key-value database string representations: take individual strings and convert them into tensor operations 
* these tensor operations can be used to perform queries
* the database owner can't see any of the data because is encrypted
* you can't see what the people are querying for or the results of the query
* because all the database is encrypted using NPC, you can have a group of db owners (joint ownership)

In [87]:
# numerical representations for strings
# lookup tables that map chars to integers
import string

In [88]:
chr2index = {}
index2char = {}

In [89]:
# what we want to encode
# exception uppercase
# put this in a lookuptable
for i, char in enumerate(' ' + string.ascii_lowercase + '0123456789' + string.punctuation):
    chr2index[char] = i
    index2char[i] = char

Map between letters and numbers, we need to use tensors so we can use encrypted computation.

In [90]:
def string2values(str_input, max_length=8):

    # convert to lowercase
    str_input = str_input[:max_length].lower()
    # if we get a string too short, pad it with periods
    if (len(str_input) < max_length):
        str_input = str_input + "." * (max_length - len(str_input))
    # create the tensor of char integers, char indexes
    values = list()
    for char in str_input:
        values.append(chr2index[char])
    # convert it to a tensor and return it
    return th.tensor(values).long()

In [91]:
string2values("Hello")

tensor([ 8,  5, 12, 12, 15, 50, 50, 50])

In [92]:
string2values("Hello|fdgfgrdgfdsg")

tensor([ 8,  5, 12, 12, 15, 66,  6,  4])

Represent strings as on-hot vector.

In [93]:
def one_hot(index, length):
    # start with a vector of zeros
    vect = th.zeros(length).long()
    # set the index position to 1
    vect[index] = 1
    return vect

In [94]:
# encode 1 on index 3, the rest of them are zero
one_hot(3, 5)

tensor([0, 0, 0, 1, 0])

How can we represent characters as one-hot vectors? A vector of 69 chars.

In [98]:
# position of p in lookup table
one_hot(chr2index['p'], len(index2char))

tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [99]:
len(index2char)

69

Encode a string in a series of one-hot vectors: do this multiple times and concatenate these tensores together.

In [104]:
def strings_2_one_hot_matrix(str_input, max_length=8):
    # convert to lowercase
    str_input = str_input[:max_length].lower()
    # if we get a string too short, pad it with periods
    if (len(str_input) < max_length):
        str_input = str_input + "." * (max_length - len(str_input))
    # create the tensor of char integers, char indexes    
    char_vector = list()
    
    for char in str_input:
        char_v = one_hot(chr2index[char], len(index2char)).unsqueeze(0)
        char_vector.append(char_v)
    
    # concatenated vectors along the first dimension
    return th.cat(char_vector, dim=0)

In [105]:
matrix = strings_2_one_hot_matrix('Hello')

In [107]:
# 8 characters and 69 columns
matrix.shape

torch.Size([8, 69])

In [106]:
matrix

tensor([[0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 

Build the db to be a key value store of strings.

* store keys using one-hot encoding
* values using `strin2one_hot_matrix`

In [110]:
str_a = strings_2_one_hot_matrix('abcdefghij')
str_b = strings_2_one_hot_matrix('Hello')

In [112]:
# check to see if the two strings are not identical
# saying zero chars in common
(str_a * str_b).sum()

tensor(0)

In [140]:
str_a = strings_2_one_hot_matrix('Hello')
str_b = strings_2_one_hot_matrix('Hello')

In [141]:
# check to see if the two strings are not identical
# saying 2 chars in common
(str_a * str_b).sum()

tensor(8)

In [142]:
# how many chars are averlapping
(str_a * str_b).sum(0)

tensor([0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 2, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [143]:
# how many chars are averlapping as a matter of placement
vect = (str_a * str_b).sum(1)

In [144]:
x = vect[0]
x

tensor(1)

We care whether they match completely. In NPC, we can do string comparison using just encrypted multiplication.

In [145]:
x = vect[0]

# check if all of them match
for i in range(vect.shape[0]-1):
    x = x*vect[i]

# boolean value to determine whether or not these two keys match
key_match = x
key_match

tensor(1)

In [127]:
str_a = strings_2_one_hot_matrix('Hello')
str_b = strings_2_one_hot_matrix('Hello')

(str_a * str_b).sum()

tensor(8)

We can use this multiplication to be able to tell wether or not a given key is a perfect match with a query. (key-value lookup)

How we want to save things in db?

In [166]:
# string comparison
def strings_equal(str_a, str_b):
    # how many chars are averlapping as a matter of placement
    vect = (str_a * str_b).sum(1)

    x = vect[0]
    # check if all of them match
    for i in range(vect.shape[0]-1):
        x = x*vect[i]

    # boolean value to determine whether or not these two keys match
    str_match = x
    return str_match

In [157]:
keys = list()
values = list()

In [158]:
# dummy key and value
# store them as numerical representations
keys.append(strings_2_one_hot_matrix("key1"))
values.append(string2values('value1'))

keys.append(strings_2_one_hot_matrix("key2"))
values.append(string2values('value2'))

In [159]:
keys

[tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
         [0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
         [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
          0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
         [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
          0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
         [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,

In [160]:
values

[tensor([22,  1, 12, 21,  5, 28, 50, 50]),
 tensor([22,  1, 12, 21,  5, 29, 50, 50])]

How can we perform a query against this pseudo database? A query to a key-value store first need to compute whether the query matches any of the other keys

In [172]:
query_str = 'key2'

# convert it into a correct representation
query_matrix = strings_2_one_hot_matrix(query_str)

key_matches = list()
# we iterate over all the keys and figure out wehter any of them match
for key in keys:
    key_match = strings_equal(key, query_matrix)
    key_matches.append(key_match)
key_matches

[tensor(0), tensor(1)]

We can use this to index into the values and decide which value we want to return. We can mask all the values that don't have matching keys.

In [173]:
values

[tensor([22,  1, 12, 21,  5, 28, 50, 50]),
 tensor([22,  1, 12, 21,  5, 29, 50, 50])]

In [176]:
value_match = values[0] * key_matches[0]
value_match

tensor([0, 0, 0, 0, 0, 0, 0, 0])

In [178]:
result = values[0] * key_matches[0]
for i in range(len(values) - 1):
    result +=  values[i+1] * key_matches[i+1]
result

tensor([22,  1, 12, 21,  5, 29, 50, 50])

The result will be the numerical representation of the matching values.

In [175]:
values[1] * key_matches[1]

tensor([22,  1, 12, 21,  5, 29, 50, 50])

How can we get the string value back?
* a helper function to reverse the string to values encoding that we did

In [194]:
def values2string(input_values):
    s = ""
    for v in input_values:
        s += index2char[int(v)] 
    return s.replace(".", "")

In [196]:
# convert to string, the result from our db
values2string(result)

'value2'

In [198]:
def query(query_str):

    # convert it into a correct representation
    query_matrix = strings_2_one_hot_matrix(query_str)

    key_matches = list()
    # we iterate over all the keys and figure out wehter any of them match
    for key in keys:
        key_match = strings_equal(key, query_matrix)
        key_matches.append(key_match)

    result = values[0] * key_matches[0]
    for i in range(len(values) - 1):
        result +=  values[i+1] * key_matches[i+1]
        
    return values2string(result)

In [201]:
query("key1")

'value1'

Use these functions as a class like a encrypted db

In [259]:
def string2values(str_input, max_length=8):

    # convert to lowercase
    str_input = str_input[:max_length].lower()
    # if we get a string too short, pad it with periods
    if (len(str_input) < max_length):
        str_input = str_input + "." * (max_length - len(str_input))
    # create the tensor of char integers, char indexes
    values = list()
    for char in str_input:
        values.append(chr2index[char])
    # convert it to a tensor and return it
    return th.tensor(values).long()

def one_hot(index, length):
    # start with a vector of zeros
    vect = th.zeros(length).long()
    # set the index position to 1
    vect[index] = 1
    return vect

def strings_2_one_hot_matrix(str_input, max_length=8):
    # convert to lowercase
    str_input = str_input[:max_length].lower()
    # if we get a string too short, pad it with periods
    if (len(str_input) < max_length):
        str_input = str_input + "." * (max_length - len(str_input))
    # create the tensor of char integers, char indexes    
    char_vector = list()
    
    for char in str_input:
        char_v = one_hot(chr2index[char], len(index2char)).unsqueeze(0)
        char_vector.append(char_v)
    
    # concatenated vectors along the first dimension
    return th.cat(char_vector, dim=0)

# string comparison
def strings_equal(str_a, str_b):
    # how many chars are averlapping as a matter of placement
    vect = (str_a * str_b).sum(1)

    x = vect[0]
    # check if all of them match
    for i in range(vect.shape[0]-1):
        x = x*vect[i]

    # boolean value to determine whether or not these two keys match
    str_match = x
    return str_match

def values2string(input_values):
    s = ""
    for v in input_values:
        s += index2char[int(v)] 
    return s

Build encrypted programs that have decentralized governance

* create owners of the db (virtual workers)
* take the vector representation of the key and share it amongst the owners
* share the query
* the owners have joint governance over the encrypted data
* people can query it without revealing what they're querying to any of the shareholders
* store strings of arbitrary key-value store

In [278]:
class EncryptedDB():
    
    def __init__(self, *owners, max_key_len=8, max_val_len=8):
        self.max_key_len = max_key_len
        self.max_val_len = max_val_len 
 
        # dummy key and value
        # store them as numerical representations
        self.keys = list()
        self.values = list()
        self.owners = owners
        
    def add_entry(self, key, value):
        # add entry to the db
        key = strings_2_one_hot_matrix(key)
        key = key.share(*self.owners)
        self.keys.append(key)
        
        value = string2values(value, max_length=self.max_val_len)
        value = value.share(*self.owners)
        self.values.append(value)
        
    def query(self, query_str):

        # convert it into a correct representation
        query_matrix = strings_2_one_hot_matrix(query_str)
        query_matrix = query_matrix.share(*self.owners)

        key_matches = list()
        # we iterate over all the keys and figure out wehter any of them match
        for key in self.keys:
            key_match = strings_equal(key, query_matrix)
            key_matches.append(key_match)

        result = self.values[0] * key_matches[0]
        for i in range(len(self.values) - 1):
            result +=  self.values[i+1] * key_matches[i+1]
            
        result = result.get()

        return values2string(result).replace(".", "")

In [261]:
# initialize the class
db = EncryptedDB(bob, alice, secure_worker)

In [262]:
# add data to our db
db.add_entry("key1", "value1")
db.add_entry("key2", "value2")
db.add_entry("key3", "value3")
db.add_entry("key4", "value4")

In [263]:
db.keys

[(Wrapper)>[AdditiveSharingTensor]
 	-> (Wrapper)>[PointerTensor | me:57428958219 -> bob:10433452106]
 	-> (Wrapper)>[PointerTensor | me:26292433325 -> alice:15871960511]
 	-> (Wrapper)>[PointerTensor | me:54912356791 -> secure_worker:35564673677]
 	*crypto provider: me*, (Wrapper)>[AdditiveSharingTensor]
 	-> (Wrapper)>[PointerTensor | me:98191072014 -> bob:22244936844]
 	-> (Wrapper)>[PointerTensor | me:89837130567 -> alice:17993940160]
 	-> (Wrapper)>[PointerTensor | me:24745177277 -> secure_worker:57109363037]
 	*crypto provider: me*, (Wrapper)>[AdditiveSharingTensor]
 	-> (Wrapper)>[PointerTensor | me:7292816458 -> bob:31879441613]
 	-> (Wrapper)>[PointerTensor | me:78762171570 -> alice:38695869827]
 	-> (Wrapper)>[PointerTensor | me:96340048263 -> secure_worker:89838365384]
 	*crypto provider: me*, (Wrapper)>[AdditiveSharingTensor]
 	-> (Wrapper)>[PointerTensor | me:63953894912 -> bob:86555557159]
 	-> (Wrapper)>[PointerTensor | me:87785111562 -> alice:11809124870]
 	-> (Wrapper)

In [264]:
db.values

[(Wrapper)>[AdditiveSharingTensor]
 	-> (Wrapper)>[PointerTensor | me:97617523265 -> bob:37152636885]
 	-> (Wrapper)>[PointerTensor | me:34673684484 -> alice:34391138252]
 	-> (Wrapper)>[PointerTensor | me:8367808354 -> secure_worker:72344132210]
 	*crypto provider: me*, (Wrapper)>[AdditiveSharingTensor]
 	-> (Wrapper)>[PointerTensor | me:4238605886 -> bob:73994714885]
 	-> (Wrapper)>[PointerTensor | me:38569134292 -> alice:70338330025]
 	-> (Wrapper)>[PointerTensor | me:76303293969 -> secure_worker:77516578464]
 	*crypto provider: me*, (Wrapper)>[AdditiveSharingTensor]
 	-> (Wrapper)>[PointerTensor | me:7642171230 -> bob:141783828]
 	-> (Wrapper)>[PointerTensor | me:43265235065 -> alice:19425832542]
 	-> (Wrapper)>[PointerTensor | me:5458820934 -> secure_worker:91832385113]
 	*crypto provider: me*, (Wrapper)>[AdditiveSharingTensor]
 	-> (Wrapper)>[PointerTensor | me:61233873576 -> bob:66878695013]
 	-> (Wrapper)>[PointerTensor | me:4717786436 -> alice:26594208796]
 	-> (Wrapper)>[Poin

In [265]:
db.query('key1')

'value1'

In [279]:
# initialize the class
db_demo = EncryptedDB(bob, alice, secure_worker,  max_val_len=123)
# add data to our db
db_demo.add_entry("Bob", "(123) 456 789")
db_demo.add_entry("bill", "(635) 524 145")
db_demo.add_entry("Sam", "(968) 865 845")
db_demo.add_entry("key", "really big json value")

In [280]:
db_demo.query('Bob')

'(123) 456 789'

In [281]:
db_demo.query('bill')

'(635) 524 145'

# Lesson: Encrypted Deep Learning in PyTorch

### Train a Model

* predict with an encrypted neural network on encrypted data
* encrypt the mode and this dataset and then run the prediction ahile it's encrypted

In [282]:
from torch import nn
from torch import optim
import torch.nn.functional as F

# A Toy Dataset
data = th.tensor([[0,0],[0,1],[1,0],[1,1.]], requires_grad=True)
target = th.tensor([[0],[0],[1],[1.]], requires_grad=True)

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(2, 20)
        self.fc2 = nn.Linear(20, 1)

    def forward(self, x):
        x = self.fc1(x)
        x = F.relu(x)
        x = self.fc2(x)
        return x

# A Toy Model
model = Net()

def train():
    # Training Logic
    opt = optim.SGD(params=model.parameters(),lr=0.1)
    for iter in range(20):

        # 1) erase previous gradients (if they exist)
        opt.zero_grad()

        # 2) make a prediction
        pred = model(data)

        # 3) calculate how much we missed
        loss = ((pred - target)**2).sum()

        # 4) figure out which weights caused us to miss
        loss.backward()

        # 5) change those weights
        opt.step()

        # 6) print our progress
        print(loss.data)
        
train()

tensor(2.6329)
tensor(15.9837)
tensor(15.9501)
tensor(1.1034)
tensor(0.9866)
tensor(0.9554)
tensor(0.9267)
tensor(0.8951)
tensor(0.8602)
tensor(0.8212)
tensor(0.7775)
tensor(0.7285)
tensor(0.6745)
tensor(0.6291)
tensor(0.5797)
tensor(0.5340)
tensor(0.4969)
tensor(0.4419)
tensor(0.3720)
tensor(0.2825)


In [283]:
# predict model on the data
model(data)

tensor([[0.2125],
        [0.1941],
        [1.0298],
        [0.6200]], grad_fn=<AddmmBackward>)

## Encrypt the Model and Data

In [284]:
# encrypt the model
encrypted_model = model.fix_precision().share(alice, bob, crypto_provider=secure_worker)

In [285]:
list(encrypted_model.parameters())

[Parameter containing:
 Parameter>FixedPrecisionTensor>(Wrapper)>[AdditiveSharingTensor]
 	-> (Wrapper)>[PointerTensor | me:29163722827 -> alice:30123048733]
 	-> (Wrapper)>[PointerTensor | me:87668546365 -> bob:7504800186]
 	*crypto provider: secure_worker*, Parameter containing:
 Parameter>FixedPrecisionTensor>(Wrapper)>[AdditiveSharingTensor]
 	-> (Wrapper)>[PointerTensor | me:27151148522 -> alice:33955072103]
 	-> (Wrapper)>[PointerTensor | me:99552871649 -> bob:19378474524]
 	*crypto provider: secure_worker*, Parameter containing:
 Parameter>FixedPrecisionTensor>(Wrapper)>[AdditiveSharingTensor]
 	-> (Wrapper)>[PointerTensor | me:82058695091 -> alice:91061005839]
 	-> (Wrapper)>[PointerTensor | me:21990064968 -> bob:79264288605]
 	*crypto provider: secure_worker*, Parameter containing:
 Parameter>FixedPrecisionTensor>(Wrapper)>[AdditiveSharingTensor]
 	-> (Wrapper)>[PointerTensor | me:73644405741 -> alice:52426651070]
 	-> (Wrapper)>[PointerTensor | me:51744562958 -> bob:303802184

In [286]:
# encrypt the data
encrypted_data = data.fix_precision().share(alice, bob, crypto_provider=secure_worker)

In [287]:
encrypted_data

(Wrapper)>FixedPrecisionTensor>(Wrapper)>[AdditiveSharingTensor]
	-> (Wrapper)>[PointerTensor | me:11997727310 -> alice:71349418894]
	-> (Wrapper)>[PointerTensor | me:9141575990 -> bob:76757096518]
	*crypto provider: secure_worker*

In [288]:
# encrypt prediction
encrypted_prediction = encrypted_model(encrypted_data)

In [289]:
# get the prediction
encrypted_prediction.get().float_precision()

tensor([[0.2130],
        [0.1950],
        [1.0290],
        [0.6190]])

This encrypted prediction powers a new application called **encrypted machine learning as a service**. 
* If Alice was the data owner and bob was the model owner -> Bob could allow Alice to use his model to predict without Bob ever seeing Alice's data and without Alice ever seeing Bob's model.
* Bob's IP is protected, he doesn't have to reveal the parameters while Alice's privacy is protected.
* Image classifier without the image ever being able to be viewable by anyone else. The output is only decrypted by the data owner. 

# Lesson: Encrypted Deep Learning in Keras


## Step 1: Public Training

Welcome to this tutorial! In the following notebooks you will learn how to provide private predictions. By private predictions, we mean that the data is constantly encrypted throughout the entire process. At no point is the user sharing raw data, only encrypted (that is, secret shared) data. In order to provide these private predictions, Syft Keras uses a library called [TF Encrypted](https://github.com/tf-encrypted/tf-encrypted) under the hood. TF Encrypted combines cutting-edge cryptographic and machine learning techniques, but you don't have to worry about this and can focus on your machine learning application.

You can start serving private predictions with only three steps:
- **Step 1**: train your model with normal Keras.
- **Step 2**: secure and serve your machine learning model (server).
- **Step 3**: query the secured model to receive private predictions (client). 

Alright, let's go through these three steps so you can deploy impactful machine learning services without sacrificing user privacy or model security.

Huge shoutout to the Dropout Labs ([@dropoutlabs](https://twitter.com/dropoutlabs)) and TF Encrypted ([@tf_encrypted](https://twitter.com/tf_encrypted)) teams for their great work which makes this demo possible, especially: Jason Mancuso ([@jvmancuso](https://twitter.com/jvmancuso)), Yann Dupis ([@YannDupis](https://twitter.com/YannDupis)), and Morten Dahl ([@mortendahlcs](https://github.com/mortendahlcs)). 

_Demo Ref: https://github.com/OpenMined/PySyft/tree/dev/examples/tutorials_

## Train Your Model in Keras

To use privacy-preserving machine learning techniques for your projects you should not have to learn a new machine learning framework. If you have basic [Keras](https://keras.io/) knowledge, you can start using these techniques with Syft Keras. If you have never used Keras before, you can learn a bit more about it through the [Keras documentation](https://keras.io). 

Before serving private predictions, the first step is to train your model with normal Keras. As an example, we will train a model to classify handwritten digits. To train this model we will use the canonical [MNIST dataset](http://yann.lecun.com/exdb/mnist/).

We borrow [this example](https://github.com/keras-team/keras/blob/master/examples/mnist_cnn.py) from the reference Keras repository.  To train your classification model, you just run the cell below.

In [63]:
from __future__ import print_function
import tensorflow.keras as keras
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Flatten
from tensorflow.keras.layers import Conv2D, AveragePooling2D
from tensorflow.keras.layers import Activation

batch_size = 128
num_classes = 10
epochs = 2

# input image dimensions
img_rows, img_cols = 28, 28

# the data, split between train and test sets
(x_train, y_train), (x_test, y_test) = mnist.load_data()

x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
input_shape = (img_rows, img_cols, 1)

x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')

# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

model = Sequential()

model.add(Conv2D(10, (3, 3), input_shape=input_shape))
model.add(AveragePooling2D((2, 2)))
model.add(Activation('relu'))
model.add(Conv2D(32, (3, 3)))
model.add(AveragePooling2D((2, 2)))
model.add(Activation('relu'))
model.add(Conv2D(64, (3, 3)))
model.add(AveragePooling2D((2, 2)))
model.add(Activation('relu'))
model.add(Flatten())
model.add(Dense(num_classes, activation='softmax'))

model.compile(loss=keras.losses.categorical_crossentropy,
              optimizer=keras.optimizers.Adadelta(),
              metrics=['accuracy'])

model.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=epochs,
          verbose=1,
          validation_data=(x_test, y_test))
score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

x_train shape: (60000, 28, 28, 1)
60000 train samples
10000 test samples
Instructions for updating:
Colocations handled automatically by placer.


Instructions for updating:
Colocations handled automatically by placer.


Train on 60000 samples, validate on 10000 samples
Instructions for updating:
Use tf.cast instead.


Instructions for updating:
Use tf.cast instead.


Epoch 1/2
Epoch 2/2
Test loss: 0.1548735132828355
Test accuracy: 0.9517


In [64]:
## Save your model's weights for future private prediction
model.save('short-conv-mnist.h5')

## Step 2: Load and Serve the Model

Now that you have a trained model with normal Keras, you are ready to serve some private predictions. We can do that using Syft Keras.

To secure and serve this model, we will need three TFEWorkers (servers). This is because TF Encrypted under the hood uses an encryption technique called [multi-party computation (MPC)](https://en.wikipedia.org/wiki/Secure_multi-party_computation). The idea is to split the model weights and input data into shares, then send a share of each value to the different servers. The key property is that if you look at the share on one server, it reveals nothing about the original value (input data or model weights).

We'll define a Syft Keras model like we did in the previous notebook. However, there is a trick: before instantiating this model, we'll run `hook = sy.KerasHook(tf.keras)`. This will add three important new methods to the Keras Sequential class:
 - `share`: will secure your model via secret sharing; by default, it will use the SecureNN protocol from TF Encrypted to secret share your model between each of the three TFEWorkers. Most importantly, this will add the capability of providing predictions on encrypted data.
 - `serve`: this function will launch a serving queue, so that the TFEWorkers can can accept prediction requests on the secured model from external clients.
 - `shutdown_workers`: once you are done providing private predictions, you can shut down your model by running this function. It will direct you to shutdown the server processes manually if you've opted to manually manage each worker.

If you want learn more about MPC, you can read this excellent [blog](https://mortendahl.github.io/2017/04/17/private-deep-learning-with-mpc/).

In [65]:
import numpy as np
import tensorflow as tf
from tensorflow.keras import Sequential
from tensorflow.keras.layers import AveragePooling2D, Conv2D, Dense, Activation, Flatten, ReLU, Activation

import syft as sy
hook = sy.KerasHook(tf.keras)

## Model

As you can see, we define almost the exact same model as before, except we provide a `batch_input_shape`. This allows TF Encrypted to better optimize the secure computations via predefined tensor shapes. For this MNIST demo, we'll send input data with the shape of (1, 28, 28, 1). 
We also return the logit instead of softmax because this operation is complex to perform using MPC, and we don't need it to serve prediction requests.

In [66]:
num_classes = 10
input_shape = (1, 28, 28, 1)

model = Sequential()

model.add(Conv2D(10, (3, 3), batch_input_shape=input_shape))
model.add(AveragePooling2D((2, 2)))
model.add(Activation('relu'))
model.add(Conv2D(32, (3, 3)))
model.add(AveragePooling2D((2, 2)))
model.add(Activation('relu'))
model.add(Conv2D(64, (3, 3)))
model.add(AveragePooling2D((2, 2)))
model.add(Activation('relu'))
model.add(Flatten())
model.add(Dense(num_classes, name="logit"))

### Load Pre-trained Weights

With `load_weights` you can easily load the weights you have saved previously after training your model.

In [67]:
pre_trained_weights = 'short-conv-mnist.h5'
model.load_weights(pre_trained_weights)

## Step 3: Setup Your Worker Connectors

Let's now connect to the TFEWorkers (`alice`, `bob`, and `carol`) required by TF Encrypted to perform private predictions. For each TFEWorker, you just have to specify a host.

These workers run a [TensorFlow server](https://www.tensorflow.org/api_docs/python/tf/distribute/Server), which you can either manage manually (`AUTO = False`) or ask the workers to manage for you (`AUTO = True`). If choosing to manually manage them, you will be instructed to execute a terminal command on each worker's host device after calling `model.share()` below.  If all workers are hosted on a single device (e.g. `localhost`), you can choose to have Syft automatically manage the worker's TensorFlow server.

In [68]:
AUTO = False

alice = sy.TFEWorker(host='localhost:4000', auto_managed=AUTO)
bob = sy.TFEWorker(host='localhost:4001', auto_managed=AUTO)
carol = sy.TFEWorker(host='localhost:4002', auto_managed=AUTO)

## Step 4: Split the Model Into Shares

Thanks to `sy.KerasHook(tf.keras)` you can call the `share` method to transform your model into a TF Encrypted Keras model.

If you have asked to manually manage servers above then this step will not complete until they have all been launched. Note that your firewall may ask for Python to accept incoming connection.

In [69]:
model.share(alice, bob, carol)

INFO:tf_encrypted:If not done already, please launch the following command in a terminal on host 'localhost:4000':
'python -m tf_encrypted.player --config /tmp/tfe.config server0'
This can be done automatically in a local subprocess by setting `auto_managed=True` when instantiating a TFEWorker.
INFO:tf_encrypted:If not done already, please launch the following command in a terminal on host 'localhost:4001':
'python -m tf_encrypted.player --config /tmp/tfe.config server1'
This can be done automatically in a local subprocess by setting `auto_managed=True` when instantiating a TFEWorker.
INFO:tf_encrypted:If not done already, please launch the following command in a terminal on host 'localhost:4002':
'python -m tf_encrypted.player --config /tmp/tfe.config server2'
This can be done automatically in a local subprocess by setting `auto_managed=True` when instantiating a TFEWorker.
INFO:tf_encrypted:Starting session on target 'grpc://localhost:4000' using config graph_options {
}



## Step 5: Launch 3 Servers

```
python -m tf_encrypted.player --config /tmp/tfe.config server0
python -m tf_encrypted.player --config /tmp/tfe.config server1
python -m tf_encrypted.player --config /tmp/tfe.config server2```

## Step 6: Serve the Model

Perfect! Now by calling `model.serve`, your model is ready to provide some private predictions. You can set `num_requests` to set a limit on the number of predictions requests served by the model; if not specified then the model will be served until interrupted.

In [70]:
model.serve(num_requests=3)

Served encrypted prediction 1 to client.
Served encrypted prediction 2 to client.
Served encrypted prediction 3 to client.


## Step 7: Run the Client

At this point open up and run the companion notebook: Section 4b - Encrytped Keras Client

## Step 8: Shutdown the Servers

Once your request limit above, the model will no longer be available for serving requests, but it's still secret shared between the three workers above. You can kill the workers by executing the cell below.

**Congratulations** on finishing Part 12: Secure Classification with Syft Keras and TFE!

In [None]:
model.shutdown_workers()

if not AUTO:
    process_ids = !ps aux | grep '[p]ython -m tf_encrypted.player --config /tmp/tfe.config' | awk '{print $2}'
    for process_id in process_ids:
        !kill {process_id}
        print("Process ID {id} has been killed.".format(id=process_id))