## Mid-level benchmarks

### Private Information Retrieval (PIR)

One application of HE is to retrieve a data point from a database held elsewhere, without the database holder knowing which point is being requested. If Alice wants to query Bob's database, she simply encrypts an array full of zeros, except for a "1" in the position of the desired data point, and sends this ciphertext to Bob. He then performs homomorphic multiplication and addition, and sends the ciphertext back to Alice, who decrypts it to retrieve the data point she requested.

To demonstrate this on a very small scale, lets look at a "database" containing only two values: 123 and 456:

In [1]:
import os
if "SHEEP_HOME" in os.environ.keys():
  SHEEP_HOME = os.environ["SHEEP_HOME"]
else:
  SHEEP_HOME = os.path.join(os.environ["HOME"],"SHEEP","pysheep")
import sys
sys.path.append(SHEEP_HOME)


from pysheep.mid_level_benchmarks import generate_pir_circuit, \
    generate_variance_circuit, generate_bitonic_sort_circuit, generate_gaussian_inputs
from pysheep import benchmark_utils
from pysheep import common_utils
from pysheep import sheep_client

First lets setup the SHEEP client (assuming the SHEEP server is running).  We can choose the "HElib_Fp" context, and the input type to be 8-bit integers.

In [2]:
sheep_client.new_job() 
sheep_client.set_context("HElib_Fp")
sheep_client.set_input_type("int16_t")

{'content': '', 'status_code': 200}

Now let's generate the circuit file for this simple PIR case with two values in the "database":

In [3]:
circuit_file=generate_pir_circuit(2,[2])

Lets give this to the SHEEP server, and then see what inputs it expects?

In [4]:
sheep_client.set_circuit(circuit_file)
sheep_client.get_inputs()

{'content': ['d_0_0_0', 'd_0_1_0', 's_0_0', 's_0_1'], 'status_code': 200}

"d_a_b_c" are the "database" values, and "s_x_y" are the "selector" values.  We can set d_0_0_0 and d_0_1_0 to "123" and "456" respectively.  Then, to select d_0_0_0 (which will hopefully be the value "123") from the database, we should set s_0_0 to 1 and s_0_1 to zero.

In [5]:
sheep_client.set_inputs({"d_0_0_0": [123], "d_0_1_0": [456], "s_0_0": [1], "s_0_1": [0]})

{'content': '', 'status_code': 200}

In [6]:
sheep_client.run_job()
results = sheep_client.get_results()

In [7]:
results


{'content': {'cleartext check': {'is_correct': True},
  'outputs': {'e_0_0': ['123']},
  'timings': {'c_0_0_0': '1326.900000',
   'c_0_1_0': '1265.000000',
   'decryption': '319.100000',
   'e_0_0': '67.000000',
   'encryption': '3646.700000',
   'evaluation': '2863.700000'}},
 'status_code': 200}

So we successfully got the output value "123", i.e. the first entry in the "database".

### PIR with a more complex circuit

In [8]:
sheep_client.new_job()
sheep_client.set_input_type("int16_t")
sheep_client.set_context("HElib_Fp")

{'content': '', 'status_code': 200}

Assuming we will normally want to query a database containing N>2 items, one might think that we need to encrypt and send N "s"-values to identify the data point that we want.  However, we can be smarter than that by using a binary tree structure for the data, and having the "s"-values dictate how we navigate the tree to locate the desired data point. 

Lets generate a circuit file corresponding to a database containing 32 values, arranged in a tree with 5 layers and 2 choices per layer ($2^5 = 32$).

In [9]:
circuit_file=generate_pir_circuit(32,[2,2,2,2,2])
circuit_file

'/frontend/benchmark_inputs/mid_level/circuits/circuit-pir-32_2_2_2_2_2.sheep'

Let's now fill this database with values 0-to-31:

In [10]:
sheep_client.set_circuit(circuit_file)
inputlist=sheep_client.get_inputs()['content']
data = [(x, [i]) for (i, x) in enumerate(inputlist) if x.startswith('d_')]

To start with, we'll set all the "s" (selector) inputs to zero:

In [11]:
data += [(x, [0]) for (i, x) in enumerate(inputlist) if x.startswith('s_')]
data = dict(data)
data

{'d_0_0_0': [0],
 'd_0_0_1': [1],
 'd_0_0_10': [10],
 'd_0_0_11': [11],
 'd_0_0_12': [12],
 'd_0_0_13': [13],
 'd_0_0_14': [14],
 'd_0_0_15': [15],
 'd_0_0_2': [2],
 'd_0_0_3': [3],
 'd_0_0_4': [4],
 'd_0_0_5': [5],
 'd_0_0_6': [6],
 'd_0_0_7': [7],
 'd_0_0_8': [8],
 'd_0_0_9': [9],
 'd_0_1_0': [16],
 'd_0_1_1': [17],
 'd_0_1_10': [26],
 'd_0_1_11': [27],
 'd_0_1_12': [28],
 'd_0_1_13': [29],
 'd_0_1_14': [30],
 'd_0_1_15': [31],
 'd_0_1_2': [18],
 'd_0_1_3': [19],
 'd_0_1_4': [20],
 'd_0_1_5': [21],
 'd_0_1_6': [22],
 'd_0_1_7': [23],
 'd_0_1_8': [24],
 'd_0_1_9': [25],
 's_0_0': [0],
 's_0_1': [0],
 's_1_0': [0],
 's_1_1': [0],
 's_2_0': [0],
 's_2_1': [0],
 's_3_0': [0],
 's_3_1': [0],
 's_4_0': [0],
 's_4_1': [0]}

Now we need to set certain selector variables to "1" to navigate down the tree.  Suppose we want to choose the last element of the database (should be value "31") - we need to go down the right-hand branches:

In [12]:
data['s_0_1'] = [1]
data['s_1_1'] = [1]
data['s_2_1'] = [1]
data['s_3_1'] = [1]
data['s_4_1'] = [1]

In [13]:
sheep_client.set_inputs(data)

{'content': '', 'status_code': 200}

In [14]:
sheep_client.set_timeout(120)
sheep_client.set_parameters({"Levels": 30})

{'content': '', 'status_code': 200}

In [15]:
sheep_client.run_job()

{'content': '', 'status_code': 200}

In [16]:
results = sheep_client.get_results()

In [17]:
results

{'content': {'cleartext check': {'is_correct': True},
  'outputs': {'e_4_0': ['31']},
  'timings': {'c_0_0_0': '3506.700000',
   'c_0_0_1': '3382.600000',
   'c_0_0_10': '3440.900000',
   'c_0_0_11': '3532.100000',
   'c_0_0_12': '3479.900000',
   'c_0_0_13': '3493.700000',
   'c_0_0_14': '3426.700000',
   'c_0_0_15': '3415.500000',
   'c_0_0_2': '3420.400000',
   'c_0_0_3': '3382.200000',
   'c_0_0_4': '3418.900000',
   'c_0_0_5': '3459.200000',
   'c_0_0_6': '3407.700000',
   'c_0_0_7': '3373.500000',
   'c_0_0_8': '3371.900000',
   'c_0_0_9': '3576.000000',
   'c_0_1_0': '3377.400000',
   'c_0_1_1': '3471.700000',
   'c_0_1_10': '3442.200000',
   'c_0_1_11': '3382.500000',
   'c_0_1_12': '3375.800000',
   'c_0_1_13': '3439.200000',
   'c_0_1_14': '3373.400000',
   'c_0_1_15': '3392.700000',
   'c_0_1_2': '3407.000000',
   'c_0_1_3': '3371.600000',
   'c_0_1_4': '3414.200000',
   'c_0_1_5': '3366.900000',
   'c_0_1_6': '3366.800000',
   'c_0_1_7': '3432.700000',
   'c_0_1_8': '3385.6

## Calculating mean and variance of a set of inputs

One may wish to calculate statistical properties of a set of encrypted inputs, such as mean, standard deviation etc.   Currently the contexts implemented in SHEEP do not have "Divide" operations, so calculating these exact values via only homomorphic operations on the ciphertext is not possible.  

However, the client will necessarily know "N", the number of inputs, so can perform division in the clear on the decrypted results of the homomorphic calculations.

We therefore only need homomorphic addition and multiplication.  Simply summing the inputs $x_i$ gives us $N\bar{x}$. 
Meanwhile $\Sigma_{i=0}^N(Nx_i - N\bar{x})^2$  is $N^3$ times the variance.



In [41]:
# reset the sheep server settings, - this time we'll use uint32_t and SEAL
sheep_client.new_job()
sheep_client.set_input_type("uint32_t")
sheep_client.set_context("SEAL")

{'content': '', 'status_code': 200}

Let's generate a circuit to calculate $(N \times mean)$ and ($N^3 \times variance)$ of a set of 10 inputs:

In [42]:
num_inputs = 10
circuit_file = generate_variance_circuit(num_inputs)
sheep_client.set_circuit(circuit_file)
print(sheep_client.get_circuit()["content"]["circuit"])

CONST_INPUTS N
INPUTS x_0 x_1 x_2 x_3 x_4 x_5 x_6 x_7 x_8 x_9
OUTPUTS Nxbar varianceN3
 x_0 x_1 ADD y_0
 y_0 x_2 ADD y_1
 y_1 x_3 ADD y_2
 y_2 x_4 ADD y_3
 y_3 x_5 ADD y_4
 y_4 x_6 ADD y_5
 y_5 x_7 ADD y_6
 y_6 x_8 ADD y_7
 y_7 x_9 ADD y_8
 y_8 ALIAS Nxbar
 x_0 N MULTBYCONST Nx_0
 Nxbar Nx_0 SUBTRACT v_0
 v_0 ALIAS vv_0
 v_0 vv_0 MULTIPLY s_0
 x_1 N MULTBYCONST Nx_1
 Nxbar Nx_1 SUBTRACT v_1
 v_1 ALIAS vv_1
 v_1 vv_1 MULTIPLY s_1
 x_2 N MULTBYCONST Nx_2
 Nxbar Nx_2 SUBTRACT v_2
 v_2 ALIAS vv_2
 v_2 vv_2 MULTIPLY s_2
 x_3 N MULTBYCONST Nx_3
 Nxbar Nx_3 SUBTRACT v_3
 v_3 ALIAS vv_3
 v_3 vv_3 MULTIPLY s_3
 x_4 N MULTBYCONST Nx_4
 Nxbar Nx_4 SUBTRACT v_4
 v_4 ALIAS vv_4
 v_4 vv_4 MULTIPLY s_4
 x_5 N MULTBYCONST Nx_5
 Nxbar Nx_5 SUBTRACT v_5
 v_5 ALIAS vv_5
 v_5 vv_5 MULTIPLY s_5
 x_6 N MULTBYCONST Nx_6
 Nxbar Nx_6 SUBTRACT v_6
 v_6 ALIAS vv_6
 v_6 vv_6 MULTIPLY s_6
 x_7 N MULTBYCONST Nx_7
 Nxbar Nx_7 SUBTRACT v_7
 v_7 ALIAS vv_7
 v_7 vv_7 MULTIPLY s_7
 x_8 N MULTBYCONST Nx_8
 Nxbar Nx_8 SUB

To generate the inputs, lets use a Gaussian with $\mu=50$ and $\sigma = 10$ (all input values rounded to integers):

In [43]:
input_vals = generate_gaussian_inputs(num_inputs,50,10)
input_vals

{'x_0': [34],
 'x_1': [56],
 'x_2': [55],
 'x_3': [43],
 'x_4': [48],
 'x_5': [54],
 'x_6': [71],
 'x_7': [56],
 'x_8': [48],
 'x_9': [34]}

In [44]:
sheep_client.set_inputs(input_vals)
sheep_client.set_const_inputs({"N": 10})

{'content': '', 'status_code': 200}

In [45]:
sheep_client.run_job()
results = sheep_client.get_results()['content']

In [46]:
results["outputs"]

{'Nxbar': ['499'], 'varianceN3': ['30368']}

In the clear, we can then divide these by $N$ and $N^3$ respectively:

In [47]:
print(float(results["outputs"]["Nxbar"][0])/num_inputs , float(results["outputs"]["varianceN3"][0])/pow(num_inputs,3))

49.9 30.368


## Bitonic sort

Sorting a list of inputs is another non-trivial operation that can be performed using a combination of straightforward homomorphic operations - namely "Select" and "Compare".


In [2]:
sheep_client.new_job()
sheep_client.set_input_type("int8_t")
sheep_client.set_context("HElib_F2")
circuit_file = os.path.join(SHEEP_HOME,"benchmark_inputs","mid_level","circuits","circuit-bitonic-sort-4.sheep")
sheep_client.set_circuit(circuit_file)

{'content': '', 'status_code': 200}

What inputs does this circuit expect?

In [3]:
sheep_client.get_inputs()

{'content': ['i0', 'i1', 'i2', 'i3'], 'status_code': 200}

In [4]:
sheep_client.set_inputs({"i0":[4],"i1":[8],"i2":[1],"i3": [3]})

{'content': '', 'status_code': 200}

In [5]:
sheep_client.set_timeout(120)
sheep_client.run_job()
results = sheep_client.get_results()['content']

In [6]:
results["outputs"]

{'w31': ['1'], 'w32': ['3'], 'w36': ['4'], 'w37': ['8']}

In [7]:
sheep_client.upload_results("bitonic_sort_4")

{'content': 'uploaded OK', 'status_code': 200}

In [8]:
sheep_client.set_context("TFHE")
sheep_client.run_job()
results_tfhe = sheep_client.get_results()['content']

In [9]:
results_tfhe["outputs"]

{'w31': ['1'], 'w32': ['3'], 'w36': ['4'], 'w37': ['8']}

In [10]:
sheep_client.upload_results("bitonic_sort_4")

{'content': 'uploaded OK', 'status_code': 200}

In [11]:
import pandas as pd
from pysheep.database import session, BenchmarkMeasurement, Timing

In [15]:
rows = session.query(BenchmarkMeasurement).filter_by(circuit_name="bitonic_sort_4").all()

In [21]:
circuit = sheep_client.get_circuit()["content"]["circuit"]
timing_results = results["timings"]

In [22]:
from pysheep.benchmark_utils import *
timingdict = timing_per_gate_type(timing_results, circuit)

In [23]:
timingdict

{'ALIAS': 9098438.3, 'COMPARE': 9044573.299999999, 'SELECT': 27303934.0}