## Mid-level benchmarks

### Private Information Retrieval (PIR)

One application of HE is to retrieve a data point from a database held elsewhere, without the database holder knowing which point is being requested. If Alice wants to query Bob's database, she simply encrypts an array full of zeros, except for a "1" in the position of the desired data point, and sends this ciphertext to Bob. He then performs homomorphic multiplication and addition, and sends the ciphertext back to Alice, who decrypts it to retrieve the data point she requested.

To demonstrate this on a very small scale, lets look at a "database" containing only two values: 123 and 456:

In [None]:
if "SHEEP_HOME" in os.environ.keys():
  SHEEP_HOME = os.environ["SHEEP_HOME"]
else:
  SHEEP_HOME = os.path.join(os.environ["HOME"],"SHEEP","pysheep")
import sys
sys.path.append(SHEEP_HOME)

from pysheep.common.database import BenchmarkMeasurement, session, build_filter
from pysheep.benchmarks.mid_level_benchmarks import generate_pir_circuit, \
    generate_variance_circuit, generate_bitonic_sort_circuit, generate_gaussian_inputs
from pysheep.benchmarks import benchmark_utils
from pysheep.common import common_utils
from pysheep.interface import sheep_client

First lets setup the SHEEP client (assuming the SHEEP server is running).  We can choose the "HElib_Fp" context, and the input type to be 8-bit integers.

In [None]:
sheep_client.new_job() 
sheep_client.set_context("HElib_Fp")
sheep_client.set_input_type("int8_t")

Now let's generate the circuit file for this simple PIR case with two values in the "database":

In [None]:
circuit_file=generate_pir_circuit(2,[2])

Lets give this to the SHEEP server, and then see what inputs it expects?

In [None]:
sheep_client.set_circuit(circuit_file)
sheep_client.get_inputs()

"d_a_b_c" are the "database" values, and "s_x_y" are the "selector" values.  We can set d_0_0_0 and d_0_1_0 to "123" and "456" respectively.  Then, to select d_0_0_0 (which will hopefully be the value "123") from the database, we should set s_0_0 to 1 and s_0_1 to zero.

In [None]:
sheep_client.set_inputs({"d_0_0_0": 123, "d_0_1_0": 456, "s_0_0": 1, "s_0_1": 0})

In [None]:
sheep_client.run_job()
results = sheep_client.get_results()

In [None]:
results


So we successfully got the output value "123", i.e. the first entry in the "database".

### PIR with a more complex circuit

In [None]:
sheep_client.new_job()
sheep_client.set_input_type("int8_t")
sheep_client.set_context("HElib_F2")

Assuming we will normally want to query a database containing N>2 items, one might think that we need to encrypt and send N "s"-values to identify the data point that we want.  However, we can be smarter than that by using a binary tree structure for the data, and having the "s"-values dictate how we navigate the tree to locate the desired data point. 

Lets generate a circuit file corresponding to a database containing 32 values, arranged in a tree with 5 layers and 2 choices per layer ($2^5 = 32$).

In [None]:
circuit_file=generate_pir_circuit(32,[2,2,2,2,2])
circuit_file

Let's now fill this database with values 0-to-31:

In [None]:
sheep_client.set_circuit(circuit_file)
inputlist=sheep_client.get_inputs()['content']
data = [(x, i) for (i, x) in enumerate(inputlist) if x.startswith('d_')]

To start with, we'll set all the "s" (selector) inputs to zero:

In [None]:
data += [(x, 0) for (i, x) in enumerate(inputlist) if x.startswith('s_')]
data = dict(data)
data

Now we need to set certain selector variables to "1" to navigate down the tree.  Suppose we want to choose the last element of the database (should be value "25", due to the elements being filled with their index in alphabetic order) - we need to go down the right-hand branches:

In [None]:
data['s_0_1'] = 1
data['s_1_1'] = 1
data['s_2_1'] = 1
data['s_3_1'] = 1
data['s_4_1'] = 1

In [None]:
sheep_client.set_inputs(data)

In [None]:
sheep_client.run_job()

In [None]:
results = sheep_client.get_results()

In [None]:
results

## Calculating mean and variance of a set of inputs

One may wish to calculate statistical properties of a set of encrypted inputs, such as mean, standard deviation etc.   Currently the contexts implemented in SHEEP do not have "Divide" operations, so calculating these exact values via only homomorphic operations on the ciphertext is not possible.  

However, the client will necessarily know "N", the number of inputs, so can perform division in the clear on the decrypted results of the homomorphic calculations.

We therefore only need homomorphic addition and multiplication.  Simply summing the inputs $x_i$ gives us $N\bar{x}$. 
Meanwhile $\Sigma_{i=0}^N(Nx_i - N\bar{x})^2$  is $N^3$ times the variance.



In [None]:
# reset the sheep server settings, - this time we'll use uint32_t and HElib_Fp
sheep_client.new_job()
sheep_client.set_input_type("uint32_t")
sheep_client.set_context("HElib_Fp")

Let's generate a circuit to calculate $(N \times mean)$ and ($N^3 \times variance)$ of a set of 10 inputs:

In [None]:
num_inputs = 10
circuit_file = generate_variance_circuit(num_inputs)
sheep_client.set_circuit(circuit_file)

To generate the inputs, lets use a Gaussian with $\mu=50$ and $\sigma = 10$ (all input values rounded to integers):

In [None]:
input_vals = generate_gaussian_inputs(num_inputs,50,10)
sheep_client.set_inputs(input_vals)

In [None]:
sheep_client.run_job()
results = sheep_client.get_results()['content']

In [None]:
results["outputs"]

In the clear, we can then divide these by $N$ and $N^3$ respectively:

In [None]:
print(float(results["outputs"]["Nxbar"])/num_inputs , float(results["outputs"]["varianceN3"])/pow(num_inputs,3))

## Bitonic sort

Sorting a list of inputs is another non-trivial operation that can be performed using a combination of straightforward homomorphic operations - namely "Select" and "Compare".


In [None]:
sheep_client.new_job()
sheep_client.set_input_type("int8_t")
sheep_client.set_context("HElib_F2")
circuit_file = os.path.join(SHEEP_HOME,"benchmark_inputs","mid_level","circuits","circuit-bitonic-sort-4.sheep")
sheep_client.set_circuit(circuit_file)

What inputs does this circuit expect?

In [None]:
sheep_client.get_inputs()

In [None]:
sheep_client.set_inputs({"i0":4,"i1":8,"i2":1,"i3": 3})

In [None]:
sheep_client.run_job()
results = sheep_client.get_results()['content']

In [None]:
results["outputs"]

In [None]:
sheep_client.set_context("TFHE")
sheep_client.run_job()
results_tfhe = sheep_client.get_results()['content']

In [None]:
results_tfhe["outputs"]

In [None]:
import pandas as pd
all_rows = pd.read_sql(session.query(BenchmarkMeasurement).filter(BenchmarkMeasurement.circuit_name=="bitonic-sort").statement,session.bind)

In [None]:
all_rows[["context_name","input_bitwidth","circuit_name","execution_time","num_inputs"]]