# ImPACT MPC - Secure Multiparty Computation

This Jupyter Notebook provides an example of how to use ImPACT's Secure Multiparty Computation (MPC) components. MPC allows groups of people to collaboratively evaluate algebraic functions, but without ever sharing their private data.

## What Secure Multiparty Computation (MPC) is

Imagine Alice, Bob, and Charlie are three researchers in Wake, Durham, and Orange Counties (respectively). These three people have determined how many bicycle accidents there are in each of their respective counties, and they would like to know what the sum total of bike wrecks is across the three-county Triangle area. The only problem is, each of the researchers considers their number to be private. Alice has no desire whatsoever to reveal the number of accidents she found in Wake county, while Charlie is similarly circumspect about Orange.

There are two potential solutions to this Social Science Standoff: agreeing on a Trusted Third Party (TTP), or using some Really Cool Math. The TTP has, historically, been the way this is done. All three researchers agree to trust someone and they share their individual numbers with that honest broker. The TTP, in turns, adds the numbers up and tells everyone what the sum is. This works as long as everyone can agree on a Trusted Third Party to work with and as long as that third party remains faithful to the duties of the position (i.e., "actually keeps a secret") and doesn't become compromised, hacked, or subpoenaed. Really Cool Math, on the other hand, is the basis for Multiparty Computation (MPC) and lets the three researchers create a Virtual Trusted Third Party (VTTP) that is immune to some of the weaknesses of the traditional TTP approach.

## How to use MPC

Fundamentally, MPC is a process for a group of people (A, B, C...) to evaluate a function _func_ (A, B, C...). Everyone will be able to tell what the function evaluates to, but each party will only know what their own input was. Person A will only know the value A, person B will only know the value of B, and so forth.

ImPACT (currently) uses the SPDZ/2 software from the Bristol Cryptography Research Group as the MPC "engine" for doing the computation. SPDZ/2 is not easy to install and it's not all that obvious how to use it. ImPACT helps on both fronts - the prepackaged Amazon Machine Instance makes it simple to have a running instance of the software and the documentation (including this notebook) gives plenty of examples.

### Step 1: Write your function

The very first step in MPC is deciding what function you want to evaluate. In this example, we'll take the numbers from our three intrepid researchers and we'll just add them together. The function is written in a subset of Python, and it's a pretty small subset at that. You can't have any loops, the only data types available are numbers and booleans (numbers can be integers or floating point numbers, so that helps), and you have to know exactly how many inputs you'll have. Here's the code to sum our three numbers:

```
a = sint.get_input_from(0)
b = sint.get_input_from(1)
c = sint.get_input_from(2)

d = a+b+c
print_ln('##################### Result is %s', d.reveal())```

This function ("tripleadd.mpc") takes three numbers (one each from parties 0, 1, and 2), adds them up, and prints the final result. There's nothing unexpected here, but know that we could have thrown some curveballs. For instance, we could say that Alice and Chuck (parties 0 and 2) will have their number taken at face value while Bob has his number doubled before it's added. It's perfectly OK to treat input differently (for instance, one researcher could know the number of drownings while another researcher knows the number of boats, and we could compute the drownings per boat without ever revealing either of the raw inputs).

Use Jupyter to take a look at tripleadd.mpc :  https://ec2-18-218-33-24.us-east-2.compute.amazonaws.com:4991/edit/Programs/Source/tripleadd.mpc

### Step 2: Compile your function

With your function written, you now have to compile it so it can be run. If you're used to Old School programming then you probably know that compilation is the process of converting programs from their "human readable" form into something that the computer knows how to run but that looks, to the eye, like an absolutely random spattering of pure chaos. If all you've ever used is Python then know that Python hides this step from you, but it's there anyway.

To compile the function:




In [1]:
%%bash
# change to the SPDZ directory
cd /home/ec2-user/SPDZ-2
# compile the function. DO NOT INCLUDE THE ".mpc" FILE EXTENSION
#./compile.py tripleadd
./compile.py addition


Compiling program in /home/ec2-user/SPDZ-2/Programs
Default bit length: 32
Default security parameter: 40
Galois length: 40
Compiling file /home/ec2-user/SPDZ-2/Programs/Source/addition.mpc
Compiling basic block addition-0--0
Processing tape addition-0 with 1 blocks
Processing basic block addition-0--0, 0/1, 18 instructions
Program requires 1 rounds of communication
Program requires 1 invocations
Tape register usage: defaultdict(<function <lambda> at 0x7f81bb4a6758>, {'ci': 0, 'sg': 0, 'c': 1, 'cg': 0, 's': 3})
modp: 1 clear, 3 secret
GF2N: 0 clear, 0 secret
Re-allocating...
Compile offline data requirements...
Tape requires 1 inputs in modp from player 1, 1 inputs in modp from player 0
Tape requires prime bit length 0
Tape requires galois bit length 0
Program requires: {('modp', 'input', 1): 1, ('modp', 'input', 0): 1}
Cost: 0
Memory size: defaultdict(<function <lambda> at 0x7f81bb4a6668>, {'ci': 8192, 'sg': 8192, 'c': 8192, 'cg': 8192, 's': 8192})
Compiling basic block addition-0-mem

---
...and that compiled tripleadd and saved the results as tripleadd.sch and tripleadd.bc (see the last two lines). The compiled code is _almost_ ready to be run now...

### Step 3: Your (collaborators') cooperation is requested

There are a few coordinating steps that need to be done now.

#### Step 3.1 - Share the function
Share the function among all of the researchers and have each one of them compile it.

#### Step 3.2 - Get IPs
Get the IP address of everyone's server - this can either be names ("BobLabMachine.med.unc.edu") or numeric IP addresses ("152.2.31.249").



#### Step 3.3 - Assign party numbers
1. Decide among yourselves who is Party 0, 1, and 2. Remember how SPDZ/2 is capable of handling inputs differently depending on which party they came from? That's why we have to decide who is who, even if everyone is treated exactly the same. And importantly, SPDZ/2 will only print the final output value to Party 0. That's implementation dependent, by the way, and we have a way to share the results securely.
1. Put the IP addresses of the parties in /home/ec2-user/parties , and put them in the file in order from Party 0 down through the last party.



#### Step 3.4 - Create Encryption "Keys"
SPDZ/2, and in fact any MPC technique, relies on combining the secret data with carefully constructed "other information" in a way that the final output function can still be evaluated but that individual parties can't read each other's inputs. This "other information" is an encryption key of a special kind, and different parts of that key are distributed to different parties. The mechanics of how all this happens is Really Cool, but suffice it to say that it takes a huge amount of computation to produce these keys and that it's done ahead of time.

Pedantically speaking, MPC doesn't use encryption, it uses "secret sharing". The distinction is subtle, and for our purposes moot. I mention it only because parts of the system will refer to "triples" instead of "keys", and because we're in the academic world and pedantry is what we do...

In normal ImPACT MPC use, each researcher would go to a python notebook that calls out to the SPDZ key generator to initiate the key generation process. It takes about 30 minutes. For demonstration and learning purposes, there is already a pre-generated set of keys installed. You should _never_ use these for real data, but it's fine to use them while you're just playing around and learning about the system.




### Step 4: Perform the computation
Finally, the good part. We've selected who will be Party 0, Party 1, and so forth. We've got their computer's IP addresses sorted, and we've created encryption keys. We're now ready to add up some numbers!

First, let's put our number into this notebook. Each party is responsible for deciding what their individual number is (as always - if it's a collaboration, everyone has to do some of the work, right?) and keeping up with it. Maybe you do a database query and decide there are 1729 bike crashes in your county. Maybe a whole ton statistics completes and you decide that 1729 was the optimum year, in the pre-colonial period at least, to grow pumpkins in Maryland. The sky is the limit here, along with your funding agency and what you can get past your department head, but the point is to come up with the number you'll use as input into the function (tripleadd, remember?).

Run one of the following two cells - either this first one which isn't much to look at, or the second one showing how to use some other Really Cool Work (tm) at Renci.

In [2]:
# Through some incredibly complicated machinations, compute the secret result that makes up your
# contribution to the collaboration. 

mySecretNumber = 1729

Or, as alluded to above, use the following cells to run an example of using ICEES, an interface to the NIH-sponsored Data Translator work at Renci for exploratory data analysis.

In [58]:
import urllib.parse
import urllib.request
import ssl
import json

url = 'https://icees.renci.org/1.0.0/patient/2010/cohort'
requestData = '{"AvgDailyPM2.5Exposure":{"operator":">", "value":1}}'
requestBytes = bytes(requestData, "utf-8")
req = urllib.request.Request(url, requestBytes)

myssl = ssl.create_default_context();
myssl.check_hostname=False
myssl.verify_mode=ssl.CERT_NONE

with urllib.request.urlopen(req, context=myssl) as response:
    httpResult = response.read()

In [59]:
print("First, here's the 'raw' data that came back from the ICEES server. It's a JSON formatted string.")

print("")
print (httpResult)
print ("")

First, here's the 'raw' data that came back from the ICEES server. It's a JSON formatted string.

b'{"version": "1.0.0", "return value": {"cohort_id": "COHORT:56", "size": 23093}, "terms and conditions": "The Translator Integrated Clinical and Environmental Exposures Service (ICEES) is providing you with Data that have been de-identified in accordance with 45 C.F.R. \\u00a7\\u00a7 164.514(a) and (b) and that UNC Health Care System (UNCHCS) is permitted to provide under 45 C.F.R. \\u00a7 164.502(d)(2). Recipient agrees to notify UNCHCS via NC TraCS in the event that Recipient receives any identifiable data in error and to take such measures to return the identifiable data and/or destroy it at the direction of UNCHCS.\\n\\nRestrictions on Recipient\\u2019s Use of Data. Recipient further agrees to use the data exclusively for the purposes and functionalities provided by the ICEES: cohort discovery; feature-rich cohort discovery; hypothesis-driven queries; and exploratory queries. Recipien

In [60]:
iceesResults = json.loads(httpResult)["return value"]
cohortId = iceesResults["cohort_id"]
cohortCount = iceesResults["size"]

print ("In cohort " + cohortId + " there were " + str(cohortCount) + " individuals.")
mySecretNumber = cohortCount

In cohort COHORT:56 there were 23093 individuals.


Then specify your party number and the name of the function to evaluate:

In [4]:
myPartyNumber = 0
#mpcFunction = "tripleadd"
mpcFunction = "addition"

Now, we'll use a "Jupyter magic" (that's its real name!) to run a shell command from inside this notebook. The idea is that we'll use mySecretNumber as the "standard input" (what we normally would type at the keyboard) to the shell command and we'll capture the "standard output" and "standard error" (stuff usually printed by the command) into variables where we can look at them.

In [6]:
%%bash -s $mySecretNumber $myPartyNumber $mpcFunction --out mpcOut --err mpcErr
cd /home/ec2-user/SPDZ-2
echo $1 | /home/ec2-user/SPDZ-2/Player-Online.x -ip /home/ec2-user/parties $2 $3

Now we can print the output from the command. In our case, we wrote our function to compute so that it printed the output with some nice formatting. It's a convenience for debugging - if we have a complicate formula then we can print all sorts of intermediate results to see what is going on. Just because the function can't have any loops doesn't mean it can't be fairly complicated. If you wanted to do a Taylor series expansion, for instance, you'd have to "unroll" the loop and type in enough terms by hand to come close enough to converging. "No loops" winds up meaning "I know ahead of time exactly how many times I'm going to do this". This is just like, coincidentally, the SAFE component of ImPACT.

In [7]:
print (mpcOut)

##################### Result is 3458



While my own code never produces errors, your's theoretically could. The standard error messages are available in the "mpcErr" variable. There will always be a lot of messages, even for a successful run, and any errors aren't always obvious. 

In [57]:
print (mpcErr)

Got list of 3 players from file: 
    172.31.34.83
    172.31.46.182
    172.31.44.75
ServerSocket is bound on port 5000
loading params from: Player-Data/3-128-40/Params-Data
MAC Key p = 9713710702708426579646134761687417105
MAC Key 2 = 0x821139c82b
Opening file Programs/Schedules/tripleadd.sch
Number of threads I will run in parallel = 1
Number of program sequences I need to load = 1
Loading program 0 from Programs/Bytecode/tripleadd-0.bc
tripleadd-0 needs more secret gf2n memory, resizing to 8192
tripleadd-0 needs more clear gf2n memory, resizing to 8192
tripleadd-0 needs more secret gfp memory, resizing to 8192
tripleadd-0 needs more clear gfp memory, resizing to 8192
tripleadd-0 needs more clear integer memory, resizing to 8192
Cost of first tape:
  Type p
             0 =          0        Triples à           0
             0 =          0        Squares à           0
             0 =          0           Bits à           0
             0 =          0       Inverses à           0
 

## In Conclusion
We've seen a very simple way to use Secure Multiparty Computation to enable collaboration without sharing data. There is a good deal more to ImPACT's use of MPC and encryption methods than we've covered here. In particular, there is an art to writing efficient functions with all kinds of dirty tricks. Avoiding loops is mandatory, but there is "syntactic sugar" for faking them. Speed goes down linearly with the number of parties, but there is a "client-server" mode that handles huge numbers of users quite well. The algorithm supports "cheater detection" - we've set it up where for a 66% chance of detecting malfeasance but that can be set arbitrarily high (99.999...%). There are even times when you should use a 1-element array instead of just a variable. For much more information, see the SPDZ/2 documentation itself.
