HPMPC implements multiple MPC protocols and provides a high-level C++ interface to define functions and use cases. Out of the box, the framework supports computation in the boolean and arithmetic domain, mixed circuits, and fixed point arithmetic. Neural network models can be imported from PyTorch as part of PIGEON (Private Inference of Neural Networks).
More extensive documentation can be found here.
TLDR instructions can be found here.
You can use the provided Dockerfile or set up the project manually. The only dependency is OpenSSL. Neural networks and other functions with matrix operations also require the Eigen library.
#Install Dependencies:
sudo apt install libssl-dev libeigen3-dev
#Run with Docker
docker build -t hpmpc .
#Run each command in different terminals or different machines
docker run -it --network host --cap-add=NET_ADMIN --name p0 hpmpc
docker run -it --network host --cap-add=NET_ADMIN --name p1 hpmpc
docker run -it --network host --cap-add=NET_ADMIN --name p2 hpmpc
docker run -it --network host --cap-add=NET_ADMIN --name p3 hpmpc
You can run the following commands to compile and execute a program with an MPC protocol locally.
# Compile executables for protocol Trio (5) for all parties and unit tests for basic primitives (function 54)
make -j PARTY=all FUNCTION_IDENTIFIER=54 PROTOCOL=5
# Run the MPC protocol locally
scripts/run.sh -p all -n 3 # Run three parties locally
After setting up the framework on each node of a distributed setup, you can run the following commands to run the MPC protocol on a distributed setup. Replace <party_id>
with e.g. 0 to compile an executable for party 0.
make -j PARTY=<party_id>
# Run the MPC protocol on a distributed setup. For 2PC and 3PC protocols, the -c or -d flags are not required.
scripts/run.sh -p <party_id> -a <ip_address_party_0> -b <ip_address_party_1> -c <ip_address_party_2> -d <ip_address_party_3>
GPU acceleration for matrix multiplication and convolutions requires an NVIDIA GPU, the NVCC compiler, and a copy of the CUTLASS library. To obtain the GPU architecture (sm_xx), refer to this oerview.
# Dependencies for GPU acceleration
git clone https://github.com/NVIDIA/cutlass.git
# Compile standalone executable for GPU acceleration
cd core/cuda
# Replace with your GPU architecture, nvcc path, and CUTLASS path:
make -j arch=sm_89 CUDA_PATH=/usr/local/cuda CUTLASS_PATH=/home/user/cutlass
cd ../..
# Compile executables for protocol Quad (12) for all parties and unit tests for matrix multiplication (function 54) with GPU acceleration (USE_CUDA_GEMM=2)
make -j PARTY=all FUNCTION_IDENTIFIER=57 PROTOCOL=12 USE_CUDA_GEMM=2
SplitRoles compiles multiple executables per player to perform load balancing. Running a protocol with SplitRoles can be done by running the following commands. More information on Split-Roles can be found in the section Scaling MPC to Billions of Gates per Second.
make -j PARTY=<party_id> SPLITROLES=1 # Compile multiple executables for a 3PC protocol with Split-Roles
scripts/run.sh -s 1 -p <party_id> -a <ip_address_party_0> -b <ip_address_party_1> -c <ip_address_party_2>
SplitRoles supports multi-GPU setups. To run a protocol with multiple GPUs, you can run the following commands.
make -j USE_CUDA_GEMM=2 # USE_CUDA_GEMM 1/2/4 work as well
scripts/run.sh -p <party_id> -s 1 -g 6 # Utilize 6 GPUs for the computation
The framework uses a modular architecture with the following components.
Software Component | Description |
---|---|
Core | Implements communication between parties, cryptographic primitives, and techniques for hardware acceleration. Uses Bitslicing, Vectorization, GPU acceleration, and hardware instruction for cryptographic primitives to accelerate local computation required by the MPC protocols. |
Protocols | Implements MPC protocols and protocol-specific primitives. Each protocol utilizes high-level operations provided by Core for commonly used operations such as sampling shared random numbers or exchanging messages. |
Datatypes | Implements different datatypes that serve as a high-level interface to compute on MPC shares generically with overloaded operators. |
Programs | Implements high-level functions, routines, and use cases using the custom datatypes . Implements several MPC-generic functions such as matrix multiplication and comparisons. |
NN | Implements a templated neural network inference engine that performs the forward pass of a CNN by relying on high-level MPC-generic functions provided by Programs . Models and datasets can be exported from PyTorch. |
- New functions can be added to
programs/
by using the operations supported byDatatypes
. - New MPC protocols can be added to
protocols/
by using the networking and cryptographic utilities provided byCore
. - New Neural Network Model architectures can be added to
nn/PIGEON/architectures/
by using our PyTorch-like interface to define model architectures. - Model parameters and datasets can be exported from PyTorch using
nn/Pygeon/
.
The framework offers multiple tweaks to accelerate MPC computation. The following are the most important settings that can be adjusted in by setting the respective flags when compiling with make
or by permanently changing the entries in config.h
.
Configuration Type | Options | Description |
---|---|---|
Concurrency | DATTYPE , PROCESS_NUM |
DATTYPE defines the register length to vectorize all integers and boolean variables to fully utilize the register. PROCESS_NUM sets the number of processes to use for parallel computation. |
Hardware Acceleration | RANDOM_ALGORITHM , USE_SSL_AES , ARM , USE_CUDA_GEMM |
Different approaches for efficiently implementing cryptographic primitives on various hardware architectures. Matrix operations can be accelerated using CUDA. |
Tweaks | SEND_BUFFER , RECV_BUFFER , VERIFY_BUFFER |
Setting buffer sizes for communication and sha hashing to verify messages can accelerate workloads. The default settings should provide a good starting point for most settings. |
Preprocessing | PRE |
Some protocols support a preprocessing phase that can be enabled to accelerate the online phase. |
SplitRoles | SPLITROLES |
By using the SPLITROLES flag when compiling, the framework compiles n! executables for a n-PC protocol where each executable has a different player assignment. This allows load balance the communication and computation between the nodes. SPLITROLES=1 compiles all(6) executables for a 3PC protocol, SPLITROLES=2 compiles all (24) executables for a 3PC protocol in a setting with four nodes, and SPLITROLES=3 compiles all (24) executables for a 4PC protocol. |
For nodes equipped with a 32-core AVX-512 CPU, and a CUDA-enabled GPU, the following example may compile an optimized executable in a distributed setup. Note that this example inherently vectorizes the computation PROCESS_NUM x DATTYPE/BITLENGTH x SPLITROLES_FACTOR
times.
make -j PARTY=<party_id> FUNCTION_IDENTIFIER=<function_id> PROTOCOL=12 DATTYPE=512 PROCESS_NUM=32 RANDOM_ALGORITHM=2 USE_SSL_AES=0 ARM=0 USE_CUDA_GEMM=2 SEND_BUFFER=10000 RECV_BUFFER=10000 VERIFY_BUFFER=1 PRE=1 SPLITROLES=3
Out of the box, the framework supports multiple MPC protocols. For some protocols, only basic primitives such as secret sharing, addition, and multiplication are currently implemented. Other protocols support additional primitives to fully support mixed circuits and fixed point arithmetic.
A protocol can be selected with the PROTOCOL
flag when compiling.
Protocol | Adversary Model | Preprocessing | Supported Primitives |
---|---|---|---|
1 Sharemind (3PC) |
Semi-Honest | ✘ | Basic |
2 Replicated (3PC) |
Semi-Honest | ✘ | Basic |
3 ASTRA (3PC) |
Semi-Honest | ✔ | Basic |
4 ABY2 Dummy (2PC) |
Semi-Honest | ✔ | Basic |
5 Trio (3PC) |
Semi-Honest | ✔ | All |
6 Trusted Third Party (3PC) |
Semi-Honest | ✘ | All |
7 Trusted Third Party (4PC) |
Semi-Honest | ✘ | All |
8 Tetrad (4PC) |
Malicious | ✔ | Basic |
9 Fantastic Four (4PC) |
Malicious | ✘ | Basic |
10 Quad (4PC) |
Malicious | ✘ | All |
11 Quad: Het (4PC) |
Malicious | ✔ | All |
12 Quad (4PC) |
Malicious | ✔ | All |
Trio, ASTRA, Quad, ABY2, and Tetrad support a Preprocessing phase.
The preprocessing phase can be enabled in config.h
or by setting PRE=1
when compiling.
Setting PRE=0
interleaves the preprocessing and online phase.
New protocols can be added to protocols/
and adding a protocol ID to protocols/Protocols.h
.
Out of the box, the framework provides multiple high-level functions that operate on Additive and Boolean shares.
programs/functions/
contains unit tests and benchmarks for these functions.
An overview of which id corresponds to which function can be found in protocol_executer.hpp
.
In the following, we provide a brief overview of the functions that are currently implemented.
Category | Functions |
---|---|
Basic Primitives | Secret Sharing, Reconstruction, Addition, Multiplication, Division, etc. |
Fixed Point Arithmetic | Fixed Point Addition, Multiplication, Truncation, Division, etc. |
Matrix Operations | Matrix Multiplication, Dot Product, etc. |
Multi-input Operations | Multi-input Multiplication, Multi-input Scalar Products, etc. |
Comparisons | EQZ, LTZ, MAX, MIN, Argmax, etc. |
Use Cases (Benchmarking) | Set Intersection, Auction, AES, Logistic Regression, etc. |
Neural Networks | Forward Pass of CNN/ResNet, ReLU, Softmax, Pooling, Batchnorm, etc. |
To implement a custom programs, these functions can be used as building blocks.
programs/tutorials/
contains tutorials on how to use different functions.
New functions can be added by first implementing the function in programs/functions/
and then adding a FUNCTION_IDENTIFIER
to protocol_executer.hpp
. The tutorial programs/tutorials/YourFirstProgram.hpp
should get you started after following the other tutorials.
Scaling MPC requires a high degree of parallelism to overcome network latency bottlenecks.
HPMPC's architecture is designed to utilize hardware resources proportionally to the degree of parallelism required by the MPC workload.
By increasing load balancing, register sizes, or number of processes, the framework executes multiple instances of the same function in parallel.
For instance, by setting DATTYPE=512
and PROCESS_NUM=32
, each arithmetic operation on 32-bit integers is executed 512 times in parallel by using 32 processes on 16 packed integers per register.
Similarly, a boolean operation is executed 512x32=16384 times in parallel with 32 processes and 512-bit registers due to Bitslicing.
For mixed circuits, HPMPC automatically groups blocks of arithmetic shares before share conversion to handle these different degrees of parallelism.
The degree of parallelism for operations can be calculated as follows (Boolean operations have a BITLENGTH of 1):
PROCESS_NUM x DATTYPE/BITLENGTH x SPLITROLES_Factor
The following examples illustrate the concept of parallelism in HPMPC.
- Setting
SPLITROLES=1
,PROCESS_NUM=4
, andDATTYPE=256
to compile a program computing 10 AES blocks (boolean circuit) will actually compute6x4x256x10=61440
AES blocks in parallel by fully utilizing the available hardware resources, - Setting
DATTYPE=1
,SplitRoles=0
, andPROCESS_NUM=1
will compute 10 AES blocks on a single core without vectorization. - Setting
SPLITROLES=1
,PROCESS_NUM=4
, andDATTYPE=256
,NUM_INPUTS=1
to compile a program computing a single neural network inference (mixed circuit) will evaluate6x4x256/32=192
samples in parallel, thus effectively using a batch size of 192.
HPMPC can execute bytecode generated by the MP-SPDZ compiler. It is possible to run computation with bytecode compiled by MP-SPDZ. Most instructions of MP-SPDZ 0.3.8 are supported. Note that some MP-SPDZ instructions may show significant performance improvements when using the HPMPC framework, while others may show a performance decrease when workarounds are used to support MP-SPDZ bytecode with HPMPC functions.
- Install MP-SPDZ
- Required setup to run HP-MPC with MP-SPDZ as frontend
- Define the input used for computation
- Add/Run your own functions (.mpc) files using HP-MPC
You need to install MP-SPDZ 0.3.8 to compile your <filename>.mpc
wget https://github.com/data61/MP-SPDZ/releases/download/v0.3.8/mp-spdz-0.3.8.tar.xz
tar xvf mp-spdz-0.3.8.tar.xz
For some MP-SPDZ programs PyTorch or numpy are required. To install them you can use requirements.txt
pip install -r ./MP-SPDZ/requirements.txt
In the HPMPC
main directory, create two directories in MP-SPDZ/
: Schedules
for the schedule file and Bytecodes
for the respective bytecode file.
mkdir -p "./MP-SPDZ/Schedules" "./MP-SPDZ/Bytecodes"
In order to compile the .mpc
files in MP-SPDZ/Functions/
you have to:
Assuming MP-SPDZ is installed at $MPSPDZ
, copy the desired <file>.mpc
into "$MPSPDZ"/Programs/Source
and compile them using their compiler with the bit length you intent to use.
cp "./MP-SPDZ/Functions/<file.mpc>" "$MPSPDZ"/Programs/Source/
- For arithmetic programs using Additive_Shares use:
cd "$MPSDZ" && ./compile.py -K LTZ,EQZ -R "<BITLENGTH>" "<file>"
where BITLENGTH
is the integer bit-length you want to use for the computation.
- For boolean programs using XOR_Shares
cd "$MPSDZ" && ./compile.py -K LTZ,EQZ -B "<bit-length>" "<file>"
where <bit-length>
can be anything EXCEPT when operating on int-types (cint, int) <bit-length>
<= 64
NOTE Adding:
-
-D/--dead-code-elimination
might decrease the size of the bytecode -
-O/--optimize-hard
might even slow down execution as LTZ/EQZ are replaced by a bit-decomposition approach using random secret bits that are not yet properly supported -
--budget=<num> -l/--flow-optimization
will prevent the compiler from completely unrolling every loop$\implies$ faster compilation and smaller bytecode but might slow down execution
To execute the compiled MP-SPDZ programs with HPMPC, move them to the respective directories in HPMPC
. For your own functions, you can use the filename custom.sch
for easier setup.
mv "$MPSDZ/Programs/Schedules/*" "./MP-SPDZ/Schedules/"
mv "$MPSDZ/Programs/Bytecode/*" "./MP-SPDZ/Bytecodes/"
Make sure to use the correct FUNCTION_IDENTIFIER
and BITLENGTH
. The following example executes the tutorial.mpc
file locally with `BITLENGTH=32.
make -j PARTY=all PROTOCOL=5 FUNCTION_IDENTIFIER=500 BITLENGTH=32
./run.sh -p all -n 3
We provide multiple example functions in MP-SPDZ/Functions/.
Mappings of .mpc
files to FUNCTION_IDENTIFIER
can be found in programs/functions/mpspdz.hpp.
Note that many functions require specifying a number of operations when compiling the bytecode with the MP-SPDZ compiler or need input files to be present in MP-SPDZ/Input/ when executing the program.
FUNCTION_IDENTIFIER |
.mpc |
---|---|
500 |
tutorial.mpc |
501 |
custom.mpc (can be used for your own functions) |
502 |
add.mpc |
503 |
mul.mpc |
504 |
mul_fix.mpc (make sure that the precision is set correctly) |
505 |
int_test.mpc/int_test_32.mpc (depending on BITLENGTH (64 or 32 )) can be used to test public integer operations |
506-534 |
Various functions used for benchmarks (see here). |
Input will be read from the files in MP-SPDZ/Input/
- public input will be read from PUB-INPUT
- private input will be read from
INPUT-P<player_number>-0-<vec>
<player_number>
: is the number associate with a specific player.<vec>
: is always0
- except for SIMD circuits:
- it is between [
0
-DATTYPE/BITLENGTH
] - for all numbers between [
0
-DATTYPE/BITLENGTH
], there must exist an input-file (otherwise there are not enough numbers to store in a SIMD register)
- it is between [
- except for SIMD circuits:
An example for formatting can be seen in Input-P0-0-0 which is used for:
- private input from party
0
- from main thread (thread
0
) - for the first number of the vectorization (
0
)
As with other .mpc
files, copy the bytecode file and schedule file into the correct Directory (./MP-SPDZ/Schedules/
, ./MP-SPDZ/Bytecodes/
respectively).
Make sure that for both MP-SPDZ and HPMPC you are using the same bitlength for compilation.
Rename the schedule file to custom.sch
and compile with FUNCTION_IDENTIFIER = 501
mv "./MP-SPDZ/Schedules/<file>.sch" "./MP-SPDZ/Schedules/custom.sch"
make -j PARTY=<party_id> PROTOCOL=<protocol_id> FUNCTION_IDENTIFIER=501 BITLENGTH=<bit-length>
With FUNCTION_IDENTIFIER
set to 501
the virtual machine will search for a file custom.sch
in ./MP-SPDZ/Schedules/
- NOTE: bytecode file(-s) do not have to be renamed as their name is referenced in the respective schedule-file
In programs/functions/mpspdz.hpp are all currently supported functions you'll notice the only thing that changes is the path of the <schedule-file>
To add a new FUNCTION_IDENTIFIER
- Create a new header file in programs you may use programs/mp-spdz_interpreter_template.hpp
- Choose a number
<your-num>
for (FUNCTION_IDENTFIER
)- make sure it does not exist yet (see protocol_executer.hpp)
- make sure that in protocol_executer.hpp the correct header file is included
You can do so by adding the following lines to protocol_executre.hpp
#elif FUNCTION_IDENTIFIER == `<your-identifier>`
#include "programs/<your header file>.hpp"
- Define the function for a given
FUNCTION_IDENTIFIER
:- when using the template make sure to replace the
FUNCTION_IDENTIFIER
, the function name and path to the<schedule-file>
- when using the template make sure to replace the
-
Add the instruction and its opcode in MP-SPDZ/lib/Constants.hpp to the
IR::Opcode
enum class but also toIR::valid_opcodes
-
To read the parameters from the bytecode-file add a case to the switch statement in the
IR::Program::load_program([...]);
function in MP-SPDZ/lib/Program.hpp. You may use:read_int(fd)
to read a 32-bit Integerread_long(fd)
to read a 64-bit Integerfd
(std::ifstream) if more/less bytes are required (keep in mind the bytcode uses big-endian)
To add the parameters to the parameter list of the current instruction you may use inst.add_reg(<num>)
, where:
inst
is the current instruction (see theInstruction
class)<num>
is of typeint
OR use inst.add_immediate(<num>)
for a constant 64-bit integer some instructions may require.
This program also expects this function to update the greatest compile-time address that the compiler tries to access. Since the size of the registers is only set once and only a few instructions check if the registers have enough memory. Use:
-
update_max_reg(<type>, <address>, <opcode>)
: to update the maximum register address<type>
: is the type of the register this instruction tries to access<address>
: the maximum address the instruction tries to access<opcode>
: can be used for debugging
-
m.update_max_mem(<type>, <address>)
: to update the maximum memory address<type>
: is the type of the memory cell this instruction tries to access<address>
: the maximum memory address the instruction tries to access
- To add functionality add the Opcode to the switch statment in
IR::Instruction::execute()
(MP-SPDZ/lib/Program.hpp)
- for more complex instructions consider adding a new function to
IR::Program
- registers can be accessed via
p.<type>_register[<address>]
, where<type>
is:s
for secretAdditive_Share
sc
for clear integeres of lengthBITLENGTH
i
for 64-bit integerssb
for boolean registers (one cell holds 64-XOR_Share
s)cb
clear bit registers, represented by 64-bit integers (one cell can hold 64-bits) (may be vectorized with SIMD but is not guaranteed depending on theBITLENGTH
)
- memory can be accessed via
m.<type>_mem[<address>]
where<type>
is the same as for registers except 64-bit integers useci
instead ofi
(I do not know why I did this)
You may also look at this commit which adds INPUTPERSONAL
(0xf5
) and FIXINPUT
(0xe8
)
You can use/change the clang-format file in MP-SPDZ/
clang-format --style=file:MP-SPDZ/.clang-format -i MP-SPDZ/lib/**/*.hpp MP-SPDZ/lib/**/*.cpp
PIGEON adds support for private inference of neural networks. PIGEON adds the following submodules to the framework.
- FlexNN: A templated neural network inference engine to perform the forward pass of a CNN.
- Pygeon: Python scripts for exporting models and datsets from PyTorch to the inference engine.
All protocols that are fully supported by HPMPC can be used with PIGEON. To get started with PIGEON, initialize the submodules to set up FlexNN and Pygeon.
git submodule update --init --recursive
A full end-to-end example can be executed as follows. To only benchmark the inference without real data, set MODELOWNER
and DATAOWNER
to -1
and skip steps 1 and 5.
-
Use Pygeon to train a model in PyTorch and export its test labels, test images, and model parameters to
.bin
files using the provided scripts. Alternatively, download the provided pre-trained models.cd nn/Pygeon # Option 1: Train a model and export it to PyGEON python main.py --action train --export_model --export_dataset --transform standard --model VGG16 --num_classes 10 --dataset_name CIFAR-10 --modelpath ./models/alexnet_cifar --num_epochs 30 --lr 0.01 --criterion CrossEntropyLoss --optimizer Adam # Option 2: Download a pretrained VGG16 model and CIFAR10 dataset python download_pretrained.py single_model datasets # Option 3: Follow steps from PyGEON README to use pretrained PyTorch models on ImageNet cd ../..
-
If it does not exist yet, add your model architecture to
nn/PIGEON/architectures/
. -
If it does not exist yet, add a
FUNCTION_IDENTIFIER
for your model architecture and dataset dimensions inPrograms/functions/NN.hpp
. -
Specify the
MODELOWNER
andDATAOWNER
config options when compiling.# Example for MODELOWNER=P_0 and DATAOWNER=P_1 make -j PARTY=<party_id> FUNCTION_IDENTIFIER=<function_id> DATAOWNER=P_0 MODELOWNER=P_1
-
Specify the path of your model, images, and labels by exporting the environment variables
MODEL_DIR
,DATA_DIR
,MODEL_FILE
,SAMPLES_FILE
, andLABELS_FILE
.# Set environment variables for the party holding the model parameters (adjust paths if needed) export MODEL_DIR=nn/Pygeon/models/pretrained export MODEL_FILE=vgg16_cifar_standard.bin # Set environment variables for the party holding the dataset (adjust paths if needed) export DATA_DIR=nn/Pygeon/data/datasets export SAMPLES_FILE=CIFAR-10_standard_test_images.bin export LABELS_FILE=CIFAR-10_standard_test_labels.bin
-
Run the program
scripts/run.sh -p <party_id> -a <ip_address_party_0> -b <ip_address_party_1> -c <ip_address_party_2> -d <ip_address_party_3>
PIGEON provides several options to modify the inference. The following are the most important settings that can be adjusted by setting the respective flags when compiling.
Configuration Type | Options | Description |
---|---|---|
Bits | BITLENGTH , FRACTIONAL |
The number of bits used for the total bitlength and the fractional part respectively. |
Truncation | TRUNC_APPROACH , TRUNC_THEN_MULT , TRUNC_DELAYED |
There are multiple approaches to truncation. The default approach is to truncate probabilistically after each multiplication. The different approaches allow switching between several truncation strategies. |
ReLU | REDUCED_BITLENGTH_m , REDUCED_BITLENGTH_k |
ReLU can be evaluated probabilistically by reducing its bitwidth to save communication and computation. The default setting is to evaluate ReLU with the same bitwidth as the rest of the computation. |
Secrecy | PUBLIC_WEIGHTS , COMPUTE_ARGMAX |
The weights can be public or private. The final argmax computation may not be required if parties should learn the probabilities of each class. |
Optimizations | ONLINE_OPTIMIZED , BANDWIDTH_OPTIMIZED |
All layers requiring sign bit extraction such as ReLU, Maxpooling, and Argmax can be evaluated with different types of adders. These have different trade-offs in terms of online/preprocessing communication as well as total round complexity and communication complexity. |
Other Optimizations | SPLITROLES , BUFFER_SIZE , VECTORIZE |
All default optimizations of HPMPC such as SPLITROLES , different buffers, and vectorization can be used with PIGEON. The parties automatically utilize the concurrency to perform inference on multiple independent samples from the dataset in parallel. To benchmark the inference without real data, MODELOWNER and DATAOWNER can be set to -1 . |
To automate benchmarks and tests of various functions and protocols, users can define .conf
files in the measurements/configs
directory. The following is an example of a configuration file that runs a function with different number of inputs and different protocols.
PROTOCOL=8,9,12
NUM_INPUTS=10000,100000,1000000
FUNCTION_IDENTIFIER=1
DATTYPE=32
BITLENGTH=32
The run_config.py
script runs compiles and executes all combinations in .conf
. Outputs are stored as .log
files in the measurements/logs/
directory.
python3 measurements/run_config.py -p <party_id> measurements/configs/<config_file>.conf
Results in .log
files can be parsed with the measurements/parse_logs.py
script. The parsed result contains information such as communication, runtime, throughput, and if applicable the number of unit tests passed or accuracy achieved.
python3 measurements/parse_logs.py measurements/logs/<log_file>.log
To simulate real world network settings you can specify a json file with network configuarations. Examples based on real-world measurements are found in measurements/network_shaping
.
{
"name": "CMAN",
"latencies": [
[2.318, 1.244, 1.432],
[2.394, 1.088, 2.020],
[1.232, 1.091, 1.883],
[1.418, 2.054, 1.892]
],
"bandwidth": [
[137, 1532, 417],
[139, 1144, 312],
[1550, 1023, 602],
[444, 389, 609]
]
}
Each row in latencies
and bandwidth
corresponds to a party. The values are in milliseconds and Mbps respectively. The third row would for instance be parsed as party 2 having a latency of 1.232ms to party 0, 1.091ms to party 1, and 1.883ms to party 3. The bandwidth is parsed in the same way.
To apply the bandwidths from a config file, run the following script.
./measurements/network_shaping/shape_network.sh -p <party_id> -a <ip_address_party_0> -b <ip_address_party_1> -c <ip_address_party_2> -d <ip_address_party_3> -l 2 -f measurements/network_shaping/<config_file>.json
The -l 2
flag divides the applied latencies by 2 to avoid that both round trip times between two parties are added up.
This option should be used for all provided json files and if the latencies are measured with the ping utility.
The resulting network shaping can be verified by running the following script on all nodes simultaneously. The script sends and receives data between all parties in parallel and thus may deviate from pair-wise measurements but therefore might be more accurate to represent MPC communication. Note that some deviations in network shaping and verification are expected.
./scripts/measure_connection.sh -p <party_id> -a <ip_address_party_0> -b <ip_address_party_1> -c <ip_address_party_2> -d <ip_address_party_3>
The framework utilizes different hardware acceleration techniques for a range of hardware architectures.
In case of timeouts, change the BASE_PORT
or make sure that all previous executions have been terminated by executing pkill -9 -f run-P
on all nodes.
In case of compile errors, please note the following requirements and supported bitlengths for different DATTYPE
values.
Register Size | Requirements | Supported BITLENGTH | Config Option |
---|---|---|---|
512 | AVX512 | 16, 32, 64 | DATTYPE=512 |
256 | AVX2 | 16, 32, (64 with AVX512) | DATTYPE=256 |
128 | SSE | 16, 32, (64 with AVX512) | DATTYPE=128 |
64 | None | 64 | DATTYPE=64 |
32 | None | 32 | DATTYPE=32 |
16 | None | 16 | DATTYPE=16 |
8 | None | 8 (Does not support all arithmetic instructions) | DATTYPE=8 |
1 | None | 16,32,64 (Use only for boolean circuits) | DATTYPE=1 |
To benefit from Hardware Acceleration, the following config options are important.
Config Option | Requirements | Description |
---|---|---|
RANDOM_ALGORITHM=2 |
AES-NI or VAES | Use the AES-NI or VAES instruction set for AES. If not available, set USE_SSL_AES=1 or RANDOM_ALGORITHM=1 |
USE_CUDA_GEMM>0 |
CUDA, CUTLASS | Use CUDA for matrix multiplications and convolution. In case your CUDA-enabled GPU does not support datatypes such as UINT8, you can comment out the respective forward declaration in core/cuda/conv_cutlass_int.cu and core/cuda/gemm_cutlass_int.cu . |
ARM=1 |
ARM CPU | For ARM CPUs, setting ARM=1 may improve performance of SHA hashing. |
Internal g++ or clang errors might be fixed by updating the compiler to a newer version.
If reading input files fails, adding -lstdc++fs
to the Makefile compile flags may resolve the issue.
If you encounter issues regarding the accuracy of neural network inference, the following options may increase accuracy.
- Increase the
BITLENGTH
. - Increase or reduce the number of
FRACTIONAL
bits. - Adjust the truncation strategy to
TRUNC_APPROACH=1
(REDUCED Slack) orTRUNC_APPROACH=2
(Exact Truncation), along withTRUNC_THEN_MULT=1
andTRUNC_DELAYED=1
. Note that truncation approaches 1 and 2 require settingTRUNC_DELAYED=1
. - Inspect the terminal output for any errors regarding reading the model or dataset. PIGEON uses dummy data or model parameters if the files are not found. Make sure that
MODELOWNER
andDATAOWNER
are set during compilation and that the respective environment variables point to existing files.
sudo apt install libssl-dev libeigen3-dev
git submodule update --init --recursive
pip install torch torchvision gdown # if not already installed
python3 measurements/run_config.py measurements/configs/unit_tests/
python3 measurements/run_config.py measurements/configs/unit_tests/ -p <party_id> -a <ip_address_party_0> -b <ip_address_party_1> -c <ip_address_party_2> -d <ip_address_party_3>
python3 measurements/parse_logs.py measurements/logs/ # results are stored as `.csv` in measurements/logs/
cd nn/Pygeon
python download_pretrained.py single_model datasets
export MODEL_DIR=nn/Pygeon/models/pretrained
export MODEL_FILE=vgg16_cifar_standard.bin
export DATA_DIR=nn/Pygeon/data/datasets
export SAMPLES_FILE=CIFAR-10_standard_test_images.bin
export LABELS_FILE=CIFAR-10_standard_test_labels.bin
cd ../..
make -j PARTY=all FUNCTION_IDENTIFIER=74 PROTOCOL=5 MODELOWNER=P_0 DATAOWNER=P_1 NUM_INPUTS=40 BITLENGTH=32 DATTYPE=32
scripts/run.sh -p all -n 3
make -j PARTY=<party_id> FUNCTION_IDENTIFIER=74 PROTOCOL=5 MODELOWNER=P_0 DATAOWNER=P_1
scripts/run.sh -p <party_id> -a <ip_address_party_0> -b <ip_address_party_1> -c <ip_address_party_2> -d <ip_address_party_3>
Run AND gate benchmark with different protocols and number of processes on a local/distributed setup
# use DATTYPE=256 or DATTYPE=128 or DATTYPE=64 for CPUs without AVX/SSE support.
#Local Setup
python3 measurements/run_config.py -p all measurements/configs/benchmarks/Multiprocessing.conf --override NUM_INPUTS=1000000 DATTYPE=512
#Distributed Setup, 3PC
python3 measurements/run_config.py -p <party_id> -a <ip_address_party_0> -b <ip_address_party_1> -c <ip_address_party_2> -d <ip_address_party_3> measurements/configs/benchmarks/Multiprocesssing.conf --override NUM_INPUTS=1000000 DATTYPE=512 PROTOCOL=1,2,3,5,6
#Distributed Setup, 4PC
python3 measurements/run_config.py -p <party_id> -a <ip_address_party_0> -b <ip_address_party_1> -c <ip_address_party_2> -d <ip_address_party_3> measurements/configs/benchmarks/Multiprocesssing.conf --override NUM_INPUTS=1000000 DATTYPE=512 PROTOCOL=7,8,9,10,11,12
# use DATTYPE=256 or DATTYPE=128 or DATTYPE=64 for CPUs without AVX/SSE support.
# 3PC
python3 measurements/run_config.py -s 1 -p all measurements/configs/benchmarks/lenet.conf --override PROTOCOL=5 PROCESS_NUM=4
# 4PC
python3 measurements/run_config.py -s 3 -p all measurements/configs/benchmarks/lenet.conf --override PROTOCOL=12 PROCESS_NUM=1
Run various neural network models in a distributed setting on ImageNet with 3 iterations per run and SPLITROLES (Requires server-grade hardware)
# use DATTYPE=256 or DATTYPE=128 or DATTYPE=64 for CPUs without AVX/SSE support.
# 3PC
python3 measurements/run_config.py -s 1 -i 3 -p <party_id> -a <ip_address_party_0> -b <ip_address_party_1> -c <ip_address_party_2> -d <ip_address_party_3> measurements/configs/benchmarks/imagenetmodels.conf --override PROTOCOL=5 PROCESS_NUM=4
# 4PC
python3 measurements/run_config.py -s 3 -i 3 -p <party_id> -a <ip_address_party_0> -b <ip_address_party_1> -c <ip_address_party_2> -d <ip_address_party_3> measurements/configs/benchmarks/imagenetmodels.conf --override PROTOCOL=12 PROCESS_NUM=12
python3 measurements/parse_logs.py measurements/logs/ # results are stored as `.csv` in measurements/logs/
Our framework utilizes the following third-party implementations.
- Architecture-specific headers for vectorization and Bitslicing adapted from USUBA, MIT LICENSE.
- AES-NI implementation adapted from AES-Brute-Force, Apache 2.0 LICENSE
- SHA-256 implementation adapted from SHA-Intrinsics, No License.
- CUDA GEMM and Convolution implementation adapted from Cutlass, LICENSE and Piranha, MIT LICENSE.
- Neural Network Inference engine adapted from SimpleNN, MIT LICENSE.