Source code of PProx proxy, and experimentation scripts from the Middleware'21 paper: PProx: Efficient Privacy for Recommendation-as-a-Service. If you want to use this code, please cite the paper:
Rosinosky, G., Da Silva, S., Ben Mokhtar, S., Négru, D., Réveillère, L., & Rivière, E. (2021, November). PProx: efficient privacy for recommendation-as-a-service. In Proceedings of the 22nd International Middleware Conference (pp. 14-26).
ACM Middleware 2021 presentation video: https://www.youtube.com/watch?v=WTJfoH6jLd4
We present PProx, a system preventing recommendation-as-a-service (RaaS) providers from accessing sensitive data about the users of applications leveraging their services. PProx does not impact recommendations accuracy, is compatible with arbitrary recommendation algorithms, and has minimal deployment requirements. Its design combines two proxying layers directly running inside SGX enclaves at the RaaS provider side. These layers transparently pseudonymize users and items and hide links between the two, and PProx privacy guarantees are robust even to the corruption of one of these enclaves. We integrated PProx with the Harness recommendation engine and evaluated it on a 27-node cluster. Our results indicate its ability to withstand a high number of requests with low end-to-end latency, horizontally scaling up to match increasing workloads of recommendations.
In this repository, we propose PProx source code, helper scripts for deployment on Docker Compose and Kubernetes using Helm charts. We also propose load testing scripts for tests on Harness and Universal Recommender as a recommendation system (macro benchmarks), and on a "stub" installation comprised of an Nginx server sending back a fixed recommendation result (microbenchmarks).
The repository directories content is the following:
- PrivateRecSys-proxy/ source code of PProx
- harness-recommender/: Dockerfile and source code of helper scripts used for Harness/Universal Recommender initialization
- stub/: Dockerfile and source code of the Nginx based stub, stub proxy and stub client
- manager-scripts/: initialization scripts and Jupyter notebooks for load tests
- charts/: Helm charts for the deployment of Harness/Universal Recommender coupled to PProx, stub coupled to PProx, and injector used for load testing
- paper/plots: results datasets and generation scripts
- ManagerKubernetesExperimentations/: chart for installation of required modules for experimentations orchestration of load tests, and logging and storage of results (Jupyter, Fluentd, MongoDB)
PProx has been tested on Kubernetes v1.15.9-beta.0. You need to have a working cluster with SGX-able nodes for PProx to make it work. See details-install.md for more details on how to install an on-premises Kubernetes cluster using Kubespray.
Load tests can be done on a Harness installation, or a "stub" installation using Nginx to emulate a recommendation system. We propose Helm charts for the deployment of the different tested modules, and the execution of the load testing scenarios.
In the following, we assume the user uses an on-premises Kubernetes cluster (in the current state, only hostPath based volumes are supported by the experimentation orchestration scripts), and has a dedicated "manager" node, used for the orchestration and storage of results. The user should have SSH access to this node, and use it for the manipulations.
The stub stack is used for microbenchmarks to emulate a recommendation system with a web server. It is available in the charts/stub-recsys directory, and is composed of the following sub-charts :
Component | Version | Chart | Version | Source code | Origin |
---|---|---|---|---|---|
proxy-sgx | - | proxy-sgx | charts/stub-recsys/charts/proxy-sgx | ||
stub-recsys | - | stub-recsys | charts/stub-recsys/charts/stub-recsys |
The Harness/Universal recommender chart is available in the charts/private-recsys directory, and is composed of the following sub-charts:
Component | Version | Chart | Version | Source code | Origin |
---|---|---|---|---|---|
Mongo Database | 4.0.12-debian-9-r7 | mongodb | 5.16.4 | https://github.com/helm/charts/tree/master/stable/mongodb | https://charts.helm.sh/stable |
ElasticSearch | 5.6.16 | elasticsearch | 7.1.1 | https://github.com/elastic/helm-charts/tree/master/elasticsearch | https://helm.elastic.co |
Spark | 2.3.3 | actionml/spark | latest | charts/private-recsys/charts/spark | |
harness | develop-2020-03-11 | harness | charts/private-recsys/charts/harness | ||
proxy-sgx | - | proxy-sgx | charts/private-recsys/charts/proxy-sgx |
For both configurations (stub and Harness), an injector chart based on loadtest 3.0.8 library is used to emulate users with multiple levels of Requests Per Second (RPS).
To orchestrate experimentations, we use a management stack used on the manager
node:
Component | Version | Chart | Version | Source code | Origin |
---|---|---|---|---|---|
fluentd | v1.7-debian-1 | fluentd | ManagerKubernetesExperimentations/chart/charts/fluentd | ||
jupyter | latest | jupyter | ManagerKubernetesExperimentations/chart/charts/jupyter | ||
Mongo Database | 4.0.12-debian-9-r7 | mongodb | 4.10.1 | https://github.com/helm/charts/tree/master/stable/mongodb | https://charts.helm.sh/stable |
mongo-express | 0.49 | mongo-express | ManagerKubernetesExperimentations/chart/charts/mongo-express |
Each SGX node must have the SGX driver installed. Launch the following commands and reboot the node :
curl -fssl https://raw.githubusercontent.com/SconeDocs/SH/master/install_sgx_driver.sh | bash
# sudo service aesmd stop # seems not needed anyore
If you use Ansible (as in Kubespray in the last part), you can use the following one-liner to stop the service on every node (you can replace all
by a comma separated list of nodes):
ansible -i inventory/cluster_nuc/hosts.yml -m shell -a 'curl -fssl https://raw.githubusercontent.com/SconeDocs/SH/master/install_sgx_driver.sh | bash' all
# ansible -i inventory/cluster_nuc/hosts.yml -m shell -a 'sudo service aesmd stop' --become -K all #seems not needed anymore
Helper initialization scripts are located in manager-scripts/init. Please go in this directory.
We need the cluster to be able to cope with SGX resources. We use a DaemonSet plugin developed by Vaucher et al., available here: https://github.com/sebva/sgx-device-plugin. Deploy it on the cluster from the manager node using:
kubectl apply -f ./sgx_deviceplugin.yaml
Still from the manager node, you need now to set up the node labelling: Each node will be given a role (proxy, injector, manager, etc.) for each needed module for the experimentation. Please prepare carefully a script using this template, and execute it. The following labels should be defined:
- manager (1 node, preferably with SSD)
- pprox (a multiple of 2, with SGX support.)
- harness (used for Harness and stub modules, any needed number of nodes)
- mongo (1 node, preferably with SSD)
- elastic (1, 3, 5 or 7 nodes, preferably with SSD)
- injector (any needed number of nodes)
Appropriate directories for the Mongo database used for experiment results should be set in manager-scripts/init/persistentVolume-mongoxp.yaml, and the location of the source code repository should be set in manager-scripts/init/persistentVolume-jupyter.yaml. These are set respectively to /home/ubuntu/xp/volumes/mongo
and /home/ubuntu/xp/PrivateRecSys
in the current files.
The mongo
directory should be previously created with 777
rights (with chmod -R
). Every script directory that the user intends to modify with Jupyter should be set to 777
as well, or modifications will not be possible (more specifically work/manager-scripts/test and work/manager-scripts/xp directories for experimentation launches).
Now, to the installation of all manager dependencies. The only needed parameter is the address of the manager's Ingress (will be the point of entry for access to the Mongo, Jupyter of the manager). Modify the values.yaml
and set hostAddress
to a valid DNS name pointing to the node manager (by default, 10-0-0-33.nip.io
). Note that in our case we use the online nip.io service permitting to map a DNS name to a local address.
Launch the following scripts (sudo might help):
./init-cluster.sh
./init-manager.sh
An available registry on localhost:32000
(using kube-registry-proxy should have been deployed by the initialization scripts. You can now build all Docker images and push (default to Docker repository).
docker-compose -f ./docker-compose-build.yaml build
docker-compose -f ./docker-compose-build.yaml push
For tests with Harness, you need to download the MovieLens dataset that will be used for the experiments. A helper script is available for this intent:
cd manager-scripts/datasets
python3 ./generate-dataset-json.py
Wait for a few minutes for the dataset to be generated.
You can now access the different modules using a browser and the fixed Ingress DNS entry (replace the 10-0-0-33
address with your own):
Two notebooks are proposed for load test scenario launches: one for Harness manager-scripts/test/debug-harness-install.ipynb and one for the stub manager-scripts/test/debug-stub-install.ipynb. They can be found in the Jupyter notebook in the work/manager-scripts/test directory.
Both launch notebooks are organized the same way: global parameters set, help functions and experiments launches.
The user should set the hostAddress
in the first cell to the actual DNS name of the manager node (the same used for access to Jupyter, i.e. 10-0-0-33.nip.io
)
Initialization cells should then be launched sequentially, and the desired test cells should be launched manually.
Note: For the Harness scenario, please consider using 3 replicas for Elastic, as the used version does not deploy successfully half of the time when using 1 or an odd number of replicas.
In each experiment launch cell, the parameters are organized as maps of maps, and used for the generation of the Helm install commands. Sub-chart parameters are generated by replacing "!" with ".". Loops on parameters permit to chain multiple experiments.
For more details, please check the corresponding charts.
For each launched experiment, an additional entry is generated in the metadata collection experiment
in the privaterecsys
database in the experiment Mongo database.
containing all the used parameters.
Parameter | Role |
---|---|
stub |
"1" if SUT is stub, "0" if SUT (System Under Test) is Harness |
start_time |
Start timestamp, used to name experiments with name |
params_recsys |
PProx and SUT parameters map |
params_injector |
Injector parameters map |
name |
Name of the experiment |
status |
Initial status of the experiment |
Parameter | Role |
---|---|
hostAddress |
manager name |
harness!replicaCount |
number of Harness or stub replicas |
proxy-sgx!replicaCount |
number of PProx replicas per layer |
proxy-sgx!proxyType1 |
First proxy layer ("1" = UA, "2" = IA) |
proxy-sgx!proxyType2 |
Second proxy layer ("1" = UA, "2" = IA) |
proxy-sgx!targetPort |
Port of SUT |
proxy-sgx!enabledEncryption |
Encryption enabled in proxies |
proxy-sgx!bufferShuffling |
Size of buffer shuffling (1 = disabled) |
proxy-sgx!responseTimeout |
Delay before PProx sends a response anyways |
proxy-sgx!debug |
PProx debug mode |
proxy-sgx!image!repository |
Used image (can be localhost:32000/sgx-proxy or localhost:32000/sgx-proxy-nosgx) |
name |
Name of the experiment |
timeout |
Chart deployment timeout |
Harness specific parameters | |
elastic!replicas |
Number of Elastic replicas |
elastic!esJavaOpts |
Java opts. Set -Xmx and -Xms to the maximum available memory in Elastic nodes. |
elastic!resources!requests!cpu |
Elastic CPU requests |
elastic!resources!requests!memory |
Elastic memory request |
elastic!resources!limits!cpu |
Elastic CPU limits |
elastic!resources!limits!memory |
Elastic memory limits |
mongo!replicaSet!enabled |
Usage of replicaset in Mongo, please set to "false" |
mongo!clusterDomain |
Name of the cluster (set to your cluster name) |
Parameter | Role |
---|---|
harness |
"true" if SUT is Harness, "false" if SUT is stub |
harnessReplicas |
number of Harness replicas |
duration |
Duration of injection |
clients |
Number of parallel clients (equivalent to injected RPS) |
replicaCount |
Number of injectors |
targetHost |
target host (can be PProx or SUT) |
concurrency |
Concurrency (see loadtest documentation for more information) |
timeout |
Chart deployment timeout |
asyncFluentd |
"true" if results are sent to fluentd during experiments, "false" if the injector waits for the end before sending it. Set to "false" in all experiments. |
enabledEncryption |
Encrypted entries injected if "1", raw entries if "0". Use with proxy-sgx!enabledEncryption |
randomShift |
not used |
name |
Name of the experiment |
Load testing results are aggregated in the private-recsys
database in Mongo, and then archived to another database (name depend on the used test, see the code for details).
A statictics notebook is available manager-scripts/test/debug-stats.ipynb. It reads from a target database (private-recsys
by default), and proposes some general statistics and plots.
Experimentations launched in the Middleware 2021 paper are available in the work/manager-scripts/xp directory. You can find it the following notebooks, following the same organization and map parameters as the install test scripts:
- micro1-9.ipynb for the microbenchmark (stub) load test launch scripts
- macro10-12.ipynb for the macrobenchmark (Harness) load test launch scripts
The statistics notebook can be used as well with the results of these experiments. The corresponding Mongo database and wanted parameters should be set.
An export notebook raw-latencies/export-raw-RTT.ipynb can be used to generate CSV files for the experimentations results. The results should be used accordingly with the Gnuplot scripts paper/plots/only_harness_cut/plot_all.sh and paper/plots/only_harness_cut/plot_all.sh