# Secure Data Disclosure: Server side

This notebook showcases how data owner could set up the server, add make their data available to certain users. It explains the different steps required.

# Start the server

## Create a docker volume
The first step is to create a docker volume for mongodb, which will hold all the "admin" data of the server. Docker volumes are persistent storage spaces that are managed by docker and can be mounted in containers. To create the volume use `docker volume create mongodata`. This must be done only once, and we use bind mounts for the server, so no need to create volumes for that.

In a terminal run: `docker volume create mongodata`. In output you should see `mongodata` written.

## Start server
The second step is to start the server. Therefore the config file `configs/example_config.yaml` has to be adapted. The data owner must make sure to set the develop mode to False, specify the database type and ports. For this notebook, we will keep the default and use a mongodb on port 27017. Note: Keep in mind that if the configuration file is modified then the `docker-compose` has to be modified accordingly. This is out of scope for this notebook.

In a terminal run `docker compose up`. This will start the server and the mongodb, each running in its own Docker container. In addition, it will also start a client session container for demonstration purposes, more on that later.

To check that all containers are indeed running, run `docker ps`. You should be able to see a container for the server (`lomas_server_dev`), for the client (`lomas_client_dev`) and one for the mongo database (`mongodb`).

## Access the server to administrate the mongoDB

To interact with the mongoDB, we first need to access the server Docker container from where we will run the commands. To do that from inside this Jupyter Notebook, we will need to use the Docker client library. Let's first install it.

In [1]:
!pip install docker



We can now import the library, create the client allowing us to interact with Docker, and finally, access the server container.

In [2]:
import docker
client = docker.DockerClient()
server_container = client.containers.get("lomas_server_dev")

To execute commands inside that Docker container, you can use the `exec_run` method which will return an ExecResult object, from which you can retrieve the output of the command. Let's see in the following example:

In [3]:
response = server_container.exec_run("ls")
print(response.output.decode('utf-8'))

__init__.py
admin_database
app.py
constants.py
dataset_store
dp_queries
mongodb_admin.py
private_dataset
setup.py
utils
uvicorn_serve.py



Now, you are ready to interact with the database and add users.

# Prepare the database

## Visualise all options
You can visualise all the options offered by the database by running the command `python mongodb_admin.py --help`. We will go through through each of them in the rest of the notebook.

We prepare the function `run_command` to have a cleaner output of the commands in the notebook.

In [4]:
from ast import literal_eval

def run(command, to_dict=False):
    response = server_container.exec_run(command)
    output = response.output.decode('utf-8').replace("'", '"')
    if to_dict:
        if len(output):
            output = literal_eval(output)
            return output
    return print(output)

In [5]:
run("python mongodb_admin.py --help")

usage: mongodb_admin.py [-h]
                        {add_user,add_user_with_budget,del_user,add_dataset_to_user,del_dataset_to_user,set_budget_field,set_may_query,show_user,create_users_collection,add_dataset,add_datasets,drop_collection,show_collection}
                        ...

MongoDB administration script for the database

options:
  -h, --help            show this help message and exit

subcommands:
  {add_user,add_user_with_budget,del_user,add_dataset_to_user,del_dataset_to_user,set_budget_field,set_may_query,show_user,create_users_collection,add_dataset,add_datasets,drop_collection,show_collection}
                        user database administration operations
    add_user            add user to users collection
    add_user_with_budget
                        add user with budget to users collection
    del_user            delete user from users collection
    add_dataset_to_user
                        add dataset with initialized budget values for a user
    del_dataset_

And finally, let's delete all existing data from database to start clean:

In [6]:
run("python mongodb_admin.py drop_collection --collection datasets")
run("python mongodb_admin.py drop_collection --collection metadata")
run("python mongodb_admin.py drop_collection --collection users")

Deleted collection datasets.

Deleted collection metadata.

Deleted collection users.



## Datasets (add and drop)

We first need to set the dataset meta-information. For each dataset, 2 informations are required:
- the type of database in which the dataset is stored
- a path to the metadata of the dataset (stored as a yaml file).

To later perform query on the dataset, metadata are required. In this secure server the metadata information is expected to be in the same format as [SmartnoiseSQL dictionary format](https://docs.smartnoise.org/sql/metadata.html#dictionary-format), where among other, there is information about all the available columns, their type, bound values (see Smartnoise page for more details). It is also expected to be in a `yaml` file.

These information (dataset name, dataset type and metadata path) are stored in the `datasets` collection. Then for each dataset, its metadata is fetched from its `yaml` file and stored in a collection named `metadata`.

We then check that there is indeed no data in the dataset and metadata collections yet:

In [7]:
run("python mongodb_admin.py show_collection --collection datasets", to_dict = True)

[]

In [8]:
run("python mongodb_admin.py show_collection --collection metadata", to_dict = True)

[]

We can add **one dataset** with its name, database type and path to medata file:

In [9]:
run("python mongodb_admin.py add_dataset -d PENGUIN -db REMOTE_HTTP_DB -db_url https://raw.githubusercontent.com/mwaskom/seaborn-data/master/penguins.csv -mp ../data/collections/metadata/penguin_metadata.yaml -m_db LOCAL_DB")

Added dataset PENGUIN with database REMOTE_HTTP_DB and associated metadata.



We can now see the dataset and metadata collection with the Iris dataset:

In [10]:
run("python mongodb_admin.py show_collection --collection datasets", to_dict = True)

[{'dataset_name': 'PENGUIN',
  'database_type': 'REMOTE_HTTP_DB',
  'dataset_url': 'https://raw.githubusercontent.com/mwaskom/seaborn-data/master/penguins.csv'}]

In [11]:
run("python mongodb_admin.py show_collection --collection metadata", to_dict = True)

[{'PENGUIN': {'': {'Schema': {'Table': {'max_ids': 1,
      'row_privacy': True,
      'censor_dims': False,
      'species': {'type': 'string',
       'cardinality': 3,
       'categories': ['Adelie', 'Chinstrap', 'Gentoo']},
      'island': {'type': 'string',
       'cardinality': 3,
       'categories': ['Torgersen', 'Biscoe', 'Dream']},
      'bill_length_mm': {'type': 'float', 'lower': 30.0, 'upper': 65.0},
      'bill_depth_mm': {'type': 'float', 'lower': 13.0, 'upper': 23.0},
      'flipper_length_mm': {'type': 'float', 'lower': 150.0, 'upper': 250.0},
      'body_mass_g': {'type': 'float', 'lower': 2000.0, 'upper': 7000.0},
      'sex': {'type': 'string',
       'cardinality': 2,
       'categories': ['MALE', 'FEMALE']}}}},
   'engine': 'csv'}}]

Or a path to a yaml file which contains all these informations to do **multiple datasets** in one command:

In [12]:
run("python mongodb_admin.py add_datasets --path ../data/collections/dataset_collection.yaml -c")

Cleaning done. 

Added datasets collection from yaml at ../data/collections/dataset_collection.yaml. 
Added metadata of IRIS dataset. 
Added metadata of PENGUIN dataset. 
Added metadata of TITANIC dataset. 
Added metadata of FSO_INCOME_SYNTHETIC dataset. 



The argument *-c* or *--clean* allow you to clear the current dataset collection before adding your collection.

By default, *add_datasets* will only add new dataset found from the collection provided.

In [13]:
run("python mongodb_admin.py add_datasets --path ../data/collections/dataset_collection.yaml")

Metadata already exist. Use the command -om to overwrite with new values.
Metadata already exist. Use the command -om to overwrite with new values.
Metadata already exist. Use the command -om to overwrite with new values.
Metadata already exist. Use the command -om to overwrite with new values.



Arguments :

*-od* / *--overwrite_datasets* : Overwrite the values for **exisiting datasets** with the values provided in the yaml.

*-om* / *--overwrite_metadata* : Overwrite the values for **exisiting metadata** with the values provided in the yaml.

In [14]:
# Add new datasets/metadata, update existing datasets
run("python mongodb_admin.py add_datasets --path ../data/collections/dataset_collection.yaml -od")

Existing datasets updated with valuesfrom yaml at ../data/collections/dataset_collection.yaml. 
Metadata already exist. Use the command -om to overwrite with new values.
Metadata already exist. Use the command -om to overwrite with new values.
Metadata already exist. Use the command -om to overwrite with new values.
Metadata already exist. Use the command -om to overwrite with new values.



In [15]:
# Add new datasets/metadata, update existing metadata
run("python mongodb_admin.py add_datasets --path ../data/collections/dataset_collection.yaml -om")

Metadata updated for dataset : IRIS.
Metadata updated for dataset : PENGUIN.
Metadata updated for dataset : TITANIC.
Metadata updated for dataset : FSO_INCOME_SYNTHETIC.



In [16]:
# Add new datasets/metadata, update existing datasets & metadata
run("python mongodb_admin.py add_datasets --path ../data/collections/dataset_collection.yaml -od -om")

Existing datasets updated with valuesfrom yaml at ../data/collections/dataset_collection.yaml. 
Metadata updated for dataset : IRIS.
Metadata updated for dataset : PENGUIN.
Metadata updated for dataset : TITANIC.
Metadata updated for dataset : FSO_INCOME_SYNTHETIC.



Let's see all the dataset collection:

In [17]:
run("python mongodb_admin.py show_collection --collection 'datasets'", to_dict = True)

[{'dataset_name': 'PENGUIN',
  'database_type': 'REMOTE_HTTP_DB',
  'dataset_url': 'https://raw.githubusercontent.com/mwaskom/seaborn-data/master/penguins.csv'}]

Finally let's have a look at the  stored metadata:

In [18]:
run("python mongodb_admin.py show_collection --collection metadata", to_dict=True)

[{'PENGUIN': {'': {'Schema': {'Table': {'max_ids': 1,
      'row_privacy': True,
      'censor_dims': False,
      'species': {'type': 'string',
       'cardinality': 3,
       'categories': ['Adelie', 'Chinstrap', 'Gentoo']},
      'island': {'type': 'string',
       'cardinality': 3,
       'categories': ['Torgersen', 'Biscoe', 'Dream']},
      'bill_length_mm': {'type': 'float', 'lower': 30.0, 'upper': 65.0},
      'bill_depth_mm': {'type': 'float', 'lower': 13.0, 'upper': 23.0},
      'flipper_length_mm': {'type': 'float', 'lower': 150.0, 'upper': 250.0},
      'body_mass_g': {'type': 'float', 'lower': 2000.0, 'upper': 7000.0},
      'sex': {'type': 'string',
       'cardinality': 2,
       'categories': ['MALE', 'FEMALE']}}}},
   'engine': 'csv'}}]

## Users

### Add user
Let's see which users are alreay loaded:

In [19]:
run("python mongodb_admin.py show_collection --collection users", to_dict=True)

[]

And now let's add few users.

In [20]:
run("python mongodb_admin.py add_user_with_budget --user 'Mrs. Daisy' --dataset 'IRIS' --epsilon 10.0 --delta 0.001")

Added access to user Mrs. Daisy with dataset IRIS, budget epsilon 10.0 and delta 0.001.



In [21]:
run("python mongodb_admin.py add_user_with_budget --user 'Mr. Coldheart' --dataset 'PENGUIN' --epsilon 10.0 --delta 0.001")

Added access to user Mr. Coldheart with dataset PENGUIN, budget epsilon 10.0 and delta 0.001.



In [22]:
run("python mongodb_admin.py add_user_with_budget --user 'Lord McFreeze' --dataset 'PENGUIN' --epsilon 10.0 --delta 0.001")

Added access to user Lord McFreeze with dataset PENGUIN, budget epsilon 10.0 and delta 0.001.



Users must all have different names, otherwise you will have an error and nothing will be done:

In [23]:
run("python mongodb_admin.py add_user_with_budget --user 'Lord McFreeze' --dataset 'IRIS' --epsilon 10.0 --delta 0.001")

Traceback (most recent call last):
  File "/code/mongodb_admin.py", line 691, in <module>
    args.func(args)
  File "/code/mongodb_admin.py", line 16, in wrap_function
    return function(db, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/code/mongodb_admin.py", line 48, in add_user_with_budget
    raise ValueError("Cannot add user because already exists. ")
ValueError: Cannot add user because already exists. 



If you want to add another dataset access to an existing user, just use the function `add_dataset_to_user` command.

In [24]:
run("python mongodb_admin.py add_dataset_to_user --user 'Lord McFreeze' --dataset 'IRIS' --epsilon 5.0 --delta 0.005")

Added access to dataset IRIS to user Lord McFreeze with budget epsilon 5.0 and delta 0.005.



Alternatively, you can create a user without assigned dataset and then add dataset in another command.

In [25]:
run("python mongodb_admin.py add_user --user 'Madame Frostina'")

Added user Madame Frostina.



Let's see the default parameters after the user creation:

In [26]:
run("python mongodb_admin.py show_user --user 'Madame Frostina'")

{"user_name": "Madame Frostina", "may_query": True, "datasets_list": []}



Let's give her access to a dataset with a budget:

In [27]:
run("python mongodb_admin.py add_dataset_to_user --user 'Madame Frostina' --dataset 'IRIS' --epsilon 5.0 --delta 0.005")

Added access to dataset IRIS to user Madame Frostina with budget epsilon 5.0 and delta 0.005.



In [28]:
run("python mongodb_admin.py add_dataset_to_user --user 'Madame Frostina' --dataset 'PENGUIN' --epsilon 5.0 --delta 0.005")

Added access to dataset PENGUIN to user Madame Frostina with budget epsilon 5.0 and delta 0.005.



Now let's see the user Madame Frostina details to check all is in order:

In [29]:
run("python mongodb_admin.py show_user --user 'Madame Frostina'")

{"user_name": "Madame Frostina", "may_query": True, "datasets_list": [{"dataset_name": "IRIS", "initial_epsilon": 5.0, "initial_delta": 0.005, "total_spent_epsilon": 0.0, "total_spent_delta": 0.0}, {"dataset_name": "PENGUIN", "initial_epsilon": 5.0, "initial_delta": 0.005, "total_spent_epsilon": 0.0, "total_spent_delta": 0.0}]}



And we can also modify existing the total budget of a user:

In [30]:
run("python mongodb_admin.py add_user_with_budget --user 'Dr. Antartica' --dataset 'PENGUIN' --epsilon 10.0 --delta 0.001")

Added access to user Dr. Antartica with dataset PENGUIN, budget epsilon 10.0 and delta 0.001.



In [31]:
run("python mongodb_admin.py set_budget_field --user 'Dr. Antartica' --dataset 'PENGUIN' --field initial_epsilon --value 20.0")

Set budget of Dr. Antartica for dataset PENGUIN of initial_epsilon to 20.0.



Let's see the current state of the database:

In [32]:
run("python mongodb_admin.py show_collection --collection users", to_dict=True)

[{'user_name': 'Mrs. Daisy',
  'may_query': True,
  'datasets_list': [{'dataset_name': 'IRIS',
    'initial_epsilon': 10.0,
    'initial_delta': 0.001,
    'total_spent_epsilon': 0.0,
    'total_spent_delta': 0.0}]},
 {'user_name': 'Mr. Coldheart',
  'may_query': True,
  'datasets_list': [{'dataset_name': 'PENGUIN',
    'initial_epsilon': 10.0,
    'initial_delta': 0.001,
    'total_spent_epsilon': 0.0,
    'total_spent_delta': 0.0}]},
 {'user_name': 'Lord McFreeze',
  'may_query': True,
  'datasets_list': [{'dataset_name': 'PENGUIN',
    'initial_epsilon': 10.0,
    'initial_delta': 0.001,
    'total_spent_epsilon': 0.0,
    'total_spent_delta': 0.0},
   {'dataset_name': 'IRIS',
    'initial_epsilon': 5.0,
    'initial_delta': 0.005,
    'total_spent_epsilon': 0.0,
    'total_spent_delta': 0.0}]},
 {'user_name': 'Madame Frostina',
  'may_query': True,
  'datasets_list': [{'dataset_name': 'IRIS',
    'initial_epsilon': 5.0,
    'initial_delta': 0.005,
    'total_spent_epsilon': 0.0,
  

Do not hesitate to re-run this command after every other command to ensure that everything runs as expected.

### Remove user
You have just heard that the penguin named Coldheart might have malicious intentions and decide to remove his access until an investigation has been carried out. To ensure that he is not allowed to do any more queries, run the following command:

In [33]:
run("python mongodb_admin.py set_may_query --user 'Mr. Coldheart' --value False")

Set user Mr. Coldheart may query to True.



Now, he won't be able to do any query (unless you re-run the query with --value True).

A few days have passed and the investigation reveals that he was aiming to do unethical research, you can remove his dataset by doing:

In [34]:
run("python mongodb_admin.py del_dataset_to_user --user 'Mr. Coldheart' --dataset 'PENGUIN'")

Remove access to dataset PENGUIN from user Mr. Coldheart.



Or delete him completely from the codebase:

In [35]:
run("python mongodb_admin.py del_user --user 'Mr. Coldheart'")

Deleted user Mr. Coldheart.



Let's see the resulting users:

In [None]:
run("python mongodb_admin.py show_collection --collection users", to_dict=True)

### Change budget
You also change your mind about the budget allowed to Lord McFreeze and give him a bit more on the penguin dataset.

In [None]:
run("python mongodb_admin.py set_budget_field --user 'Lord McFreeze' --dataset 'PENGUIN' --field initial_epsilon --value 15.0")

In [None]:
run("python mongodb_admin.py set_budget_field --user 'Lord McFreeze' --dataset 'PENGUIN' --field initial_delta --value 0.005")

Let's check all our changes by looking at the state of the database:

In [None]:
run("python mongodb_admin.py show_collection --collection users", to_dict = True)

### Finally all can be loaded fom a file direcly

Let's delete the existing user collection first:

In [None]:
run("python mongodb_admin.py drop_collection --collection users")

Is is now empty:

In [None]:
run("python mongodb_admin.py show_collection --collection users", to_dict = True)

We add the data based on a yaml file:

In [None]:
run("python mongodb_admin.py create_users_collection --path ../data/collections/user_collection.yaml")

By default, *create_users_collection* will only add new users to the database.

In [None]:
run("python mongodb_admin.py create_users_collection --path ../data/collections/user_collection.yaml")

If you want to clean the current users collection and replace it, you can use the argument *--clean*. 

In [None]:
run("python mongodb_admin.py create_users_collection --path ../data/collections/user_collection.yaml --clean")

If you want to add new users and update the existing ones in your collection, you can use the argument *--overwrite*. This will make sure to add new users if they do not exist and replace values from existing users with the collection provided.

In [None]:
run("python mongodb_admin.py create_users_collection --path ../data/collections/user_collection.yaml --overwrite")

And let's see the resulting collection:

In [None]:
run("python mongodb_admin.py show_collection --collection users", to_dict = True)

## Archives of queries

In [None]:
run("python mongodb_admin.py show_collection --collection queries_archives", to_dict = True)

## Stop the server: do not do it now !
To tear down the service, first do `ctrl+C` in the terminal where you had done `docker compose up`. Wait for the command to finish executing and then run `docker compose down`. This will also delete all the containers but the volume will stay in place. 