# Secure Data Disclosure: Server side

This notebook showcases how data owner could set up the server, add make their data available to certain user. It explains the different steps necessary.

## Start the server

### Create a docker volume
The first step is to create a docker volume for mongodb. Docker volumes are persistent storage spaces that are managed by docker and can be mounted in containers. To create the volume use `docker volume create mongodata`. This must be done only once, and we use bind mounts for the server, so no need to create volumes for that.

In a terminal run: `docker volume create mongodata`. In output you should see `mongodata` written.

### Start server
The second step is to start the server. Therefore the config file `configs/example_config.yaml` has to be adapted. The data owner must make sure to set the develop mode to False, specify the database type and ports. For this notebook, we will keep the default and use a mongodb on port 27017. Note: Keep in mind that if the configuration file is modified then the `docker-compose` has to be modified accordingly. This is out of scope for this notebook.

In a terminal run `docker compose up`. This will start the server and the mongodb, each running in its own Docker container. 

To check that both containers are indeed running, run `docker ps`. You should be able to see a container for the server: `sdd_server_dev` and one for the mongo database `mongodb`.

### Access the server to administrate the mongoDB

To interact with the mongoDB, we first need to access the server Docker container from where we will run the commands. To do that from inside this Jupyter Notebook, we will need to use the Docker client library. Let's first install it.

In [1]:
!pip install docker



We can now import the library, create the client allowing us to interact with Docker, and finally, access the server container.

In [29]:
import docker
client = docker.DockerClient()
server_container = client.containers.get("sdd_server_dev") # where "vigilant_galileo" is the name of the runnning server

To execute commands inside that Docker container, you can use the `exec_run` method which will return an ExecResult object, from which you can retrieve the output of the command. Let's see in the following example:

In [30]:
response = server_container.exec_run("ls")
response

ExecResult(exit_code=0, output=b'__init__.py\n__pycache__\napp.py\ndatabase\ndp_queries\nglobals.py\nmetadata\nmongodb_admin.py\nsdd_poc_server_package.egg-info\nserver_notebook.ipynb\nsetup.py\nutils\nuvicorn_serve.py\n')

In [31]:
print(response.output.decode('utf-8'))

__init__.py
__pycache__
app.py
database
dp_queries
globals.py
metadata
mongodb_admin.py
sdd_poc_server_package.egg-info
server_notebook.ipynb
setup.py
utils
uvicorn_serve.py



Now, you are ready to interact with the database and add users.

## Prepare the database and user's access rights

### Visualise all options
You can visualise all the options offered by the database by running the command `python mongodb_admin.py --help`. We will go through through each of them in the rest of the notebook.

In [32]:
!python3 mongodb_admin.py --help

usage: MongoDB administration script for the SDD POC Server
       [-h]
       {add_user,add_user_with_budget,del_user,add_dataset_to_user,del_dataset_to_user,set_budget_field,set_may_query,show_user,add_metadata,del_metadata,drop_collection,show_collection,create_ex_users}
       ...

options:
  -h, --help            show this help message and exit

subcommands:
  {add_user,add_user_with_budget,del_user,add_dataset_to_user,del_dataset_to_user,set_budget_field,set_may_query,show_user,add_metadata,del_metadata,drop_collection,show_collection,create_ex_users}
                        database administration operations
    add_user            add user to users collection
    add_user_with_budget
                        add user to users collection
    del_user            delete user from users collection
    add_dataset_to_user
                        add dataset with initialized budget values for a user
                        in users collection
    del_dataset_to_user

We prepare the function `run_command` to have a cleaner output of the commands in the notebook.

In [33]:
from ast import literal_eval

def run_command(command, to_dict=False):
    response = server_container.exec_run(command)
    output = response.output.decode('utf-8').replace("'", '"').replace("\n", '')
    if to_dict:
        output = literal_eval(output)
    return output

### Add metadata

To later perform query on the dataset, some metadata is required. In this secure server the metadata information is expected to be in the same format as [SmartnoiseSQL dictionary format](https://docs.smartnoise.org/sql/metadata.html#dictionary-format), where among other, there is information about all the available columns, their type, bound values (see Smartnoise page for more details).

Therefore, you should prepare the data and save it in a `yaml` file. When you run the following command, you should give the path to this metadata `yaml` file.

In [34]:
run_command("python mongodb_admin.py add_metadata --metadata_path metadata/penguin_metadata.yaml --dataset PENGUIN")

'Added metadata of PENGUIN dataset.'

In [35]:
run_command("python mongodb_admin.py add_metadata --metadata_path metadata/iris_metadata.yaml --dataset IRIS")

'Added metadata of IRIS dataset.'

Finally let's have a look at the  stored metadata:

In [36]:
run_command("python mongodb_admin.py show_collection --collection metadata", to_dict=True)

[{'PENGUIN': {'': {'Schema': {'Table': {'max_ids': 1,
      'row_privacy': True,
      'species': {'type': 'string'},
      'island': {'type': 'string'},
      'bill_length_mm': {'type': 'float', 'lower': 30.0, 'upper': 65.0},
      'bill_depth_mm': {'type': 'float', 'lower': 13.0, 'upper': 23.0},
      'flipper_length_mm': {'type': 'float', 'lower': 150.0, 'upper': 250.0},
      'body_mass_g': {'type': 'float', 'lower': 2000.0, 'upper': 7000.0},
      'sex': {'type': 'string'}}}},
   'engine': 'csv'}},
 {'IRIS': {'': {'Schema': {'Table': {'max_ids': 1,
      'petal_length': {'type': 'float', 'lower': 0.5, 'upper': 10.0},
      'petal_width': {'type': 'float', 'lower': 0.05, 'upper': 5.0},
      'row_privacy': True,
      'sepal_length': {'type': 'float', 'lower': 2.0, 'upper': 10.0},
      'sepal_width': {'type': 'float', 'lower': 1.0, 'upper': 6.0},
      'species': {'type': 'string'}}}},
   'engine': 'csv'}}]

### Add user
Let's add few users.

In [37]:
run_command("python mongodb_admin.py add_user_with_budget --user 'Dr. Antartica' --dataset 'PENGUIN' \
--epsilon 10.0 --delta 0.001")

'Added access to user Dr. Antartica with dataset PENGUIN, budget epsilon 10.0 and delta 0.001.'

In [11]:
run_command("python mongodb_admin.py set_budget_field --user 'Dr. Antartica' --dataset 'PENGUIN' \
--field initial_epsilon --value 15.0")

'Set budget of Dr. Antartica for dataset PENGUIN of initial_epsilon to 15.0.'

In [12]:
run_command("python mongodb_admin.py add_user_with_budget --user 'Mrs. Daisy' --dataset 'IRIS' \
--epsilon 10.0 --delta 0.001")

'Added access to user Mrs. Daisy with dataset IRIS, budget epsilon 10.0 and delta 0.001.'

In [13]:
run_command("python mongodb_admin.py add_user_with_budget --user 'Mr. Coldheart' --dataset 'PENGUIN' \
--epsilon 10.0 --delta 0.001")

'Added access to user Mr. Coldheart with dataset PENGUIN, budget epsilon 10.0 and delta 0.001.'

In [14]:
run_command("python mongodb_admin.py add_user_with_budget --user 'Lord McFreeze' --dataset 'PENGUIN' \
--epsilon 10.0 --delta 0.001")

'Added access to user Lord McFreeze with dataset PENGUIN, budget epsilon 10.0 and delta 0.001.'

Users must all have different names, otherwise you will have an error and nothing will be done:

In [15]:
run_command("python mongodb_admin.py add_user_with_budget --user 'Lord McFreeze' --dataset 'IRIS' \
--epsilon 10.0 --delta 0.001")

'Traceback (most recent call last):  File "mongodb_admin.py", line 377, in <module>    args.func(args)  File "mongodb_admin.py", line 47, in add_user_with_budget    raise ValueError("Cannot add user because already exists. ")ValueError: Cannot add user because already exists. '

If you want to add another dataset access to an existing user, just use the function `add_dataset_to_user` command.

In [16]:
run_command("python mongodb_admin.py add_dataset_to_user --user 'Lord McFreeze' --dataset 'IRIS' \
--epsilon 5.0 --delta 0.005")

'Added access to dataset IRIS to user Lord McFreeze with budget epsilon 5.0 and delta 0.005.'

Alternatively, you can create a user without assigned dataset and then add dataset in another command.

In [17]:
run_command("python mongodb_admin.py add_user --user 'Madame Frostina'")

'Added user Madame Frostina.'

In [18]:
run_command("python mongodb_admin.py add_dataset_to_user --user 'Madame Frostina' --dataset 'IRIS' \
--epsilon 5.0 --delta 0.005")

'Added access to dataset IRIS to user Madame Frostina with budget epsilon 5.0 and delta 0.005.'

In [19]:
run_command("python mongodb_admin.py add_dataset_to_user --user 'Madame Frostina' --dataset 'PENGUIN' \
--epsilon 5.0 --delta 0.005")

'Added access to dataset PENGUIN to user Madame Frostina with budget epsilon 5.0 and delta 0.005.'

Let's see the current state of the database:

In [20]:
run_command("python mongodb_admin.py show_collection --collection users", to_dict=True)

[{'user_name': 'Antartica',
  'may_query': True,
  'datasets_list': [{'dataset_name': 'IRIS',
    'initial_epsilon': 10.0,
    'initial_delta': 0.0004,
    'total_spent_epsilon': 0.0,
    'total_spent_delta': 0.0},
   {'dataset_name': 'PENGUIN',
    'initial_epsilon': 10.0,
    'initial_delta': 0.0004,
    'total_spent_epsilon': 0.0,
    'total_spent_delta': 0.0}]},
 {'user_name': 'Bob',
  'may_query': True,
  'datasets_list': [{'dataset_name': 'IRIS',
    'initial_epsilon': 10.0,
    'initial_delta': 0.0004,
    'total_spent_epsilon': 0.0,
    'total_spent_delta': 0.0}]},
 {'user_name': 'Dr. Antartica',
  'may_query': True,
  'datasets_list': [{'dataset_name': 'PENGUIN',
    'initial_epsilon': 15.0,
    'initial_delta': 0.001,
    'total_spent_epsilon': 13.95,
    'total_spent_delta': 0.0002249947499999294}]},
 {'user_name': 'Mrs. Daisy',
  'may_query': True,
  'datasets_list': [{'dataset_name': 'IRIS',
    'initial_epsilon': 10.0,
    'initial_delta': 0.001,
    'total_spent_epsilon'

Do not hesitate to re-run this command after every other command to ensure that everything runs as expected.

### Remove user
You have just heard that the penguin named Coldheart might have malicious intentions and decide to remove his access until an investigation has been carried out. To ensure that he is not allowed to do any more queries, run the following command:

In [21]:
run_command("python mongodb_admin.py set_may_query --user 'Mr. Coldheart' --value False")

'Set user Mr. Coldheart may query.'

Now, he won't be able to do any query (unless you re-run the query with --value True).

A few days have passed and the investigation reveals that he was aiming to do unethical research, you can remove his dataset by doing:

In [22]:
run_command("python mongodb_admin.py del_dataset_to_user --user 'Mr. Coldheart' --dataset 'PENGUIN'")

'Remove access to dataset PENGUIN from user Mr. Coldheart.'

Or delete him completely from the codebase:

In [23]:
run_command("python mongodb_admin.py del_user --user 'Mr. Coldheart'")

'Deleted user Mr. Coldheart.'

Let's see the resulting users:

In [24]:
run_command("python mongodb_admin.py show_collection --collection users", to_dict=True)

[{'user_name': 'Antartica',
  'may_query': True,
  'datasets_list': [{'dataset_name': 'IRIS',
    'initial_epsilon': 10.0,
    'initial_delta': 0.0004,
    'total_spent_epsilon': 0.0,
    'total_spent_delta': 0.0},
   {'dataset_name': 'PENGUIN',
    'initial_epsilon': 10.0,
    'initial_delta': 0.0004,
    'total_spent_epsilon': 0.0,
    'total_spent_delta': 0.0}]},
 {'user_name': 'Bob',
  'may_query': True,
  'datasets_list': [{'dataset_name': 'IRIS',
    'initial_epsilon': 10.0,
    'initial_delta': 0.0004,
    'total_spent_epsilon': 0.0,
    'total_spent_delta': 0.0}]},
 {'user_name': 'Dr. Antartica',
  'may_query': True,
  'datasets_list': [{'dataset_name': 'PENGUIN',
    'initial_epsilon': 15.0,
    'initial_delta': 0.001,
    'total_spent_epsilon': 13.95,
    'total_spent_delta': 0.0002249947499999294}]},
 {'user_name': 'Mrs. Daisy',
  'may_query': True,
  'datasets_list': [{'dataset_name': 'IRIS',
    'initial_epsilon': 10.0,
    'initial_delta': 0.001,
    'total_spent_epsilon'

### Change budget
You also change your mind about the budget allowed to Lord McFreeze and give him a bit more on the penguin dataset.

In [25]:
run_command("python mongodb_admin.py set_budget_field --user 'Lord McFreeze' --dataset 'PENGUIN' \
--field initial_epsilon --value 15.0")

'Set budget of Lord McFreeze for dataset PENGUIN of initial_epsilon to 15.0.'

In [26]:
run_command("python mongodb_admin.py set_budget_field --user 'Lord McFreeze' --dataset 'PENGUIN' \
--field initial_delta --value 0.005")

'Set budget of Lord McFreeze for dataset PENGUIN of initial_delta to 0.005.'

Let's check all our changes by looking at the state of the database:

In [27]:
run_command("python mongodb_admin.py show_collection --collection users", to_dict = True)

[{'user_name': 'Antartica',
  'may_query': True,
  'datasets_list': [{'dataset_name': 'IRIS',
    'initial_epsilon': 10.0,
    'initial_delta': 0.0004,
    'total_spent_epsilon': 0.0,
    'total_spent_delta': 0.0},
   {'dataset_name': 'PENGUIN',
    'initial_epsilon': 10.0,
    'initial_delta': 0.0004,
    'total_spent_epsilon': 0.0,
    'total_spent_delta': 0.0}]},
 {'user_name': 'Bob',
  'may_query': True,
  'datasets_list': [{'dataset_name': 'IRIS',
    'initial_epsilon': 10.0,
    'initial_delta': 0.0004,
    'total_spent_epsilon': 0.0,
    'total_spent_delta': 0.0}]},
 {'user_name': 'Dr. Antartica',
  'may_query': True,
  'datasets_list': [{'dataset_name': 'PENGUIN',
    'initial_epsilon': 15.0,
    'initial_delta': 0.001,
    'total_spent_epsilon': 13.95,
    'total_spent_delta': 0.0002249947499999294}]},
 {'user_name': 'Mrs. Daisy',
  'may_query': True,
  'datasets_list': [{'dataset_name': 'IRIS',
    'initial_epsilon': 10.0,
    'initial_delta': 0.001,
    'total_spent_epsilon'

## Stop the server: do not do it now !
To tear down the service, first do `ctrl+C` in the terminal where you had done `docker compose up`. Wait for the command to finish executing and then run `docker compose down`. This will also delete all the containers but the volume will stay in place. 