# FL Advanced Features 

This notebook will walk you through advanced features of FL as:
1. Configure privacy to clients 
1. Bring your own Aggregator to FL
1. Bring your own privacy 
1. Running realistic FL experiment


## Prerequisites
Before you continue make sure you are comfortable with FL and you have 
- Ran [Provisioning Notebook](Provisioning.ipynb) and started the server.
- Ran [Client Notebook](Client.ipynb) checks for each client (at least 1 client).
- Ran through [Admin Notebook](Admin.ipynb) 


### Resources
We encourage you to watch the free GTC 2021 talks covering Clara Train SDK
- [Clara Train 4.0 - 201 Federated Learning [SE3208]](https://gtc21.event.nvidia.com/media/Clara%20Train%204.0%20-%20201%20Federated%20Learning%20%5BSE3208%5D/1_m48t6b3y)
- [Federated Learning for Medical AI [S32530]](https://gtc21.event.nvidia.com/media/Federated%20Learning%20for%20Medical%20AI%20%5BS32530%5D/1_z26u15uk)


# Lets get started

We will continue using the project1 we created in previous notebooks. 
Lets look into start by installing tree to look at directory structures 

In [None]:
MMAR_DIR="/claraDevDay/FL/project1/"
!apt-get install tree


#### Recommended JupyterLab setup 
The Admin tool runs in an interactive shell therefore unfortunately we can't have cells to run the notebook. 
Instead we recommend you open a terminal to be as image below. 
You could also open multiple terminals if you are interested in seeing what is the output for the client and server. 
Below we have this notebook on the left, server and 2 clients on the top right and the admin shell on the bottom right
<br>![fl](screenShots/JLabLayout.png)<br>

## 1- Make sure server and clients are connected 
in the terminal go to admin folder in side your project and run
```
./fl_admin.sh
``` 
type `admin@admin.com` 



Check server/client status
type 
```
> check_status server
> check_status client
```



**Reminder**<br>
Just a reminder you should:

1. `> set_run_number <123>`
1. `> upload_folder ../../../adminMMAR_privacy`
2. `> deploy adminMMAR_privacy server`
3. `> deploy adminMMAR_privacy client` 


Then to start training you should:

5. `> start server`
6. `> start client`
7. `> check_status server` and/or `> check_status client`   
8. `> cat server log.txt` to get logs from clients 


## 2. Privacy and Model Protection 
To mitigate the risk of recovering the training data from the trained model, 
which is also commonly known as reverse engineering or model inversion, 
we provide a configurable client-side privacy control based on the differential-privacy (DP) technique. 
During training, each client could have their own privacy policy and could be updated by the admin client during training. 

The DP protection consists of two major components: selective parameter update and sparse vector technique (SVT):
For selective parameter update, the client only sends a partial of the model weights/updates, 
instead of the whole, to limit the amount of information shared. 
This is achieved by:
1. only uploading the fraction of the model weights/updates whose absolute values are greater than a predefined threshold or percentile of the absolute update values; 
2. further replacing the model weights by clipping its value to a fixed range. 
Sparse vector technique operates on a random fraction of the weights/updates x by first adding a random noise to its absolute value abs(x)+Lap(s); 
then share the clipped noisy value clip(x+Lap(s), ) iff the thresholding condition is satisfied. 
Here abs(x) represents absolute value, Lap(x) denotes a random valuable sampled from the Laplace distribution, 
is a pre-defined threshold, clip(x,) denotes clipping of x value to be in the range of [-,]. 

For detail, please refer to [W, Li. et, al “Privacy-preserving Federated Brain Tumour Segmentation”,  arXiv preprint arXiv:1910.00962 (2019)](https://arxiv.org/abs/1910.00962)
The experimental results show that there is a tradeoff between model performance and privacy protection costs.


## 2.1 Use built in privacy Algorithms in Clara SDK
You can try enabling privacy on each client to drop percentage of the weights using the privacy section in the `client.json`. 
For this check files in  


You can do one or more of the following options:
1. Use PercentileProtocol Algorithm by editing your client `config_fed_client.json`. 
You should add section below to the `outbound_filters`
    ```
      {
          "name": "PercentilePrivacy",
          "args": {
              "gamma": 0.1
          }
      },
    ``` 
or you could copy `config_fed_client_w_PercentileProtocol.json` to your client config_fed_client.json by

In [None]:
clientNo="client1"
runNo="1"
! cp $MMAR_DIR/../adminMMAR_privacy/config/config_fed_client_w_PercentileProtocol.json $MMAR_DIR/$clientNo/run_$runNo/mmar_$clientNo/config/config_fed_client.json

In [None]:
!tree $MMAR_DIR/$clientNo/run_$runNo

**You should now try re running an FL experiments with** 
```
> start server
> start client
```
You should see the training go through successfully. 
To should train a realistic model and see the accuracy effect as you change the privacy parameters  


2. SVTProtocol Algorithm by copying section below into your client `config_fed_client.json`
You should add section below to the `outbound_filters`

```
  {
      "name": "SVTPrivacy",
      "args": {
        "fraction":0.1,
        "epsilon":0.1,
        "noise_var":0.1,
        "gamma":1e-5,
        "tau":1e-6
      }
  }
```


In [None]:
clientNo="client3"
runNo="1"
!cp $MMAR_DIR/../adminMMAR_privacy/config/config_fed_client_w_SVT.json \
  $MMAR_DIR/$clientNo/run_$runNo/mmar_$clientNo/config/config_fed_client.json

In [None]:
!tree $MMAR_DIR/$clientNo/run_$runNo

**You should now try re running an FL experiments with** 
```
> start server
> start client
```
You should see the training go through successfully. 
To should train a realistic model and see the accuracy effect as you change the privacy parameters  


# 3. Model Aggregator
Model aggregation happens on the server as pacified in the `config_fed_server.json` file. Clara train comes with built in aggregator 
## 3.1 Built in Aggregator
This aggregator is based on algorithm in [Federated Learning for Breast Density Classification: A Real-World Implementation](https://arxiv.org/abs/2009.01871).
The ModelAggregator computes a weighted sum of the model gradients from each client, 
where the default weights are based on the number of training iterations that the client executed in this round of FL. 
The user can further adjust the client’s weights by adding additional custom weights in the arguments of this component.

you can adjust this by changing arguments in the aggregation section of the `config_fed_server.json`
```
"aggregator":
  {
    "name": "InTimeAccumulateWeightedAggregator",
    "args": {
      "aggregation_weights":
          {
            "client0": 1,
            "client1": 1.5,
            "client2": 0.8
          }
    }
  },
``` 
## 3.2 Bring your own Aggregator to FL
A sample of writing your own aggregator is shown in [custom_aggregator.py](adminMMAR/custom/custom_aggregator.py).
This file was already in the custom folder that we have been using. 
Therefore, in the `config_fed_server.json`  you simply need to change the component `name` tag into `path` and point to your code as below  

```
"aggregator":
  {
    "path": "BYO_aggregator.MyJustInTimeAggregator",
    "args": {
    }
  },
```
This change is already in the `config_fed_server.json` 

In [None]:
# runNo="10"
# ! cp $MMAR_DIR/../adminMMAR_privacy/config/config_fed_server.json $MMAR_DIR/server/run_$runNo/mmar_server/config/config_fed_server.json

In [None]:
# !tree $MMAR_DIR/server/run_$runNo


**You should now try re running an FL experiments with** 
```
> start server
> start client
```


# 4. Security 
As you might have noticed from this notebook there are couple of security issues with the current setup.

# 4.1 Disallow BYOC  
First issues is BYOC code is pushed by admin in the custom folder to all clients. 
This is potential threat as admin could write any malicious code to run on the client.

The solution to this is to rule groups. 
There you can set `BYOC` and/or `allow custom datalist` to false.
This will disallow uploading the custom folder of the mmar. 
If research lead tries to do this he would get error 
```
Error: Authorization Error: the MMAR contains custom code, which is not allowed on site "org1-a" and reject any uploads
``` 
You can test this by starting client3 which has this strict rule and try to upload the mmar.

For clients how are more strict the workflow should be:
1. Lead researched send custom folder to client via email.
2. Client review the code and verify it is not doing any harm
3. Client should place custom code in `/local/custom` 

# 4.2 Roles and Rights
All experiments till now, we have logged in as super user `admin@admin.com` who has all rights to de everything. 
You should now try to log in with restricted user who have limited action.

in the admin terminal you should: 
1. `cd project1/leadIT/startup`
2. `./fl_admin.sh`
3. log in as `leadIT@org1.com`
   
In the admin terminal you should: 
1. `cd project1/leadIT/startup`
2. `./fl_admin.sh`
3. log in as `siteresearcher@org2.com`

# 4.3 Disallow all Roles and Rights
In some extreme cases some clients would want to have the only access to their client that is lead researcher is not allowed to upload any mmar files. 
This guaranties total security, however this would add the constant communication overhead with every run to move mmars into a run folder. 

Never the less this option is provided by steps below (see image):
1. creating an new rights group (for Example: `only4secure`)
2. Make sure other groups don't have any access
3. Give access to  self deploy and train for this group
<br><img src="screenShots/Authorization4Sevure.png" alt="Drawing" style="height: 300px;"/><br>

In our Example, we have set up `client4` to be super secure with the only user allowed to upload is user `leadIT@secure.com`

To test this actually works:
1. cd to client4 and start the client
    1. `cd /claraDevDay/FL/project1/client4/startup`
    2. `./start.sh`
2. cd to leadIT and log in
    In the admin terminal you should: 
    1. `cd project1/leadIT/startup`
    2. `./fl_admin.sh`
    3. log in as `leadIT@secure.com`
3. try to upload mmar to client4 using `> deploy adminMMAR client client4` 
4. You should get message below 
```
Error: Authorization Error: you are not authorized to deploy MMAR to "client4"
Done [15666 usecs] 2020-10-26 18:55:39.807613
```


# Next steps:
You can now move to more advanced features of FL as:
- [Bring Your Own Trainer Notebook](FLBYOTrainer.ipynb)
- [Homomorphic Encryption Notebook](Homomorphic_Encryption.ipynb)


# Exercise:
### 1. Redo steps above the proper way
In the steps above we cut some corners as copying files / configs between admin and server or clients. 
All of these steps should run from with in the admin console. 
As stated in the Admin notebook, the correct way to do this is to 
1. set a run number 
1. Upload files from admin transfer folder to server staging 
2. run `> deploy server` to copy files from staging to server run folder
3. run `> deploy client clientname` to copy files from staging to client 
4. start training on server and client 


# Security:
As you might have noticed from this notebook there are couple of issues with the current setup. 
Both issues below could be restricted when provisioning the project. 
Check out [Provisioning](Provisioning.ipynb) Notebook Step 4   

1. BYOC code is pushed by admin in the custom folder to all clients. 
This is potential threat as admin could write any malicious code. 
If this option is restricted then the BYOC code needs to be emailed or passed to clients, 
then clients would check the code and place it in the custom folder so it is executed    
2. Clients privacy: the method is set in the client config file. 
However, admin can push and override that the client does/want. 

