# **NetManager**: Create a Federated Learning Network using MedFL


@Author : [MEDomics consortium](https://github.com/medomics/)

@Email : medomics.info@gmail.com


### Introduction

The `NetManager` module within `MedFL` is responsible for the generation of federated learning networks. It relies on a CSV file containing a DataSet as input. Leveraging this Dataset file, the module creates various nodes within the network, assigns a dataset to each node, and generates the federated dataset for each node. Subsequently, these federated datasets are transferred to the subsequent package, `Learning Manager`.

<img src='../Images/MEDfl_Diagramm.png' />

The NetManager workflow involves five primary steps:

1. **Network creation**
2. **DataSets storage**
3. **Nodes Creation**
4. **FLsetup Creation**
5. **Federated DataSet Creation**
   
<img src='../Images/NetManager_Diagramm.png' width="70%" style="display : block ; margin :0 auto " />

In [1]:
import sys
sys.path.append('../..')

import os
os.environ['PYTHONPATH'] = '../..'

imports

In [2]:
import mysql.connector
import pandas as pd
from sqlalchemy import create_engine,text

## MedFL Imports 
from Medfl.NetManager.node import Node
from Medfl.NetManager.network import Network
from Medfl.NetManager.dataset import DataSet
from Medfl.NetManager.flsetup import FLsetup

# Utils
from Medfl.LearningManager.utils import *


### 0. DB Preparation

DB Creation: 
Check the `1_DB-checkpoint` tutorial to learn more about this step

In [3]:
# DB Creation 
# !python ../../scripts/create_db.py

# clearn DB 
empty_db()

### 1. Network Creation




<div style="display : flex">
<div style="width : 65%">

**Automatically** | **Manually**
------------------|---------------------------
 This method used when we don't have an idea about hospitals name, their datasets. so, the creation will based on some  variables on the datasets, for example in our Proof-Of-Concept with the *eICU* dataset, there are two variables that can be used, `site_hosiptal`, and `site_region`.  <br> Note that, hospitals can participate for <b>training</b> or for <b>testing</b> only. (this may be changed to support both modes) . <br> the set of hospitals or (Nodes) perform a network, and on MEDfl terminology, a network and its additional informations, refered as a <b>FLsetup (Federated Learning Setup)</b> <br>After the creation of an <b>FLsetup</b> it will be stored on the DB, by a unique ID, a name, a description, and a creation date.| Martin can also create a FLsetup manually, create a network, and add each node separately, then uploade a dataset on each node.


   
</div>

<img width="270px"  height="250px" style="display:block ; margin : 0 auto" src='../Images/NetworkCreation.png' />
</div>

##### 1.1 Auto Network Creation

In [4]:
# Create a nest work "Net_1"
Net_1 = Network(name="Auto_Net")
Net_1.create_network()

Net_1.name

'Auto_Net'

### 2. Data Storage
<h4>MasterDataSet Creation</h4>

Using the Methode `create_master_dataset` we will create a new table **MasterDataset**, and upload the data of `path_to_csv` to it, ( path_to_csv is optional and by default is the file specified on `Medfl.LearningManager.params.yaml` : `path_to_master_csv`  )
The MasterDataSet serves dual purposes within the network:

Creation methode | **Auto-Creation of Nodes:** | **Manual Creation of Nodes:**
-----------------|-----------------------------|------------------------------|
Puprpose         |- When automatically creating nodes, the MasterDataSet plays a pivotal role in dividing the data across various train and test nodes.|- In scenarios involving manual creation of nodes, the MasterDataSet acts as a reference point to verify compatibility between different dataSets and the MasterDataSet.
  
To Create MasterDataSet 3 main steps are executed: 
1. **Create a MasterDataSet Table on the DB** 
2. **Read The CSV DATA file**
3. **Copy the Data from the CSV file to the DB**

We Will use the CSV file `sapsii_score_knnimputed_eicu.csv` as a MasterDataSet

In [5]:
# Read the CSV file
data = pd.read_csv('D:\ESI\\3CS\PFE\last_year\Code\MEDfl\\notebooks\data\eicu_sapsii_data.csv')

data.head()

Unnamed: 0,patientunitstayid,hadm_id,subject_id,hospitalid,deceased,age,stay_length,organdonor,bicarbonate_min,bicarbonate_max,...,uo,aids,hem,mets,admissiontype,sepsis3,unitvisitnumber,unitvisitnumberrevised,num_admiss,num_hosp
0,141650,129307,002-17293,73,0,71,5787,0,23.0,28.0,...,,,,,6.0,0,1,1,1,1
1,141777,129401,002-44088,73,0,60,1559,0,27.0,28.0,...,,0.0,0.0,0.0,6.0,0,1,1,1,1
2,141907,129494,002-60303,63,0,68,1135,0,23.0,23.0,...,2380.0,,,,6.0,0,1,1,1,1
3,141939,129522,002-66368,69,0,56,448,0,,,...,50.0,0.0,0.0,0.0,6.0,0,1,1,1,1
4,141978,129552,002-25450,67,0,77,1104,0,23.0,30.0,...,,0.0,0.0,0.0,0.0,0,1,1,2,2


In [4]:
# Create a MasterDataSet from Net_1
Net_1.create_master_dataset()

# Check if the Network has a masterDataSet Table ( 1: Table exists ; 0: Table doesn't exist)
Net_1.mtable_exists

NameError: name 'Net_1' is not defined

### 3. Federeated Learning setup Creation (FlSetup creation)

In [7]:
# auto FLsetup creation
autoFl  = FLsetup(name = "Flsetup_1", description = "The first fl setup",network = Net_1)
autoFl.create()

# List all setups 
FLsetup.list_allsetups()

Unnamed: 0,FLsetupId,name,description,creation_date,NetId,column_name
0,1,Flsetup_1,The first fl setup,2023-12-31 22:44:38,1,


### 4. Nodes Creation:

<div style="display : flex">
<div style="width : 65%">

Each node within the Network represents an FL Client on the Network and MEDFl packages provide two distinct approaches for node creation (**Auto Creation , Manually Creation**):

**Auto Creation** | **Manually Creation**
------------------|----------------------
In this method, nodes are generated automatically within the Network based on specified values from a particular column in the masterDataSet, as designated by the User. Following node creation, each node's DataSets are automatically assigned from the MasterDataSet.|In this method, nodes are manually created, and DataSets are separately and manually assigned to each individual node


   
</div>

<img width="270px"  height="250px" style="display:block ; margin : 0 auto" src='../Images/nodecreation.png' />
</div>





Let's start with the auto creation and see How the process goes : 

The user should create a parameters dictionary that contains  :
<ul> 
    <li>The name of the column that be used to create the nodes, which is the main element in the <b>Automatic Method</b></li> 
    <li> The lists of the train/test nodes</li>
    </ul> 
and parse it to the create_nodes_from_master_dataset function from the FLsetup class.<br>
Node is also an object, and it will be stored on the DataBase.
   

In [8]:
params_dict = {'column_name' : 'site_region','train_nodes' : ["Midwest","South"] , 'test_nodes' : ['West','Northeast'] }

eicu_nodes = autoFl.create_nodes_from_master_dataset(params_dict = params_dict )

[node.name  for node in eicu_nodes]  

1
1
0
0


['Midwest', 'South', 'West', 'Northeast']

In [9]:
# List all setups 
FLsetup.list_allsetups()

Unnamed: 0,FLsetupId,name,description,creation_date,NetId,column_name
0,1,Flsetup_1,The first fl setup,2023-12-31 22:44:38,1,site_region


### 5. Creating a Federated Dataset

In this phase, we will divide the data from each node into the following segments:

<div style="display : flex">
<div style="width : 50%">

1. **Train Loader:** This segment is utilized for training purposes at each node.
2. **Validation Loader:** Used for validation during the training phase.
3. **Test Loaders:** Utilized to test the model within the test nodes.
4. **Holdout Data:** This dataset is reserved for the final testing of the model after the Federated Learning (FL) process.
   
</div>

<img width="30%" style="display:block ; margin : 0 auto" src='../Images/FlDatasetDiagramm.png' />
</div>


To generate an FL DataSet, the method `create_federated_dataset` is employed, which requires several arguments:

1. `output`: The required argument indicating the output feature of our dataset.
2. `fit_encode`: An array of features to be encoded (typically string type features encoded to integers). By default, it's an empty array.
3. `to_drop`: An array of features to be removed from the dataset. By default, it's an empty array.
4. `fill_strategy`: A strategy for handling missing values in the dataset. Default set to 'mean'.
5. `test_frac`: The fraction of the dataset allocated for testing. Default set to 0.2.
6. `val_frac`: The fraction of the dataset allocated for validation. Default set to 0.1.






In [10]:
# Create a Federated DataSet for the autoFL
fl_dataset = autoFl.create_federated_dataset(
    output="event_death", 
    fit_encode=["site_hospital", "site_region"], 
    to_drop=[ "event_death" , "id"], 
 )

In [11]:
# Get the Federated DataSet of the auto FL 
data = autoFl.get_flDataSet()
data

Unnamed: 0,FedId,FLsetupId,FLpipeId,name
0,1,1,,Flsetup_1_Feddataset


### 1.2 Manualy Creation of a Network

The manual setup of FLsetup using MEDfl differs slightly from the automated method but operates within the same scope. The main distinction lies in creating all objects (FLsetup, network, training nodes, and test nodes) manually.

To accomplish this, we will undertake the following steps:

- Construct a network.
- Generate a Master Dataset, utilizing the master_dataset table to ensure uniform dataset formats across all nodes (following horizontal federated learning principles).
- Establish training and testing nodes within the network.
- Upload datasets to each respective node.
- Create the FLsetup object to streamline organization and simplify the storage process.

In [5]:
# Initiate the network object
network_man = Network(name="man_network")
# Create the network and store it 
network_man.create_network()

network_man.name

'man_network'

In [6]:
# List all networks on the DB 
Network.list_allnetworks()

Unnamed: 0,NetId,NetName
0,1,man_network


In [7]:
network_man.create_master_dataset()

  data_df.fillna(data_df.mean(), inplace=True)


2. Nodes Creations

In order to manually create a node, it's necessary to define both the name and the node's type. If `train = 1` , it indicates a train node; otherwise, it signifies a test node `train = 0`.


In [8]:
# Create 3 nodes 
hospital_1,hospital_2,hospital_3 = Node(name = "hospital_1", train = 1),Node(name = "hospital_2", train = 1),Node(name = "hospital_3", train = 0) 

In [9]:
# Assign the 3 nodes to the man_network 
network_man.add_node(hospital_1)
network_man.add_node(hospital_2)
network_man.add_node(hospital_3) 

1
1
0


In [10]:
# List all created nodes on the db 
Node.list_allnodes()

Unnamed: 0,NodeId,NodeName,train,NetId
0,1,hospital_1,1,1
1,2,hospital_2,1,1
2,3,hospital_3,0,1


### 3. Upload DataSets To nodes 

For uploading a dataSet to a node, it's essential to provide a name for the dataset and the file path of the CSV file containing the data. 

In [12]:
# Define the path of the files 
Ds_1 = 'D:\\ESI\\3CS\\PFE\\last_year\\Code\\MEDfl\\notebooks\\eicu_test_1.csv'
Ds_2 = 'D:\\ESI\\3CS\\PFE\\last_year\\Code\\MEDfl\\notebooks\\eicu_test_2.csv'
Ds_3 = 'D:\\ESI\\3CS\\PFE\\last_year\\Code\\MEDfl\\notebooks\\eicu_test_3.csv'

# pload the DataSets 
hospital_1.upload_dataset( dataset_name = "hospital_1_dataset" , path_to_csv=Ds_1  )
hospital_2.upload_dataset( dataset_name = "hospital_2_dataset" , path_to_csv=Ds_2)
hospital_3.upload_dataset( dataset_name = "hospital_3_dataset" , path_to_csv=Ds_3 )

### 4. Create the FLsetup 

Now, let's create the Federated Learning setup of the manual network `man_network`

In [17]:
# Create FLsetup for man_network 
fl_setup = FLsetup(name = "Manual_Flsetup", description = "The first manual fl setup",network = network_man)
fl_setup.create()

In [18]:
# List all created Setups 
FLsetup.list_allsetups()

Unnamed: 0,FLsetupId,name,description,creation_date,NetId,column_name
0,1,Flsetup_1,The first fl setup,2023-12-31 22:44:38,1,site_region
1,2,Manual_Flsetup,The first manual fl setup,2024-01-07 17:11:09,2,


### 5. Create the Federated DataSet

finaly, we will create the federated DataSet the `man_network` and pass it to the `Learning manager` package

**when creating a fl Dataset for a manual network you need always to drop these two comumns `DataSetName` and `NodeId`** 

In [19]:
# Create FLDataSet
fl_dataset = fl_setup.create_federated_dataset(
    output="event_death", 
    fit_encode=["site_hospital", "site_region"], 
    to_drop=[ "event_death" , "id" , "DataSetName" , "NodeId"], 
 )

In [20]:
# Get the Federated DataSet of the manual FL 
data = fl_setup.get_flDataSet()
data

Unnamed: 0,FedId,FLsetupId,FLpipeId,name
0,2,2,,Manual_Flsetup_Feddataset


# THE END 

<img src='../Images/netMan.png' width="50%"  />

By now we completed the workflow of the first sub package of `MedFl` whiwh is the `Learning Manager` subpackage, we started with a csv file of data and have successfully generated our federated dataset.

Throughout this process, we employed two distinct methods to create the FedDataset: automated creation and manual creation.