# Scikit-Learn SVM with NVFLARE


## Prepare data

In this section, we will download the data and split the data and save to the local disk

### Download data

In [1]:
from utils.prepare_data import download_data

The download data function will download one of the two datasets from Scikit-learn: Iris or Cancer
* the file will be save to the output directory 
* the file format will be CSV format with comma separated
* the file will be remove the header 
* default dataset is iris
* filename = dataset name


In [2]:
output_dir="/tmp/nvflare/sklearn/data"
download_data(output_dir)

Verify the file is downloaded


In [3]:
!ls {output_dir}

iris.csv


### Split Data
* **Split Method**


Split the data into different datasets, one for each client. 
There are several split methods, we use test our algorithms in different scenarios. Here we just pick uniform split from the followns
* Uniform 
* linear
* Sqare
* Exponential



* **data store method**

There are two approaches to store the splited data 
* STORE DATA: 

similar to the real application, we split the data total into different directories (sites), and each client will ready one-site's data

```
   /tmp/nvflare/sklearn/data/site-1/iris.csv
   /tmp/nvflare/sklearn/data/site-2/iris.csv
   /tmp/nvflare/sklearn/data/valid/iris.csv
```

* STORE_INDEX: 

simulate the split, by assign data index range for each site, but the original file is not splited. The data loader is reading from the original file but only for the data within the index range
  For example: the index assignment for the data split is captured in a json file
 ``` 
  {
     "data_path" : "/tmp/nvflare/sklearn/data/iris.csv"
     "data_index" : {
         "site-1": {"start": 100, "end": 300},
         "site-2": {"start": 301, "end": 600},
     }
  }
 ```

Here we choose STORE_DATA approach

In [7]:
from utils.prepare_data_split import split_data, SplitMethod

In [8]:
input_path = "/tmp/nvflare/sklearn/data/iris.csv"
output_dir = "/tmp/nvflare/sklearn/data"
site_num = 2
valid_frac = 0.3
split_method: SplitMethod = SplitMethod.UNIFORM

In [9]:

split_data(input_path, output_dir, site_num, valid_frac, split_method=split_method)

In [10]:
!ls -l {output_dir}

total 20
-rw-rw-r-- 1 chester chester  316 Dec 16 22:15 data_split.json
-rw-rw-r-- 1 chester chester 3000 Dec 16 21:13 iris.csv
drwxrwxr-x 2 chester chester 4096 Dec 16 21:16 site-1
drwxrwxr-x 2 chester chester 4096 Dec 16 21:16 site-2
drwxrwxr-x 2 chester chester 4096 Dec 16 21:16 valid


In [11]:
! head -n 10 {output_dir}/site-1/iris.csv

0.0,4.8,3.0,1.4,0.3
0.0,5.1,3.8,1.6,0.2
0.0,4.6,3.2,1.4,0.2
0.0,5.3,3.7,1.5,0.2
0.0,5.0,3.3,1.4,0.2
1.0,7.0,3.2,4.7,1.4
1.0,6.4,3.2,4.5,1.5
1.0,6.9,3.1,4.9,1.5
1.0,5.5,2.3,4.0,1.3
1.0,6.5,2.8,4.6,1.5


## Config Jobs

ToDo 

## Running Job

### FL Simulator

In [12]:
! nvflare simulator -w /tmp/nvflare/ -n 2 -t 2 job_configs/sklearn_svm_base

2022-12-16 22:15:24,816 - SimulatorRunner - INFO - Create the Simulator Server.
2022-12-16 22:15:24,865 - nvflare.fuel.hci.server.hci - INFO - Starting Admin Server localhost on Port 58905
2022-12-16 22:15:24,867 - SimulatorServer - INFO - starting insecure server at localhost:57749
2022-12-16 22:15:24,869 - SimulatorRunner - INFO - Deploy the Apps.
2022-12-16 22:15:24,871 - SimulatorRunner - INFO - Create the simulate clients.
2022-12-16 22:15:25,055 - ClientManager - INFO - Client: New client site-1@127.0.0.1 joined. Sent token: 14fe139d-bcf0-4b22-adbf-23d5353bf5f1.  Total clients: 1
2022-12-16 22:15:25,055 - FederatedClient - INFO - Successfully registered client:site-1 for project simulator_server. Token:14fe139d-bcf0-4b22-adbf-23d5353bf5f1 SSID:
2022-12-16 22:15:25,239 - ClientManager - INFO - Client: New client site-2@127.0.0.1 joined. Sent token: 3663a46a-9e69-4787-a03d-14fec5ddf369.  Total clients: 2
2022-12-16 22:15:25,239 - FederatedClient - INFO - Successfully registered cli

In [10]:
!ls -l  /tmp/nvflare/simulate_job/app_site-1

total 16
-rw-rw-r-- 1 chester chester    0 Dec 16 21:55 audit.log
drwxrwxr-x 2 chester chester 4096 Dec 16 21:55 config
drwxrwxr-x 3 chester chester 4096 Dec 16 21:55 custom
-rw-rw-r-- 1 chester chester   40 Dec 16 21:55 events.out.tfevents.1671256551.RTX.21091.0
-rw-rw-r-- 1 chester chester 1454 Dec 16 21:55 log.txt


In [2]:
!ls -l  /tmp/nvflare/sklearn/model

total 0
