**Utilities**


# Data Frame Federated Statistics 

In this example, we will show how to generate federated statistics for data that can be represented as Pandas Data Frame

## Setup NVFLARE

follow the [Getting_Started](https://nvflare.readthedocs.io/en/main/getting_started.html) to setup virtual environment and install NVFLARE

You can also follow this [Getting Started](../../../../getting_started.ipynb) Notebook to setup. 



## Install NVFLARE (from Pypi) 

In [None]:
!python -m pip install nvflare


## Install from Source



In [None]:
 !pwd

In [None]:
!cd ../../../
!export NVFLARE_HOME=$( cd ../. && pwd )
!echo $NVFLARE_HOME

In [None]:
!echo $NVFLARE_HOME


In [None]:
!pip install -e $NVFLARE_HOME

## Install requirements
lets first install required packages.


In [None]:
!pip install -r df_stats/requirements.txt


## Prepare data

In this example, we are using UCI (University of California, Irwin) [adult dataset](https://archive.ics.uci.edu/ml/datasets/adult)
The original dataset has already contains "training" and "test" datasets. Here we simply assume that "training" and test data sets are belong to different clients.
so we assigned the training data and test data into two clients.
 
Now we use data utility to download UCI datasets to separate client package directory to /tmp/nvflare/data/ directory



In [9]:
!df_stats/prepare_data.sh


prepare data for data directory /tmp/nvflare/df_stats/data

wget download to /tmp/nvflare/df_stats/data/site-1/data.csv
100% [......................................................] 3974305 / 3974305
wget download to /tmp/nvflare/df_stats/data/site-2/data.csv
100% [......................................................] 2003153 / 2003153
done with prepare data


## Run job in FL Simulator

With FL simulator, we can just run the example with CLI command 



In [None]:
! nvflare simulator df_stats/jobs/df_stats -w /tmp/nvflare/df_stats -n 2 -t 2



The results are stored in workspace "/tmp/nvflare"
```
/tmp/nvflare/df_stats/simulate_job/statistics/adults_stats.json
```

In [None]:
cat /tmp/nvflare/df_stats/simulate_job/statistics/adults_stats.json

## Visualization
We can visualize the results easly via the visualizaiton notebook. Before we do that, we need to copy the data to the notebook directory 


In [None]:
! cp /tmp/nvflare/df_stats/simulate_job/statistics/adults_stats.json df_stats/demo/.

now we can visualize via the [visualization notebook](df_stats/demo/visualization.ipynb)

We are not quite done yet. What if you prefer to use python API instead CLI to run jobs. Lets do that in this section

## Run Job using Simulator API
This should be the same as running in command CLI via nvflare simulator

In [None]:
from nvflare.private.fed.app.simulator.simulator_runner import SimulatorRunner
runner = SimulatorRunner(job_folder="df_stats/jobs/df_stats", workspace="/tmp/nvflare/df_stats", n_clients = 2, threads=2)
runner.run()

## Run Job using FLARE API in POC mode

So far, we having using Simulator to simulate the federated job run. With [FLARE API](../../../tutorial/flare_api.ipynb) , you can directly interact with NVFLARE system in production or POC mode
In such cases, we have to first setup and deploy the federated system. Since we are running on local machine, we will minic this deploy via Proof of Conccept mode.
Please refer this section to see how to [setup POC mode](../../../tutorial/setup_poc.ipynb) and [detailed POC commands](https://nvflare.readthedocs.io/en/latest/user_guide/poc_command.html#poc-command)

For now, we assume the NVFLARE system is already running in POC mode started with the following commands

```
nvflare poc --prepare -n 2 

nvflare poc --start -ex admin
```

You double check if the flare is running with the following command from **terminal**



In [None]:
! ps -eaf | grep nvflare

If you determine that the flare poc system is not running, you can open a terminal to start the FLARE system in POC mode. 
At this point, assume you have already setup the poc and started the NVFLARE. And we are going to use the default **workspace=/tmp/nvflare/poc**. We will first check the system status

In [11]:
import os
from nvflare.fuel.flare_api.flare_api import new_insecure_session

workspace = "/tmp/nvflare/poc"
admin_dir = os.path.join(workspace, "admin")
sess = new_insecure_session(admin_dir)
print(sess.get_system_info())


SystemInfo
server_info:
status: started, start_time: Mon Mar 13 20:17:35 2023
client_info:
site-1(last_connect_time: Mon Mar 13 20:17:50 2023)
site-2(last_connect_time: Mon Mar 13 20:17:43 2023)
job_info:
JobInfo:
  job_id: 20ba29c5-253e-4d33-a085-a24b72e71885
  app_name: df_stats


###

**submit Job**

In [12]:
examples_dir = os.path.join(admin_dir, "transfer")
job_folder = os.path.join(examples_dir, "advanced/federated-statistics/df_stats/jobs/df_stats")
job_id = sess.submit_job(job_folder)
print(job_id + " was submitted")

d438e85b-2937-4c3a-a6f2-2a4958b7914b was submitted


**Monitoring Job**

You can choose your monitoring output, here is one function to display the job information 

In [13]:
from nvflare.fuel.flare_api.flare_api import Session
def status_monitor_cb(
        session: Session, job_id: str, job_meta, *cb_args, **cb_kwargs
    ) -> bool:
    if job_meta["status"] == "RUNNING":
        if cb_kwargs["cb_run_counter"]["count"] < 3 or cb_kwargs["cb_run_counter"]["count"]%10 == 0:
            print(job_meta)            
        else: 
            print(".", end="")
    else:
        print("\n" + str(job_meta))
    
    cb_kwargs["cb_run_counter"]["count"] += 1
    return True


In [14]:
sess.get_job_meta(job_id)

{'name': 'df_stats',
 'job_folder_name': 'df_stats',
 'resource_spec': {},
 'deploy_map': {'df_stats': ['@ALL']},
 'min_clients': 1,
 'submitter_name': 'admin',
 'submitter_org': 'global',
 'submitter_role': 'super',
 'job_id': 'd438e85b-2937-4c3a-a6f2-2a4958b7914b',
 'submit_time': 1678763877.7957163,
 'submit_time_iso': '2023-03-13T20:17:57.795716-07:00',
 'start_time': '2023-03-13 20:17:58.573500',
 'duration': 'N/A',
 'status': 'RUNNING',
 'job_deploy_detail': ['server: OK', 'site-1: OK', 'site-2: OK'],
 'schedule_count': 1,
 'last_schedule_time': 1678763878.5102963,
 'schedule_history': ['2023-03-13 20:17:58: scheduled']}

In [15]:
sess.monitor_job(job_id, cb=status_monitor_cb, cb_run_counter={"count":0})

{'name': 'df_stats', 'job_folder_name': 'df_stats', 'resource_spec': {}, 'deploy_map': {'df_stats': ['@ALL']}, 'min_clients': 1, 'submitter_name': 'admin', 'submitter_org': 'global', 'submitter_role': 'super', 'job_id': 'd438e85b-2937-4c3a-a6f2-2a4958b7914b', 'submit_time': 1678763877.7957163, 'submit_time_iso': '2023-03-13T20:17:57.795716-07:00', 'start_time': '2023-03-13 20:17:58.573500', 'duration': 'N/A', 'status': 'RUNNING', 'job_deploy_detail': ['server: OK', 'site-1: OK', 'site-2: OK'], 'schedule_count': 1, 'last_schedule_time': 1678763878.5102963, 'schedule_history': ['2023-03-13 20:17:58: scheduled']}
{'name': 'df_stats', 'job_folder_name': 'df_stats', 'resource_spec': {}, 'deploy_map': {'df_stats': ['@ALL']}, 'min_clients': 1, 'submitter_name': 'admin', 'submitter_org': 'global', 'submitter_role': 'super', 'job_id': 'd438e85b-2937-4c3a-a6f2-2a4958b7914b', 'submit_time': 1678763877.7957163, 'submit_time_iso': '2023-03-13T20:17:57.795716-07:00', 'start_time': '2023-03-13 20:17:

<MonitorReturnCode.JOB_FINISHED: 0>

In [16]:
import json

def format_json( data: dict): 
    print(json.dumps(data, sort_keys=True, indent=4,separators=(',', ': ')))


list_jobs_output = sess.list_jobs()
print( format_json(list_jobs_output))

[
    {
        "duration": "0:00:17.934815",
        "job_id": "d438e85b-2937-4c3a-a6f2-2a4958b7914b",
        "job_name": "df_stats",
        "status": "FINISHED:COMPLETED",
        "submit_time": "2023-03-13T20:17:57.795716-07:00"
    },
    {
        "duration": "0:07:38.695158",
        "job_id": "20ba29c5-253e-4d33-a085-a24b72e71885",
        "job_name": "df_stats",
        "status": "RUNNING",
        "submit_time": "2023-03-13T20:10:43.635269-07:00"
    },
    {
        "duration": "0:00:41.948711",
        "job_id": "5c8bb1cc-1332-4775-bf41-787772952fea",
        "job_name": "image_stats",
        "status": "FINISHED:COMPLETED",
        "submit_time": "2023-03-13T19:58:25.565562-07:00"
    }
]
None


In [17]:
list_jobs_output_detailed = sess.list_jobs(detailed=True)
print(format_json(list_jobs_output_detailed))

[
    {
        "deploy_map": {
            "df_stats": [
                "@ALL"
            ]
        },
        "duration": "0:00:17.934815",
        "job_deploy_detail": [
            "server: OK",
            "site-1: OK",
            "site-2: OK"
        ],
        "job_folder_name": "df_stats",
        "job_id": "d438e85b-2937-4c3a-a6f2-2a4958b7914b",
        "last_schedule_time": 1678763878.5102963,
        "min_clients": 1,
        "name": "df_stats",
        "resource_spec": {},
        "schedule_count": 1,
        "schedule_history": [
            "2023-03-13 20:17:58: scheduled"
        ],
        "start_time": "2023-03-13 20:17:58.573500",
        "status": "FINISHED:COMPLETED",
        "submit_time": 1678763877.7957163,
        "submit_time_iso": "2023-03-13T20:17:57.795716-07:00",
        "submitter_name": "admin",
        "submitter_org": "global",
        "submitter_role": "super"
    },
    {
        "deploy_map": {
            "df_stats": [
                "@ALL"
    

###
**Download the result from FL Server**

In [18]:
result_dir = sess.download_job_result(job_id)


In [19]:
! tree {result_dir}

[01;34m/tmp/nvflare/poc/admin/transfer/d438e85b-2937-4c3a-a6f2-2a4958b7914b[0m
├── [01;34mjob[0m
│   └── [01;34mdf_stats[0m
│       ├── [01;34mdf_stats[0m
│       │   ├── [01;34mconfig[0m
│       │   │   ├── config_fed_client.json
│       │   │   └── config_fed_server.json
│       │   └── [01;34mcustom[0m
│       │       ├── df_statistics.py
│       │       ├── [01;34m__pycache__[0m
│       │       │   └── df_statistics.cpython-38.pyc
│       │       └── [01;34mtests[0m
│       │           ├── df_statistics_test.py
│       │           └── [01;34m__pycache__[0m
│       │               ├── df_statistics_test.cpython-38-pytest-7.2.0.pyc
│       │               ├── df_statistics_test.cpython-38-pytest-7.2.1.pyc
│       │               └── simulate_stats_job_test.cpython-38-pytest-7.2.0.pyc
│       └── meta.json
└── [01;34mworkspace[0m
    ├── [01;34mapp_server[0m
    │   ├── [01;34mconfig[0m
    │   │   ├── config_fed_client.json
    │   │   └── config_fed_server.js

Now we can copy the adults_stats.json to the demo folder for visualization

In [20]:
! cp  {result_dir}/workspace/statistics/adults_stats.json df_stats/demo/.

Now we can visualize via the [visualization notebook](df_stats/demo/visualization.ipynb) as before


## Stop POC 
You can use a terminal to stop POC using
```
nvflare poc --stop 
```


## Cleanup
If you like to clean up the temp folders and POC, we need some clean up
* remove downloaded result folder 
* clean up POC workspace

In [21]:
! rm -r {result_dir}

In [22]:
!nvflare poc --clean

/tmp/nvflare/poc is removed


## We are done !
Congratulations, you just completed the federated stats calulation with data represented by data frame
