# Federated Hierarchical Statistics 

In this example, we will show how to generate federated hierarchical statistics for data that can be represented as Pandas Data Frame.

## Set Up NVFLARE

Follow the [Getting Started](https://nvflare.readthedocs.io/en/main/getting_started.html) to setup virtual environment, get the latest NVFLARE source, build it and  install NVFLARE.

## Install requirements
First, install the required packages:

In [None]:
%pip install -r requirements.txt


## Prepare data

In this example, we are using synthetic anonymous students scores datasets generated for student belonging to 7 different universities.

Run the script `prepare_data.sh` that generates 7 different datasets each having random number of entries between 1000 to 2000. Each entry in the datasets has three columns - `Pass`, `Fail` and `Percentage`. `Pass`/`Fail` represents whether the particular student passed or failed the exam and `Percentage` represents the overall percentage marks scored by the student.



In [8]:
from utils.prepare_data import prepare_data

prepare_data()

Preparing data at data directory `/tmp/nvflare/data/hierarchical_stats/`...

CSV file `university-1.csv` is generated with 1001 entries for client `university-1` at /tmp/nvflare/data/hierarchical_stats/university-1.
CSV file `university-2.csv` is generated with 1730 entries for client `university-2` at /tmp/nvflare/data/hierarchical_stats/university-2.
CSV file `university-3.csv` is generated with 1263 entries for client `university-3` at /tmp/nvflare/data/hierarchical_stats/university-3.
CSV file `university-4.csv` is generated with 1037 entries for client `university-4` at /tmp/nvflare/data/hierarchical_stats/university-4.
CSV file `university-5.csv` is generated with 1497 entries for client `university-5` at /tmp/nvflare/data/hierarchical_stats/university-5.
CSV file `university-6.csv` is generated with 1454 entries for client `university-6` at /tmp/nvflare/data/hierarchical_stats/university-6.
CSV file `university-7.csv` is generated with 1271 entries for client `university-7` at /

## Run job in FL Simulator

**Run Job using Simulator API**


In [None]:
from nvflare.private.fed.app.simulator.simulator_runner import SimulatorRunner
runner = SimulatorRunner(job_folder="jobs/hierarchical_stats", workspace="/tmp/nvflare/hierarchical_stats", clients="university-1,university-2,university-3,university-4,university-5,university-6,university-7", n_clients = 7, threads=7)
runner.run()

**Run Job using Simulator CLI**

From a **terminal** one can also the following equivallent CLI

```
cd NVFlare/examples/advanced/hierarchical_stats
nvflare simulator hierarchical_stats/jobs/hierarchical_stats -w /tmp/nvflare/hierarchical_stats/ -n 7 -t 7 -c university-1,university-2,university-3,university-4,university-5,university-6,university-7

```

assuming the nvflare is installed from a **terminal**. doing pip install from the notebook cell directory with bash command (! or %%bash) may or may not work depending on which python runtime kernel selected. Also %pip install or %pip install from notebook cell doesn't register the console_scripts in the PATH.   


## Examine the result




The results are stored in workspace "/tmp/nvflare"
```
/tmp/nvflare/hierarchical_stats/server/simulate_job/statistics/hierarchical_stats.json
```

In [None]:
cat /tmp/nvflare/hierarchical_stats/server/simulate_job/statistics/hierarchical_stats.json

## Visualization
We can visualize the results easly via the visualization notebook. Before we do that, we need to copy the data to the notebook directory 


In [None]:
! cp /tmp/nvflare/hierarchical_stats/server/simulate_job/statistics/hierarchical_stats.json demo/.

now we can visualize via the [visualization notebook](demo/visualization.ipynb)

## We are done !
Congratulations, you just completed the federated hierarchical stats calulation with data represented by data frame!
