# Data Frame Federated Statistics 

In this example, we will show how to generate federated statistics for data that can be represented as Pandas Data Frame.

## Set Up NVFLARE

Follow [Getting Started](https://nvflare.readthedocs.io/en/main/getting_started.html) to set up a virtual environment and install NVFLARE.

You can also follow this [notebook](../../nvflare_setup.ipynb) to get set up.

Just to quickly recap the NVFLARE installation:

**NVFLARE Installation**

In [None]:
%pip install 'nvflare>=2.3.0rc5'

In [None]:
! nvflare -V

## Install requirements
First, install the required packages:

In [None]:
%pip install -r df_stats/requirements.txt


## Prepare data

In this example, we are using UCI (University of California, Irwin) [adult dataset](https://archive.ics.uci.edu/ml/datasets/adult)
The original dataset has already contains "training" and "test" datasets. Here we simply assume that "training" and test data sets are belong to different clients.
so we assigned the training data and test data into two clients.
 
Now we use data utility to download UCI datasets to separate client package directory to /tmp/nvflare/data/ directory



In [None]:
!df_stats/prepare_data.sh


## Run job in FL Simulator

With FL simulator, we can just run the example with CLI command 



In [None]:
! nvflare simulator df_stats/jobs/df_stats -w /tmp/nvflare/df_stats -n 2 -t 2



The results are stored in workspace "/tmp/nvflare"
```
/tmp/nvflare/df_stats/simulate_job/statistics/adults_stats.json
```

In [None]:
cat /tmp/nvflare/df_stats/simulate_job/statistics/adults_stats.json

## Visualization
We can visualize the results easly via the visualizaiton notebook. Before we do that, we need to copy the data to the notebook directory 


In [None]:
! cp /tmp/nvflare/df_stats/simulate_job/statistics/adults_stats.json df_stats/demo/.

now we can visualize via the [visualization notebook](df_stats/demo/visualization.ipynb)

We are not quite done yet. What if you prefer to use python API instead CLI to run jobs. Lets do that in this section

## Run Job using Simulator API
This should be the same as running in command CLI via nvflare simulator

In [None]:
from nvflare.private.fed.app.simulator.simulator_runner import SimulatorRunner
runner = SimulatorRunner(job_folder="df_stats/jobs/df_stats", workspace="/tmp/nvflare/df_stats", n_clients = 2, threads=2)
runner.run()

## We are done !
Congratulations, you just completed the federated stats calulation with data represented by data frame
