# Federated Statistics with image data

## Calculate Image Histogram

In this example, we will compute local and global image statistics with the consideration that data is private at each of the client sites.

## Install requirements

In [None]:
%pip install -r image_stats/requirements.txt

## Download data

As an example, we use the dataset from the ["COVID-19 Radiography Database"](https://www.kaggle.com/tawsifurrahman/covid19-radiography-database).
it contains png image files in four different classes: `COVID`, `Lung_Opacity`, `Normal`, and `Viral Pneumonia`.
First create a temp directory, then we download and extract to `/tmp/nvflare/image_stats/data/.`.

In [2]:
%%bash 

# prepare the directory

if [ ! -d /tmp/nvflare/image_stats/data ]; then
  mkdir -p /tmp/nvflare/image_stats/data
fi


Download and unzip the data (you may need to log in to Kaggle or use an API key). Once you have extracted the data from the zip file, you can check the directory to make sure you have the COVID-19_Radiography_Dataset directory at the following location.

In [4]:
ls -l /tmp/nvflare/image_stats/data/.

total 0
drwxr-xr-x@ 12 kevlu  staff  384 Jan 30 21:55 [34mCOVID-19_Radiography_Dataset[m[m/



## Prepare data

Next, create the data lists simulating different clients with varying amounts and types of images. 
The downloaded archive contains subfolders for four different classes: `COVID`, `Lung_Opacity`, `Normal`, and `Viral Pneumonia`.
Here we assume each class of image corresponds to a different sites.




In [5]:
from image_stats.utils.prepare_data import prepare_data

prepare_data(input_dir = "/tmp/nvflare/image_stats/data", 
             input_ext = ".png",
             output_dir ="/tmp/nvflare/image_stats/data")



Created 4 data lists for ['COVID', 'Lung_Opacity', 'Normal', 'Viral Pneumonia'].
Saved 3616 entries at /tmp/nvflare/image_stats/data/site-1_COVID.json
Saved 6012 entries at /tmp/nvflare/image_stats/data/site-2_Lung_Opacity.json
Saved 10192 entries at /tmp/nvflare/image_stats/data/site-3_Normal.json
Saved 1345 entries at /tmp/nvflare/image_stats/data/site-4_Viral Pneumonia.json


## Run Job in FL Simulator

**Run Job with Simulator API**



In [None]:
from nvflare.private.fed.app.simulator.simulator_runner import SimulatorRunner
runner = SimulatorRunner(job_folder="image_stats/jobs/image_stats", workspace="/tmp/nvflare/workspace/image_stats", n_clients = 4, threads=4)
runner.run()

**Run Job using Simulator CLI**

From a **terminal** one can also the following equivalent CLI

```
nvflare simulator image_stats/jobs/image_stats -w /tmp/nvflare/workspace/image_stats -n 4 -t 4

```

assuming the nvflare is installed from a **terminal**. doing pip install from the notebook cell directory with bash command (! or %%bash) may or may not work depending on which python runtime kernel selected. Also %pip install or %pip install from notebook cell doesn't register the console_scripts in the PATH.   


## Examine the result



The results are stored in workspace "/tmp/nvflare/image_stats"

In [7]:
! ls -al /tmp/nvflare/workspace/image_stats/server/simulate_job/statistics/image_statistics.json

-rw-r--r--  1 kevlu  wheel  38015 Jan 30 22:27 /tmp/nvflare/workspace/image_stats/server/simulate_job/statistics/image_statistics.json


## Visualization
We can visualize the results easly via the visualization notebook. Before we do that, we need to copy the data to the notebook directory 


In [8]:
! cp /tmp/nvflare/workspace/image_stats/server/simulate_job/statistics/image_statistics.json image_stats/demo/.

now we can visualize via the [visualization notebook](image_stats/demo/visualization.ipynb)

We are not quite done yet. What if you prefer to use python API instead CLI to run jobs. Lets do that in this section

The file [image_stats_job.py](image_stats/job_api/image_stats_job.py) uses the StatsJob to generate a job configuration in a Pythonic way. With the default arguments, the job will be exported to `/tmp/nvflare/jobs/stats_df` and then the job will be run with a work_dir of `/tmp/nvflare/jobs/stats_df/work_dir`.

In [11]:
! python3 image_stats/job_api/image_stats_job.py

Traceback (most recent call last):
  File "/Users/kevlu/workspace/repos/NVFlare/examples/tutorials/self-paced-training/part-1_federated_learning_introduction/Chapter-2_develop_federated_learning_applications/02.1_federated_statistics/federated_statistics_with_image_data/image_stats/job_api/image_stats_job.py", line 16, in <module>
    from image_statistics import ImageStatistics
  File "/Users/kevlu/workspace/repos/NVFlare/examples/tutorials/self-paced-training/part-1_federated_learning_introduction/Chapter-2_develop_federated_learning_applications/02.1_federated_statistics/federated_statistics_with_image_data/image_stats/job_api/image_statistics.py", line 20, in <module>
    from monai.data import ITKReader, load_decathlon_datalist
ModuleNotFoundError: No module named 'monai'


## We are done !
Congratulations, you just completed the federated stats image histogram calulation
