# Federated Statistics with image data

## Calculate Image Histogram

In this example, we will compute local and global image statistics with the consideration that data is private at each of the client sites.

## Install requirements

In [None]:
%pip install -r code/requirements.txt

## Download data

As an example, we use the dataset from the ["COVID-19 Radiography Database"](https://www.kaggle.com/tawsifurrahman/covid19-radiography-database).
it contains png image files in four different classes: `COVID`, `Lung_Opacity`, `Normal`, and `Viral Pneumonia`.
First create a temp directory, then we download and extract to `/tmp/nvflare/image_stats/data/.`.

In [2]:
%%bash 

# prepare the directory

if [ ! -d /tmp/nvflare/image_stats/data ]; then
  mkdir -p /tmp/nvflare/image_stats/data
fi


Download and unzip the data (you may need to log in to Kaggle or use an API key). Once you have extracted the data from the zip file, you can check the directory to make sure you have the COVID-19_Radiography_Dataset directory at the following location.

In [None]:
ls -l /tmp/nvflare/image_stats/data/.


## Prepare data

Next, create the data lists simulating different clients with varying amounts and types of images. 
The downloaded archive contains subfolders for four different classes: `COVID`, `Lung_Opacity`, `Normal`, and `Viral Pneumonia`.
Here we assume each class of image corresponds to a different site.

In [None]:
from code.image_stats.utils.prepare_data import prepare_data

prepare_data(input_dir = "/tmp/nvflare/image_stats/data", 
             input_ext = ".png",
             output_dir ="/tmp/nvflare/image_stats/data")


## Run Job with FL Simulator

The file [image_stats_job.py](code/image_stats_job.py) uses the StatsJob to generate a job configuration in a Pythonic way. With the default arguments, the job will be exported to `/tmp/nvflare/jobs/image_stats` and then the job will be run with the FL simulator with the `simulator_run()` command with a work_dir of `/tmp/nvflare/workspace/image_stats`.

In [None]:
! python3 code/image_stats_job.py

## Examine the result


The results are stored on the server in the workspace at "/tmp/nvflare/image_stats" and can be accessed with the following command:

In [None]:
! ls -al /tmp/nvflare/workspace/image_stats/server/simulate_job/statistics/image_statistics.json

## Visualization
We can visualize the results easly via the visualization notebook. Before we do that, we need to copy the data to the notebook directory 


In [8]:
! cp /tmp/nvflare/workspace/image_stats/server/simulate_job/statistics/image_statistics.json image_stats/demo/.

now we can visualize via the [visualization notebook](image_stats/demo/visualization.ipynb)

## We are done !
Congratulations, you just completed the federated stats image histogram calulation
