# Federated Statistics with image data

In this example, we will compute local and global image statistics with the consideration that data is private at each of the client sites.

## Define target statistics configuration

For Image statistics, we are only interested in histogram of the image intensity, so we ignore all other statistic measures. 

```python

statistic_configs = {"count": {}, "histogram": {"*": {"bins": 20, "range": [0, 256]}}}
```

## Define the local statistics generator

Based on the above target statistics configuration, we can define the local statistics generator. To do this, we need to write a class that implement 

```python

class Statistics(InitFinalComponent, ABC):

    def initialize(self, fl_ctx: FLContext):
    def pre_run(self, statistics: List[str], num_of_bins: Optional[Dict[str, Optional[int]]],bin_ranges: Optional[Dict[str, Optional[List[float]]]]):
    def features(self) -> Dict[str, List[Feature]]:
    def count(self, dataset_name: str, feature_name: str) -> int:
    def sum(self, dataset_name: str, feature_name: str) -> float:
    def mean(self, dataset_name: str, feature_name: str) -> float:
    def stddev(self, dataset_name: str, feature_name: str) -> float:
    def variance_with_mean(self, dataset_name: str, feature_name: str, global_mean: float, global_count: float) -> float:
    def histogram(self, dataset_name: str, feature_name: str, num_of_bins: int, global_min_value: float, global_max_value: float) -> Histogram:
    def max_value(self, dataset_name: str, feature_name: str) -> float:
    def min_value(self, dataset_name: str, feature_name: str) -> float:
    def failure_count(self, dataset_name: str, feature_name: str) -> int:
    def quantiles(self, dataset_name: str, feature_name: str, percentiles: List) -> Dict:
    def finalize(self, fl_ctx: FLContext):

```

But since we are only interested in two metrics : Count and Histogram, we can ignore other metrics implementation and only implements count and histogram. Here is the skeleton code for this generator

```python

class ImageStatistics(Statistics):

    def __init__(self):
        pass
 
    def initialize(self, fl_ctx: FLContext):
        self.fl_ctx = fl_ctx
        self.client_name = fl_ctx.get_identity_name()
        
        # call load data function 

    def _load_data_list(self, client_name, fl_ctx: FLContext) -> bool:
        pass


    def pre_run(
        self,
        statistics: List[str],
        num_of_bins: Optional[Dict[str, Optional[int]]],
        bin_ranges: Optional[Dict[str, Optional[List[float]]]],
    ):
        return {}

    def features(self) -> Dict[str, List[Feature]]:
        return {"train": [Feature("intensity", DataType.FLOAT)]}

    def count(self, dataset_name: str, feature_name: str) -> int:

        # return number of images loaded
        pass
            

    def failure_count(self, dataset_name: str, feature_name: str) -> int:

        return self.failure_images

    def histogram(
        self, dataset_name: str, feature_name: str, num_of_bins: int, global_min_value: float, global_max_value: float
    ) -> Histogram:
        # do histogram calculation: 
        return Histogram(HistogramType.STANDARD, histogram_bins)
```

Here ```FLContext``` is the context of the current Job workflow, "identity" referring to the site identity, therefore ```get_identity_name()``` will return the site name.

You can take a look of the implementation

In [None]:
! cat code/src/image_statistics.py

# Define Job Configuration
 

```python

    statistic_configs = {"count": {}, "histogram": {"*": {"bins": 20, "range": [0, 256]}}}
    
    # define local stats generator
    stats_generator = ImageStatistics(data_root_dir)

    job = StatsJob(
        job_name="stats_image",
        statistic_configs=statistic_configs,
        stats_generator=stats_generator,
        output_path=output_path,
    )
```

## Install requirements

In [None]:
%pip install -r code/requirements.txt

## Download data

As an example, we use the dataset from the ["COVID-19 Radiography Database"](https://www.kaggle.com/tawsifurrahman/covid19-radiography-database).
it contains png image files in four different classes: `COVID`, `Lung_Opacity`, `Normal`, and `Viral Pneumonia`.
First create a temp directory, then we download and extract to `/tmp/nvflare/image_stats/data/.`.




In [None]:
! pip install kagglehub

In [1]:
%%bash 

# prepare the directory

if [ ! -d /tmp/nvflare/image_stats/data ]; then
  mkdir -p /tmp/nvflare/image_stats/data
fi


In [None]:
import kagglehub

# Download latest version
path = kagglehub.dataset_download("tawsifurrahman/covid19-radiography-database")

print("Path to dataset files:", path)



Download and unzip the data (you may need to log in to Kaggle or use an API key). Once you have extracted the data from the zip file, check the directory to make sure you have the COVID-19_Radiography_Dataset directory at the following location.

In [None]:
! mv {path} /tmp/nvflare/image_stats/data/

! tree /tmp/nvflare/image_stats/data


## Prepare data

Next, create the data lists simulating different clients with varying amounts and types of images. 
The downloaded archive contains subfolders for four different classes: `COVID`, `Lung_Opacity`, `Normal`, and `Viral Pneumonia`.
Here we assume each class of image corresponds to a different site.

In [None]:
! code/data/prepare_data.sh

## Run Job with FL Simulator

The file [image_stats_job.py](code/image_stats_job.py) uses `StatsJob` to generate a job configuration in a Pythonic way. With the default arguments, the job will be exported to `/tmp/nvflare/jobs/image_stats` and then the job will be run with the FL simulator with the `simulator_run()` command with a work_dir of `/tmp/nvflare/workspace/image_stats`.

In [None]:
%cd code

! python3 image_stats_job.py

%cd -


## Examine the result


The results are stored on the server in the workspace at "/tmp/nvflare/image_stats" and can be accessed with the following command:

In [None]:
! ls -al /tmp/nvflare/workspace/image_stats/server/simulate_job/statistics/image_stats.json
         

## Visualization
We can visualize the results easily via the visualization notebook. Before we do that, we need to copy the data to the notebook directory


In [None]:
! cp /tmp/nvflare/workspace/image_stats/server/simulate_job/statistics/image_stats.json code/image_stats/demo/.

now we can visualize via the [visualization notebook](code/image_stats/demo/visualization.ipynb)

## We are done !
Congratulations, you have just completed the federated stats image histogram calculation.
