# Federated Statistics Introduction

In a federated learning setting, because data is private at each site and we need to ensure data privacy, there are many considerations to take into account when trying to gather statistics on the data. We provide two examples in this section to introduce how this can be done:

   * [Federated Statistics with tabular data](./federated_statistics_with_tabular_data/federated_statistics_with_tabular_data.ipynb) demonstrates how to create federated statistics for data that can be represented as Pandas DataFrames using NumPy to calculate the local statistics.
   * [Federated Statistics with image data](./federated_statistics_with_image_data/federated_statistics_with_image_data.ipynb) shows how to compute local and global image statistics with the consideration that data is private at each of the client sites.

NVIDIA FLARE provides built-in federated statistics operators that 
can generate global statistics based on local client side statistics.

At each client site, there can be one or more datasets (for example "train" and "test" datasets); each dataset may have many features. For each feature in the dataset, you can calculate the statistics and then combine them to produce 
global statistics for all the numeric features. The output would be complete statistics for all datasets across all the client sites and global aggregates. The result can be visualized with the visualization utility in the notebook.

The statistics commonly used for the numerical features are: count, sum, mean, std_dev and histogram.
The max and min are not included as it may violate data privacy, and median is not included due to the algorithm complexity. If the statistics for sum and count are selected, the mean will be calculated with count and sum. Note that count is always required because count is used to enforce data privacy policy. Only numerical features are supported, not categorical features. If any non-numerical features are returned, they will be removed.

A client will only need to implement the selected methods of the "Statistics" class from statistics_spec.

For additional details including information on Hierarchical Statistics and Privacy Filters, see the [Federated Statistics example](https://github.com/NVIDIA/NVFlare/tree/main/examples/advanced/federated-statistics/README.md) in the examples section.

For the next notebook, proceed to [Federated Statistics with tabular data](./federated_statistics_with_tabular_data/federated_statistics_with_tabular_data.ipynb).