# NVFLARE Federated Statistics Visualization

#### dependencies

To run examples, you might need to install the dependencies
* numpy
* pandas
* wget
* matplotlib
* jupyter
* notebook

These are captured in the requirements.txt

## Tabular Data Statistics Visualization
In this example, we demonstate how to visualize the results from the statistics of tabular data. The visualization requires json, pandas, matplotlib modules as well as nvflare visualization utlities. 

In [None]:
import json
import pandas as pd

from nvflare.app_opt.statistics.visualization.statistics_visualization import Visualization

First, copy the resulting json file to demo directory. In this example, resulting file is called adults_stats.json. Then load json file

In [None]:
with open('adults_stats.json', 'r') as f:
    data = json.load(f)

Initialize the Visualization utilities

In [None]:
vis = Visualization()

### Overall Statistics

vis.show_statis() will show the statistics for each features, at each site for each dataset

In [None]:

vis.show_stats(data = data)

### select features statistics using white_list_features 
user can optionally select only show specified features via white_list_features arguments. In the following, we only selected three features instead of all the features

In [None]:
vis.show_stats(data = data, white_list_features= ['Age', 'fnlwgt', 'Hours per week'])

### Histogram Visualization
We can use vis.show_histograms() to visualize the histogram. Before we do that, we need set some iPython display setting to make sure the graph displayed in full cell. 

In [None]:
from IPython.display import display, HTML
display(HTML("<style>.container { width:100%  depth:100% !important; }</style>"))

The following command display histograms for numberic features. The result shows both main plot and show sub-plots

In [None]:
vis.show_histograms(data = data)

# Display Options
Similar to other statistics, we can use white_list_features to select only few features to display histograms. We can also use display_format="percent" to allow all dataset and sites to be displayed in the same scale. User can set 

* display_format: "percent" or "sample_count"
* white_list_features: feature names
* plot_type : "both" or "main" or "subplot"

#### show percent display format with selected features
In the following, we display only feature "Age" in "percent" display_format, with "both" plot_type

In [None]:
vis.show_histograms(data = data, display_format = "percent", white_list_features= ['Age'])

#### display main plot_type with selected features
In this example, we display two features in "sample_counts" display_format, with "main" plot_type

In [None]:
vis.show_histograms(data, "sample_counts", ['Age', 'Hours per week' ], plot_type="main")

#### selected features with subplot plot_type
In next example, we display one feature in "sample_counts" display_format, with "subplot" plot_type

In [None]:
vis.show_histograms(data, "sample_counts", ['Age', 'Hours per week' ], plot_type="subplot")

### Tip: Avoid repeated calculation
If you intend to plot histogram main plot and subplot separately, repeated calling show_histogram with different plot_types is not efficicent, as it repeatewd calculate the same set of Dataframes. To do it efficiently, you can use the following functions instead show_histogram methods. This avoid the duplicated calculation in show_histograms. But if you intend to show both plots, the show_histogram() should be used

In [None]:
 feature_dfs = vis.get_histogram_dataframes(data, display_format="percent" )
   
 vis.show_dataframe_plots(feature_dfs, plot_type="main")  