# Histogram diagnostic - Single histogram/PDF plot

The aim of this diagnostic is to visualise the probability density function (PDF) or histogram of a certain variable over a specified region.

In this notebook we demonstrate how to:
1. Compute a histogram using the `Histogram` class
2. Plot it using the `PlotHistogram` class
3. Optionally include reference data for comparison

In [None]:
%reload_ext autoreload
%autoreload 2

## Import the classes

In [None]:
from aqua_diagnostics.histogram import Histogram
from aqua_diagnostics.histogram import PlotHistogram

## Setup data dictionaries

We define:
- `dataset_dict`: Configuration for the model data
- `obs_dict`: Configuration for reference/observational data (optional)
- `common_dict`: Common parameters for both datasets

In [None]:
dataset_dict = {
    'catalog': 'climatedt-phase1',
    'model': 'ICON',
    'exp': 'historical-1990',
    'source': 'lra-r100-monthly'
}

obs_dict = {
    'catalog': 'obs',
    'model': 'ERA5',
    'exp': 'era5',
    'source': 'monthly'
}

common_dict = {
    'startdate': '1990-01-01',
    'enddate': '1999-12-31',
    'bins': 100,              # Number of bins for histogram
    'weighted': True,         # Use latitudinal weights
    'loglevel': 'INFO'
}

## Compute histograms

We create `Histogram` objects for both model and reference data.
The `run()` method:
1. Retrieves the data from the catalog
2. Computes the histogram with `density=True` to get a PDF
3. Saves the result to a netCDF file

We'll analyze `tprate` (Total Precipitation Rate) in mm/day.

In [None]:
# Create Histogram objects
hist_dataset = Histogram(**dataset_dict, **common_dict)
hist_obs = Histogram(**obs_dict, **common_dict)

# Configuration for the variable
run_dict = {
    'var': 'tprate',
    'units': 'mm/day',
    'density': True  # Get PDF instead of counts
}

# Compute histograms
hist_dataset.run(**run_dict)
hist_obs.run(**run_dict)

## Plot the histogram/PDF

Now we use `PlotHistogram` to visualize the computed histograms.
We can plot:
- Single model data with reference data
- Multiple models together
- With or without reference data

In [None]:
# Setup plot with model data and reference
plot = PlotHistogram(
    data=[hist_dataset.histogram_data],  # Note: needs to be a list
    ref_data=hist_obs.histogram_data,    # Reference data
    loglevel='INFO'
)

# Generate and save the plot
plot.run(
    ylogscale=True,    # Logarithmic scale for y-axis (frequency)
    xlogscale=False,   # Linear scale for x-axis (precipitation)
    smooth=False       # No smoothing applied
)

## Optional: Custom plot with more control

You can also create plots with more customization:

In [None]:
# Create plot with custom settings
fig, ax = plot.plot(
    data_labels=['ICON historical-1990'],
    ref_label='ERA5',
    title='PDF of Total Precipitation Rate',
    xlogscale=False,
    ylogscale=True,
    xmin=0,
    xmax=50,  # mm/day
    smooth=True,
    smooth_window=5
)

# Show the plot
import matplotlib.pyplot as plt
plt.show()