# Comparing two datasets
In this part, we compare the set of 1996 tracks above to IBTrACS which we use as reference.
To start with, note that for all that was shown above, you can superimpose several sets and therefore compare several sources/models/trackers/etc. Below we show specific functions for matching tracks and computing detection scores.

In [None]:
import huracanpy
import matplotlib.pyplot as plt

## Load tracks

In [None]:
# Load IBTrACS
ib = huracanpy.load(source="ibtracs")

# Here we subset the 1996 tracks with xarray's where method:
ib_1996 = ib.where(ib.time.dt.year == 1996, drop=True)
ib_1996

In [None]:
# load ERA5 year of tracks
ERA5 = huracanpy.load(huracanpy.example_year_file)

## Superimposing several sets on one plot
To start with, note that for all that was shown above, you can superimpose several sets and therefore compare several sources/models/trackers/etc. Here we only show one example.

In [None]:
# Compute LMI for both sets
LMI_wind_ib = ib_1996.wind.groupby(ib_1996.track_id).max()
LMI_wind_ib = LMI_wind_ib / 1.94  # Convert kn to m/s
LMI_wind_ERA5 = ERA5.wind10.groupby(ERA5.track_id).max()
# Plot both histograms
LMI_wind_ib.plot.hist(
    bins=[10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65], color="k", label="IBTrACS"
)
LMI_wind_ERA5.plot.hist(
    bins=[10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65], label="ERA5", alpha=0.8
)
# Labels
plt.legend()
plt.xlabel("Lifetime maximum wind speed / m/s")
plt.ylabel("Number of tracks")

## Matching tracks

In [None]:
matches = huracanpy.assess.match([ERA5, ib_1996], names=["ERA5", "IBTrACS"])
# each row is a pair of tracks that matched, with both ids, the number of time steps and
# the mean distance between the tracks over their matching period.
matches

## Computing scores

In [None]:
# Probability of detection (POD) : Proportion of observed tracks that are found in ERA5.
huracanpy.assess.pod(matches, ref=ib_1996, ref_name="IBTrACS")

In [None]:
# False alarm rate (FAR) : Proportion of detected tracks that were not observed
huracanpy.assess.far(matches, detected=ERA5, detected_name="ERA5")

## Venn diagrams
Venn diagrams are a convenient way to show the overlap between two datasets.

In [None]:
huracanpy.plot.venn([ERA5, ib_1996], matches, labels=["ERA5", "IBTrACS"])