# Comparing two datasets
In this part, we compare the set of 1996 tracks (used in the [previous example](set_of_tracks.ipynb)) to IBTrACS which we use as reference.
To start with, note that for all that was shown in the previous examples, you can superimpose several sets and therefore compare several sources/models/trackers/etc. Below we show specific functions for matching tracks and computing detection scores.

In [None]:
import huracanpy
import matplotlib.pyplot as plt

## Load tracks
### Load IBTrACS and subset the 1996 tracks with xarray's where method

In [None]:
ib = huracanpy.load(source="ibtracs")
ib_1996 = ib.where(ib.time.dt.year == 1996, drop=True)
ib_1996

### Load ERA5 year of tracks

In [None]:
era5 = huracanpy.load(huracanpy.example_year_file)

## Superimposing several sets on one plot
To start with, note that for all that was shown above, you can superimpose several sets and therefore compare several sources/models/trackers/etc. Here we only show one example.

### Compute lifetime maximum intensity (LMI) for both sets

In [None]:
lmi_wind_ib = ib_1996.wind.groupby(ib_1996.track_id).max()
# Convert kn to m/s
lmi_wind_ib = lmi_wind_ib / 1.94
lmi_wind_era5 = era5.wind10.groupby(era5.track_id).max()

### Plot both histograms

In [None]:
bins = range(10, 65 + 1, 5)
lmi_wind_ib.plot.hist(bins=bins, color="k", label="IBTrACS", alpha=0.8)
lmi_wind_era5.plot.hist(bins=bins, label="ERA5", alpha=0.8)
plt.legend()
plt.xlabel("Lifetime maximum wind speed / m/s")
plt.ylabel("Number of tracks")

## Matching tracks
Use `huracanpy.assess.match` to find matching tracks.
The results is a `pandas.DataFrame` where each row is a pair of tracks that matched, with both ids, the number of time steps and the mean distance between the tracks over their matching period.

In [None]:
matches = huracanpy.assess.match([era5, ib_1996], names=["ERA5", "IBTrACS"])
matches

## Computing scores
### Probability of detection (POD)
Proportion of observed tracks that are found in ERA5.

In [None]:
huracanpy.assess.pod(matches, ref=ib_1996, ref_name="IBTrACS")

### False alarm rate (FAR)
Proportion of detected tracks that were not observed

In [None]:
huracanpy.assess.far(matches, detected=era5, detected_name="ERA5")

## Venn diagrams
Venn diagrams are a convenient way to show the overlap between two datasets.

In [None]:
huracanpy.plot.venn([era5, ib_1996], matches, labels=["ERA5", "IBTrACS"])