In [None]:
import huracanpy

# Embedded IBTrACS Subsets
IBTrACS can be loaded in two ways using HuracanPy's `load` function:
* Online: You can load the latest version of any IBTrACS subset using the load function, provided you are connected to internet
* Offline: Your installation of huracanpy embeds parts of the IBTrACS database which can be loaded even if you are not connected to internet.


Two offline subsets are currently available:
* "wmo" contains the data provided in the "wmo" columns, which correspond to the data provided by the center
      responsible for the area of a given point. (see https://community.wmo.int/en/tropical-cyclone-regional-bodies)
      Note that within this dataset, wind units are not homogeneous: they are provided as collected from the
      meteorological agencies, which means that they have different time-averaging for wind extrema. (default)
* "usa" contains the data provided in the "wmo" columns, which is provided by the NHC or the JTWC.

Loading these will raise a warning to remind you that these datasets are offline versions with caveats and some post-treatment.

In [None]:
# WMO subset
ib_wmo = huracanpy.load(source="ibtracs", ibtracs_online=False, ibtracs_subset="wmo")
huracanpy.plot.tracks(ib_wmo.lon, ib_wmo.lat, intensity_var=ib_wmo.wind)

In [None]:
# USA subset
ib_usa = huracanpy.load(source="ibtracs", ibtracs_online=False, ibtracs_subset="usa")
huracanpy.plot.tracks(ib_usa.lon, ib_usa.lat, intensity_var=ib_usa.wind)

Both subsets currently cover the 1980-2022. The WMO subset contains more tracks, because
the JTWC columns does not always contain data for all tracks

In [None]:
print("WMO")
print(ib_wmo.time.values.min(), ib_wmo.time.values.max())
print(ib_wmo.track_id.hrcn.nunique(), "tracks", len(ib_wmo.record), "points\n")

print("JTWC")
print(ib_usa.time.values.min(), ib_usa.time.values.max())
print(ib_usa.track_id.hrcn.nunique(), "tracks", len(ib_usa.record), "points")

One of the main differences among these two subsets is the way winds are reported: In WMO, the maximum winds as reported by the WMO agencies are provided. This is inhomogeneous: the USA report 1-minute sustained winds, CMA reports 3-minutes sustained winds, and most other centers report 10-minutes sustained winds. 

In [None]:
# Add basin data to ib_wmo
ib_wmo = ib_wmo.hrcn.add_basin()
# Match tracks between ib_wmo and ib_usa, then retrieve LMI
m = huracanpy.assess.match([ib_wmo, ib_usa], names=["wmo", "usa"])
max_winds = m.join(
    ib_wmo[["wind"]].groupby(ib_wmo.track_id).max().to_dataframe(), on="id_wmo"
).join(
    ib_usa[["wind"]].groupby(ib_usa.track_id).max().to_dataframe(),
    on="id_usa",
    lsuffix="_wmo",
    rsuffix="_usa",
)

In [None]:
# Add basin with separate groupby
max_winds = max_winds.join(
    ib_wmo[["basin"]].groupby(ib_wmo.track_id).first().to_dataframe(), on="id_wmo"
)

In [None]:
# Plot difference between WMO and USA winds in each basin
import seaborn as sns

p = sns.displot(
    data=max_winds,
    x="wind_wmo",
    y="wind_usa",
    col="basin",
    col_wrap=3,
)
for ax in p.axes.flatten():
    ax.plot([0, 175], [0, 175], color="k", linestyle="--")