# Overview
In the following example we will walk through how we can leverage the `clouddrift` library to do something interesting with the HURDAT2 dataset. Simply put, the HURDAT2 dataset is a dataset that contains storm track data (including other measurements such as pressure, wind speed, etc...) for storms recorded from 1852 - 2022 across both the pacific and atlantic ocean.

To get things in motion lets import the `adapters` module from clouddrift and load the dataset into the `ds` variable


We convert the ragged array into an xarray `Dataset` to leverage some of the powerful 
subsetting utilities provided by the data structure that helps power the `subset` function.

In [None]:
from clouddrift import adapters
ds = adapters.hurdat2.to_raggedarray().to_xarray()

Now lets say that we'd like to select a specific subset of this dataset; we can leverage the `subset` utility function provided through the `ragged` module which contains a library of helpful utility functions for working with the `RaggedArray` data structure. As an example say you wanted a subset of the dataset for storms whose track lied within the atlantic basin and was tracked between August and October of 2020. You can leverage the `subset` function by first defining the criteria:

In [None]:
# import some helpful libraries
import numpy as np
from datetime import datetime

# define the critieria
# Here the datasets variables are mapped to an (inclusive start and end) range
criteria = dict(
    lat=(10, 50),
    lon=(-80, -20), 
    time=(
        np.float64(datetime(2020, 8, 1).timestamp()), 
        np.float64(datetime(2020, 10, 1).timestamp())
    )
)

Lets import the function and apply the criteria to the dataset. 

Here we need to provide the row dimensions alias which is `traj` in the hurdat2 dataset.

In [None]:
from clouddrift.ragged import subset

subset_ds = subset(ds, criteria, row_dim_name="traj")

Now that we have the subset we want, lets display the tracks we have left after the operation to see the tracks of storms in 2020s hurricane season.

In [None]:
# Import some helpful plotting libraries
import cartopy.crs as ccrs  # cartopy for projecting our dataset onto different map projections
import matplotlib.pyplot as plt # is an standard plotting library

# Lets display the projection to see what it looks like
ax = plt.axes(projection=ccrs.PlateCarree())
ax.coastlines()

Now lets grab our dataset `subset_ds` and process it to plot each track

In [None]:
# Lets generate an array identifying the start and end index of each trajectories segment on the RaggedArray 
obs_ranges = np.cumsum(np.array([0, *subset_ds["rowsize"]]))

# Retrieve all of the ids
traj_ids = subset_ds["id"].to_numpy().tolist()

# Go through each of them
for traj_id in traj_ids:
    index = traj_ids.index(traj_id) # Use position to determine start index
    start, end = obs_ranges[index], obs_ranges[index+1] # the end is determines by the start of the next index
    selected = slice(start, end) # create a slice to be used for selecting segments of a variable
    
    # Select the trajectories segments on the longitude and latitude variables and plot them!
    plt.plot(
        subset_ds["lon"].isel(obs=selected), subset_ds["lat"].isel(obs=selected),
        linestyle="-", ms=0.1,
        transform=ccrs.Geodetic(), # Define coordinate system to be used
    )