# Subsetting Data

## Using existing xarray functions
Tracks are loaded as an `xarray.Dataset` which have lots of built in methods for subsetting data.
e.g. for indexing see [xarray indexing](https://docs.xarray.dev/en/stable/user-guide/indexing.html).

For more specific selection of data, the best method is to use
[xarray.Dataset.where](https://docs.xarray.dev/en/stable/generated/xarray.Dataset.where.html)
with the argument `drop=True`. e.g.

In [None]:
import huracanpy

tracks = huracanpy.load(huracanpy.example_csv_file)

# Select all points with longitude > 60
print(tracks.lon, "\n")
tracks_subset = tracks.where(tracks.lon > 60, drop=True)
print(tracks_subset.lon)

## Selecting times
Generally the `time` array will be loaded in as an
[np.datetime64](https://numpy.org/doc/stable/reference/arrays.datetime.html)
array. This means it doesn't work to compare it with the standard `datetime`

In [None]:
import datetime

# Try to select a subset of times based on datetime
print(tracks.time)
tracks_subset = tracks.where(tracks.time > datetime.datetime(1980, 1, 10), drop=True)

However, the same comparison can be done using `datetime64`, the syntax is just a bit different

In [None]:
import numpy as np

tracks_subset = tracks.where(tracks.time > np.datetime64("1980-01-10"), drop=True)
print(tracks_subset.time)

Note, that this isn't always the case. If the tracks are loaded in with a different
calendar, then the times will use [cftime](https://unidata.github.io/cftime/)
which is not converted to `datetime64` by xarray.

In [None]:
# The tracks don't actually use a 360_day calendar.
# I'm just passing this as an argument to show an example of it loading this way
tracks = huracanpy.load(
    huracanpy.example_TRACK_file, source="track", track_calendar="360_day"
)
print(tracks.time)

In this case, neither the `datetime` or the `datetime64` comparison will work and you
have to compare to a `cftime.datetime` object with the same calendar

In [None]:
tracks_subset = tracks.where(tracks.time > datetime.datetime(1980, 1, 10), drop=True)

In [None]:
tracks_subset = tracks.where(tracks.time > np.datetime64("1980-01-10"), drop=True)

In [None]:
import cftime

tracks_subset = tracks.where(
    tracks.time > cftime.datetime(1980, 1, 10, calendar="360_day"), drop=True
)
print(tracks_subset.time)

## Subsetting by track
To apply a criteria to each track in the dataset, use
[huracanpy.trackswhere](../api/_autosummary/huracanpy.trackswhere.rst)

In [None]:
# Add storm category by pressure to each track and filter those that don't reach
# category 2
tracks = huracanpy.load(huracanpy.example_csv_file)

tracks["category"] = huracanpy.tc.get_pressure_cat(tracks.slp, slp_units="Pa")

# Show the categories for each storm
# Storms 0 and 2 reach category 2, and storm 1 only reaches category 1
for track_id, track in tracks.groupby("track_id"):
    print("track", track_id, "category", int(track.category.max()))

# Subset the tracks by category threshold which will remove track 1
track_subset = huracanpy.trackswhere(tracks, tracks.track_id, lambda track: track.category.max() >= 2)

# Confirm that track 1 has been filtered out
print("\n", "tracks remaining -", set(track_subset.track_id.data))