# Subsetting Data

## Using existing xarray functions
Tracks are loaded as an `xarray.Dataset` which have lots of built in methods for subsetting data.
e.g. for indexing see [xarray indexing](https://docs.xarray.dev/en/stable/user-guide/indexing.html).

For more specific selection of data, the best method is to use
[xarray.Dataset.where](https://docs.xarray.dev/en/stable/generated/xarray.Dataset.where.html)
with the argument `drop=True`. e.g.

In [1]:
import huracanpy

tracks = huracanpy.load(huracanpy.example_csv_file)

# Select all points with longitude > 60
print(tracks.lon, "\n")
tracks_subset = tracks.where(tracks.lon > 60, drop=True)
print(tracks_subset.lon)

<xarray.DataArray 'lon' (record: 99)>
array([120.5 , 119.  , 119.  , 119.25, 119.5 , 118.75, 118.5 , 118.25,
       118.25, 118.25, 118.75, 119.25, 119.25, 119.75, 120.  , 120.  ,
       119.5 , 119.25, 118.25, 117.5 , 117.  , 117.  , 116.75, 116.75,
       117.5 , 119.25, 121.  , 123.5 , 127.5 , 130.25, 131.25, 149.5 ,
       151.5 , 154.  , 156.25, 158.5 , 159.5 , 160.  , 160.  , 160.  ,
       158.25, 156.  , 154.25, 153.25, 152.75, 152.5 , 152.5 , 153.  ,
       153.75, 154.75, 156.  ,  55.25,  54.25,  52.75,  54.  ,  55.  ,
        52.  ,  51.  ,  52.  ,  51.5 ,  50.75,  50.25,  50.5 ,  50.75,
        49.75,  50.  ,  50.5 ,  51.  ,  51.75,  52.25,  53.  ,  53.25,
        52.75,  54.5 ,  53.  ,  53.75,  53.5 ,  53.25,  53.25,  52.75,
        52.5 ,  52.25,  52.5 ,  53.25,  54.25,  55.5 ,  56.75,  58.25,
        59.5 ,  59.25,  58.75,  58.25,  57.75,  57.5 ,  57.25,  57.5 ,
        58.5 ,  60.25,  62.25])
Coordinates:
  * record   (record) int64 0 1 2 3 4 5 6 7 8 9 ... 90 91 92 93 9

## Selecting times
Generally the `time` array will be loaded in as an
[np.datetime64](https://numpy.org/doc/stable/reference/arrays.datetime.html)
array. This means it doesn't work to compare it with the standard `datetime`

In [2]:
import datetime

# Try to select a subset of times based on datetime
print(tracks.time)
tracks_subset = tracks.where(tracks.time > datetime.datetime(1980, 1, 10), drop=True)

<xarray.DataArray 'time' (record: 99)>
array(['1980-01-06T06:00:00.000000000', '1980-01-06T12:00:00.000000000',
       '1980-01-06T18:00:00.000000000', '1980-01-07T00:00:00.000000000',
       '1980-01-07T06:00:00.000000000', '1980-01-07T12:00:00.000000000',
       '1980-01-07T18:00:00.000000000', '1980-01-08T00:00:00.000000000',
       '1980-01-08T06:00:00.000000000', '1980-01-08T12:00:00.000000000',
       '1980-01-08T18:00:00.000000000', '1980-01-09T00:00:00.000000000',
       '1980-01-09T06:00:00.000000000', '1980-01-09T12:00:00.000000000',
       '1980-01-09T18:00:00.000000000', '1980-01-10T00:00:00.000000000',
       '1980-01-10T06:00:00.000000000', '1980-01-10T12:00:00.000000000',
       '1980-01-10T18:00:00.000000000', '1980-01-11T00:00:00.000000000',
       '1980-01-11T06:00:00.000000000', '1980-01-11T12:00:00.000000000',
       '1980-01-11T18:00:00.000000000', '1980-01-12T00:00:00.000000000',
       '1980-01-12T06:00:00.000000000', '1980-01-12T12:00:00.000000000',
       '1980

TypeError: '>' not supported between instances of 'int' and 'datetime.datetime'

However, the same comparison can be done using `datetime64`, the syntax is just a bit different

In [3]:
import numpy as np

tracks_subset = tracks.where(tracks.time > np.datetime64("1980-01-10"), drop=True)
print(tracks_subset.time)

<xarray.DataArray 'time' (record: 72)>
array(['1980-01-10T06:00:00.000000000', '1980-01-10T12:00:00.000000000',
       '1980-01-10T18:00:00.000000000', '1980-01-11T00:00:00.000000000',
       '1980-01-11T06:00:00.000000000', '1980-01-11T12:00:00.000000000',
       '1980-01-11T18:00:00.000000000', '1980-01-12T00:00:00.000000000',
       '1980-01-12T06:00:00.000000000', '1980-01-12T12:00:00.000000000',
       '1980-01-12T18:00:00.000000000', '1980-01-13T00:00:00.000000000',
       '1980-01-13T06:00:00.000000000', '1980-01-13T12:00:00.000000000',
       '1980-01-13T18:00:00.000000000', '1980-01-10T06:00:00.000000000',
       '1980-01-10T12:00:00.000000000', '1980-01-10T18:00:00.000000000',
       '1980-01-11T00:00:00.000000000', '1980-01-11T06:00:00.000000000',
       '1980-01-11T12:00:00.000000000', '1980-01-11T18:00:00.000000000',
       '1980-01-12T00:00:00.000000000', '1980-01-12T06:00:00.000000000',
       '1980-01-17T06:00:00.000000000', '1980-01-17T18:00:00.000000000',
       '1980

Note, that this isn't always the case. If the tracks are loaded in with a different
calendar, then the times will use [cftime](https://unidata.github.io/cftime/)
which is not converted to `datetime64` by xarray.

In [7]:
# The tracks don't actually use a 360_day calendar. I'm just passing this as an argument
# to show an example of it loading this way
tracks = huracanpy.load(huracanpy.example_TRACK_file, tracker="track", calendar="360_day")
print(tracks.time)

<xarray.DataArray 'time' (record: 46)>
array([cftime.datetime(2022, 1, 13, 18, 0, 0, 0, calendar='360_day', has_year_zero=True),
       cftime.datetime(2022, 1, 14, 0, 0, 0, 0, calendar='360_day', has_year_zero=True),
       cftime.datetime(2022, 1, 14, 6, 0, 0, 0, calendar='360_day', has_year_zero=True),
       cftime.datetime(2022, 1, 14, 12, 0, 0, 0, calendar='360_day', has_year_zero=True),
       cftime.datetime(2022, 1, 14, 18, 0, 0, 0, calendar='360_day', has_year_zero=True),
       cftime.datetime(2022, 1, 15, 0, 0, 0, 0, calendar='360_day', has_year_zero=True),
       cftime.datetime(2022, 1, 15, 6, 0, 0, 0, calendar='360_day', has_year_zero=True),
       cftime.datetime(2022, 1, 15, 12, 0, 0, 0, calendar='360_day', has_year_zero=True),
       cftime.datetime(2022, 1, 15, 18, 0, 0, 0, calendar='360_day', has_year_zero=True),
       cftime.datetime(2022, 1, 16, 0, 0, 0, 0, calendar='360_day', has_year_zero=True),
       cftime.datetime(2022, 1, 16, 6, 0, 0, 0, calendar='360_day'

In this case, neither the `datetime` or the `datetime64` comparison will work and you
have to compare to a `cftime.datetime` object with the same calendar

In [9]:
tracks_subset = tracks.where(tracks.time > datetime.datetime(1980, 1, 10), drop=True)

TypeError: cannot compare cftime.datetime(2022, 1, 13, 18, 0, 0, 0, calendar='360_day', has_year_zero=True) and datetime.datetime(1980, 1, 10, 0, 0) (different calendars)

In [8]:
tracks_subset = tracks.where(tracks.time > np.datetime64("1980-01-10"), drop=True)

TypeError: '>' not supported between instances of 'cftime._cftime.datetime' and 'datetime.date'

In [12]:
import cftime

tracks_subset = tracks.where(tracks.time > cftime.datetime(1980, 1, 10, calendar="360_day"), drop=True)
print(tracks_subset.time)

<xarray.DataArray 'time' (record: 46)>
array([cftime.datetime(2022, 1, 13, 18, 0, 0, 0, calendar='360_day', has_year_zero=True),
       cftime.datetime(2022, 1, 14, 0, 0, 0, 0, calendar='360_day', has_year_zero=True),
       cftime.datetime(2022, 1, 14, 6, 0, 0, 0, calendar='360_day', has_year_zero=True),
       cftime.datetime(2022, 1, 14, 12, 0, 0, 0, calendar='360_day', has_year_zero=True),
       cftime.datetime(2022, 1, 14, 18, 0, 0, 0, calendar='360_day', has_year_zero=True),
       cftime.datetime(2022, 1, 15, 0, 0, 0, 0, calendar='360_day', has_year_zero=True),
       cftime.datetime(2022, 1, 15, 6, 0, 0, 0, calendar='360_day', has_year_zero=True),
       cftime.datetime(2022, 1, 15, 12, 0, 0, 0, calendar='360_day', has_year_zero=True),
       cftime.datetime(2022, 1, 15, 18, 0, 0, 0, calendar='360_day', has_year_zero=True),
       cftime.datetime(2022, 1, 16, 0, 0, 0, 0, calendar='360_day', has_year_zero=True),
       cftime.datetime(2022, 1, 16, 6, 0, 0, 0, calendar='360_day'

In [14]:
tracks = huracanpy.load(huracanpy.example_csv_file)
huracanpy.extract.where(tracks, time=lambda x: x > datetime.datetime(1980, 1, 10))