# Examples of applying filters and categorising tracks

Import essential libraries

In [1]:
from pathlib import Path

from octant.core import TrackRun

Define the common data directory

In [2]:
sample_dir = Path('.') / 'sample_data'

Data are usually organised in hierarchical directory structure. Here, the relevant parameters are defined.

In [3]:
dataset = 'era5'
period = 'test'
run_id = 0

Construct the full path

In [4]:
track_res_dir = sample_dir / dataset / f'run{run_id:03d}' / period

Now load the cyclone tracks themselves

In [5]:
tr = TrackRun(track_res_dir)
tr

octant.core.TrackRun,octant.core.TrackRun.1,octant.core.TrackRun.2,octant.core.TrackRun.3,octant.core.TrackRun.4,octant.core.TrackRun.5,octant.core.TrackRun.6,octant.core.TrackRun.7
Number of tracks,671,671,671,671,671,671,671
Data columns,lon,lat,vo,time,area,vortex_type,cat
Sources,,,,,,,
Sources,sample_data/era5/run000/test,sample_data/era5/run000/test,sample_data/era5/run000/test,sample_data/era5/run000/test,sample_data/era5/run000/test,sample_data/era5/run000/test,sample_data/era5/run000/test


### Classify the tracks

Now, to label each of the tracks within `tr` according to a set of filters or criteria, `classify()` method should be used.

Below are two examples: a simple one and a more advanced using a function with multiple arguments.

### Simple functions as filters

As its argument, `classify()` takes a list of tuples in the form of
```
[
(<labelA>, [<func1>, <func2>, ..., <funcN>]),
(<labelB>, [<func1>, <func2>, ..., <funcN>]),
...
(<labelZ>, [<func1>, <func2>, ..., <funcN>]),
],
```

where `labelA` is assigned to a track if the track satisfies **all** the conditions given by `[<func1>, <func2>, ..., <funcN>]`, which is a list of 1 or more functions.
These functions expect 1 and only 1 argument - `OctantTrack`.

For example, it is possible to classify tracks by their lifetime, maximum vorticity, and distance travelled:

In [6]:
conditions = [
    ('long_lived', [lambda ot: ot.lifetime_h >= 6]),
    ('far_travelled_and_very_long_lived', [lambda ot: ot.lifetime_h >= 36,
                                           lambda ot: ot.gen_lys_dist_km > 300.0]),
    ('strong', [lambda x: x.max_vort > 1e-3])
]

In [7]:
tr.classify(conditions)

In [8]:
tr

octant.core.TrackRun,octant.core.TrackRun.1,octant.core.TrackRun.2,octant.core.TrackRun.3,octant.core.TrackRun.4,octant.core.TrackRun.5,octant.core.TrackRun.6,octant.core.TrackRun.7
Categories,,,,,,,
Categories,,671,671,671,671,671,in total
Categories,of which,247,247,247,247,247,long_lived
Categories,of which,18,18,18,18,18,far_travelled_and_very_long_lived
Categories,of which,3,3,3,3,3,strong
Data columns,lon,lat,vo,time,area,vortex_type,cat
Sources,,,,,,,
Sources,sample_data/era5/run000/test,sample_data/era5/run000/test,sample_data/era5/run000/test,sample_data/era5/run000/test,sample_data/era5/run000/test,sample_data/era5/run000/test,sample_data/era5/run000/test


**NB** By default, the categories are "inclusive", so in this example the "long_lived" subset includes tracks are "far_travelled_and_very_long_lived", and they both include the "strong" subset.

This is how the numbers change if the categorisation is non-inclusive (all the categories are separate):

In [9]:
tr.classify(conditions, inclusive=False)

In [10]:
tr

octant.core.TrackRun,octant.core.TrackRun.1,octant.core.TrackRun.2,octant.core.TrackRun.3,octant.core.TrackRun.4,octant.core.TrackRun.5,octant.core.TrackRun.6,octant.core.TrackRun.7
Categories,,,,,,,
Categories,,671,671,671,671,671,in total
Categories,of which,228,228,228,228,228,long_lived
Categories,of which,15,15,15,15,15,far_travelled_and_very_long_lived
Categories,of which,5,5,5,5,5,strong
Data columns,lon,lat,vo,time,area,vortex_type,cat
Sources,,,,,,,
Sources,sample_data/era5/run000/test,sample_data/era5/run000/test,sample_data/era5/run000/test,sample_data/era5/run000/test,sample_data/era5/run000/test,sample_data/era5/run000/test,sample_data/era5/run000/test


In this case, "long_lived" do not include the 18 tracks, of which 15 are "far_travelled_and_very_long_lived" plus 5 are "strong".

### More complex functions as filters

It is possible to categorise tracks by their proximity to the coast (land) or other masked points in an array with geographical coordinates.
For convenience, `octant.misc` module contains `check_by_mask()` function that checks if a cyclone track stays close to land points or domain boundaries for a long enough time. This function is essentially a wrapper around `octant.utils.mask_tracks()` function.

In [11]:
import xarray as xr

from octant.misc import check_by_mask

First, reload the `TrackRun` just in case.

In [12]:
tr = TrackRun(track_res_dir)

Load land-sea mask array from ERA5 dataset:

In [13]:
lsm = xr.open_dataarray(sample_dir / dataset / 'lsm.nc')
lsm = lsm.squeeze()  # remove singular time dimension

Importantly, the `classify()` method expects functions that only take 1 argument of type `OctantTrack`, so to use the function above, we need to construct a partial function using `functools` from the standard library.

In [14]:
from functools import partial

In [15]:
land_mask_fun = partial(check_by_mask, trackrun=tr, lsm=lsm, rad=75.)  # and leave `mask_thresh=` to default

This new function has been supplied with all the additional arguments, and can take only `OctantTrack`, which is exactly what `classify()` needs.
It is then passed as a second filtering function to the list of conditions:

In [16]:
new_conditions = [
    ('good_candidates', [lambda ot: ot.lifetime_h >= 6, land_mask_fun]),
    ('pmc', [lambda ot: ((ot.vortex_type != 0).sum() / ot.shape[0] < 0.2) and (ot.gen_lys_dist_km > 300.0)]),   
]

In [17]:
tr.classify(new_conditions, True)

In [18]:
tr

octant.core.TrackRun,octant.core.TrackRun.1,octant.core.TrackRun.2,octant.core.TrackRun.3,octant.core.TrackRun.4,octant.core.TrackRun.5,octant.core.TrackRun.6,octant.core.TrackRun.7
Categories,,,,,,,
Categories,,671,671,671,671,671,in total
Categories,of which,101,101,101,101,101,good_candidates
Categories,of which,36,36,36,36,36,pmc
Data columns,lon,lat,vo,time,area,vortex_type,cat
Sources,,,,,,,
Sources,sample_data/era5/run000/test,sample_data/era5/run000/test,sample_data/era5/run000/test,sample_data/era5/run000/test,sample_data/era5/run000/test,sample_data/era5/run000/test,sample_data/era5/run000/test
