# Loading tracks with `huracanpy.load`

One of the main motivations for HuracanPy was to provide a common tool to load the tracks that come from different sources with various incompatible formats.

HuracanPy provides the `load` function which can be used for loading either tracks from a file on your computer, or from databases (currently, only IBTrACS). 
Additionally, HuracanPy embeds small data samples from various formats for examples and testing. 

In [None]:
import huracanpy

## Loading tracks from files

To load tracks from file, the basic syntax is `huracanpy.load(filepath, source = "type-of-file")`. Below we describe supported format, and potential associated additional options.

### CSV

A CSV is a compact and simple way of storing track data. Each row corresponds to a point, identified by its position in space and time. 
If you tracks are stored in csv (including
if they were outputed from TempestExtremes' StitchNodes), you can specify the
`source="csv"` argument, or, if your filename ends with *csv*, it will be detected
automatically.

`huracanpy.load` will read most of the CSV file as it is to output as an
`xarray.Dataset`. There can be a few extra modifications
to make sure the output has the variables `track_id`, `time`, `lon`, and `lat`.
For example, in the file used here, the time variable is constructed from
`year`, `month`, `day`, and `hour`.

In [None]:
!head {huracanpy.example_csv_file} # HuracanPy embeds an example csv file. Here is the content of the file.

In [None]:
file = huracanpy.example_csv_file # Replace with your file name if necessary (including the .csv extension)
huracanpy.load(file, source = "csv") # Load the file

Advanced: You can pass arguments to `pd.read_csv` through `load`.

### NetCDF

Similar to CSV, NetCDF data can largely be loaded as is. NetCDF has the disadvantage of
not being readable like a CSV, but the advantage that it can better store metadata about
variables.

`huracanpy.load` only recognizes NetCDF files if their name ends with `.nc`. 

HuracanPy assumes that NetCDF files follow the [CF convention](http://cfconventions.org/Data/cf-conventions/cf-conventions-1.11/cf-conventions.html#_contiguous_ragged_array_representation_of_trajectories)
This allows the load function to identify the TRACK_ID and extend it along the data
dimension. 

Like loading CSV data, some variables are renamed. In the example the positions
are `longitude` and `latitude` in the netCDF file, but are renamed to `lon` and `lat`.

NB: This supports loading NetCDF files from TRACK, CHAZ or MIt-Open.

In [None]:
!ncdump -h {huracanpy.example_TRACK_netcdf_file} | head -n 30 # HuracanPy embeds an example netcdf file. Here is the content of the file.

In [None]:
file = huracanpy.example_TRACK_netcdf_file # Replace with your file name if necessary (including the .nc extension)
huracanpy.load(file,) # Load the file

### TRACK textual format

TRACK is a cyclone tracker, which output text files with tracks data. Note that TRACK files don't contain the variable names, instead they are usually described in the filename. Currently `huracanpy.load` doesn't try to infer the variable names from the filename. Instead, any extra variables will be named feature_n, where n is between 0 and number of variables minus 1. TRACK also associates extra coordinates with some of these features, these will be loaded as feature_n_longitude and feature_n_latitude.

In [None]:
!head {huracanpy.example_TRACK_file} # HuracanPy embeds an example netcdf file. Here is the content of the file.

In [None]:
file = huracanpy.example_TRACK_file # Replace with your file name if necessary 
huracanpy.load(file, source="TRACK")

If you want to load the variables by name, then pass a list of variable names to
`huracanpy.load`. The associated longitudes/latitudes are associated to the respective
feature names.

In [None]:
file = huracanpy.example_TRACK_file 
variable_names = [
    *[f"vorticity_{n}hPa" for n in [850, 700, 600, 500, 400, 300, 200]],
    "mslp",
    "vmax_925hPa",
    "vmax_10m",
]
huracanpy.load(
    file, source="TRACK", variable_names=variable_names
)

### TempestExtremes/GFDL textual format

TempestExtremes & GFDL also has their own textual format. Note however that TempestExtremes' `StitchNodes` in particular can output csv and we recommend that option. 

*Variable names:* These files can be read with HuracanPy specifying `source="te"`. Because the file themselves do not embed variable names, you may pass them with `variable_names`. 

*Tracks from unstructured grid:* By default, HuracanPy assumes that your file comes from tracking structured data, hence has two grid indices `i` and `j`. If this is not the case (i.e. file comes from tracking unstructured data), then you need to specify `tempest_extremes_unstructured=True` so that only one index `i` is read. 

*Line starting keyword:* Finally, if the line starting keyword is not "`start`", you can specify it with `tempest_extremes_header_str`

In [None]:
!head {huracanpy.example_TE_file} # HuracanPy embeds an example GFDL format file. Here is the content of the file.

In [None]:
file = huracanpy.example_TE_file # Replace with your file name if necessary 
huracanpy.load(file, source = "te")

In [None]:
# Providing names
file = huracanpy.example_TE_file 
variable_names = ["slp", "wind10"]
huracanpy.load(file, source = "te", variable_names = variable_names)

## Loading IBTrACS
The [International Best Track Archive for Climate Stewardship (IBTrACS)](https://www.ncei.noaa.gov/products/international-best-track-archive) is a reference observational dataset.

HuracanPy embeds two subsets of IBTrACS for offline use, and can also retrieve the latest online version. 
They can be loaded with `huracanpy.load(source="ibtracs")` without specifying a filename.

NB: A warning will be raised when you load the data to remind you of the main caveats.

### Offline subsets
By default, HuracanPy will use the offline option. Two subsets of IBTrACS for offline use: 
* "WMO": Data with the wmo_* variables. The data as reported by the WMO agency responsible for each basin. (Default)
* "JTWC": Data with the usa_* variables. The data as recorded by the USA/Joint Typhoon Warning Centre.

NB: These offline files are updated manually by the developers. As such, they may not correspond to the latest versions. If you want the latest version and/or more columns, use the online option below.

In [None]:
huracanpy.load(source = "ibtracs", ibtracs_subset="wmo") # WMO subset

In [None]:
huracanpy.load(source = "ibtracs", ibtracs_subset="jtwc") # JTWC subset

### Online subsets
You can download the latest IBTrACS subsets from NOAA's storage by setting `ibtracs_online=True`. In this case the `ibtracs_subset` refers to the official IBTrACS subsets:
        * **ACTIVE**: TCs currently active
        * **ALL**: Entire IBTrACS database
        * Specific basins: **EP**, **NA**, **NI**, **SA**, **SI**, **SP**, **WP**
        * **last3years**: self-explanatory
        * **since1980**: Entire IBTrACS database since 1980 (advent of satellite era,
          considered reliable from then on)

Example: `huracanpy.load(source="IBTrACS", subset="ALL", ibtracs_online=True)`.

Note that this will fail if you are using a machine that is not currently connected to the internet. HuracanPy developers' decline all responsibility for any breach in security resulting from using this online option.

`huracanpy` won't load locally saved copies of IBTrACS. We would recommend downloading once with `ibtracs_online=True` and subsetting then saving a copy as CSV or NetCDF with `ibtracs.save`. Also note that the NetCDF files provided by IBTrACS are not (currently) compatible with `huracanpy` because the format is different.