# Working with different datasets
`huracanpy` can load track data from various formats. For testing, there are a few
example files embedded in `huracanpy`

In [None]:
import huracanpy

print(huracanpy.example_csv_file.split("/")[-1])
print(huracanpy.example_TRACK_netcdf_file.split("/")[-1])
print(huracanpy.example_TRACK_file.split("/")[-1])

## CSV

A CSV is a useful way of storing track data. If you tracks are stored in csv (including
if they were outputed from TempestExtremes' StitchNodes), you can specify the
`source="csv"` argument, or, if your filename ends with *csv*, it will be detected
automatically.

`huracanpy.load` will read most of the CSV file as it is to output as an
`xarray.Dataset`. There can be a few extra modifications
to make sure the output has the variables `track_id`, `time`, `lon`, and `lat`.
For example, in the file used here, the time variable is constructed from
`year`, `month`, `day`, and `hour`.



In [None]:
huracanpy.load(huracanpy.example_csv_file)

## NetCDF

Similar to CSV, NetCDF data can largely be loaded as is. NetCDF has the disadvantage of
not being readable like a CSV, but the advantage that it can better store metadata about
variables.

The only assumption about the NetCDF file, is that it is using the CF convention

http://cfconventions.org/Data/cf-conventions/cf-conventions-1.11/cf-conventions.html#_contiguous_ragged_array_representation_of_trajectories

This allows the load function to identify the TRACK_ID and extend it along the data
dimension. Like loading CSV data, some variables are renamed. In the example the positions
are `longitude` and `latitude` in the netCDF file, but are renamed to `lon` and `lat`.

In [None]:
huracanpy.load(huracanpy.example_TRACK_netcdf_file)

## TRACK

Note that TRACK files don't contain the variable names, instead they are usually
described in the filename. Currently `huracanpy.load` doesn't try to infer the variable
names from the filename. Instead, any extra variables will be named feature_n, where
n is between 0 and number of variables minus 1. TRACK also associates extra coordinates
with some of these features, these will be loaded as feature_n_longitude and
feature_n_latitude.

In [None]:
huracanpy.load(huracanpy.example_TRACK_file, source="TRACK")

If you want to load the variables by name, then pass a list of variable names to
`huracanpy.load`. The associated longitudes/latitudes are associated to the respective
feature names.

In [None]:
variable_names = [
    *[f"vorticity_{n}hPa" for n in [850, 700, 600, 500, 400, 300, 200]],
    "mslp",
    "vmax_925hPa",
    "vmax_10m",
]
huracanpy.load(
    huracanpy.example_TRACK_file, source="TRACK", variable_names=variable_names
)

## IBTrACS
`huracanpy` includes a subset of the IBTrACS dataset to use 

In [None]:
# ibtracs_subset is "wmo" or "usa" which correspond to the slp/variables used
huracanpy.load(source="ibtracs", ibtracs_subset="wmo", ibtracs_online=False)

You can download the full IBTrACS dataset by setting `ibtracs_online=True`. In this case
the subset refers to the official IBTrACS subsets.

`huracanpy` won't load locally saved copies of IBTrACS. We would recommend downloading
once with `ibtracs_online=True` and subsetting then saving a copy as CSV or NetCDF with
`ibtracs.save`. Also note that the NetCDF files provided by IBTrACS are not (currently)
compatible with `huracanpy` because the format is different.

In [None]:
# Not running this code for the documentation since it downloads the file when run
# huracanpy.load(source="IBTrACS", subset="ALL", ibtracs_online=True)

## Saving data

`huracanpy.save` supports saving data as CSV or NetCDF which is detected by the file
extension.

You can also use
[xarray.Dataset.to_netcdf](https://docs.xarray.dev/en/latest/generated/xarray.Dataset.to_netcdf.html)
to save NetCDF files, but they must then be loaded with
[xarray.open_dataset](https://docs.xarray.dev/en/latest/generated/xarray.open_dataset.html)
`huracanpy.save` does call `to_netcdf` but also has some additional steps to make sure
the resulting NetCDF file uses the CF convention used for loading.

In [None]:
tracks = huracanpy.load(huracanpy.example_csv_file)
huracanpy.save(tracks, "saved_data.csv")
huracanpy.save(tracks, "saved_data.nc")

In [None]:
!head -5 saved_data.csv

In [None]:
!ncdump -h saved_data.nc