# Working with different datasets
`huracanpy` can load track data from various formats. For testing, there are a few
example files embedded in `huracanpy`

In [4]:
import huracanpy

print(huracanpy.example_csv_file.split("/")[-1])
print(huracanpy.example_TRACK_netcdf_file.split("/")[-1])
print(huracanpy.example_TRACK_file.split("/")[-1])

sample.csv
tr_trs_pos.2day_addT63vor_addmslp_add925wind_add10mwind.tcident.new.nc
tr_trs_pos.2day_addT63vor_addmslp_add925wind_add10mwind.tcident.new


## CSV

A CSV is a useful way of storing track data. If you tracks are stored in csv (including
if they were outputed from TempestExtremes' StitchNodes), you can specify the
`tracker="csv"` argument, or, if your filename ends with *csv*, it will be detected
automatically.

`huracanpy.load` will read most of the CSV file as it is to output as an
`xarray.Dataset`. There can be a few extra modifications
to make sure the output has the variables `track_id`, `time`, `lon`, and `lat`.
For example, in the file used here, the time variable is constructed from
`year`, `month`, `day`, and `hour`.



In [8]:
huracanpy.load(huracanpy.example_csv_file)

## NetCDF

Similar to CSV, NetCDF data can largely be loaded as is. NetCDF has the disadvantage of
not being readable like a CSV, but the advantage that it can better store metadata about
variables.

The only assumption about the NetCDF file, is that it is using the CF convention

http://cfconventions.org/Data/cf-conventions/cf-conventions-1.11/cf-conventions.html#_contiguous_ragged_array_representation_of_trajectories

This allows the load function to identify the TRACK_ID and extend it along the data
dimension. Unlike loading CSV data, variables are not currently renamed. In the example
here `track_id` is upper case and the positions are `longitude` and `latitude` because
that is how they are named in the file.

In [7]:
huracanpy.load(huracanpy.example_TRACK_netcdf_file)

## TRACK

Note that TRACK files don't contain the variable names, instead they are usually
described in the filename. Currently `huracanpy.load` doesn't try to infer the variable
names from the filename. Instead, any extra variables will be named feature_n, where
n is between 0 and number of variables minus 1. TRACK also associates extra coordinates
with some of these features, these will be loaded as feature_n_longitude and
feature_n_latitude.

In [9]:
huracanpy.load(huracanpy.example_TRACK_file, tracker="TRACK")

If you want to load the variables by name, then pass a list of variable names to
`huracanpy.load`. The associated longitudes/latitudes are associated to the respective
feature names.

In [12]:
variable_names = [
    *[f"vorticity_{n}hPa" for n in [850, 700, 600, 500, 400, 300, 200]],
    "mslp",
    "vmax_925hPa",
    "vmax_10m",
]
huracanpy.load(
    huracanpy.example_TRACK_file, tracker="TRACK", variable_names=variable_names
)

## IBTrACS
`huracanpy` includes a subset of the IBTrACS dataset to use 

In [8]:
# ibtracs_subset is "wmo" or "usa" which correspond to the slp/variables used
huracanpy.load(tracker="ibtracs", ibtracs_subset="wmo", ibtracs_online=False)

                  It was last updated on the 24nd May 2024, based on the IBTrACS file at that date.
                  It contains only data from 1980 up to the last year with no provisional tracks. All spur tracks were removed. Only 6-hourly time steps were kept.
                      Be aware of the fact that wind and pressure data is provided as they are in IBTrACS,                       which means in particular that wind speeds are in knots and averaged over different time periods.
                    For more information, see the IBTrACS column documentation at https://www.ncei.noaa.gov/sites/default/files/2021-07/IBTrACS_v04_column_documentation.pdf


You can download the full IBTrACS dataset by setting `ibtracs_online=True`. In this case
the subset refers to the official IBTrACS subsets.

`huracanpy` won't load locally saved copies of IBTrACS. We would recommend downloading
once with `ibtracs_online=True` and subsetting then saving a copy as CSV or NetCDF with
`ibtracs.save`. Also note that the NetCDF files provided by IBTrACS are not (currently)
compatible with `huracanpy` because the format is different.

In [31]:
# Not running this code for the documentation since it downloads the file when run
# huracanpy.load(tracker="IBTrACS", subset="ALL", ibtracs_online=True)

  tracks = pd.read_csv(filename, **read_csv_kws)


## Saving data

In [13]:
tracks = huracanpy.load(huracanpy.example_csv_file)
huracanpy.save(tracks, "saved_data.csv")
huracanpy.save(tracks, "saved_data.nc")

In [18]:
!head -5 saved_data.csv

track_id,year,month,day,hour,i,j,lon,lat,slp,zs,wind10,time
0,1980,1,6,6,482,417,120.5,-14.25,99876.38,-10.7118,14.64815,1980-01-06 06:00:00
0,1980,1,6,12,476,419,119.0,-14.75,99811.0,-16.10522,13.98848,1980-01-06 12:00:00
0,1980,1,6,18,476,420,119.0,-15.0,99536.94,-40.20874,13.69575,1980-01-06 18:00:00
0,1980,1,7,0,477,420,119.25,-15.0,99414.56,-50.43206,17.97812,1980-01-07 00:00:00


In [20]:
!ncdump -h saved_data.nc

netcdf saved_data {
dimensions:
	record = 99 ;
	trajectory = 3 ;
variables:
	int64 record(record) ;
	int64 track_id(trajectory) ;
	int64 year(record) ;
	int64 month(record) ;
	int64 day(record) ;
	int64 hour(record) ;
	int64 i(record) ;
	int64 j(record) ;
	double lon(record) ;
		lon:_FillValue = NaN ;
	double lat(record) ;
		lat:_FillValue = NaN ;
	double slp(record) ;
		slp:_FillValue = NaN ;
	double zs(record) ;
		zs:_FillValue = NaN ;
	double wind10(record) ;
		wind10:_FillValue = NaN ;
	int64 time(record) ;
		time:units = "hours since 1980-01-06 06:00:00" ;
		time:calendar = "proleptic_gregorian" ;
	int64 rowSize(trajectory) ;
		rowSize:sample_dimension = "record" ;
}
