# Demo Workflow

In [None]:
import os
import warnings

import intake
import shapely.wkt
import pandas as pd
import xarray as xr

import model_catalogs as mc

In [None]:
# suppess warnings when building "complete" model files in dependencies
warnings.filterwarnings('ignore')

## Source catalog

This is useful for getting baseline metadata about a model. The source catalog is all hard-wired and does not access the actual model files.

In [None]:
source_cat = mc.setup_source_catalog()

See what model options are available with:

In [None]:
list(source_cat)

### Complete source catalog files

If the "complete" versions of the models, which contain the boundary information for the model, are not yet available, a message will print and you can run them with:

`source_cat = mc.complete_source_catalog()`

Or, if you know you want to rerun the "complete" model files you can just force it to rerun them all by calling `source_cat = mc.complete_source_catalog()` in the first place. Or, delete the "complete" catalog directory.

In [None]:
%%time
source_cat = mc.complete_source_catalog()

### Examine metadata of models in source catalog

Now the source catalog has all hard-wired information about the models plus the domain boundary information. This is available in `source_cat`. 

#### Domain boundaries

In [None]:
P = shapely.wkt.loads(source_cat['CBOFS'].metadata['geospatial_bounds'])
P

#### Variables

A mapping for the relevant variables to NOAA applications has been written into each source catalog. The mapping is between a standard CF convention variable name to the model dataset variable name. The list of possible variables used is:

        eastward_sea_water_velocity
        eastward_wind
        northward_sea_water_velocity
        northward_wind
        sea_floor_depth
        sea_ice_area_fraction
        sea_ice_thickness
        sea_surface_height_above_mean_sea_level
        sea_water_temperature
        sea_water_practical_salinity
        time
        upward_sea_water_velocity

Examine the variable mapping for a given model:

In [None]:
source_cat['CBOFS']['forecast'].metadata['standard_names']

## Find available for model output

You can query a specific model for its availability for specifically the forecast and hindcast timing, as below. This is not a necessary step, but is useful for a user choosing the model to use and what is possible. The date range information is saved in an "updated" version of the model catalog file.

This will take 10 seconds to 1 minute if the model availability has not been checked recently "enough", according to a "stale" parameter. Currently a forecast is stale after 4 hours and a hindcast is stale after 1 day.

In [None]:
%%time
cat = mc.find_availability(model='CIOFS')

In [None]:
intake.open_catalog(cat.path)

This is fast the second time when the "updated" version of the model catalog is fresh instead of stale.

In [None]:
%%time
cat = mc.find_availability(model='CIOFS')

Timing metadata is now available:

In [None]:
print('forecast: ', cat['forecast'].metadata['start_datetime'], ' to ', cat['forecast'].metadata['end_datetime'])
print('hindcast: ', cat['hindcast'].metadata['start_datetime'], ' to ', cat['hindcast'].metadata['end_datetime'])

## Request model output for desired date range

Use this to then actually read model output in. Since `find_availability` was previous run, the model knows when it is available and can decide which source to use for the user-defined date range, as shown here. This uses the catalog file found from running `find_availability()`.

In [None]:
%%time
start_date = '2020-01-01'
cat = mc.add_url_path(cat, start_date=start_date, end_date=start_date)
cat

An alternative approach is to skip the `find_availability` step, instead use the catalog directly from the source catalog, and input the source timing to use with the date range — this will save a little time for the power user that already has a good guess as to what will work (as opposed to also having to run `find_availability()` first).

In [None]:
%%time

source_cat = mc.setup_source_catalog()

today = pd.Timestamp.today()

model = "LMHOFS"
cat2 = mc.add_url_path(source_cat[model], timing="nowcast", start_date=today, end_date=today)
cat2

## Read in model output

In [None]:
%%time
ds = cat['CIOFS'].to_dask()
ds

## Other topics

### NOAA OFS models: how to use filetypes besides default 3D "fields"

All NOAA OFS model configurations are available with 3D fields filetypes. However, for some models there are other filetypes:
* `regular_grid`: model output interpolated to rectilinear grid
* `2ds`: only surface model output, variable names changed

You can see what model configurations are available, specifically including filetypes, by looking at the source catalog:

In [None]:
list(source_cat)

In [None]:
start_date = pd.Timestamp.today()
cat3 = mc.add_url_path(source_cat['TBOFS_REGULARGRID'], timing='forecast', start_date=start_date, end_date=start_date)
ds3 = cat3['TBOFS_REGULARGRID'].to_dask()
ds3

### Access variables and axis

Metadata has been added to the model Datasets to facilitate certain variable and axis accessibility. The variable access is set up by adding standard variable names to the datasets when they are opened, based on the variables available, and the axis accessibility is first by adding some attributes and then by using `cf-xarray`. See the following examples.

Filter an xarray Dataset by the attribute of `standard_name` (returns an xarray Dataset):

In [None]:
ds.filter_by_attrs(standard_name='sea_water_practical_salinity')

Alternatively, you could back out the variable name and use it directly (returns an xarray DataArray):

In [None]:
varname = cat['CIOFS'].metadata['standard_names']['sea_water_practical_salinity']
ds[varname]

Refer to time axis without knowing the time variable name with `cf-xarray`:

In [None]:
ds.cf['T']