# Building ERDDAP Datasets

This notebook documents the process of creating XML fragments
for nowcast system run results files
for inclusion in `/results/erddap-datasets/datasets.xml`
which is symlinked to `/opt/tomcat/content/erddap/datasets.xml`
on the `skookum` ERDDAP server instance.

The contents are a combination of:

* instructions for using the
`GenerateDatasetsXml.sh` and `DasDds.sh` tools found in the
`/opt/tomcat/webapps/erddap/WEB-INF/` directory
* instructions for forcing the server to update the datasets collection
via the `/results/erddap/flags/` directory
* code and metadata to transform the output of `GenerateDatasetsXml.sh`
into XML fragments that are ready for inclusion in `/results/erddap-datasets/datasets.xml`

In [1]:
from collections import OrderedDict

from lxml import etree

**NOTE**

The next cell mounts the `/results` filesystem on `skookum` locally.
It is intended for use if when this notebook is run on a laptop 
or other non-Waterhole machine that has `sshfs` installed 
and a mount point for `/results` available in its root filesystem.

Don't execute the cell if that doesn't describe your situation.

In [21]:
!sshfs skookum:/results /results

# Metadata for All Datasets

The `metadata` dictionary below contains information for dataset
attribute tags whose values need to be changed,
or that need to be added for all datasets.

The keys are the dataset attribute names.

The values are dicts containing a required `text` item
and perhaps an optional `after` item.

The value associated with the `text` key is the text content
for the attribute tag.

When present,
the value associated with the `after` key is the name
of the dataset attribute after which a new attribute tag
containing the `text` value is to be inserted.

In [2]:
metadata = OrderedDict([
    ('infoUrl', {
        'text': 
            'https://salishsea-meopar-docs.readthedocs.io/en/latest/results_server/index.html#salish-sea-model-results',
    }),
    ('institution', {
        'text': 'UBC EOAS', 
        'after': 'infoUrl',
    }),
    ('institution_fullname', {
        'text': 'Earth, Ocean & Atmospheric Sciences, University of British Columbia',
        'after': 'institution',
    }),
    ('license', {
        'text': '''The Salish Sea MEOPAR NEMO model results are copyright
by the Salish Sea MEOPAR Project Contributors and The University of British Columbia.

They are licensed under the Apache License, Version 2.0. https://www.apache.org/licenses/LICENSE-2.0''',
    }),
    ('project', {
        'text':'Salish Sea MEOPAR NEMO Model',
        'after': 'title',
    }),
    ('creator_name', {
        'text': 'Salish Sea MEOPAR Project Contributors',
        'after': 'project',
    }),
    ('creator_email', {
        'text': 'sallen@eoas.ubc.ca',
        'after': 'creator_name',
    }),
    ('creator_url', {
        'text': 'https://salishsea-meopar-docs.readthedocs.io/',
        'after': 'creator_email',
    }),
    ('acknowledgement', {
        'text': 'MEOPAR, ONC, Compute Canada',
        'after': 'creator_url',
    }),
    ('drawLandMask', {
        'text': 'over',
        'after': 'acknowledgement',
    }),
])

# Dataset Attributes

The `datasets` dictionary below provides the content
for the dataset `title` and `summary` attributes.

The `title` attribute content appears in the the datasets list table
(among other places).
It should be `<`80 characters long,
and note that only the 1st 40 characters will appear in the table.

The `summary` attribute content appears
(among other places)
when a user hovers the cursor over the `?` icon beside the `title`
content in the datasets list table.
The text that is inserted into the `summary` attribute tag
by code later in this notebook is the
`title` content followed by the `summary` content,
separated by a blank line.

The keys of the `datasets` dict are the `datasetID` strings that
are used in many places by the ERDDAP server.
They are structured as follows:

* `ubc` to indicate that the dataset was produced at UBC
* `SS` to indicate that the dataset is a product of the Salish Sea NEMO model
* a few letters to indicate the model runs that produce the dataset:

  * `n` to indicate that the dataset is from a nowcast run,
  * `f` for forecast,
  * `f2` for forecast2 (aka preliminary forecast),
  * `g` for nowcast-green,
  * `a` for atmospheric forcing,
* a description of the dataset variables; e.g. `PointAtkinsonSSH` or `3DuVelocity`
* the time interval of values in the dataset; e.g. `15m`, `1h`, `1d`
* the dataset version; e.g. `V16-10`, or `V1`

Versioning was changed to a [CalVer](http://calver.org/) type scheme in Oct-2016.
Thereafter versions are of the form `Vyy-mm` and indicate the year and month when the dataset entered production.

So:

* `ubcSSnPointAtkinsonSSH15mV1` is the version 1 dataset of 15 minute averaged sea surface height values at Point Atkinson from `PointAtkinson.nc` output files

* `ubcSSn3DwVelocity1hV2` is the version 2 dataset of 1 hr averaged vertical (w) velocity values over the entire domain from `SalishSea_1h_*_grid_W.nc` output files

* `ubcSSnSurfaceTracers1dV1` is the version 1 dataset of daily averaged surface tracer values over the entire domain from `SalishSea_1d_*_grid_T.nc` output files

* `ubcSSnBathymetry2V16-07`  is the version 16-07 dataset of longitude, latitude, and bathymetry of the Salish Sea NEMO model grid that came into use in Jul-2016.
  The corresponding NEMO-generated mesh mask variables are in the `ubcSSn2DMeshMaskDbo2V16-07` (y, x variables),
  and the `ubcSSn3DMeshMaskDbo2V16-07` (z, y, x variables) datasets.

The dataset version part of the `datasetID` is used to indicate changes in the variables
contained in the dataset.
For example,
the transition from the `ubcSSn3DwVelocity1hV1` to the `ubcSSn3DwVelocity1hV2` dataset
occurred on 24-Jan-2016 when we started to output vertical eddy viscosity and diffusivity
values at the `w` grid points.

All dataset ids end with their version identifier and their `summary` ends with a notation about the variables
that they contain; e.g.
```
v1: wVelocity variable
```
When the a dataset version is incremented a line describing the change is added
to the end of its `summary`; e.g.
```
v1: wVelocity variable
v2: Added eddy viscosity & diffusivity variables ve_eddy_visc & ve_eddy_diff
```

## Bathymetries

In [3]:
datasets = {
    'ubcSSnBathymetry2V1' :{
        'type': 'geolocation bathymetry',
        'title': 'Salish Sea NEMO Model Grid, Geo-location and Bathymetry, v1',
        'summary':'''Longitude, latitude, and bathymetry of the Salish Sea NEMO model grid.
The bathymetry values are those calculated by NEMO from the input bathymetry file.
NEMO modifies the input bathymetry to remove isolated holes, and too-small partial steps.
The model grid includes the Juan de Fuca Strait, the Strait of Georgia, Puget Sound,
and Johnstone Strait on the coasts of Washington State and British Columbia.

v1: longitude, latitude and bathymetry variables
''',
        'fileNameRegex': '.*SalishSea2_NEMO_bathy\.nc$'
        
    },
    
    'ubcSSnBathymetry2V16-07' :{
        'type': 'geolocation bathymetry',
        'title': 'Salish Sea NEMO Model Grid, Geo-location and Bathymetry, v16-07',
        'summary':'''Longitude, latitude, and bathymetry of the Salish Sea NEMO model grid.
The bathymetry values are those calculated by NEMO from the input bathymetry file.
NEMO modifies the input bathymetry to remove isolated holes, and too-small partial steps.
The model grid includes the Juan de Fuca Strait, the Strait of Georgia, Puget Sound,
and Johnstone Strait on the coasts of Washington State and British Columbia.

v1: longitude, latitude and bathymetry variables
v16-07: same variables,
        bathymetry uniformly deepened by 1 grid level,
        smoothed at Juan de Fuca & Johnstone Strait open boundaries,
        Fraser River lengthened,
        bathymetry deepened near mouth of Fraser River
''',
        'fileNameRegex': '.*downbyone2_NEMO_bathy\.nc$'
        
    },
    
    'ubcSSnBathymetryV17-02' :{
        'type': 'geolocation bathymetry',
        'title': 'Salish Sea NEMO Model Grid, Geo-location and Bathymetry, v17-02',
        'keywords': '''bathymetry, depth, nav_lat, nav_lon, ocean, oceans,
Oceans > Bathymetry/Seafloor Topography > Bathymetry,
Salish Sea, sea floor, sea_floor_depth, seafloor, topography''',
        'summary':'''Longitude, latitude, and bathymetry of the Salish Sea NEMO model grid.
The bathymetry values are those calculated by NEMO from the input bathymetry file.
NEMO modifies the input bathymetry to remove isolated holes, and too-small partial steps;
See the ubcSSn2DMeshMaskV17-02 dataset for the complete details of the calculation grid.
The model grid includes the Juan de Fuca Strait, the Strait of Georgia, Puget Sound,
and Johnstone Strait on the coasts of Washington State and British Columbia.

v1: longitude, latitude and bathymetry variables
v16-07: same variables,
        bathymetry uniformly deepened by 1 grid level,
        smoothed at Juan de Fuca & Johnstone Strait open boundaries,
        Fraser River lengthened,
        bathymetry deepened near mouth of Fraser River
v17-02: same variables,
        Bathymetry composed from 3 datasets:
        * USGS Digital elevation model (DEM) of Cascadia, latitude 39N-53N, longitude 116W-133W, Open-File Report 99-369, https://pubs.er.usgs.gov/publication/ofr99369
        * NOAA British Columbia, 3 arc-second MSL DEM, https://www.ngdc.noaa.gov/dem/squareCellGrid/download/4956
        * CHS Multibeam data and all point cloud data for the Salish Sea. 
        Smoothed at Juan de Fuca & Johnstone Strait open boundaries.
        Proxy channel for Fraser River upstream of confluence with the Pitt River.
        Adjustments by Michael Dunphy to make increase resolution of Fraser River channels downstream of confluence with the Pitt River.
''',
        'fileNameRegex': '.*bathymetry_201702\.nc$'
        
    },
}

## Mesh Masks

In [4]:
datasets.update({
    'ubcSSn2DMeshMask2V1': {
        'type': 'geolocation bathymetry',
        'title': 'Salish Sea NEMO Model Grid, 2D Mesh Mask, v1',
        'summary':'''NEMO grid variable value for the u-v plane of the 
Salish Sea NEMO model Arakawa-C grid.
The values are those calculated by NEMO from the input coordinates and bathymetry files.
The variable names are those used by NEMO-3.4,
see the NEMO-3.4 book (http://www.nemo-ocean.eu/Media/Files/NEMO_book_V3_4.pdf) for details,
or the long_name attributes of the variables for succinct descriptions of the variables.
The model grid includes the Juan de Fuca Strait, the Strait of Georgia, Puget Sound,
and Johnstone Strait on the coasts of Washington State and British Columbia.

v1: e1t, e2t, e1u, e2u, e1v, e2v, e1f, e2f, glamt, gphit, glamu, gphiu, glamv, gphiv, 
    tmaskutil, umaskutil, vmaskutil, fmaskutil, ff, mbathy variables
''',
        'fileNameRegex': '.*mesh_mask_SalishSea2\.nc$',
    },

    'ubcSSn2DMeshMask2V16-07': {
        'type': 'geolocation bathymetry',
        'title': 'Salish Sea NEMO Model Grid, 2D Mesh Mask, v16-07',
        'summary':'''NEMO grid variable value for the u-v plane of the 
Salish Sea NEMO model Arakawa-C grid.
The values are those calculated by NEMO from the input coordinates and bathymetry files.
The variable names are those used by NEMO-3.6,
see the NEMO-3.6 book (http://www.nemo-ocean.eu/Media/Files/NEMO_book_V3_6.pdf) for details,
or the long_name attributes of the variables for succinct descriptions of the variables.
The model grid includes the Juan de Fuca Strait, the Strait of Georgia, Puget Sound,
and Johnstone Strait on the coasts of Washington State and British Columbia.

v1: e1t, e2t, e1u, e2u, e1v, e2v, e1f, e2f, glamt, gphit, glamu, gphiu, glamv, gphiv, 
    tmaskutil, umaskutil, vmaskutil, fmaskutil, ff, mbathy variables
v16-07: e1t, e2t, e1u, e2u, e1v, e2v, e1f, e2f, glamt, gphit, glamu, gphiu, glamv, gphiv, 
        glamf, gphif, tmaskutil, umaskutil, vmaskutil, fmaskutil, ff, mbathy variables
''',
        'fileNameRegex': '.*mesh_mask_downbyone2\.nc$',
    },

    'ubcSSn3DMeshMask2V1': {
        'type': 'geolocation bathymetry',
        'title': 'Salish Sea NEMO Model Grid, 3D Mesh Mask, v1',
        'summary':'''NEMO grid variable value for the Salish Sea NEMO model Arakawa-C grid.
The values are those calculated by NEMO from the input coordinates and bathymetry files.
The variable names are those used by NEMO-3.4,
see the NEMO-3.4 book (http://www.nemo-ocean.eu/Media/Files/NEMO_book_V3_4.pdf) for details,
or the long_name attributes of the variables for succinct descriptions of the variables.
The model grid includes the Juan de Fuca Strait, the Strait of Georgia, Puget Sound,
and Johnstone Strait on the coasts of Washington State and British Columbia.

v1: e3t, e3u, e3v, e3w, gdept, gdepu, gdepv, gdepw, tmask, umask, vmask, fmask variables
''',
        'fileNameRegex': '.*mesh_mask_SalishSea2\.nc$'
    },
    'ubcSSn3DMeshMask2V16-07': {
        'type': 'geolocation bathymetry',
        'title': 'Salish Sea NEMO Model Grid, 3D Mesh Mask, v16-07',
        'summary':'''NEMO grid variable value for the Salish Sea NEMO model Arakawa-C grid.
The values are those calculated by NEMO from the input coordinates and bathymetry files.
The variable names are those used by NEMO-3.6,
see the NEMO-3.6 book (http://www.nemo-ocean.eu/Media/Files/NEMO_book_V3_6.pdf) for details,
or the long_name attributes of the variables for succinct descriptions of the variables.
The model grid includes the Juan de Fuca Strait, the Strait of Georgia, Puget Sound,
and Johnstone Strait on the coasts of Washington State and British Columbia.

v1: e3t_0, e3u_0, e3v_0, e3w_0, gdept_0, gdepu, gdepv, gdepw_0, tmask, umask, vmask, fmask variables
v16-07: e3t, e3u, e3v, e3w, gdept, gdepu, gdepv, gdepw, tmask, umask, vmask, fmask variables
''',
        'fileNameRegex': '.*mesh_mask_downbyone2\.nc$'
    },
})

datasets['ubcSSn2DMeshMaskV17-02'] = datasets['ubcSSn2DMeshMask2V16-07']
datasets['ubcSSn2DMeshMaskV17-02'].update({
    'title': datasets['ubcSSn2DMeshMask2V16-07']['title'].replace(', v16-07', ', v17-02'),
    'keywords': '''
''',
    'summary': datasets['ubcSSn2DMeshMask2V16-07']['summary'] + '''
v17-02: same variables as v16-07''',
    'fileNameRegex': '.*mesh_mask_201702\.nc$',
})

datasets['ubcSSn3DMeshMaskV17-02'] = datasets['ubcSSn3DMeshMask2V16-07']
datasets['ubcSSn3DMeshMaskV17-02'].update({
    'title': datasets['ubcSSn3DMeshMask2V16-07']['title'].replace(', v16-07', ', v17-02'),
    'keywords': '''
''',
    'summary': datasets['ubcSSn3DMeshMask2V16-07']['summary'] + '''
v17-02: tmask, umask, vmask, fmask, e3t_0, e3u_0, e3v_0, e3w_0, gdept_0, gdepu, gdepv, gdepw_0 variables''',
    'fileNameRegex': '.*mesh_mask_201702\.nc$',
})

## Surface Atmospheric Forcing Fields

In [5]:
datasets.update({
    'ubcSSaSurfaceAtmosphereFieldsV1': {
        'type': 'surface fields',
        'title': 'HRDPS, Salish Sea, Atmospheric Forcing Fields, Hourly, v1',
        'summary': '''2d hourly atmospheric field values from the
Environment Canada HRDPS atmospheric forcing model that are used to force the Salish Sea NEMO model.
The model grid includes the Juan de Fuca Strait, the Strait of Georgia, Puget Sound,
and Johnstone Strait on the coasts of Washington State and British Columbia.
Geo-location data for the atmospheric forcing grid are available in the ubcSSaAtmosphereGridV1 dataset.
Atmospheric field values are interpolated on to the Salish Sea NEMO model grid on-the-fly by NEMO.

v1: atmospheric pressure, precipitation rate, 2m specific humidity, 2m air temperature,
short-wave radiation flux, long-wave radiation flux, 10m u wind component, 10m v wind component variables
''',
        'fileNameRegex': '.*ops_y\d{4}m\d{2}d\d{2}\.nc$',
    },
})

## Nowcast
### Velocity Fields

In [6]:
datasets.update({
    'ubcSSn3DuVelocity1hV1': {
        'type': '3d fields',
        'title': 'Nowcast, Salish Sea, 3d u Velocity Field, Hourly, v1',
        'summary': '''3d zonal (u) component velocity field values averaged over 1 hour intervals
from Salish Sea NEMO model nowcast runs. The values are calculated for the entire model grid
that includes the Juan de Fuca Strait, the Strait of Georgia, Puget Sound,
and Johnstone Strait on the coasts of Washington State and British Columbia.
The time values are the centre of the intervals over which the calculated model results are averaged.
Geo-location and depth data for the Salish Sea NEMO model grid are available in the ubcSSnBathymetry2V1 dataset.

v1: uVelocity variable
''',
        'fileNameRegex': '.*SalishSea_1h_\d{8}_\d{8}_grid_U\.nc$',
    },
    
    'ubcSSn3DvVelocity1hV1': {
        'type': '3d fields',
        'title': 'Nowcast, Salish Sea, 3d v Velocity Field, Hourly, v1',
        'summary': '''3d meridional (v) component velocity field values averaged over 1 hour intervals
from Salish Sea NEMO model nowcast runs. The values are calculated for the entire model grid
that includes the Juan de Fuca Strait, the Strait of Georgia, Puget Sound,
and Johnstone Strait on the coasts of Washington State and British Columbia.
The time values are the centre of the intervals over which the calculated model results are averaged.
Geo-location and depth data for the Salish Sea NEMO model grid are available in the ubcSSnBathymetry2V1 dataset.

v1: vVelocity variable
''',
        'fileNameRegex': '.*SalishSea_1h_\d{8}_\d{8}_grid_V\.nc$',
    },
    
    'ubcSSn3DwVelocity1hV1': {
        'type': '3d fields',
        'title': 'Nowcast, Salish Sea, 3d w Velocity Field, Hourly, v1',
        'summary': '''3d vertical (w) component velocity field values averaged over 1 hour intervals
from Salish Sea NEMO model nowcast runs. The values are calculated for the entire model grid
that includes the Juan de Fuca Strait, the Strait of Georgia, Puget Sound,
and Johnstone Strait on the coasts of Washington State and British Columbia.
The time values are the centre of the intervals over which the calculated model results are averaged.
Geo-location and depth data for the Salish Sea NEMO model grid are available in the ubcSSnBathymetry2V1 dataset.

v1: wVelocity variable
''',
        'fileNameRegex': '.*SalishSea_1h_\d{8}_\d{8}_grid_W\.nc$',
    },
})

datasets['ubcSSn3DwVelocity1hV2'] = datasets['ubcSSn3DwVelocity1hV1']
datasets['ubcSSn3DwVelocity1hV2'].update({
    'title': datasets['ubcSSn3DwVelocity1hV1']['title'].replace(', v1', ', v2'),
    'summary': datasets['ubcSSn3DwVelocity1hV1']['summary'] + '''
v2: Added eddy viscosity & diffusivity variables ve_eddy_visc & ve_eddy_diff''',
})

datasets['ubcSSn3DuVelocity1hV16-10'] = datasets['ubcSSn3DuVelocity1hV1']
datasets['ubcSSn3DuVelocity1hV16-10'].update({
    'title': datasets['ubcSSn3DuVelocity1hV1']['title'].replace(', v1', ', v16-10'),
    'summary': datasets['ubcSSn3DuVelocity1hV1']['summary']
        .replace('ubcSSnBathymetry2V1', 'ubcSSnBathymetry2V1-07') + '''
v16-10: NEMO-3.6; ubcSSnBathymetry2V16-07 bathymetry; see infoUrl link for full details.'''
})

datasets['ubcSSg3DuGridFields1hV17-02'] = datasets['ubcSSn3DuVelocity1hV16-10']
datasets['ubcSSg3DuGridFields1hV17-02'].update({
    'title': 'Green, Salish Sea, 3d u Grid Variable Fields, Hourly, v17-02',
    'keywords': '''circulation, current, currents, depthu, u grid, ocean, oceans,
Oceans > Ocean Circulation > Ocean Currents,
sea, sea_water_x_velocity, seawater, time_counter, u velocity component, velocity along x-axis, vozocrtx''',
    'summary': '''Variable values at the 3d zonal (u) component velocity grid points averaged over 1 hour intervals
from Salish Sea NEMO model runs with physics and biology. The values are calculated for the entire model grid
that includes the Juan de Fuca Strait, the Strait of Georgia, Puget Sound,
and Johnstone Strait on the coasts of Washington State and British Columbia.
The time values are the centre of the intervals over which the calculated model results are averaged.
Geo-location and depth data for the Salish Sea NEMO model grid are available in the ubcSSnBathymetryV17-02 dataset.

v1: uVelocity variable
v16-10: NEMO-3.6; ubcSSnBathymetry2V16-07 bathymetry; see infoUrl link for full details.
v17-02: NEMO-3.6; ubcSSnBathymetryV17-02 bathymetry; see infoUrl link for full details.
''',
})

datasets['ubcSSn3DvVelocity1hV16-10'] = datasets['ubcSSn3DvVelocity1hV1']
datasets['ubcSSn3DvVelocity1hV16-10'].update({
    'title': datasets['ubcSSn3DvVelocity1hV1']['title'].replace(', v1', ', v16-10'),
    'summary': datasets['ubcSSn3DvVelocity1hV1']['summary']
        .replace('ubcSSnBathymetry2V1', 'ubcSSnBathymetry2V1-07') + '''
v16-10: NEMO-3.6; ubcSSnBathymetry2V16-07 bathymetry; see infoUrl link for full details.'''
})

datasets['ubcSSg3DvGridFields1hV17-02'] = datasets['ubcSSn3DvVelocity1hV16-10']
datasets['ubcSSg3DvGridFields1hV17-02'].update({
    'title': 'Green, Salish Sea, 3d v Grid Variable Fields, Hourly, v17-02',
    'keywords': '''circulation, current, currents, depthv, v grid, ocean, oceans,
Oceans > Ocean Circulation > Ocean Currents,
sea, sea_water_y_velocity, seawater, time_counter, v velocity component, velocity along y-axis, vomecrty''',
    'summary': '''Variable values at the 3d meridional (v) component velocity grid points averaged over 1 hour intervals
from Salish Sea NEMO model runs with physics and biology. The values are calculated for the entire model grid
that includes the Juan de Fuca Strait, the Strait of Georgia, Puget Sound,
and Johnstone Strait on the coasts of Washington State and British Columbia.
The time values are the centre of the intervals over which the calculated model results are averaged.
Geo-location and depth data for the Salish Sea NEMO model grid are available in the ubcSSnBathymetryV17-02 dataset.

v1: vVelocity variable
v16-10: NEMO-3.6; ubcSSnBathymetry2V16-07 bathymetry; see infoUrl link for full details.
v17-02: NEMO-3.6; ubcSSnBathymetryV17-02 bathymetry; see infoUrl link for full details.
''',
})

datasets['ubcSSn3DwVelocity1hV16-10'] = datasets['ubcSSn3DwVelocity1hV2']
datasets['ubcSSn3DwVelocity1hV16-10'].update({
    'title': datasets['ubcSSn3DwVelocity1hV2']['title'].replace(', v2', ', v16-10'),
    'summary': datasets['ubcSSn3DwVelocity1hV2']['summary']
        .replace('ubcSSnBathymetry2V1', 'ubcSSnBathymetry2V1-07') + '''
v16-10: NEMO-3.6; ubcSSnBathymetry2V16-07 bathymetry; see infoUrl link for full details.
        Renamed eddy viscosity & diffusivity variables to vert_eddy_visc & vert_eddy_diff
        Added turbulent kinetic energy dissipation rate variable.'''
})

datasets['ubcSSg3DwGridFields1hV17-02'] = datasets['ubcSSn3DwVelocity1hV16-10']
datasets['ubcSSg3DwGridFields1hV17-02'].update({
    'title': 'Green, Salish Sea, 3d w Grid Variable Fields, Hourly, v17-02',
    'keywords': '''circulation, currents, depthw, vertical eddy diffusivity, downwelling, w grid, ocean, oceans,
Oceans > Ocean Circulation > Diffusion,
Oceans > Ocean Circulation > Ocean Currents,
Oceans > Ocean Circulation > Upwelling/Downwelling,
sea, seawater, time_counter, turbulent kinetic energy dissipation rate, upward, upward_sea_water_velocity, upwelling, 
w velocity component, velocity along z-axis, vert_eddy_diff, vert_eddy_visc, vertical eddy viscosity, 
vovecrtz''',
    'summary': '''Variable values at the 3d vertical (w) component velocity grid points averaged over 1 hour intervals
from Salish Sea NEMO model runs with physics and biology. The values are calculated for the entire model grid
that includes the Juan de Fuca Strait, the Strait of Georgia, Puget Sound,
and Johnstone Strait on the coasts of Washington State and British Columbia.
The time values are the centre of the intervals over which the calculated model results are averaged.
Geo-location and depth data for the Salish Sea NEMO model grid are available in the ubcSSnBathymetryV17-02 dataset.

v1: wVelocity variable
v16-10: NEMO-3.6; ubcSSnBathymetry2V16-07 bathymetry; see infoUrl link for full details.
        Renamed eddy viscosity & diffusivity variables to vert_eddy_visc & vert_eddy_diff
        Added turbulent kinetic energy dissipation rate variable.
v17-02: NEMO-3.6; ubcSSnBathymetryV17-02 bathymetry; see infoUrl link for full details.
''',
})

### Physics Tracer Variable Fields

In [7]:
datasets.update({
    'ubcSSn3DTracerFields1hV1': {
        'type': '3d fields',
        'title': 'Nowcast, Salish Sea, 3d Tracer Fields, Hourly, v1',
        'summary': '''3d salinity and water temperature field values averaged over 1 hour intervals
from Salish Sea NEMO model nowcast runs. The values are calculated for the entire model grid
that includes the Juan de Fuca Strait, the Strait of Georgia, Puget Sound,
and Johnstone Strait on the coasts of Washington State and British Columbia.
The time values are the centre of the intervals over which the calculated model results are averaged.
Geo-location and depth data for the Salish Sea NEMO model grid are available in the ubcSSnBathymetry2V1 dataset.

v1: salinity (practical) and temperature variables
''',
        'fileNameRegex': '.*SalishSea_1h_\d{8}_\d{8}_grid_T\.nc$',
    },
    
    'ubcSSnSurfaceTracerFields1hV1': {
        'type': 'surface fields',
        'title': 'Nowcast, Salish Sea, Surface Tracer Fields, Hourly, v1',
        'summary': '''2d sea surface height and rainfall rate field values averaged over 1 hour intervals
from Salish Sea NEMO model nowcast runs. The values are calculated for the surface of the model grid
that includes the Juan de Fuca Strait, the Strait of Georgia, Puget Sound,
and Johnstone Strait on the coasts of Washington State and British Columbia.
The time values are the centre of the intervals over which the calculated model results are averaged.
Geo-location and depth data for the Salish Sea NEMO model grid are available in the ubcSSnBathymetry2V1 dataset.

v1: sea surface height and rainfall rate variables
''',
        'fileNameRegex': '.*SalishSea_1h_\d{8}_\d{8}_grid_T\.nc$',
    },
})

datasets['ubcSSn3DTracerFields1hV16-10'] = datasets['ubcSSn3DTracerFields1hV1']
datasets['ubcSSn3DTracerFields1hV16-10'].update({
    'title': datasets['ubcSSn3DTracerFields1hV1']['title'].replace(', v1', ', v16-10'),
    'summary': datasets['ubcSSn3DTracerFields1hV1']['summary']
        .replace('ubcSSnBathymetry2V1', 'ubcSSnBathymetry2V1-07') + '''
v16-10: NEMO-3.6; ubcSSnBathymetry2V16-07 bathymetry; see infoUrl link for full details.
        Changed salinity variable to reference salinity.
        Changed temperature variable to conservative temperature.
        Added squared buoyancy frequency variable.''',
})

datasets['ubcSSg3DTracerFields1hV17-02'] = datasets['ubcSSn3DTracerFields1hV16-10']
datasets['ubcSSg3DTracerFields1hV17-02'].update({
    'title': datasets['ubcSSn3DTracerFields1hV16-10']['title']
                .replace('Nowcast, ', 'Green, ')
                .replace(', v16-10', ', v17-02'),
    'keywords': '''buoy_n2, density, deptht, squared buoyancy frequency, grid, ocean, oceans,
Oceans > Ocean Temperature > Conservative Temperature,
Oceans > Salinity/Density > Salinity,
reference salinity, sea water, sea_water_conservative_temperature, sea_water_reference_salinity, seawater, 
conservative temperature, time_counter, vosaline, votemper, water''',
    'summary':  '''3d salinity, water temperature, and squared buoyancy frequency field values averaged over 1 hour intervals
from Salish Sea NEMO model runs with physics and biology. The values are calculated for the entire model grid
that includes the Juan de Fuca Strait, the Strait of Georgia, Puget Sound,
and Johnstone Strait on the coasts of Washington State and British Columbia.
The time values are the centre of the intervals over which the calculated model results are averaged.
Geo-location and depth data for the Salish Sea NEMO model grid are available in the ubcSSnBathymetry2V17-02 dataset.

v1: salinity (practical) and temperature variables
v16-10: NEMO-3.6; ubcSSnBathymetry2V16-07 bathymetry; see infoUrl link for full details.
        Changed salinity variable to reference salinity.
        Changed temperature variable to conservative temperature.
        Added squared buoyancy frequency variable.
v17-02: NEMO-3.6; ubcSSnBathymetryV17-02 bathymetry; see infoUrl link for full details.''',
})

datasets['ubcSSnSurfaceTracerFields1hV16-10'] = datasets['ubcSSnSurfaceTracerFields1hV1']
datasets['ubcSSnSurfaceTracerFields1hV16-10'].update({
    'title': datasets['ubcSSnSurfaceTracerFields1hV1']['title'].replace(', v1', ', v16-10'),
    'summary': datasets['ubcSSnSurfaceTracerFields1hV1']['summary']
        .replace('ubcSSnBathymetry2V1', 'ubcSSnBathymetry2V1-07') + '''
v16-10: NEMO-3.6; ubcSSnBathymetry2V16-07 bathymetry; see infoUrl link for full details.
        Deleted rainfall rate variable.''',
})

datasets['ubcSSgSurfaceTracerFields1hV17-02'] = datasets['ubcSSnSurfaceTracerFields1hV16-10']
datasets['ubcSSgSurfaceTracerFields1hV17-02'].update({
    'title': datasets['ubcSSnSurfaceTracerFields1hV16-10']['title']
                .replace('Nowcast, ', 'Green, ')
                .replace(', v16-10', ', v17-02'),
    'keywords': '''deptht, grid, ocean, oceans, 
Oceans > Ocean Circulation > Ocean Mixed Layer, 
Oceans > Sea Surface Topography > Sea Surface Height, 
mixed_depth, mixed_layer_depth, ocean_mixed_layer_thickness_defined_by_sigma_theta, 
mixed layer depth (dsigma = 0.01 wrt 10m), 
sea surface height, sea_surface_height_above_geoid, sea water, seawater, ssh, 
sossheig, time_counter, water''',
    'summary':   '''2d sea surface height and mixed layer depth field values averaged over 1 hour intervals
from Salish Sea NEMO model runs with physics and biology. The values are calculated for the surface of the model grid
that includes the Juan de Fuca Strait, the Strait of Georgia, Puget Sound,
and Johnstone Strait on the coasts of Washington State and British Columbia.
The time values are the centre of the intervals over which the calculated model results are averaged.
Geo-location and depth data for the Salish Sea NEMO model grid are available in the ubcSSnBathymetry2V17-02 dataset.

v1: sea surface height and rainfall rate variables
v16-10: NEMO-3.6; ubcSSnBathymetry2V16-07 bathymetry; see infoUrl link for full details.
        Deleted rainfall rate variable.
v17-02: NEMO-3.6; ubcSSnBathymetryV17-02 bathymetry; see infoUrl link for full details.
        Added mixed layer depth variable (dsigma = 0.01 wrt 10m)''',
})

### Biology Variable Fields

In [8]:
datasets.update({
    'ubcSSg3DBiologyFields1hV17-02': {
        'type': '3d fields',
        'title': 'Green, Salish Sea, 3d Biology Fields, Hourly, v17-02',
        'keywords': '''ammonia, ammonium, aquatic, biogenic, biogenic_silicon, biological,
Biological Classification > Protists > Diatoms,
biosphere,
Biosphere > Aquatic Ecosystems > Plankton > Zooplankton,
chemistry, ciliates, concentration, deptht, detritus, diatoms, dissolved, dissolved_organic_nitrogen, ecosystems,
flagellates, Fraser_tracer, Fraser River, marine, mesodinium rubrum, mesozooplankton, microzooplankton,
mole_concentration_of_ammonium_in_sea_water, fraser_river_turbidity_tracer, 
mole_concentration_of_diatoms_expressed_as_nitrogen_in_sea_water, 
mole_concentration_of_flagellates_expressed_as_nitrogen_in_sea_water, 
mole_concentration_of_mesodinium_rubrum_expressed_as_nitrogen_in_sea_water, 
mole_concentration_of_mesozooplankton_expressed_as_nitrogen_in_sea_water, 
mole_concentration_of_microzooplankton_expressed_as_nitrogen_in_sea_water, 
mole_concentration_of_nitrate_in_sea_water, mole_concentration_of_organic_detritus_expressed_as_nitrogen_in_sea_water, 
mole_concentration_of_organic_detritus_expressed_as_silicon_in_sea_water, 
mole_concentration_of_particulate_organic_matter_expressed_as_nitrogen_in_sea_water, 
mole_concentration_of_silicate_in_sea_water, n02, nh4, nitrate, nitrogen, no3, ocean, oceans,
Oceans > Marine Sediments > Sediment Chemistry,
Oceans > Marine Sediments > Sediment Composition,
Oceans > Marine Sediments > Suspended Solids,
Oceans > Ocean Chemistry > Ammonia,
Oceans > Ocean Chemistry > Nitrate,
Oceans > Ocean Chemistry > Nitrogen,
Oceans > Ocean Chemistry > Organic Matter,
Oceans > Ocean Chemistry > Silicate,
organic, particulate, particulate_organic_nitrogen, plankton, protists, river, sea, seawater, sediment, sediments, 
silicate, silicon, solids, suspended, time_counter, tracer, turbidity, water, zooplankton
''',
    'summary': '''3d SMELT biological model field values averaged over 1 hour intervals
from Salish Sea NEMO model runs with physics and biology. The values are calculated for the entire model grid
that includes the Juan de Fuca Strait, the Strait of Georgia, Puget Sound,
and Johnstone Strait on the coasts of Washington State and British Columbia.
The time values are the centre of the intervals over which the calculated model results are averaged.
Geo-location and depth data for the Salish Sea NEMO model grid are available in the ubcSSnBathymetry2V17-02 dataset.

v17-02: Micromolar concentrations of ammonium, biogenic silicon, ciliates (mesodinium rubrum),
        diatoms, dissolved organic nitrogen, flagellates (nanophytoplankton), mesozooplankton, microzooplankton,
        nitrate, particulate organic nitrogen, silicon.
        Fraser River water turbidity tracer.
        NEMO-3.6; ubcSSnBathymetryV17-02 bathymetry; see infoUrl link for full details.
''',
        'fileNameRegex': '.*SalishSea_1h_\d{8}_\d{8}_ptrc_T\.nc$',
    }
})

### Single Point Sea Surface Heights

In [59]:
datasets.update({
    'ubcSSnPointAtkinsonSSH15mV1': {
        'type': 'tide gauge',
        'title': 'Nowcast, Point Atkinson, Sea Surface Height, 15min, v1',
        'summary': '''Sea surface height values averaged over 15 minute intervals from
Salish Sea NEMO model nowcast runs. The values are calculated at the model grid point
closest to the Point Atkinson tide gauge station on the north side of English Bay,
near Vancouver, British Columbia.
The time values are the centre of the intervals over which the calculated model results are averaged.
Geo-location and depth data for the Salish Sea NEMO model grid are available in the ubcSSnBathymetry2V1 dataset.

v1: ssh variable''',
        'fileNameRegex': '.*PointAtkinson\.nc$',
    },
    
    'ubcSSnBoundaryBaySSH15mV16-10': {
        'type': 'tide gauge',
        'title': 'Nowcast, Boundary Bay, Sea Surface Height, 15min, v16-10',
        'summary': '''Sea surface height values averaged over 15 minute intervals from
Salish Sea NEMO model nowcast runs. The values are calculated at the model grid point
closest to the Point Atkinson tide gauge station on the north side of English Bay,
near Vancouver, British Columbia.
The time values are the centre of the intervals over which the calculated model results are averaged.
Geo-location and depth data for the Salish Sea NEMO model grid are available in the ubcSSnBathymetry2V16-07 dataset.

v1: ssh variable''',
        'fileNameRegex': '.*BoundaryBay\.nc$',
    },
    
    'ubcSSnCampbellRiverSSH15mV1': {
        'type': 'tide gauge',
        'title': 'Nowcast, Campbell River, Sea Surface Height, 15min, v1',
        'summary': '''Sea surface height values averaged over 15 minute intervals from
Salish Sea NEMO model nowcast runs. The values are calculated at the model grid point
closest to the Campbell River tide gauge station at the north end of the Strait of Georgia,
near Campbell River, British Columbia.
The time values are the centre of the intervals over which the calculated model results are averaged.
Geo-location and depth data for the Salish Sea NEMO model grid are available in the ubcSSnBathymetry2V1 dataset.

v1: ssh variable''',
        'fileNameRegex': '.*CampbellRiver\.nc$',
    },

    'ubcSSnCherryPointSSH15mV1': {
        'type': 'tide gauge',
        'title': 'Nowcast, Cherry Point, WA, Sea Surface Height, 15min, v1',
        'summary': '''Sea surface height values averaged over 15 minute intervals from
Salish Sea NEMO model nowcast runs. The values are calculated at the model grid point
closest to the Cherry Point, WA tide gauge station in the southern Strait of Georgia,
near Birch Bay, Washington.
The time values are the centre of the intervals over which the calculated model results are averaged.
Geo-location and depth data for the Salish Sea NEMO model grid are available in the ubcSSnBathymetry2V1 dataset.

v1: ssh variable''',
        'fileNameRegex': '.*CherryPoint\.nc$',
    },

    'ubcSSnFridayHarborSSH15mV1': {
        'type': 'tide gauge',
        'title': 'Nowcast, Friday Harbor, WA, Sea Surface Height, 15min, v1',
        'summary': '''Sea surface height values averaged over 15 minute intervals from
Salish Sea NEMO model nowcast runs. The values are calculated at the model grid point
closest to the Friday Harbor, WA tide gauge station at San Juan Island in Haro Strait,
near Friday Harbor, Washington.
The time values are the centre of the intervals over which the calculated model results are averaged.
Geo-location and depth data for the Salish Sea NEMO model grid are available in the ubcSSnBathymetry2V1 dataset.

v1: ssh variable''',
        'fileNameRegex': '.*FridayHarbor\.nc$',
    },

    'ubcSSnNanaimoSSH15mV1': {
        'type': 'tide gauge',
        'title': 'Nowcast, Nanaimo, Sea Surface Height, 15min, v1',
        'summary': '''Sea surface height values averaged over 15 minute intervals from
Salish Sea NEMO model nowcast runs. The values are calculated at the model grid point
closest to the Nanaimo tide gauge station on the west side of the central Strait of Georgia,
near Nanaimo, British Columbia.
The time values are the centre of the intervals over which the calculated model results are averaged.
Geo-location and depth data for the Salish Sea NEMO model grid are available in the ubcSSnBathymetry2V1 dataset.

v1: ssh variable''',
        'fileNameRegex': '.*Nanaimo\.nc$',
    },

    'ubcSSnNeahBaySSH15mV1': {
        'type': 'tide gauge',
        'title': 'Nowcast, Neah Bay, WA, Sea Surface Height, 15min, v1',
        'summary': '''Sea surface height values averaged over 15 minute intervals from
Salish Sea NEMO model nowcast runs. The values are calculated at the model grid point
closest to the Neah Bay, WA tide gauge station on the south side of the west end of the Juan de Fuca Strait,
near Neah Bay, Washington.
The time values are the centre of the intervals over which the calculated model results are averaged.
Geo-location and depth data for the Salish Sea NEMO model grid are available in the ubcSSnBathymetry2V1 dataset.

v1: ssh variable''',
        'fileNameRegex': '.*NeahBay\.nc$',
    },

    'ubcSSnVictoriaSSH15mV1': {
        'type': 'tide gauge',
        'title': 'Nowcast, Victoria, Sea Surface Height, 15min, v1',
        'summary': '''Sea surface height values averaged over 15 minute intervals from
Salish Sea NEMO model nowcast runs. The values are calculated at the model grid point
closest to the Victoria tide gauge station on the north side of the east end of the Juan de Fuca Strait,
in the Victoria Inner Harbour, near Victoria, British Columbia.
The time values are the centre of the intervals over which the calculated model results are averaged.
Geo-location and depth data for the Salish Sea NEMO model grid are available in the ubcSSnBathymetry2V1 dataset.

v1: ssh variable''',
        'fileNameRegex': '.*Victoria\.nc$',
    },

    'ubcSSnSandHeadsSSH15mV1': {
        'type': 'tide gauge',
        'title': 'Nowcast, Sand Heads, Sea Surface Height, 15min, v1',
        'summary': '''Sea surface height values averaged over 15 minute intervals from
Salish Sea NEMO model nowcast runs. The values are calculated at the model grid point
closest to the Sand Heads light station on the east side of the central Strait of Georgia,
near Steveston, British Columbia.
The time values are the centre of the intervals over which the calculated model results are averaged.
Geo-location and depth data for the Salish Sea NEMO model grid are available in the ubcSSnBathymetry2V1 dataset.

v1: ssh variable''',
        'fileNameRegex': '.*Sandheads\.nc$',
    },
})

datasets['ubcSSnPointAtkinsonSSH10mV16-10'] = datasets['ubcSSnPointAtkinsonSSH15mV1']
datasets['ubcSSnPointAtkinsonSSH10mV16-10'].update({
    'title': datasets['ubcSSnPointAtkinsonSSH15mV1']['title']
        .replace(', v1', ', v16-10')
        .replace(', 15min,', ', 10min,'),
    'summary': datasets['ubcSSnPointAtkinsonSSH15mV1']['summary']
        .replace('ubcSSnBathymetry2V1', 'ubcSSnBathymetry2V16-07')
        .replace(' 15 minute ', ' 10 minute ') + '''
v16-10: NEMO-3.6; ubcSSnBathymetry2V16-07 bathymetry; see infoUrl link for full details.'''
})

datasets['ubcSSgPointAtkinsonSSH10mV17-02'] = datasets['ubcSSnPointAtkinsonSSH10mV16-10']
datasets['ubcSSgPointAtkinsonSSH10mV17-02'].update({
    'title': datasets['ubcSSnPointAtkinsonSSH10mV16-10']['title']
        .replace('Nowcast,', 'Green,')
        .replace(', v16-10', ', v17-02')
        .replace(', 15min,', ', 10min,'),
    'keywords': '''10min, earth, geodetics, sea surface height above geoid, geoid, ocean, oceans,
Oceans > Sea Surface Topography > Sea Surface Height,
sea, sea_surface_height_above_geoid,
sossheig, ssh, surface, tides, time_counter, topography''',
    'summary': '''**WARNING** This is a provisional dataset. Use for review & testing only.
    
''' +
    datasets['ubcSSnPointAtkinsonSSH10mV16-10']['summary']
        .replace('ubcSSnBathymetry2V16-10', 'ubcSSnBathymetryV17-02')
        .replace('Salish Sea NEMO model nowcast runs.', 'Salish Sea NEMO model runs with physics and biology.') + '''
v17-02: NEMO-3.6; ubcSSnBathymetry2V17-02 bathymetry; see infoUrl link for full details.'''
})

datasets['ubcSSnCampbellRiverSSH10mV16-10'] = datasets['ubcSSnCampbellRiverSSH15mV1']
datasets['ubcSSnCampbellRiverSSH10mV16-10'].update({
    'title': datasets['ubcSSnCampbellRiverSSH15mV1']['title']
        .replace(', v1', ', v16-10')
        .replace(', 15min,', ', 10min,'),
    'summary': datasets['ubcSSnCampbellRiverSSH15mV1']['summary']
        .replace('ubcSSnBathymetry2V1', 'ubcSSnBathymetry2V1-07')
        .replace(' 15 minute ', ' 10 minute ') + '''
v16-10: NEMO-3.6; ubcSSnBathymetry2V16-07 bathymetry; see infoUrl link for full details.'''
})

datasets['ubcSSnCherryPointSSH10mV16-10'] = datasets['ubcSSnCherryPointSSH15mV1']
datasets['ubcSSnCherryPointSSH10mV16-10'].update({
    'title': datasets['ubcSSnCherryPointSSH15mV1']['title']
        .replace(', v1', ', v16-10')
        .replace(', 15min,', ', 10min,'),
    'summary': datasets['ubcSSnCherryPointSSH15mV1']['summary']
        .replace('ubcSSnBathymetry2V1', 'ubcSSnBathymetry2V1-07')
        .replace(' 15 minute ', ' 10 minute ') + '''
v16-10: NEMO-3.6; ubcSSnBathymetry2V16-07 bathymetry; see infoUrl link for full details.'''
})

datasets['ubcSSnFridayHarborSSH10mV16-10'] = datasets['ubcSSnFridayHarborSSH15mV1']
datasets['ubcSSnFridayHarborSSH10mV16-10'].update({
    'title': datasets['ubcSSnFridayHarborSSH15mV1']['title']
        .replace(', v1', ', v16-10')
        .replace(', 15min,', ', 10min,'),
    'summary': datasets['ubcSSnFridayHarborSSH15mV1']['summary']
        .replace('ubcSSnBathymetry2V1', 'ubcSSnBathymetry2V1-07')
        .replace(' 15 minute ', ' 10 minute ') + '''
v16-10: NEMO-3.6; ubcSSnBathymetry2V16-07 bathymetry; see infoUrl link for full details.'''
})

datasets['ubcSSnNanaimoSSH10mV16-10'] = datasets['ubcSSnNanaimoSSH15mV1']
datasets['ubcSSnNanaimoSSH10mV16-10'].update({
    'title': datasets['ubcSSnNanaimoSSH15mV1']['title']
        .replace(', v1', ', v16-10')
        .replace(', 15min,', ', 10min,'),
    'summary': datasets['ubcSSnNanaimoSSH15mV1']['summary']
        .replace('ubcSSnBathymetry2V1', 'ubcSSnBathymetry2V1-07')
        .replace(' 15 minute ', ' 10 minute ') + '''
v16-10: NEMO-3.6; ubcSSnBathymetry2V16-07 bathymetry; see infoUrl link for full details.'''
})

datasets['ubcSSnNeahBaySSH10mV16-10'] = datasets['ubcSSnNeahBaySSH15mV1']
datasets['ubcSSnNeahBaySSH10mV16-10'].update({
    'title': datasets['ubcSSnNeahBaySSH15mV1']['title']
        .replace(', v1', ', v16-10')
        .replace(', 15min,', ', 10min,'),
    'summary': datasets['ubcSSnNeahBaySSH15mV1']['summary']
        .replace('ubcSSnBathymetry2V1', 'ubcSSnBathymetry2V1-07')
        .replace(' 15 minute ', ' 10 minute ') + '''
v16-10: NEMO-3.6; ubcSSnBathymetry2V16-07 bathymetry; see infoUrl link for full details.'''
})

datasets['ubcSSnVictoriaSSH10mV16-10'] = datasets['ubcSSnVictoriaSSH15mV1']
datasets['ubcSSnVictoriaSSH10mV16-10'].update({
    'title': datasets['ubcSSnVictoriaSSH15mV1']['title']
        .replace(', v1', ', v16-10')
        .replace(', 15min,', ', 10min,'),
    'summary': datasets['ubcSSnVictoriaSSH15mV1']['summary']
        .replace('ubcSSnBathymetry2V1', 'ubcSSnBathymetry2V1-07')
        .replace(' 15 minute ', ' 10 minute ') + '''
v16-10: NEMO-3.6; ubcSSnBathymetry2V16-07 bathymetry; see infoUrl link for full details.'''
})

datasets['ubcSSnSandHeadsSSH10mV16-10'] = datasets['ubcSSnSandHeadsSSH15mV1']
datasets['ubcSSnSandHeadsSSH10mV16-10'].update({
    'title': datasets['ubcSSnSandHeadsSSH15mV1']['title']
        .replace(', v1', ', v16-10')
        .replace(', 15min,', ', 10min,'),
    'summary': datasets['ubcSSnSandHeadsSSH15mV1']['summary']
        .replace('ubcSSnBathymetry2V1', 'ubcSSnBathymetry2V1-07')
        .replace(' 15 minute ', ' 10 minute ') + '''
v16-10: NEMO-3.6; ubcSSnBathymetry2V16-07 bathymetry; see infoUrl link for full details.'''
})

## Forecast
### Velocity Fields

In [10]:
datasets.update({
    'ubcSSf3DuVelocity1hV16-10': {
        'type': '3d fields',
        'title': 'Forecast, Salish Sea, 3d u Velocity Field, Hourly, v1',
        'summary': '''3d zonal (u) component velocity field values averaged over 1 hour intervals
from the most recent Salish Sea NEMO model forecast run. The values are calculated for the entire model grid
that includes the Juan de Fuca Strait, the Strait of Georgia, Puget Sound,
and Johnstone Strait on the coasts of Washington State and British Columbia.
The time values are the centre of the intervals over which the calculated model results are averaged.
Geo-location and depth data for the Salish Sea NEMO model grid are available in the ubcSSnBathymetry2V16-07 dataset.

v16-10: NEMO-3.6; ubcSSnBathymetry2V16-07 bathymetry
        uVelocity variable
''',
        'fileNameRegex': '.*SalishSea_1h_\d{8}_\d{8}_grid_U\.nc$',
    },
    
    'ubcSSf3DvVelocity1hV16-10': {
        'type': '3d fields',
        'title': 'Forecast, Salish Sea, 3d v Velocity Field, Hourly, v1',
        'summary': '''3d meridional (v) component velocity field values averaged over 1 hour intervals
from the most recent Salish Sea NEMO model forecast run. The values are calculated for the entire model grid
that includes the Juan de Fuca Strait, the Strait of Georgia, Puget Sound,
and Johnstone Strait on the coasts of Washington State and British Columbia.
The time values are the centre of the intervals over which the calculated model results are averaged.
Geo-location and depth data for the Salish Sea NEMO model grid are available in the ubcSSnBathymetry2V16-07 dataset.

v16-10: NEMO-3.6; ubcSSnBathymetry2V16-07 bathymetry
        vVelocity variable
''',
        'fileNameRegex': '.*SalishSea_1h_\d{8}_\d{8}_grid_V\.nc$',
    },
    
    'ubcSSf3DwVelocity1hV16-10': {
        'type': '3d fields',
        'title': 'Forecast, Salish Sea, 3d w Velocity Field, Hourly, v1',
        'summary': '''3d vertical (w) component velocity field values averaged over 1 hour intervals
from the most recent Salish Sea NEMO model forecast run. The values are calculated for the entire model grid
that includes the Juan de Fuca Strait, the Strait of Georgia, Puget Sound,
and Johnstone Strait on the coasts of Washington State and British Columbia.
The time values are the centre of the intervals over which the calculated model results are averaged.
Geo-location and depth data for the Salish Sea NEMO model grid are available in the ubcSSnBathymetry2V16-07 dataset.

v16-10: NEMO-3.6; ubcSSnBathymetry2V16-07 bathymetry.
        wVelocity, ve_eddy_visc (eddy viscosity), ve_eddy_diff (eddy diffusivity)
        and turbulent kinetic energy dissipation rate variables.
''',
        'fileNameRegex': '.*SalishSea_1h_\d{8}_\d{8}_grid_W\.nc$',
    },
})

## Dataset Variables Renaming

The `dataset_vars` dictionary below is used to rename
variables from the often cryptic NEMO names to the names
that appear in the ERDDAP generated files and web content.

The keys are the NEMO variable names to replace.

The values are dicts that map the variable names to use in ERDDAP
to the `destinationName` attribute name.

In [43]:
dataset_vars = {
    'Bathymetry': {'destinationName': 'bathymetry'},
    'nav_lat': {'destinationName': 'latitude'},
    'nav_lon': {'destinationName': 'longitude'},
    'sossheig': {'destinationName': 'ssh'},
    'vosaline': {'destinationName': 'salinity'},
    'votemper': {'destinationName': 'temperature'},
    'vozocrtx': {'destinationName': 'uVelocity'},
    'vomecrty': {'destinationName': 'vVelocity'},
    'vovecrtz': {'destinationName': 'wVelocity'},
    't': {'destinationName': 'time'},
}

## Colour Bar Ranges for Variables

The `var_colour_ranges` dictionary below is used to set the `colorBarMinimum` and `colorBarMaximum`
attribute values for dataset variables.

The keys are the dataset variable `destinationName` values.

In [6]:
var_colour_ranges = {
    'salinity': {'colorBarMinimum': '0.0', 'colorBarMaximum': '34.0'},
    'temperature': {'colorBarMinimum': '4.0', 'colorBarMaximum': '20.0'},
    'uVelocity': {'colorBarMinimum': '-8.0', 'colorBarMaximum': '8.0'},
    'vVelocity': {'colorBarMinimum': '-8.0', 'colorBarMaximum': '8.0'},
    'ssh': {'colorBarMinimum': '-4.0', 'colorBarMaximum': '4.0'},
    'mixed_layer_depth': {'colorBarMinimum': '0', 'colorBarMaximum': '30.0'},
    'biogenic_silicon': {'colorBarMinimum': '0', 'colorBarMaximum': '70.0'},
    'nitrate': {'colorBarMinimum': '0', 'colorBarMaximum': '40.0'},
    'silicon': {'colorBarMinimum': '0', 'colorBarMaximum': '70.0'},
    'diatoms': {'colorBarMinimum': '0', 'colorBarMaximum': '20.0'},
    'longitude': {'colorBarMinimum': '-127.0', 'colorBarMaximum': '-121.0'},
    'latitude': {'colorBarMinimum': '46.0', 'colorBarMaximum': '52.0'},
    'bathymetry': {'colorBarMinimum': '0.0', 'colorBarMaximum': '450.0'},
}

## IOOS Categories for Variables

The `ioos_categories` dictionary below is used to set the `ioos_category`
attribute values for dataset variables
(see [ERDDAP docs IOOS category section](https://coastwatch.pfeg.noaa.gov/erddap/download/setupDatasetsXml.html#ioos_category)).

The keys are the dataset variable `destinationName` values.

In [7]:
ioos_categories = {
    'time': 'time',
    'gridX': 'location',
    'gridY': 'location',
    'gridZ': 'location',
    'latitude': 'location',
    'longitude': 'location',
    'bathymetry': 'bathymetry',
    'salinity': 'salinity',
    'temperature': 'temperature',
    'uVelocity': 'currents',
    'vVelocity': 'currents',
    'wVelocity': 'currents',
    'ssh': 'sea_level',
    'mixed_layer_depth': 'physical_oceanography',
    'dissipation': 'physical_oceanography',
    'vert_eddy_diff': 'physical_oceanography',
    'vert_eddy_visc': 'physical_oceanography',
    'fraser_river_tracer': 'physical_oceanography',
    'ammonium': 'dissolved_nutrients',
    'biogenic_silicon': 'dissolved_nutrients',
    'dissolved_organic_nitrogen': 'dissolved_nutrients',
    'nitrate': 'dissolved_nutrients',
    'particulate_organic_nitrogen': 'dissolved_nutrients',
    'silicon': 'dissolved_nutrients',
    'ciliates': 'biology',
    'diatoms': 'biology',
    'flagellates': 'biology',
    'mesozooplankton': 'biology',
    'microzooplankton': 'biology',
}

# Convenience Functions

A few convenient functions to reduce code repetition:

In [8]:
def print_tree(root):
    """Display an XML tree fragment with indentation.
    """
    print(etree.tostring(root, pretty_print=True).decode('ascii'))

In [9]:
def find_att(root, att):
    """Return the dataset attribute element named att
    or raise a ValueError exception if it cannot be found.
    """
    e = root.find('.//att[@name="{}"]'.format(att))
    if e is None:
        raise ValueError('{} attribute element not found'.format(att))
    return e

In [10]:
def replace_yx_with_lonlat(root):
    new_axes = {
        'y': {'sourceName': 'nav_lon', 'destinationName': 'longitude'},
        'x': {'sourceName': 'nav_lat', 'destinationName': 'latitude'},
    }
    for axis in root.findall('.//axisVariable'):
        if axis.find('.//sourceName').text in new_axes:
            key = axis.find('.//sourceName').text
            new_axis = etree.Element('axisVariable')
            etree.SubElement(new_axis, 'sourceName').text = new_axes[key]['sourceName']
            etree.SubElement(new_axis, 'destinationName').text = new_axes[key]['destinationName']
            axis.getparent().replace(axis, new_axis)

In [11]:
def update_xml(root, datasetID, metadata, datasets, dataset_vars):
    root.attrib['datasetID'] = datasetID
    root.find('.//fileNameRegex').text = datasets[datasetID]['fileNameRegex']
        
    title = datasets[datasetID]['title']
    if 'keywords' in datasets[datasetID]:
        keywords = find_att(root, 'keywords')
        keywords.text = datasets[datasetID]['keywords']
    summary = find_att(root, 'summary')
    summary.text = f'{title}\n\n{datasets[datasetID]["summary"]}'
    e = etree.Element('att', name='title')
    e.text = title
    summary.addnext(e)

    for att, info in metadata.items():
        e = etree.Element('att', name=att)
        e.text = info['text']
        try:
            root.find(f'''.//att[@name="{info['after']}"]'''.format()).addnext(e)
        except KeyError:
            find_att(root, att).text = info['text']
            
    attrs = root.find('addAttributes')
    etree.SubElement(attrs, 'att', name='NCO').text = 'null'
    if not 'Bathymetry' in datasetID:
        etree.SubElement(attrs, 'att', name='history').text = 'null'
        etree.SubElement(attrs, 'att', name='name').text = 'null'

    for axis_name in root.findall('.//axisVariable/destinationName'):
        attrs = axis_name.getparent().find('addAttributes')
        etree.SubElement(attrs, 'att', name='coverage_content_type').text = 'modelResult'
        
        if axis_name.text == 'time':
            etree.SubElement(attrs, 'att', name='comment').text = (
                'time values are the centre of the intervals over which the calculated model results are averaged')
        
        if axis_name.text in ('x', 'y', 'z'):
            axis_name.text = f'grid{axis_name.text.upper()}'
        
        if axis_name.text in ioos_categories:
            etree.SubElement(attrs, 'att', name='ioos_category').text = ioos_categories[axis_name.text]
            
    if datasets[datasetID]['type'] == 'tide gauge':
        replace_yx_with_lonlat(root)
        
    for var_name in root.findall('.//dataVariable/destinationName'):
        if var_name.text in dataset_vars:
            var_name.text = dataset_vars[var_name.text]['destinationName']

        if var_name.text in var_colour_ranges:
            for att_name in ('colorBarMinimum', 'colorBarMaximum'):
                cb_att = var_name.getparent().find(f'addAttributes/att[@name="{att_name}"]')
                if cb_att is not None:
                    cb_att.text = var_colour_ranges[var_name.text][att_name]
                else:
                    attrs = var_name.getparent().find('addAttributes')
                    etree.SubElement(attrs, 'att', name=att_name, type='double').text = (
                        var_colour_ranges[var_name.text][att_name])

        attrs = var_name.getparent().find('addAttributes')
        etree.SubElement(attrs, 'att', name='coverage_content_type').text = 'modelResult'
        etree.SubElement(attrs, 'att', name='cell_measures').text = 'null'
        etree.SubElement(attrs, 'att', name='cell_methods').text = 'null'
        etree.SubElement(attrs, 'att', name='interval_operation').text = 'null'
        etree.SubElement(attrs, 'att', name='interval_write').text = 'null'
        etree.SubElement(attrs, 'att', name='online_operation').text = 'null'
        
        if var_name.text in ioos_categories:
            etree.SubElement(attrs, 'att', name='ioos_category').text = ioos_categories[var_name.text]

# Generate Initial Dataset XML Fragment

Now we're ready to produce a dataset!!!

Use the `/opt/tomcat/webapps/erddap/WEB-INF/GenerateDatasetsXml.sh` script
generate the initial version of an XML fragment for a dataset:
```
$ cd /opt/tomcat/webapps/erddap/WEB-INF/
$ bash GenerateDatasetsXml.sh EDDGridFromNcFiles /results/SalishSea/nowcast/
```
The `EDDGridFromNcFiles` and `/results/SalishSea/nowcast/` arguments
tell the script which `EDDType` and what parent directory to use,
avoiding having to type those in answer to prompts.
Answer the remaining prompts,
for example:
```
File name regex (e.g., ".*\.nc") (default="")
? .*SalishSea_1h_\d{8}_\d{8}_grid_W\.nc$

Full file name of one file (default="")
? /results/SalishSea/nowcast/28jan16/SalishSea_1h_20160128_20160128_grid_W.nc

ReloadEveryNMinutes (e.g., 10080) (default="")
? 10080
```
Other examples of file name regex are:

* `.*PointAtkinson.nc$`
* `.*SalishSea_1d_\d{8}_\d{8}_grid_W\.nc$`

The output is written to `/results/erddap/logs/GenerateDatasetsXml.out`

Dataset ids and file name regexs from datasets dict:

In [12]:
for dataset in sorted(datasets):
    print(dataset, datasets[dataset]['fileNameRegex'])

ubcSSn2DMeshMask2V1 .*mesh_mask_SalishSea2\.nc$
ubcSSn2DMeshMask2V16-07 .*mesh_mask_201702\.nc$
ubcSSn2DMeshMaskV17-02 .*mesh_mask_201702\.nc$
ubcSSn3DMeshMask2V1 .*mesh_mask_SalishSea2\.nc$
ubcSSn3DMeshMask2V16-07 .*mesh_mask_201702\.nc$
ubcSSn3DMeshMaskV17-02 .*mesh_mask_201702\.nc$
ubcSSnBathymetry2V1 .*SalishSea2_NEMO_bathy\.nc$
ubcSSnBathymetry2V16-07 .*downbyone2_NEMO_bathy\.nc$
ubcSSnBathymetryV17-02 .*bathymetry_201702\.nc$


# Finalize Dataset XML Fragment

Now, we:

* set the `datasetID` we want to use
* parse the output of `GenerateDatasetsXml.sh` into an XML tree data structure
* set the `datasetID` dataset attribute value
* re-set the `fileNameRegex` dataset attribute value because it looses its `\` characters during parsing(?)
* edit and add dataset attributes from the `metadata` dict
* set the `title` and `summary` dataset attributes from the `datasets` dict
* set the names of the grid `x` and `y` axis variables
* rename data variables as specified in the `dataset_vars` dict

In [64]:
parser = etree.XMLParser(remove_blank_text=True)
tree = etree.parse('/results/erddap/logs/GenerateDatasetsXml.out', parser)
root = tree.getroot()

datasetID = 'ubcSSn3DMeshMaskV17-02'

update_xml(root, datasetID, metadata, datasets, dataset_vars)

Inspect the resulting dataset XML fragment below and edit the dicts and
code cell above until it is what is required for the dataset:

In [14]:
print_tree(root)

<dataset type="EDDGridFromNcFiles" datasetID="ubcSSn3DMeshMaskV17-02" active="true">
  <reloadEveryNMinutes>10080</reloadEveryNMinutes>
  <updateEveryNMillis>10000</updateEveryNMillis>
  <fileDir>/home/dlatorne/</fileDir>
  <recursive>true</recursive>
  <fileNameRegex>.*mesh_mask_201702\.nc$</fileNameRegex>
  <metadataFrom>last</metadataFrom>
  <matchAxisNDigits>20</matchAxisNDigits>
  <fileTableInMemory>false</fileTableInMemory>
  <accessibleViaFiles>false</accessibleViaFiles>
  <!-- sourceAttributes>
        <att name="Conventions">CF-1.6</att>
        <att name="file_name">NEMO-forcing/grid/mesh_mask201702.nc</att>
        <att name="history">[2017-04-13 22:37] ncks -4 -L4 -O mesh_mask.nc mesh_mask201702.nc
[2017-05-14 11:59] Added metadata to variable in preparation for creation of ERDDAP datasets.</att>
        <att name="institution">Dept of Earth, Ocean &amp; Atmospheric Sciences, University of British Columbia</att>
        <att name="references">https://salishsea.eos.ubc.ca/er

Extra processing step are required for some types of datasets.
See:

* [Surface Field Datasets](#Surface-Field-Datasets)
* [Single Point Sea Surface Height Datasets](#Single-Point-Sea-Surface-Height-Datasets)
* [Model Grid Geo-location and Bathymetry Datasets](#Model-Grid-Geo-location-and-Bathymetry-Datasets)
* [EC HDRPS Atmospheric Forcing Datasets](#EC-HDRPS-Atmospheric-Forcing-Datasets)

Store the XML fragment for the dataset:

In [26]:
with open('/results/erddap-datasets/fragments/{}.xml'.format(datasetID), 'wb') as f:
    f.write(etree.tostring(root, pretty_print=True))

Edit `/results/erddap-datasets/datasets.xml` to include the
XML fragment for the dataset that was stored by the above cell.

That file is symlinked to `/opt/tomcat/content/erddap/datasets.xml`.

Create a flag file to signal the ERDDAP server process to load the dataset:
```
$ cd /results/erddap/flag/
$ touch <datasetID>
```

If the dataset does not appear on https://salishsea.eos.ubc.ca/erddap/info/,
check `/results/erddap/logs/log.txt` for error messages from the dataset load process
(they may not be at the end of the file because ERDDAP is pretty chatty).

Once the dataset has been successfully loaded and you are happy with the metadata
that ERDDAP is providing for it,
commit the changes in `/results/erddap-datasets/` and push them to Bitbucket.

## Surface Field Datasets

The `/opt/tomcat/webapps/erddap/WEB-INF/GenerateDatasetsXml.sh` script produces an XML
fragment that uses all of the dimensions that it finds in the sample file it parses,
and includes only the variables that have all of those dimensions.
To produce an XML fragment for surface fields we need to do some additional work:

* Delete the depth axis
* Delete all of the `dataVariable` elements
* Add `dataVariable` elements for the surface variables

In [25]:
attrs = root.find('addAttributes')
etree.SubElement(attrs, 'att', name='description').text = 'ocean surface T grid variables'
    
for axis in root.findall('.//axisVariable'):
    if axis.find('.//destinationName').text == 'depth':
        axis.getparent().remove(axis)
        break

for var in root.findall('.//dataVariable'):
    var.getparent().remove(var)
    
var = etree.SubElement(root, 'dataVariable')
etree.SubElement(var, 'sourceName').text = 'sossheig'
etree.SubElement(var, 'destinationName').text = 'ssh'
etree.SubElement(var, 'dataType').text = 'float'
attrs = etree.SubElement(var, 'addAttributes')
etree.SubElement(attrs, 'att', name='_ChunkSize').text = 'null'
etree.SubElement(attrs, 'att', name='coordinates').text = 'null'
etree.SubElement(attrs, 'att', name='ioos_category').text = ioos_categories['ssh']
    
var = etree.SubElement(root, 'dataVariable')
etree.SubElement(var, 'sourceName').text = 'mixed_depth'
etree.SubElement(var, 'destinationName').text = 'mixed_layer_depth'
etree.SubElement(var, 'dataType').text = 'float'
attrs = etree.SubElement(var, 'addAttributes')
etree.SubElement(attrs, 'att', name='_ChunkSize').text = 'null'
etree.SubElement(attrs, 'att', name='coordinates').text = 'null'
etree.SubElement(attrs, 'att', name='ioos_category').text = ioos_categories['mixed_layer_depth']
        
for var_name in root.findall('.//dataVariable/destinationName'):
    if var_name.text in dataset_vars:
        var_name.text = dataset_vars[var_name.text]['destinationName']

    if var_name.text in var_colour_ranges:
        for att_name in ('colorBarMinimum', 'colorBarMaximum'):
            cb_att = var_name.getparent().find(f'addAttributes/att[@name="{att_name}"]')
            if cb_att is not None:
                cb_att.text = var_colour_ranges[var_name.text][att_name]
            else:
                attrs = var_name.getparent().find('addAttributes')
                etree.SubElement(attrs, 'att', name=att_name, type='double').text = (
                    var_colour_ranges[var_name.text][att_name])

        attrs = var_name.getparent().find('addAttributes')
        etree.SubElement(attrs, 'att', name='coverage_content_type').text = 'modelResult'
        etree.SubElement(attrs, 'att', name='cell_measures').text = 'null'
        etree.SubElement(attrs, 'att', name='cell_methods').text = 'null'
        etree.SubElement(attrs, 'att', name='interval_operation').text = 'null'
        etree.SubElement(attrs, 'att', name='interval_write').text = 'null'
        etree.SubElement(attrs, 'att', name='online_operation').text = 'null'
        
        if var_name.text in ioos_categories:
            etree.SubElement(attrs, 'att', name='ioos_category').text = ioos_categories[var_name.text]

In [26]:
print_tree(root)

<dataset type="EDDGridFromNcFiles" datasetID="ubcSSgSurfaceTracerFields1hV17-02" active="true">
  <reloadEveryNMinutes>10080</reloadEveryNMinutes>
  <updateEveryNMillis>10000</updateEveryNMillis>
  <fileDir>/results/SalishSea/hindcast/</fileDir>
  <recursive>true</recursive>
  <fileNameRegex>.*SalishSea_1h_\d{8}_\d{8}_grid_T\.nc$</fileNameRegex>
  <metadataFrom>last</metadataFrom>
  <matchAxisNDigits>20</matchAxisNDigits>
  <fileTableInMemory>false</fileTableInMemory>
  <accessibleViaFiles>false</accessibleViaFiles>
  <!-- sourceAttributes>
        <att name="Conventions">CF-1.6</att>
        <att name="description">ocean T grid variables</att>
        <att name="history">Wed Apr 26 21:34:53 2017: ncks -4 -L4 -O SalishSea_1h_20140912_20140912_grid_T.nc SalishSea_1h_20140912_20140912_grid_T.nc</att>
        <att name="name">SalishSea_1h_20140912_20140912</att>
        <att name="NCO">&quot;4.5.2&quot;</att>
        <att name="timeStamp">2017-Apr-27 03:53:38 GMT</att>
        <att name="

In [27]:
with open('/results/erddap-datasets/fragments/{}.xml'.format(datasetID), 'wb') as f:
    f.write(etree.tostring(root, pretty_print=True))

## Single Point Sea Surface Height Datasets

The `/opt/tomcat/webapps/erddap/WEB-INF/GenerateDatasetsXml.sh` script produces an XML
fragment that includes:

* a `nvertex` axis variable
* `bound_lon` and `bound_lat` data varibles

that are or not value in the ERDDAP dataset.

To produce a final XML fragment we need to delete those variables.

In [229]:
for axis in root.findall('.//axisVariable'):
    if axis.find('.//destinationName').text == 'nvertex':
        axis.getparent().remove(axis)
        break

var = etree.SubElement(root, 'axisVariable')
etree.SubElement(var, 'sourceName').text = 'time_counter'
etree.SubElement(var, 'destinationName').text = 'time'
etree.SubElement(var, 'dataType').text = 'float'
attrs = etree.SubElement(var, 'addAttributes')
etree.SubElement(attrs, 'att', name='axis').text = 'T'
etree.SubElement(attrs, 'att', name='standard_name').text = 'time'
etree.SubElement(attrs, 'att', name='long_name').text = 'Time axis'
etree.SubElement(attrs, 'att', name='calendar').text = 'gregorian'
etree.SubElement(attrs, 'att', name='units').text = 'seconds since 1900-01-01 00:00:00'
etree.SubElement(attrs, 'att', name='time_origin').text = '1900-01-01 00:00:00'
etree.SubElement(attrs, 'att', name='ioos_category').text = ioos_categories['time']

for var in root.findall('.//dataVariable'):
    if var.find('.//destinationName').text in ('bounds_lon', 'bounds_lat'):
        var.getparent().remove(var)

In [230]:
print_tree(root)

<dataset type="EDDGridFromNcFiles" datasetID="ubcSSgPointAtkinsonSSH10mV17-02" active="true">
  <reloadEveryNMinutes>10080</reloadEveryNMinutes>
  <updateEveryNMillis>10000</updateEveryNMillis>
  <fileDir>/results/SalishSea/hindcast/</fileDir>
  <recursive>true</recursive>
  <fileNameRegex>.*PointAtkinson\.nc$</fileNameRegex>
  <metadataFrom>last</metadataFrom>
  <matchAxisNDigits>20</matchAxisNDigits>
  <fileTableInMemory>false</fileTableInMemory>
  <accessibleViaFiles>false</accessibleViaFiles>
  <!-- sourceAttributes>
        <att name="Conventions">CF-1.5</att>
        <att name="description">10min interval SSH for tides</att>
        <att name="name">PointAtkinson</att>
        <att name="production">An IPSL model</att>
        <att name="timeStamp">2017-Mar-02 18:28:41 MST</att>
        <att name="title">10min interval SSH for tides</att>
    </sourceAttributes -->
  <addAttributes>
    <att name="cdm_data_type">Grid</att>
    <att name="coverage_content_type">modelResult</att>
 

In [231]:
with open('/results/erddap-datasets/fragments/{}.xml'.format(datasetID), 'wb') as f:
    f.write(etree.tostring(root, pretty_print=True))

## Model Grid Geo-location and Bathymetry Datasets

### Bathymetry

Special processing for bathymetry datasets:

In [133]:
root.find('.//recursive').text = 'false'
root.find(f'addAttributes/att[@name="acknowledgement"]').text = '''MEOPAR, ONC, Compute Canada, CHS, NOAA, USGS

This product has been produced by the University of British Columbia based on Canadian Hydrographic Service charts
and/or data, pursuant to CHS Direct User Licence No. 2016-0504-1260-U

The incorporation of data sourced from CHS in this product shall not be construed as constituting an endorsement of
CHS of this product.

This product does not meet the requirements of Charts and Nautical Publications Regulations,
1995 under the Canadian Shipping Act, 2001. Official charts and publications, corrected and up-to-data,
must be used to meet the requirements of those regulations.'''
attrs = root.find('.//addAttributes')
etree.SubElement(attrs, 'att', name='references').text = 'https://bitbucket.org/salishsea/nemo-forcing/raw/tip/grid/bathymetry_201702.nc'
etree.SubElement(attrs, 'att', name='comment').text = 'null'

for var in root.findall('.//dataVariable'):
    for var_name in ('latitude', 'longitude'):
        if var.find('.//destinationName').text == var_name:
            attrs = etree.SubElement(var, 'addAttributes')
            etree.SubElement(attrs, 'att', name='standard_name').text = var_name
            etree.SubElement(attrs, 'att', name='long_name').text = var_name.title()

In [134]:
print_tree(root)

<dataset type="EDDGridFromNcFiles" datasetID="ubcSSnBathymetryV17-02" active="true">
  <reloadEveryNMinutes>10080</reloadEveryNMinutes>
  <updateEveryNMillis>10000</updateEveryNMillis>
  <fileDir>/results/nowcast-sys/NEMO-forcing/grid/</fileDir>
  <recursive>false</recursive>
  <fileNameRegex>.*bathymetry_201702\.nc$</fileNameRegex>
  <metadataFrom>last</metadataFrom>
  <matchAxisNDigits>20</matchAxisNDigits>
  <fileTableInMemory>false</fileTableInMemory>
  <accessibleViaFiles>false</accessibleViaFiles>
  <!-- sourceAttributes>
        <att name="_NCProperties">version=1|netcdflibversion=4.4.1|hdf5libversion=1.8.17</att>
        <att name="comment">Bathymetry processed from Michaels New Full River Bathymetry</att>
        <att name="Conventions">CF-1.6</att>
        <att name="history">[2017-02-28 18:55:52] Created netCDF4 zlib=True dataset.</att>
        <att name="institution">Dept of Earth, Ocean &amp; Atmospheric Sciences, University of British Columbia</att>
        <att name="ref

In [135]:
with open('/results/erddap-datasets/fragments/{}.xml'.format(datasetID), 'wb') as f:
    f.write(etree.tostring(root, pretty_print=True))

### Mesh Mask

Special processing for model grid geo-location datasets
(generated from mesh mask files):

In [65]:
root.find('.//recursive').text = 'false'
for attr in root.findall('.//addAttributes/att[@name="institution"]'):
    if attr.text == '???':
        attr.getparent().remove(attr)
for axis in root.findall('.//axisVariable/destinationName'):
    if axis.text == 't':
        axis.text = 'time'
        attrs = axis.getparent().find('addAttributes')
        attrs.find('att[@name="long_name"]').text = 'Time'
        etree.SubElement(attrs, 'att', name='standard_name').text = 'time'
        etree.SubElement(attrs, 'att', name='calendar').text = 'gregorian'
        etree.SubElement(attrs, 'att', name='time_origin').text = '2014-09-12 00:30:00'
        etree.SubElement(attrs, 'att', name='units').text = 'seconds since 1900-01-01 00:00:00'
        etree.SubElement(attrs, 'att', name='calendar').text = 'gregorian'

for attrs in root.findall('.//dataVariable/addAttributes'):
    etree.SubElement(attrs, 'att', name='ioos_category').text = 'grid_parameter'
    for attr in attrs.findall('att[@name="colorBarMaximum"]'):
        attr.getparent().remove(attr)
    for attr in attrs.findall('att[@name="colorBarMinimum"]'):
        attr.getparent().remove(attr)

In [66]:
print_tree(root)

<dataset type="EDDGridFromNcFiles" datasetID="ubcSSn3DMeshMaskV17-02" active="true">
  <reloadEveryNMinutes>10080</reloadEveryNMinutes>
  <updateEveryNMillis>10000</updateEveryNMillis>
  <fileDir>/home/dlatorne/</fileDir>
  <recursive>false</recursive>
  <fileNameRegex>.*mesh_mask_201702\.nc$</fileNameRegex>
  <metadataFrom>last</metadataFrom>
  <matchAxisNDigits>20</matchAxisNDigits>
  <fileTableInMemory>false</fileTableInMemory>
  <accessibleViaFiles>false</accessibleViaFiles>
  <!-- sourceAttributes>
        <att name="Conventions">CF-1.6</att>
        <att name="file_name">NEMO-forcing/grid/mesh_mask201702.nc</att>
        <att name="history">[2017-04-13 22:37] ncks -4 -L4 -O mesh_mask.nc mesh_mask201702.nc
[2017-05-14 11:59] Added metadata to variable in preparation for creation of ERDDAP datasets.</att>
        <att name="institution">Dept of Earth, Ocean &amp; Atmospheric Sciences, University of British Columbia</att>
        <att name="references">https://salishsea.eos.ubc.ca/e

In [67]:
with open('/results/erddap-datasets/fragments/{}.xml'.format(datasetID), 'wb') as f:
    f.write(etree.tostring(root, pretty_print=True))

## EC HDRPS Atmospheric Forcing Datasets

### Atmospheric Forcing Grid Geo-location Dataset

Use the `/opt/tomcat/webapps/erddap/WEB-INF/GenerateDatasetsXml.sh` script
generate the initial version of an XML fragment for the dataset:
```
$ cd /opt/tomcat/webapps/erddap/WEB-INF/
$ bash GenerateDatasetsXml.sh EDDGridFromNcFiles /results/forcing/atmospheric/GEM2.5/operational/ ops_y\d{4}m\d{2}d\d{2}.nc$ /results/forcing/atmospheric/GEM2.5/operational/ops_y2016m03d07.nc 10080
```

Like the model grid geo-location and bathymetry dataset,
the atmospheric forcing grid dataset requires a lot of hand editing.
Here is the finished dataset:

In [12]:
parser = etree.XMLParser(remove_blank_text=True)
tree = etree.parse('/results/erddap-datasets/fragments/ubcSSaAtmosphereGridV1.xml', parser)
root = tree.getroot()

print_tree(root)

<dataset type="EDDGridFromNcFiles" datasetID="ubcSSaAtmosphereGridV1" active="true">
  <reloadEveryNMinutes>10080</reloadEveryNMinutes>
  <updateEveryNMillis>10000</updateEveryNMillis>
  <fileDir>/results/forcing/atmospheric/GEM2.5/operational/</fileDir>
  <recursive>false</recursive>
  <fileNameRegex>ops_y2016m03d07.nc$</fileNameRegex>
  <metadataFrom>last</metadataFrom>
  <matchAxisNDigits>20</matchAxisNDigits>
  <fileTableInMemory>false</fileTableInMemory>
  <accessibleViaFiles>false</accessibleViaFiles>
  <!-- sourceAttributes>
      <att name="Conventions">CF-1.0</att>
      <att name="GRIB2_grid_template" type="int">20</att>
      <att name="History">Mon Mar  7 10:07:34 2016: ncks -4 -L4 -O /results/forcing/atmospheric/GEM2.5/operational/ops_y2016m03d07.nc /results/forcing/atmospheric/GEM2.5/operational/ops_y2016m03d07.nc
  created by wgrib2</att>
      <att name="NCO">4.4.2</att>
  </sourceAttributes -->
  <addAttributes>
    <att name="cdm_data_type">Grid</att>
    <att name="c

### Atmospheric Forcing Model Fields

* Change the value of the `recursive` element to `false` so that the `/results/forcing/atmospheric/GEM2.5/operational/fcst/` directory is excluded
* Add Environment Canada acknowledgement and terms & conditions of use to `license` element
* Add Environment Canada to `acknowledgement` element

In [45]:
root.find('.//recursive').text = 'false'
find_att(root, 'license').text += '''

This dataset is derived from a product of the Environment Canada HRDPS (High Resolution Deterministic Prediction System)
model. The Terms and conditions of use of Meteorological Data from Environment Canada are available at
http://dd.weather.gc.ca/doc/LICENCE_GENERAL.txt.</att>'''
find_att(root, 'acknowledgement').text += ', Environment Canada'

In [47]:
for axis in root.findall('.//axisVariable'):
    axis_name = axis.find('.//sourceName').text
    if 'time' not in axis_name:
        attrs = axis.find('.//addAttributes')
        etree.SubElement(attrs, 'att', name='grid_spacing').text = 'null'
        etree.SubElement(attrs, 'att', name='units').text = 'null'
        etree.SubElement(attrs, 'att', name='long_name').text = axis_name.upper()
        etree.SubElement(attrs, 'att', name='standard_name').text = axis_name

In [48]:
print_tree(root)

<dataset type="EDDGridFromNcFiles" datasetID="ubcSSaSurfaceAtmosphereFieldsV1" active="true">
  <reloadEveryNMinutes>10080</reloadEveryNMinutes>
  <updateEveryNMillis>10000</updateEveryNMillis>
  <fileDir>/results/forcing/atmospheric/GEM2.5/operational/</fileDir>
  <recursive>false</recursive>
  <fileNameRegex>.*ops_y\d{4}m\d{2}d\d{2}\.nc$</fileNameRegex>
  <metadataFrom>last</metadataFrom>
  <matchAxisNDigits>20</matchAxisNDigits>
  <fileTableInMemory>false</fileTableInMemory>
  <accessibleViaFiles>false</accessibleViaFiles>
  <!-- sourceAttributes>
        <att name="Conventions">CF-1.0</att>
        <att name="GRIB2_grid_template" type="int">20</att>
        <att name="History">Thu Mar 10 10:11:37 2016: ncks -4 -L4 -O /results/forcing/atmospheric/GEM2.5/operational/ops_y2016m03d10.nc /results/forcing/atmospheric/GEM2.5/operational/ops_y2016m03d10.nc
created by wgrib2</att>
        <att name="NCO">4.4.2</att>
    </sourceAttributes -->
  <addAttributes>
    <att name="cdm_data_type">

In [49]:
with open('/results/erddap-datasets/fragments/{}.xml'.format(datasetID), 'wb') as f:
    f.write(etree.tostring(root, pretty_print=True))