# SSC Diatoms Extraction for Nudging Atlantis

This notebook explores using `dask.distributed` to extract diatom and lon/lat fields
from day-averaged SalishSeaCast hindcast files to create long time series files that
can be processed to generate "nudging" fields for Atlantis.

The packages required by this notebook are defined in the `conda` environment file `analysis-doug/notebooks/dask-expts/environment.yaml`
along with instructions on how to create an isolated environment containing the latest versions of those packages.
The `analysis-doug/notebooks/dask-expts/requirements.txt` contains a complete list of all the packages
(top level and dependencies) and their versions that were used most recently for work in this notebook.

To run this Jupyter Lab headless on port 8843 on `salish`,
launch `jupyter lab` there with with:

    jupyter lab --no-browser --ip $(hostname -f) --port=8843

and set up an `ssh` tunnel to connect to the kernel on port 8843 on `salish` from my local machine's port 4343 with:

    ssh -N -L 4343:salish:8843 salish

and finally launch the lab UI in my local browser with:

    http://localhost:4343/

## Setup

Python imports:

In [1]:
from pathlib import Path
import sys

import arrow
import dask.distributed
import netCDF4
import numpy
import xarray

Python and library versions:

In [2]:
print(f"Python {sys.version=}")
print(f"{xarray.__version__=}")
print(f"{dask.distributed.__version__=}")
print(f"{netCDF4.__version__=}")
print(f"{arrow.__version__=}")

Python sys.version='3.10.2 | packaged by conda-forge | (main, Jan 14 2022, 08:02:09) [GCC 9.4.0]'
xarray.__version__='0.20.2'
dask.distributed.__version__='2022.01.0'
netCDF4.__version__='1.5.8'
arrow.__version__='1.2.1'


## Source Datasets

The datasets that we want to extract the diatoms field from are stored in
`/results2/SalishSea/nowcast-green.201905/`.
Each day's run result are stored in a sub-directory with the name pattern `ddmmmyy/`;
e.g. `01jan07/`.
The model output file containing the day-averaged diatoms field has the name pattern
`SalishSea_1d_yyyymmdd_yyyymmdd_ptrc_T.nc`;
e.g. `SalishSea_1d_20070101_20070101_ptrc_T.nc`.

Let's take a look at one of those datasets:

In [None]:
results_archive = Path("/results2/SalishSea/nowcast-green.201905/")
start_date = arrow.get("2007-01-01")
ddmmmyy = start_date.format("DDMMMYY").lower()
yyyymmdd = start_date.format("YYYYMMDD")

ds_path = results_archive/ddmmmyy/f"SalishSea_1d_{yyyymmdd}_{yyyymmdd}_ptrc_T.nc"
ds = xarray.open_dataset(ds_path)

ds

Drop the variables that we don't need:


In [6]:
keep_vars = {"diatoms"}
drop_vars = {v for v in ds.data_vars} - keep_vars
drop_vars.update({"time_centered", "nav_lon", "nav_lat"})
ds = xarray.open_dataset(ds_path, drop_variables=drop_vars)

ds

## Build Extraction Dataset

Build a new dataset with the variables we want from the source dataset(s)
using the coordinates, variables, and dataset attributes that we want.

### Coordinates

In [7]:
time = xarray.DataArray(
    name="time",
    data=ds.time_counter.data,
    coords={
        "time": ds.time_counter.data,
    },
    attrs={
        "standard_name": "time",
        "long_name": "Time Axis",
        "comment": (
            "time values are UTC at the centre of the intervals over which the calculated model results are averaged;"
            "e.g. the field average values for 01 January 2007 have a time value of 2007-01-01 12:00:00Z"
        ),
        # time_origin and units are provided by encoding when dataset is written to netCDF file
    }
)

time

In [8]:
depths = xarray.DataArray(
    name="depth",
    data=ds.deptht.data,
    coords={
        "depth": ds.deptht.data,
    },
    attrs={
        "standard_name": "sea_floor_depth",
        "long_name": "Sea Floor Depth",
        "units": "metres"
    }
)

depths

In [9]:
grid_y = xarray.DataArray(
    name="gridY",
    data=ds.y.data,
    coords={
        "gridY": ds.y.data,
    },
    attrs={
        "standard_name": "y",
        "long_name": "Grid Y",
        "units": "count",
        "comment": "gridY values are grid indices in the model y-direction",
    }
)

grid_y

In [10]:
grid_x = xarray.DataArray(
    name="gridX",
    data=ds.x.data,
    coords={
        "gridX": ds.x.data,
    },
    attrs={
        "standard_name": "x",
        "long_name": "Grid X",
        "units": "count",
        "comment": "gridX values are grid indices in the model x-direction",
    }
)

grid_x

### Variables

In [11]:
diatoms = xarray.DataArray(
    name="diatoms",
    data=ds.diatoms.data,
    coords={
        "time": time,
        "depth": depths,
        "gridY": grid_y,
        "gridX": grid_x,
    },
    attrs={
        "standard_name": "mole_concentration_of_diatoms_expressed_as_nitrogen_in_sea_water",
        "long_name": "Diatoms Concentration",
        "units": "mmol m-3"
    }
)

diatoms

The longitudes and latitudes fields (`nav_lon` and `nav_lat`) in the SalishSeaCast results datasets
are a bit weird due the land processor elimination optimization that we use in NEMO to avoid
spending computational effort on grid cells that contain only land.
That optimization results in the longitude and latitude values being set to -1 in those cells.
So, instead we'll grab the lon/lat fields from the model grid bathymetry dataset
on ERDDAP [https://salishsea.eos.ubc.ca/erddap/griddap/ubcSSnBathymetryV17-02.html](https://salishsea.eos.ubc.ca/erddap/griddap/ubcSSnBathymetryV17-02.html):

In [12]:
grid_url = "https://salishsea.eos.ubc.ca/erddap/griddap/ubcSSnBathymetryV17-02"
grid_ds = xarray.open_dataset(grid_url)

grid_ds

In [13]:
lons = xarray.DataArray(
    name="longitude",
    data=grid_ds.longitude.data,
    coords={
        "gridY": grid_y,
        "gridX": grid_x,
    },
    attrs={
        "standard_name": "longitude",
        "long_name": "Longitude",
        "units": "degrees_east"
    }
)

lons

In [14]:
lats = xarray.DataArray(
    name="latitude",
    data=grid_ds.latitude.data,
    coords={
        "gridY": grid_y,
        "gridX": grid_x,
    },
    attrs={
        "standard_name": "latitude",
        "long_name": "Latitude",
        "units": "degrees_north"
    }
)

lats

In [15]:
extracted_ds = xarray.Dataset(
    coords={
        "time": time,
        "depth": depths,
        "gridY": grid_y,
        "gridX": grid_x,
    },
    data_vars={
        "longitude": lons,
        "latitude": lats,
        "diatoms": diatoms,
    },
    attrs={
        "name": "SalishSeaCast_day_avg_diatoms_20070101_20070101",
        "description": "Day-averaged diatoms biomass extracted from SalishSeaCast v201905 hindcast",
        "history": f"{arrow.now('PST').format('YYYY-MM-DD HH:mm')}: Generated by analysis-doug/notebooks/dask-expts/atlantis_nudge_diatoms.ipynb",
    },
)

extracted_ds

## Store the Extracted Dataset as a netCDF File

The main thing here is that we need to setup a nested `dict` of encoding information
to tell `xarray` and the underlying `netCDF4` library some details about how to store
the dataset as a netCDF4 file.

In [16]:
nc_path = Path("/tmp/SalishSeaCast_day_avg_diatoms_20070101_20070101.nc")

encoding = {
    "time": {
        "dtype": numpy.single,
        "units": "days since 2007-01-01 12:00:00",
        "chunksizes": [1],
    },
    "depth": {"dtype": numpy.single, "chunksizes": [depths.size]},
    "gridY": {"dtype": int, "chunksizes": [grid_y.size]},
    "gridX": {"dtype": int, "chunksizes": [grid_x.size]},
    "diatoms": {"dtype": numpy.single, "chunksizes": [1] + [c_array.size for c_name, c_array in diatoms.coords.items() if c_name != "time"]},
    "longitude": {"dtype": numpy.single, "chunksizes": [c_array.size for c_array in lons.coords.values()]},
    "latitude": {"dtype": numpy.single, "chunksizes": [c_array.size for c_array in lats.coords.values()]},
}

In [17]:
extracted_ds.to_netcdf(nc_path, format="NETCDF4_CLASSIC", encoding=encoding, unlimited_dims="time")

In [18]:
ls -lh /tmp/SalishSeaCast_day_avg_diatoms_20070101_20070101.nc

-rw-rw-r-- 1 doug doug 58M Jan 20 12:06 /tmp/SalishSeaCast_day_avg_diatoms_20070101_20070101.nc


In [19]:
!ncdump -cs /tmp/SalishSeaCast_day_avg_diatoms_20070101_20070101.nc

netcdf SalishSeaCast_day_avg_diatoms_20070101_20070101 {
dimensions:
	time = UNLIMITED ; // (1 currently)
	gridY = 898 ;
	gridX = 398 ;
	depth = 40 ;
variables:
	int gridY(gridY) ;
		gridY:standard_name = "y" ;
		gridY:long_name = "Grid Y" ;
		gridY:units = "count" ;
		gridY:comment = "gridY values are grid indices in the model y-direction" ;
		gridY:_Storage = "chunked" ;
		gridY:_ChunkSizes = 898 ;
		gridY:_Endianness = "little" ;
	int gridX(gridX) ;
		gridX:standard_name = "x" ;
		gridX:long_name = "Grid X" ;
		gridX:units = "count" ;
		gridX:comment = "gridX values are grid indices in the model x-direction" ;
		gridX:_Storage = "chunked" ;
		gridX:_ChunkSizes = 398 ;
		gridX:_Endianness = "little" ;
	float longitude(gridY, gridX) ;
		longitude:_FillValue = NaNf ;
		longitude:standard_name = "longitude" ;
		longitude:long_name = "Longitude" ;
		longitude:units = "degrees_east" ;
		longitude:_Storage = "chunked" ;
		longitude:_ChunkSizes = 898, 398 ;
		l

In [20]:
ds.close()
grid_ds.close()

## Scaling and Refactoring

Now we want to scale the processing to pull diatoms fields from an arbitrary range of dates
of SalishSeaCast hindcast files.
We'll use [dask]() and [dask.distributed]() to spread the work of processing many day's dataset
over multiple processors/cores.

We also want to abstract some repetitious code above into functions.

### Source Dataset Paths

In [None]:
results_archive = Path("/results2/SalishSea/nowcast-green.201905/")
start_date = arrow.get("2007-01-01")
end_date = arrow.get("2007-01-31")

In [15]:
results_archive = Path("/results/SalishSea/nowcast-green.201812/")
start_date = arrow.get("2015-01-01")
end_date = arrow.get("2015-01-10")

In [8]:
def ddmmmyy(arrow_date):
    """Return an Arrow date as a string formatted as lower-cased `ddmmmyy`."""
    return arrow_date.format("DDMMMYY").lower()

In [9]:
def yyyymmdd(arrow_date):
    """Return an Arrow date as a string of digits formatted as `yyyymmdd`."""
    return arrow_date.format("YYYYMMDD")

In [None]:
date_range = arrow.Arrow.range("days", start_date, end_date)
ds_paths = [results_archive/ddmmmyy(day)/f"SalishSea_1d_{yyyymmdd(day)}_{yyyymmdd(day)}_ptrc_T.nc" for day in date_range]

ds_paths

In [16]:
date_range = arrow.Arrow.range("days", start_date, end_date)
ds_paths = [results_archive/ddmmmyy(day)/f"SalishSea_1h_{yyyymmdd(day)}_{yyyymmdd(day)}_ptrc_T.nc" for day in date_range]

ds_paths

[PosixPath('/results/SalishSea/nowcast-green.201812/01jan15/SalishSea_1h_20150101_20150101_ptrc_T.nc'),
 PosixPath('/results/SalishSea/nowcast-green.201812/02jan15/SalishSea_1h_20150102_20150102_ptrc_T.nc'),
 PosixPath('/results/SalishSea/nowcast-green.201812/03jan15/SalishSea_1h_20150103_20150103_ptrc_T.nc'),
 PosixPath('/results/SalishSea/nowcast-green.201812/04jan15/SalishSea_1h_20150104_20150104_ptrc_T.nc'),
 PosixPath('/results/SalishSea/nowcast-green.201812/05jan15/SalishSea_1h_20150105_20150105_ptrc_T.nc'),
 PosixPath('/results/SalishSea/nowcast-green.201812/06jan15/SalishSea_1h_20150106_20150106_ptrc_T.nc'),
 PosixPath('/results/SalishSea/nowcast-green.201812/07jan15/SalishSea_1h_20150107_20150107_ptrc_T.nc'),
 PosixPath('/results/SalishSea/nowcast-green.201812/08jan15/SalishSea_1h_20150108_20150108_ptrc_T.nc'),
 PosixPath('/results/SalishSea/nowcast-green.201812/09jan15/SalishSea_1h_20150109_20150109_ptrc_T.nc'),
 PosixPath('/results/SalishSea/nowcast-green.201812/10jan15/Sali

### Start `dask` Cluster

In [11]:
client = dask.distributed.Client(
    n_workers=4, threads_per_worker=10, processes=True)
client

0,1
Connection method: Cluster object,Cluster type: distributed.LocalCluster
Dashboard: http://127.0.0.1:8787/status,

0,1
Dashboard: http://127.0.0.1:8787/status,Workers: 4
Total threads: 40,Total memory: 62.66 GiB
Status: running,Using processes: True

0,1
Comm: tcp://127.0.0.1:40605,Workers: 4
Dashboard: http://127.0.0.1:8787/status,Total threads: 40
Started: Just now,Total memory: 62.66 GiB

0,1
Comm: tcp://127.0.0.1:39447,Total threads: 10
Dashboard: http://127.0.0.1:42129/status,Memory: 15.66 GiB
Nanny: tcp://127.0.0.1:41383,
Local directory: /media/doug/warehouse/MEOPAR/analysis-doug/notebooks/dask-expts/dask-worker-space/worker-gaahdqb_,Local directory: /media/doug/warehouse/MEOPAR/analysis-doug/notebooks/dask-expts/dask-worker-space/worker-gaahdqb_

0,1
Comm: tcp://127.0.0.1:45887,Total threads: 10
Dashboard: http://127.0.0.1:32803/status,Memory: 15.66 GiB
Nanny: tcp://127.0.0.1:42271,
Local directory: /media/doug/warehouse/MEOPAR/analysis-doug/notebooks/dask-expts/dask-worker-space/worker-jpcz2v4x,Local directory: /media/doug/warehouse/MEOPAR/analysis-doug/notebooks/dask-expts/dask-worker-space/worker-jpcz2v4x

0,1
Comm: tcp://127.0.0.1:42213,Total threads: 10
Dashboard: http://127.0.0.1:34103/status,Memory: 15.66 GiB
Nanny: tcp://127.0.0.1:38721,
Local directory: /media/doug/warehouse/MEOPAR/analysis-doug/notebooks/dask-expts/dask-worker-space/worker-fd3__z71,Local directory: /media/doug/warehouse/MEOPAR/analysis-doug/notebooks/dask-expts/dask-worker-space/worker-fd3__z71

0,1
Comm: tcp://127.0.0.1:38609,Total threads: 10
Dashboard: http://127.0.0.1:43877/status,Memory: 15.66 GiB
Nanny: tcp://127.0.0.1:37263,
Local directory: /media/doug/warehouse/MEOPAR/analysis-doug/notebooks/dask-expts/dask-worker-space/worker-n361wo8q,Local directory: /media/doug/warehouse/MEOPAR/analysis-doug/notebooks/dask-expts/dask-worker-space/worker-n361wo8q


### Load Dataset Collection Metadata

In [12]:
chunks = {"time_counter": 1, "deptht": 40, "y": 898, "x": 398}

In [13]:
def open_reduced_dataset(ds_paths, chunks, keep_vars):
    # Use 1st dataset path to calculate the set of variables to drop
    with xarray.open_dataset(ds_paths[0], chunks=chunks) as ds:
        drop_vars = {var for var in ds.data_vars} - keep_vars
        drop_vars.update({"time_centered", "nav_lon", "nav_lat"})
    # Return dataset with only variables of interest for full list of paths
    # This triggers a dask task graph computation
    return xarray.open_mfdataset(ds_paths, chunks=chunks, drop_variables=drop_vars)

In [111]:
ds.close()

chunks = {"time_counter": 3, "deptht": 3*40, "y": 3*898, "x": 3*398}

In [112]:
%%time
ds = open_reduced_dataset(ds_paths, chunks, {"diatoms"})

ds

CPU times: user 180 ms, sys: 1.59 ms, total: 181 ms
Wall time: 317 ms


Unnamed: 0,Array,Chunk
Bytes,12.78 GiB,163.61 MiB
Shape,"(240, 40, 898, 398)","(3, 40, 898, 398)"
Count,170 Tasks,80 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 12.78 GiB 163.61 MiB Shape (240, 40, 898, 398) (3, 40, 898, 398) Count 170 Tasks 80 Chunks Type float32 numpy.ndarray",240  1  398  898  40,

Unnamed: 0,Array,Chunk
Bytes,12.78 GiB,163.61 MiB
Shape,"(240, 40, 898, 398)","(3, 40, 898, 398)"
Count,170 Tasks,80 Chunks
Type,float32,numpy.ndarray


### Coordinates

In [113]:
def calc_coord_array(name, source_data_var, attrs):
    return xarray.DataArray(
        name=name,
        data=source_data_var.data,
        coords={name: source_data_var.data},
        attrs=attrs
    )

In [114]:
time = calc_coord_array(
    "time", ds.time_counter,
    attrs={
        "standard_name": "time",
        "long_name": "Time Axis",
        "comment": (
            "time values are UTC at the centre of the intervals over which the calculated model results are averaged;"
            "e.g. the field average values for 01 January 2007 have a time value of 2007-01-01 12:00:00Z"
        ),
        # time_origin and units are provided by encoding when dataset is written to netCDF file
    }
)

time

In [115]:
depths = calc_coord_array(
    "depth", ds.deptht,
    attrs={
        "standard_name": "sea_floor_depth",
        "long_name": "Sea Floor Depth",
        "units": "metres"
    }
)

depths

In [116]:
grid_y = calc_coord_array(
    "gridY", ds.y,
    attrs={
        "standard_name": "y",
        "long_name": "Grid Y",
        "units": "count",
        "comment": "gridY values are grid indices in the model y-direction",
    }
)

grid_y

In [117]:
grid_x = calc_coord_array(
    "gridX", ds.x,
    attrs={
        "standard_name": "x",
        "long_name": "Grid X",
        "units": "count",
        "comment": "gridX values are grid indices in the model x-direction",
    }
)

grid_x

### Variables

In [118]:
def calc_var_array(name, source_data_var, coords, attrs):
    return xarray.DataArray(
        name=name,
        data=source_data_var.data,
        coords=coords,
        attrs=attrs
    )

In [119]:
diatoms = calc_var_array(
    "diatoms", ds.diatoms,
    coords={
        "time": time,
        "depth": depths,
        "gridY": grid_y,
        "gridX": grid_x,
    },
    attrs={
        "standard_name": "mole_concentration_of_diatoms_expressed_as_nitrogen_in_sea_water",
        "long_name": "Diatoms Concentration",
        "units": "mmol m-3"
    }
)

diatoms

Unnamed: 0,Array,Chunk
Bytes,12.78 GiB,163.61 MiB
Shape,"(240, 40, 898, 398)","(3, 40, 898, 398)"
Count,170 Tasks,80 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 12.78 GiB 163.61 MiB Shape (240, 40, 898, 398) (3, 40, 898, 398) Count 170 Tasks 80 Chunks Type float32 numpy.ndarray",240  1  398  898  40,

Unnamed: 0,Array,Chunk
Bytes,12.78 GiB,163.61 MiB
Shape,"(240, 40, 898, 398)","(3, 40, 898, 398)"
Count,170 Tasks,80 Chunks
Type,float32,numpy.ndarray


In [120]:
grid_url = "https://salishsea.eos.ubc.ca/erddap/griddap/ubcSSnBathymetryV17-02"
grid_ds = xarray.open_dataset(grid_url)

grid_ds

In [121]:
lons = calc_var_array(
    "longitude", grid_ds.longitude,
    coords={
        "gridY": grid_y,
        "gridX": grid_x,
    },
    attrs={
        "standard_name": "longitude",
        "long_name": "Longitude",
        "units": "degrees_east"
    }
)

lons

In [122]:
lats = calc_var_array(
    "latitude", grid_ds.latitude,
    coords={
        "gridY": grid_y,
        "gridX": grid_x,
    },
    attrs={
        "standard_name": "latitude",
        "long_name": "Latitude",
        "units": "degrees_north"
    }
)

lats

### Extraction Dataset

In [123]:
extracted_ds_name = f"SalishSeaCast_day_avg_diatoms_{yyyymmdd(start_date)}_{yyyymmdd(end_date)}"
extracted_ds = xarray.Dataset(
    coords={
        "time": time,
        "depth": depths,
        "gridY": grid_y,
        "gridX": grid_x,
    },
    data_vars={
        "longitude": lons,
        "latitude": lats,
        "diatoms": diatoms,
    },
    attrs={
        "name": extracted_ds_name,
        "description": "Day-averaged diatoms biomass extracted from SalishSeaCast v201905 hindcast",
        "history": f"{arrow.now('PST').format('YYYY-MM-DD HH:mm')}: Generated by analysis-doug/notebooks/dask-expts/atlantis_nudge_diatoms.ipynb",
    },
)

extracted_ds

Unnamed: 0,Array,Chunk
Bytes,12.78 GiB,163.61 MiB
Shape,"(240, 40, 898, 398)","(3, 40, 898, 398)"
Count,170 Tasks,80 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 12.78 GiB 163.61 MiB Shape (240, 40, 898, 398) (3, 40, 898, 398) Count 170 Tasks 80 Chunks Type float32 numpy.ndarray",240  1  398  898  40,

Unnamed: 0,Array,Chunk
Bytes,12.78 GiB,163.61 MiB
Shape,"(240, 40, 898, 398)","(3, 40, 898, 398)"
Count,170 Tasks,80 Chunks
Type,float32,numpy.ndarray


### Encoding for netCDF Storage

In [124]:
def calc_coord_encoding(ds, coord):
    match coord:
        case "time":
            return {
                "dtype": numpy.single,
                "units": "days since 2007-01-01 12:00:00",
                "chunksizes": [1],
            }
        case "depth":
            return {
                "dtype": numpy.single,
                "chunksizes": [ds.coords[coord].size]
            }
        case _:
            return {
                "dtype": int,
                "chunksizes": [ds.coords[coord].size]
            }

In [125]:
def calc_var_encoding(var, coord_names):
    chunksizes = []
    for c_name in coord_names:
        try:
            chunksizes.append(var.coords[c_name].size)
        except KeyError:
            pass
    if "time" in var.coords:
        chunksizes[0] = 1
    return {
        "dtype": numpy.single,
        "chunksizes": chunksizes,
    }

In [126]:
encoding = {}
for coord in extracted_ds.coords:
    encoding[coord] = calc_coord_encoding(extracted_ds, coord)
for v_name, v_array in extracted_ds.data_vars.items():
    encoding[v_name] = calc_var_encoding(v_array, ("time", "depth", "gridY", "gridX"))

encoding

{'gridY': {'dtype': int, 'chunksizes': [898]},
 'gridX': {'dtype': int, 'chunksizes': [398]},
 'time': {'dtype': numpy.float32,
  'units': 'days since 2007-01-01 12:00:00',
  'chunksizes': [1]},
 'depth': {'dtype': numpy.float32, 'chunksizes': [40]},
 'longitude': {'dtype': numpy.float32, 'chunksizes': [898, 398]},
 'latitude': {'dtype': numpy.float32, 'chunksizes': [898, 398]},
 'diatoms': {'dtype': numpy.float32, 'chunksizes': [1, 40, 898, 398]}}

### Store Extraction as netCDF

In [127]:
%%time
nc_path = Path("/tmp/", f"{extracted_ds_name}.nc")
extracted_ds.to_netcdf(nc_path, format="NETCDF4_CLASSIC", encoding=encoding, unlimited_dims="time")

CPU times: user 3.28 s, sys: 1.19 s, total: 4.47 s
Wall time: 26.3 s


In [42]:
ls -lh {nc_path}

-rw-rw-r-- 1 doug doug 13G Feb 11 14:13 /tmp/SalishSeaCast_day_avg_diatoms_20150101_20150110.nc


In [93]:
!ncdump -cs {nc_path}

netcdf SalishSeaCast_day_avg_diatoms_20070101_20070131 {
dimensions:
	time = UNLIMITED ; // (31 currently)
	gridY = 898 ;
	gridX = 398 ;
	depth = 40 ;
variables:
	int gridY(gridY) ;
		gridY:standard_name = "y" ;
		gridY:long_name = "Grid Y" ;
		gridY:units = "count" ;
		gridY:comment = "gridY values are grid indices in the model y-direction" ;
		gridY:_Storage = "chunked" ;
		gridY:_ChunkSizes = 898 ;
		gridY:_Endianness = "little" ;
	int gridX(gridX) ;
		gridX:standard_name = "x" ;
		gridX:long_name = "Grid X" ;
		gridX:units = "count" ;
		gridX:comment = "gridX values are grid indices in the model x-direction" ;
		gridX:_Storage = "chunked" ;
		gridX:_ChunkSizes = 398 ;
		gridX:_Endianness = "little" ;
	float longitude(gridY, gridX) ;
		longitude:_FillValue = NaNf ;
		longitude:standard_name = "longitude" ;
		longitude:long_name = "Longitude" ;
		longitude:units = "degrees_east" ;
		longitude:_Storage = "chunked" ;
		longitude:_ChunkSizes = 898, 398 ;
		

## Move to Python Module

The next step is to put the code in this section into a Python module for further
testing and profiling without the overhead and other complications of working
with lots of datasets in the notebook context.
Please see `atlantis_nudge_diatoms.py`.

The move to a module is also an early step in the design of the planned
Reshapr package for generalized processing of this type.

## Close Datasets

In [128]:
ds.close()
grid_ds.close()

## Shut Down `dask` Cluster

In [129]:
client.close()