# Riffing on Becca's "re-tiding" problem

Experiment to assess the reconstruction of hourly baroclinic currents
from hourly barotropic currents and daily baroclinic currents like
the products available from CIOPS-West.

Lot's of intermediate display of data arrays to keep tabs on how the
dask task graph develops.

Static rendering of this notebook is better on nbviewer than on GitHub
because nbviewer renders the fancy numpy/xarray reprs as intended.

The conda environment description for running this notebook is in the
`environment.yaml` file in the same directory as the notebook.

In [1]:
import xarray as xr
from pathlib import Path


I'm just going to use 2 days for quick testing,
so I'm going to use regex globbing to calculate the file paths.

Also,
added `"nav_lon"`, `"nav_lat"`, and `"time_centered"` to collection of variables to drop.
I'm not sure why xarray doesn't make `y` and `x` coordinates of the data arrays;
perhaps because they are not defined as variables in the `.nc` files so there is
no metadata for them.
They are certainly 2 of the dimensions of the arrays -
check out `e3t.coords` and `e3t.dims` and see the discussion and figure about
[coordinates and dimensions](https://xarray.pydata.org/en/stable/user-guide/data-structures.html#dataset)
in the xarray docs.

In [2]:
drop_vars = (
    "nav_lon", "bounds_lon", "nav_lat", "bounds_lat", "area", "deptht_bounds", "PAR",
    "time_centered", "time_centered_bounds", "time_counter_bounds", "dissolved_oxygen",
    "sigma_theta", "Fraser_tracer", "dissolved_inorganic_carbon", "total_alkalnity",
)

path = Path("/results/SalishSea/nowcast-green.201812/")
files = sorted(path.glob("0[12]mar19/SalishSea_1h_*_carp_T.nc"))

files


[PosixPath('/results/SalishSea/nowcast-green.201812/01mar19/SalishSea_1h_20190301_20190301_carp_T.nc'),
 PosixPath('/results/SalishSea/nowcast-green.201812/02mar19/SalishSea_1h_20190302_20190302_carp_T.nc')]

In [3]:
mydata = xr.open_mfdataset(files, drop_variables=drop_vars)
e3t = mydata['e3t']

e3t

Unnamed: 0,Array,Chunk
Bytes,2.56 GiB,1.28 GiB
Shape,"(48, 40, 898, 398)","(24, 40, 898, 398)"
Count,6 Tasks,2 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 2.56 GiB 1.28 GiB Shape (48, 40, 898, 398) (24, 40, 898, 398) Count 6 Tasks 2 Chunks Type float32 numpy.ndarray",48  1  398  898  40,

Unnamed: 0,Array,Chunk
Bytes,2.56 GiB,1.28 GiB
Shape,"(48, 40, 898, 398)","(24, 40, 898, 398)"
Count,6 Tasks,2 Chunks
Type,float32,numpy.ndarray


In [4]:
# convert e3t to e3u and to e3v
e3t_xshift = e3t.shift(x=-1, fill_value=0)
e3u = e3t_xshift+e3t
e3u = e3u*0.5
e3u = e3u.rename({'deptht': 'depthu'})

e3u

Unnamed: 0,Array,Chunk
Bytes,2.56 GiB,1.28 GiB
Shape,"(48, 40, 898, 398)","(24, 40, 898, 398)"
Count,21 Tasks,2 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 2.56 GiB 1.28 GiB Shape (48, 40, 898, 398) (24, 40, 898, 398) Count 21 Tasks 2 Chunks Type float32 numpy.ndarray",48  1  398  898  40,

Unnamed: 0,Array,Chunk
Bytes,2.56 GiB,1.28 GiB
Shape,"(48, 40, 898, 398)","(24, 40, 898, 398)"
Count,21 Tasks,2 Chunks
Type,float32,numpy.ndarray


In [5]:
e3t_yshift = e3t.shift(y=-1, fill_value=0)
e3v = e3t_yshift+e3t
e3v = e3v*0.5
e3v = e3v.rename({'deptht': 'depthv'})

e3v

Unnamed: 0,Array,Chunk
Bytes,2.56 GiB,1.28 GiB
Shape,"(48, 40, 898, 398)","(24, 40, 898, 398)"
Count,21 Tasks,2 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 2.56 GiB 1.28 GiB Shape (48, 40, 898, 398) (24, 40, 898, 398) Count 21 Tasks 2 Chunks Type float32 numpy.ndarray",48  1  398  898  40,

Unnamed: 0,Array,Chunk
Bytes,2.56 GiB,1.28 GiB
Shape,"(48, 40, 898, 398)","(24, 40, 898, 398)"
Count,21 Tasks,2 Chunks
Type,float32,numpy.ndarray


Added `"nav_lon"`, `"nav_lat"`, and `"time_centered"` to collection of variables to drop.

In [6]:
drop_vars = (
    "nav_lon", "bounds_lon", "nav_lat", "bounds_lat", "area", "depthu_bounds",
    "time_centered", "time_centered_bounds", "time_counter_bounds",
)

files = sorted(path.glob("0[12]mar19/SalishSea_1h_*_grid_U.nc"))

mydata = xr.open_mfdataset(files, drop_variables=drop_vars)
u = mydata['vozocrtx']

u


Unnamed: 0,Array,Chunk
Bytes,2.56 GiB,1.28 GiB
Shape,"(48, 40, 898, 398)","(24, 40, 898, 398)"
Count,6 Tasks,2 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 2.56 GiB 1.28 GiB Shape (48, 40, 898, 398) (24, 40, 898, 398) Count 6 Tasks 2 Chunks Type float32 numpy.ndarray",48  1  398  898  40,

Unnamed: 0,Array,Chunk
Bytes,2.56 GiB,1.28 GiB
Shape,"(48, 40, 898, 398)","(24, 40, 898, 398)"
Count,6 Tasks,2 Chunks
Type,float32,numpy.ndarray


Added `"nav_lon"`, `"nav_lat"`, and `"time_centered"` to collection of variables to drop,
and fixed type in `"depthv_bounds"`
(it was `"depthu_bounds"`).

In [7]:
drop_vars = (
    "nav_lon", "bounds_lon", "nav_lat", "bounds_lat", "area", "depthv_bounds",
    "time_centered", "time_centered_bounds", "time_counter_bounds",
)

files = sorted(path.glob("0[12]mar19/SalishSea_1h_*_grid_V.nc"))

mydata = xr.open_mfdataset(files, drop_variables=drop_vars)
v = mydata['vomecrty']

v


Unnamed: 0,Array,Chunk
Bytes,2.56 GiB,1.28 GiB
Shape,"(48, 40, 898, 398)","(24, 40, 898, 398)"
Count,6 Tasks,2 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 2.56 GiB 1.28 GiB Shape (48, 40, 898, 398) (24, 40, 898, 398) Count 6 Tasks 2 Chunks Type float32 numpy.ndarray",48  1  398  898  40,

Unnamed: 0,Array,Chunk
Bytes,2.56 GiB,1.28 GiB
Shape,"(48, 40, 898, 398)","(24, 40, 898, 398)"
Count,6 Tasks,2 Chunks
Type,float32,numpy.ndarray


In [8]:
#calcuate bartropic component of u
ut_h = (u*e3u).sum(dim='depthu')/e3u.sum(dim='depthu')

ut_h

Unnamed: 0,Array,Chunk
Bytes,65.44 MiB,32.72 MiB
Shape,"(48, 898, 398)","(24, 898, 398)"
Count,47 Tasks,2 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 65.44 MiB 32.72 MiB Shape (48, 898, 398) (24, 898, 398) Count 47 Tasks 2 Chunks Type float32 numpy.ndarray",398  898  48,

Unnamed: 0,Array,Chunk
Bytes,65.44 MiB,32.72 MiB
Shape,"(48, 898, 398)","(24, 898, 398)"
Count,47 Tasks,2 Chunks
Type,float32,numpy.ndarray


In [9]:
#calcuate bartropic component of v
vt_h = (v*e3v).sum(dim='depthv')/e3v.sum(dim='depthv')

vt_h

Unnamed: 0,Array,Chunk
Bytes,65.44 MiB,32.72 MiB
Shape,"(48, 898, 398)","(24, 898, 398)"
Count,47 Tasks,2 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 65.44 MiB 32.72 MiB Shape (48, 898, 398) (24, 898, 398) Count 47 Tasks 2 Chunks Type float32 numpy.ndarray",398  898  48,

Unnamed: 0,Array,Chunk
Bytes,65.44 MiB,32.72 MiB
Shape,"(48, 898, 398)","(24, 898, 398)"
Count,47 Tasks,2 Chunks
Type,float32,numpy.ndarray


`u`, `v`, and `e3*` variables get re-used below,
so, to be safe, force the calculations of `ut_h` and `vt_h`.

The default dask threads scheduler gave me the best performance here
when I ran the notebook inside VSCode.
I'm a little surprised at that.
For longer collections of days I would refactor this notebook into
a script and experiment with `.load(scheduler="processes", num_workers=n)`
for various values of `n`.

In [10]:
ut_h.load()

ut_h

  return func(*(_execute_task(a, cache) for a in args))


In [11]:
vt_h.load()

vt_h

Added `"nav_lon"`, `"nav_lat"`, and `"time_centered"` to collection of variables to drop.

In [12]:
# Now get the required data from the daily files
drop_vars = (
    "nav_lon", "bounds_lon", "nav_lat", "bounds_lat", "area", "deptht_bounds", "PAR",
    "time_centered", "time_centered_bounds", "time_counter_bounds", "dissolved_oxygen",
    "sigma_theta", "Fraser_tracer", "dissolved_inorganic_carbon", "total_alkalnity",
)

files = sorted(path.glob("0[12]mar19/SalishSea_1d_*_carp_T.nc"))

mydata = xr.open_mfdataset(files, drop_variables=drop_vars)
e3t = mydata['e3t']

e3t


Unnamed: 0,Array,Chunk
Bytes,109.07 MiB,54.54 MiB
Shape,"(2, 40, 898, 398)","(1, 40, 898, 398)"
Count,6 Tasks,2 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 109.07 MiB 54.54 MiB Shape (2, 40, 898, 398) (1, 40, 898, 398) Count 6 Tasks 2 Chunks Type float32 numpy.ndarray",2  1  398  898  40,

Unnamed: 0,Array,Chunk
Bytes,109.07 MiB,54.54 MiB
Shape,"(2, 40, 898, 398)","(1, 40, 898, 398)"
Count,6 Tasks,2 Chunks
Type,float32,numpy.ndarray


Added `"nav_lon"`, `"nav_lat"`, and `"time_centered"` to collection of variables to drop.

In [13]:
drop_vars = (
    "nav_lon", "bounds_lon", "nav_lat", "bounds_lat", "area", "depthu_bounds",
    "time_centered", "time_centered_bounds", "time_counter_bounds",
)

files = sorted(path.glob("0[12]mar19/SalishSea_1d_*_grid_U.nc"))

mydata = xr.open_mfdataset(files, drop_variables=drop_vars)
u = mydata['vozocrtx']

u


Unnamed: 0,Array,Chunk
Bytes,109.07 MiB,54.54 MiB
Shape,"(2, 40, 898, 398)","(1, 40, 898, 398)"
Count,6 Tasks,2 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 109.07 MiB 54.54 MiB Shape (2, 40, 898, 398) (1, 40, 898, 398) Count 6 Tasks 2 Chunks Type float32 numpy.ndarray",2  1  398  898  40,

Unnamed: 0,Array,Chunk
Bytes,109.07 MiB,54.54 MiB
Shape,"(2, 40, 898, 398)","(1, 40, 898, 398)"
Count,6 Tasks,2 Chunks
Type,float32,numpy.ndarray


Added `"nav_lon"`, `"nav_lat"`, and `"time_centered"` to collection of variables to drop,
and fixed type in `"depthv_bounds"`
(it was `"depthu_bounds"`).

In [14]:
drop_vars = (
    "nav_lon", "bounds_lon", "nav_lat", "bounds_lat", "area", "depthv_bounds",
    "time_centered", "time_centered_bounds", "time_counter_bounds",
)

files = sorted(path.glob("0[12]mar19/SalishSea_1d_*_grid_V.nc"))

mydata = xr.open_mfdataset(files, drop_variables=drop_vars)
v = mydata['vomecrty']

v


Unnamed: 0,Array,Chunk
Bytes,109.07 MiB,54.54 MiB
Shape,"(2, 40, 898, 398)","(1, 40, 898, 398)"
Count,6 Tasks,2 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 109.07 MiB 54.54 MiB Shape (2, 40, 898, 398) (1, 40, 898, 398) Count 6 Tasks 2 Chunks Type float32 numpy.ndarray",2  1  398  898  40,

Unnamed: 0,Array,Chunk
Bytes,109.07 MiB,54.54 MiB
Shape,"(2, 40, 898, 398)","(1, 40, 898, 398)"
Count,6 Tasks,2 Chunks
Type,float32,numpy.ndarray


In [15]:
# convert e3t to e3u and to e3v
e3t_xshift = e3t.shift(x=-1, fill_value=0)
e3u = e3t_xshift+e3t
e3u = e3u*0.5
e3u = e3u.rename({'deptht': 'depthu'})

e3u

Unnamed: 0,Array,Chunk
Bytes,109.07 MiB,54.54 MiB
Shape,"(2, 40, 898, 398)","(1, 40, 898, 398)"
Count,21 Tasks,2 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 109.07 MiB 54.54 MiB Shape (2, 40, 898, 398) (1, 40, 898, 398) Count 21 Tasks 2 Chunks Type float32 numpy.ndarray",2  1  398  898  40,

Unnamed: 0,Array,Chunk
Bytes,109.07 MiB,54.54 MiB
Shape,"(2, 40, 898, 398)","(1, 40, 898, 398)"
Count,21 Tasks,2 Chunks
Type,float32,numpy.ndarray


In [16]:
e3t_yshift = e3t.shift(y=-1, fill_value=0)
e3v = e3t_yshift+e3t
e3v = e3v*0.5
e3v = e3v.rename({'deptht': 'depthv'})

e3v

Unnamed: 0,Array,Chunk
Bytes,109.07 MiB,54.54 MiB
Shape,"(2, 40, 898, 398)","(1, 40, 898, 398)"
Count,21 Tasks,2 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 109.07 MiB 54.54 MiB Shape (2, 40, 898, 398) (1, 40, 898, 398) Count 21 Tasks 2 Chunks Type float32 numpy.ndarray",2  1  398  898  40,

Unnamed: 0,Array,Chunk
Bytes,109.07 MiB,54.54 MiB
Shape,"(2, 40, 898, 398)","(1, 40, 898, 398)"
Count,21 Tasks,2 Chunks
Type,float32,numpy.ndarray


In [17]:
#calcuate bartropic component
ut_d = (u*e3u).sum(dim='depthu')/e3u.sum(dim='depthu')

ut_d

Unnamed: 0,Array,Chunk
Bytes,2.73 MiB,1.36 MiB
Shape,"(2, 898, 398)","(1, 898, 398)"
Count,47 Tasks,2 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 2.73 MiB 1.36 MiB Shape (2, 898, 398) (1, 898, 398) Count 47 Tasks 2 Chunks Type float32 numpy.ndarray",398  898  2,

Unnamed: 0,Array,Chunk
Bytes,2.73 MiB,1.36 MiB
Shape,"(2, 898, 398)","(1, 898, 398)"
Count,47 Tasks,2 Chunks
Type,float32,numpy.ndarray


In [18]:
#subtract from u to get baroclinic component
uc_d = u - ut_d

uc_d


Unnamed: 0,Array,Chunk
Bytes,109.07 MiB,54.54 MiB
Shape,"(2, 40, 898, 398)","(1, 40, 898, 398)"
Count,53 Tasks,2 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 109.07 MiB 54.54 MiB Shape (2, 40, 898, 398) (1, 40, 898, 398) Count 53 Tasks 2 Chunks Type float32 numpy.ndarray",2  1  398  898  40,

Unnamed: 0,Array,Chunk
Bytes,109.07 MiB,54.54 MiB
Shape,"(2, 40, 898, 398)","(1, 40, 898, 398)"
Count,53 Tasks,2 Chunks
Type,float32,numpy.ndarray


Again, force the calculation of `uc_d` so that we can play with a
numpy array below instead of a dask task graph.
Maybe not strictly necessary,
but helpful for development purposes.

In [19]:
uc_d.load()

uc_d

  return func(*(_execute_task(a, cache) for a in args))


In [20]:
ut_h.shape, uc_d.shape

((48, 898, 398), (2, 40, 898, 398))

Here's the interpolation bit to get from daily to hourly baroclinic.

Figuring out how to use `DataArray.resample()` involves looking at both
the [xarray docs](https://xarray.pydata.org/en/stable/generated/xarray.DataArray.resample.html)
and the [panada docs](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.resample.html).

`time_counter="1H"` means use the `time_counter` coordinate as the time index
and resample the array to a 1 hour frequency.

`loffset="30min"` shifts the time-base from being on the hour to being on the half-hour
so that the `time_counter` values in the result match those in `ut_h`.

`.interpolate("linear")` causes the resampling to be done by linear interpolation.
That uses the interpolation functions from `scipy`,
making it becomes an implicit dependency
(i.e. it must be in the conda env).

Because my example is using just 2 days of model output,
the result is has 25 `time_counter` values.
They run from `12:30:00` on the first day to `11:30:00` on the second
because the day-averaged array we're interpolating in starts at `12:00:00`
on the first day,
and we have asked for a `"30min"` offset.

In [21]:
uc_h_interp = uc_d.resample(time_counter="1H", loffset="30min").interpolate("linear")

uc_h_interp

Rather remarkably,
xarray/numpy appears to just handle aligning the `time_counter` values
when we add the interpolated hourly baroclinic array to the hourly barotropic one!

In [22]:
u_new = ut_h  + uc_h_interp

u_new