# Memory Management

Useful links on dask/zarr/lazy loading:
* [pangeo discussion](https://discourse.pangeo.io/t/processing-large-too-large-for-memory-xarray-datasets-and-writing-to-netcdf/1724/2)

## Set Up

In [1]:
import numba
import numpy as np
import pandas as pd
import holoviews as hv
import xarray as xr
import line_profiler

import clearwater_riverine as cwr

In [2]:
root = './data/sumwere_test_cases/plan28_testTSM'
ras_filepath = f'{root}/clearWaterTestCases.p28.hdf'
initial_condition_path = f'{root}/cwr_initial_conditions_waterTemp_p28.csv'
boundary_condition_path = f'{root}/cwr_boundary_conditions_waterTemp_p28.csv'

In [3]:
%%time
transport_model = cwr.ClearwaterRiverine(ras_filepath, 0.001, verbose=True)

Populating Model Mesh...
Calculating Required Parameters...
CPU times: total: 12.7 s
Wall time: 13.7 s


## Look at the size of the empty array

In [4]:
transport_model.mesh.nbytes * 1e-9

6.6169479440000005

In [5]:
transport_model.mesh.dims

Frozen({'node': 549, 'time': 259201, 'nface': 444, 'nmax_face': 8, 'nedge': 915, '2': 2})

In [6]:
size_dict = {}
for var_name, var_data in transport_model.mesh.variables.items():
    size_dict[var_name] = var_data.nbytes * 1e-9

In [7]:
sorted_dict = dict(sorted(size_dict.items(), key=lambda item: item[1]))

print(sorted_dict)

{'mesh2d': 4e-09, 'faces_surface_area': 1.776e-06, 'face_x': 3.552e-06, 'face_y': 3.552e-06, 'edges_face1': 3.66e-06, 'edges_face2': 3.66e-06, 'edge_length': 3.66e-06, 'node_x': 4.3920000000000005e-06, 'node_y': 4.3920000000000005e-06, 'edge_nodes': 7.32e-06, 'edge_face_connectivity': 7.32e-06, 'face_to_face_dist': 7.32e-06, 'face_nodes': 1.4208e-05, 'time': 0.0020736080000000002, 'dt': 0.0020736080000000002, 'water_surface_elev': 0.460340976, 'volume': 0.460340976, 'edge_velocity': 0.94867566, 'face_flow': 0.94867566, 'advection_coeff': 0.94867566, 'edge_vertical_area': 0.94867566, 'coeff_to_diffusion': 1.89735132}


edge_velocity, face_flow, advection_coeff, edge_vertical_area, and coeff_to_diffusion are all large. These have the largest dimensions of (time, nedge)

In [8]:
dims_dict = {}
for var_name, var_data in transport_model.mesh.variables.items():
    dims_dict[var_name] = var_data.dims

In [9]:
for key in sorted_dict.keys():
    print(key, dims_dict[key])

mesh2d ()
faces_surface_area ('nface',)
face_x ('nface',)
face_y ('nface',)
edges_face1 ('nedge',)
edges_face2 ('nedge',)
edge_length ('nedge',)
node_x ('node',)
node_y ('node',)
edge_nodes ('nedge', '2')
edge_face_connectivity ('nedge', '2')
face_to_face_dist ('nedge',)
face_nodes ('nface', 'nmax_face')
time ('time',)
dt ('time',)
water_surface_elev ('time', 'nface')
volume ('time', 'nface')
edge_velocity ('time', 'nedge')
face_flow ('time', 'nedge')
advection_coeff ('time', 'nedge')
edge_vertical_area ('time', 'nedge')
coeff_to_diffusion ('time', 'nedge')


It's interesting to note that the coeff_to_diffusion variable is twice as large as the other variables of the same dimensions. Why could this be?

In [10]:
dtype_dict = {}
for var_name, var_data in transport_model.mesh.variables.items():
    dtype_dict[var_name] = var_data.dtype

In [11]:
for key in sorted_dict.keys():
    print(key,  dtype_dict[key], dims_dict[key])

mesh2d int32 ()
faces_surface_area float32 ('nface',)
face_x float64 ('nface',)
face_y float64 ('nface',)
edges_face1 int32 ('nedge',)
edges_face2 int32 ('nedge',)
edge_length float32 ('nedge',)
node_x float64 ('node',)
node_y float64 ('node',)
edge_nodes int32 ('nedge', '2')
edge_face_connectivity int32 ('nedge', '2')
face_to_face_dist float64 ('nedge',)
face_nodes int32 ('nface', 'nmax_face')
time datetime64[ns] ('time',)
dt float64 ('time',)
water_surface_elev float32 ('time', 'nface')
volume float32 ('time', 'nface')
edge_velocity float32 ('time', 'nedge')
face_flow float32 ('time', 'nedge')
advection_coeff float32 ('time', 'nedge')
edge_vertical_area float32 ('time', 'nedge')
coeff_to_diffusion float64 ('time', 'nedge')


Coeff to diffusion is float64, not float32, which the other (time, nedge) variables are. This is why it's so much larger. 

We may want to investigate if we want to transform this to float64 (how much does it impact mass balance?. But we'll explore lazy loading / writing to disk first. 

## Save to Zarr

In [12]:
for key in ['face_area_elevation_info',
    'face_area_elevation_values',
    'face_normalunitvector_and_length',
    'face_cell_indexes_df',
    'face_volume_elevation_info',
    'face_volume_elevation_values',
    'boundary_data']:
    del transport_model.mesh.attrs[key]

### Leverage Dask (Writing)

Xarray's documentation on [parallel computing with](https://docs.xarray.dev/en/stable/user-guide/dask.html#reading-and-writing-data) includes information on reading and writing data. Let's experiment what 'lazy writing' does:

#### Zarr

In [14]:
%%time
ds = transport_model.mesh.to_zarr(
    'temp/output_lazy.zarr',
    compute=False
)

CPU times: total: 30.5 s
Wall time: 24.2 s


In [26]:
ds

Delayed('_finalize_store-7b7308ac-e6ff-4791-b235-156bc589d143')

In [21]:
%%time
for i in range(1000):
    test = ds.coeff_to_diffusion.isel(time=0)
print(test)

Delayed('isel-2cce8327-4874-4607-bb4e-75f6d077b2bb')
CPU times: total: 109 ms
Wall time: 105 ms


It's really fast, but it doesn't return any information.

In [17]:
%%time
results = ds.compute()

CPU times: total: 15.6 ms
Wall time: 3.2 ms


In [18]:
results

#### NetCDF

In [85]:
for key in ['volume_calculation_required',
           'face_area_calculation_required']:
    del transport_model.mesh.attrs[key]


In [87]:
%%time
delayed_nc =  transport_model.mesh.to_netcdf(
    'temp/output_lazy_nc5.nc',
    compute=False
)

CPU times: total: 719 ms
Wall time: 1.58 s


In [89]:
%%time
for i in range(1000):
    test = delayed_nc.coeff_to_diffusion.isel(time=0)
print(test)

Delayed('isel-79521d8f-1363-41de-b75a-075f26615e3d')
CPU times: total: 250 ms
Wall time: 675 ms


In [91]:
from dask.diagnostics import ProgressBar

In [92]:
%%time
with ProgressBar():
    results_nc = delayed_nc.compute()

[########################################] | 100% Completed | 28.56 s
CPU times: total: 26 s
Wall time: 30.6 s


In [94]:
results_nc

In [95]:
delayed_nc

Delayed('_finalize_store-b072a9ab-099c-4b16-b593-1bfdd05a9a9c')

In [34]:
transport_model.mesh

The output has been written, but it appears to still be loaded in memory... not the behavior I expected / desired.

## Load from Disk

In [47]:
output_lazy = xr.open_zarr('temp/output_lazy.zarr')

Confirm it is lazy loaded:

In [48]:
output_lazy

Unnamed: 0,Array,Chunk
Bytes,3.47 kiB,3.47 kiB
Shape,"(444,)","(444,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 3.47 kiB 3.47 kiB Shape (444,) (444,) Dask graph 1 chunks in 2 graph layers Data type float64 numpy.ndarray",444  1,

Unnamed: 0,Array,Chunk
Bytes,3.47 kiB,3.47 kiB
Shape,"(444,)","(444,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,3.47 kiB,3.47 kiB
Shape,"(444,)","(444,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 3.47 kiB 3.47 kiB Shape (444,) (444,) Dask graph 1 chunks in 2 graph layers Data type float64 numpy.ndarray",444  1,

Unnamed: 0,Array,Chunk
Bytes,3.47 kiB,3.47 kiB
Shape,"(444,)","(444,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,4.29 kiB,4.29 kiB
Shape,"(549,)","(549,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 4.29 kiB 4.29 kiB Shape (549,) (549,) Dask graph 1 chunks in 2 graph layers Data type float64 numpy.ndarray",549  1,

Unnamed: 0,Array,Chunk
Bytes,4.29 kiB,4.29 kiB
Shape,"(549,)","(549,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,4.29 kiB,4.29 kiB
Shape,"(549,)","(549,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 4.29 kiB 4.29 kiB Shape (549,) (549,) Dask graph 1 chunks in 2 graph layers Data type float64 numpy.ndarray",549  1,

Unnamed: 0,Array,Chunk
Bytes,4.29 kiB,4.29 kiB
Shape,"(549,)","(549,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,904.73 MiB,1.79 MiB
Shape,"(259201, 915)","(8101, 58)"
Dask graph,512 chunks in 2 graph layers,512 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 904.73 MiB 1.79 MiB Shape (259201, 915) (8101, 58) Dask graph 512 chunks in 2 graph layers Data type float32 numpy.ndarray",915  259201,

Unnamed: 0,Array,Chunk
Bytes,904.73 MiB,1.79 MiB
Shape,"(259201, 915)","(8101, 58)"
Dask graph,512 chunks in 2 graph layers,512 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.77 GiB,3.58 MiB
Shape,"(259201, 915)","(8101, 58)"
Dask graph,512 chunks in 2 graph layers,512 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 1.77 GiB 3.58 MiB Shape (259201, 915) (8101, 58) Dask graph 512 chunks in 2 graph layers Data type float64 numpy.ndarray",915  259201,

Unnamed: 0,Array,Chunk
Bytes,1.77 GiB,3.58 MiB
Shape,"(259201, 915)","(8101, 58)"
Dask graph,512 chunks in 2 graph layers,512 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.98 MiB,253.13 kiB
Shape,"(259201,)","(32401,)"
Dask graph,8 chunks in 2 graph layers,8 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 1.98 MiB 253.13 kiB Shape (259201,) (32401,) Dask graph 8 chunks in 2 graph layers Data type float64 numpy.ndarray",259201  1,

Unnamed: 0,Array,Chunk
Bytes,1.98 MiB,253.13 kiB
Shape,"(259201,)","(32401,)"
Dask graph,8 chunks in 2 graph layers,8 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,7.15 kiB,7.15 kiB
Shape,"(915, 2)","(915, 2)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,int32 numpy.ndarray,int32 numpy.ndarray
"Array Chunk Bytes 7.15 kiB 7.15 kiB Shape (915, 2) (915, 2) Dask graph 1 chunks in 2 graph layers Data type int32 numpy.ndarray",2  915,

Unnamed: 0,Array,Chunk
Bytes,7.15 kiB,7.15 kiB
Shape,"(915, 2)","(915, 2)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,int32 numpy.ndarray,int32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,3.57 kiB,3.57 kiB
Shape,"(915,)","(915,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 3.57 kiB 3.57 kiB Shape (915,) (915,) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray",915  1,

Unnamed: 0,Array,Chunk
Bytes,3.57 kiB,3.57 kiB
Shape,"(915,)","(915,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,7.15 kiB,7.15 kiB
Shape,"(915, 2)","(915, 2)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,int32 numpy.ndarray,int32 numpy.ndarray
"Array Chunk Bytes 7.15 kiB 7.15 kiB Shape (915, 2) (915, 2) Dask graph 1 chunks in 2 graph layers Data type int32 numpy.ndarray",2  915,

Unnamed: 0,Array,Chunk
Bytes,7.15 kiB,7.15 kiB
Shape,"(915, 2)","(915, 2)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,int32 numpy.ndarray,int32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,904.73 MiB,1.79 MiB
Shape,"(259201, 915)","(8101, 58)"
Dask graph,512 chunks in 2 graph layers,512 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 904.73 MiB 1.79 MiB Shape (259201, 915) (8101, 58) Dask graph 512 chunks in 2 graph layers Data type float32 numpy.ndarray",915  259201,

Unnamed: 0,Array,Chunk
Bytes,904.73 MiB,1.79 MiB
Shape,"(259201, 915)","(8101, 58)"
Dask graph,512 chunks in 2 graph layers,512 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,904.73 MiB,1.79 MiB
Shape,"(259201, 915)","(8101, 58)"
Dask graph,512 chunks in 2 graph layers,512 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 904.73 MiB 1.79 MiB Shape (259201, 915) (8101, 58) Dask graph 512 chunks in 2 graph layers Data type float32 numpy.ndarray",915  259201,

Unnamed: 0,Array,Chunk
Bytes,904.73 MiB,1.79 MiB
Shape,"(259201, 915)","(8101, 58)"
Dask graph,512 chunks in 2 graph layers,512 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,3.57 kiB,3.57 kiB
Shape,"(915,)","(915,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,int32 numpy.ndarray,int32 numpy.ndarray
"Array Chunk Bytes 3.57 kiB 3.57 kiB Shape (915,) (915,) Dask graph 1 chunks in 2 graph layers Data type int32 numpy.ndarray",915  1,

Unnamed: 0,Array,Chunk
Bytes,3.57 kiB,3.57 kiB
Shape,"(915,)","(915,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,int32 numpy.ndarray,int32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,3.57 kiB,3.57 kiB
Shape,"(915,)","(915,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,int32 numpy.ndarray,int32 numpy.ndarray
"Array Chunk Bytes 3.57 kiB 3.57 kiB Shape (915,) (915,) Dask graph 1 chunks in 2 graph layers Data type int32 numpy.ndarray",915  1,

Unnamed: 0,Array,Chunk
Bytes,3.57 kiB,3.57 kiB
Shape,"(915,)","(915,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,int32 numpy.ndarray,int32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,904.73 MiB,1.79 MiB
Shape,"(259201, 915)","(8101, 58)"
Dask graph,512 chunks in 2 graph layers,512 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 904.73 MiB 1.79 MiB Shape (259201, 915) (8101, 58) Dask graph 512 chunks in 2 graph layers Data type float32 numpy.ndarray",915  259201,

Unnamed: 0,Array,Chunk
Bytes,904.73 MiB,1.79 MiB
Shape,"(259201, 915)","(8101, 58)"
Dask graph,512 chunks in 2 graph layers,512 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,27.75 kiB,27.75 kiB
Shape,"(444, 8)","(444, 8)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 27.75 kiB 27.75 kiB Shape (444, 8) (444, 8) Dask graph 1 chunks in 2 graph layers Data type float64 numpy.ndarray",8  444,

Unnamed: 0,Array,Chunk
Bytes,27.75 kiB,27.75 kiB
Shape,"(444, 8)","(444, 8)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,7.15 kiB,7.15 kiB
Shape,"(915,)","(915,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 7.15 kiB 7.15 kiB Shape (915,) (915,) Dask graph 1 chunks in 2 graph layers Data type float64 numpy.ndarray",915  1,

Unnamed: 0,Array,Chunk
Bytes,7.15 kiB,7.15 kiB
Shape,"(915,)","(915,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.73 kiB,1.73 kiB
Shape,"(444,)","(444,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 1.73 kiB 1.73 kiB Shape (444,) (444,) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray",444  1,

Unnamed: 0,Array,Chunk
Bytes,1.73 kiB,1.73 kiB
Shape,"(444,)","(444,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,439.02 MiB,1.73 MiB
Shape,"(259201, 444)","(16201, 28)"
Dask graph,256 chunks in 2 graph layers,256 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 439.02 MiB 1.73 MiB Shape (259201, 444) (16201, 28) Dask graph 256 chunks in 2 graph layers Data type float32 numpy.ndarray",444  259201,

Unnamed: 0,Array,Chunk
Bytes,439.02 MiB,1.73 MiB
Shape,"(259201, 444)","(16201, 28)"
Dask graph,256 chunks in 2 graph layers,256 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,439.02 MiB,1.73 MiB
Shape,"(259201, 444)","(16201, 28)"
Dask graph,256 chunks in 2 graph layers,256 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 439.02 MiB 1.73 MiB Shape (259201, 444) (16201, 28) Dask graph 256 chunks in 2 graph layers Data type float32 numpy.ndarray",444  259201,

Unnamed: 0,Array,Chunk
Bytes,439.02 MiB,1.73 MiB
Shape,"(259201, 444)","(16201, 28)"
Dask graph,256 chunks in 2 graph layers,256 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


Below, we compare the time to select the values for one time step from `zarr_test`, first while lazy loaded, and then actually computing the values. It is much faster to use lazy loading. This appears to be particularly slow because of the default chunking scheme (we don't actually want to chunk over space).

### Lazy

In [51]:
%%time
for i in range(1000):
    test = output_lazy.coeff_to_diffusion.isel(time=i)
print(test)

<xarray.DataArray 'coeff_to_diffusion' (nedge: 915)>
dask.array<getitem, shape=(915,), dtype=float64, chunksize=(58,), chunktype=numpy.ndarray>
Coordinates:
    time     datetime64[ns] 2022-05-13T00:16:39
Dimensions without coordinates: nedge
CPU times: total: 484 ms
Wall time: 496 ms


### Loaded
Extremely slow!

In [52]:
%%time
for i in range(1000):
    test = output_lazy.coeff_to_diffusion.isel(time=i).values
print(test)

[0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
 3.27785524e-04 7.81648068e-04 7.97152426e-04 7.95916002e-04
 3.28834134e-04 0.00000000e+00 0.00000000e+00 0.00000000e+00
 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
 0.00000000e+00 0.00000000e+00 2.65362328e-04 7.56626995e-04
 2.82070062e-04 0.00000000e+00 0.00000000e+00 0.00000000e+00
 0.00000000e+00 2.55365409e-04 7.56930374e-04 9.50877188e-04
 2.03335962e-04 0.00000000e+00 3.73600383e-04 3.48241723e-04
 7.85126071e-04 3.46855845e-04 0.00000000e+00 0.00000000e+00
 3.45408074e-04 7.86248595e-04 2.94555506e-04 0.00000000e+00
 0.00000000e+00 3.00009086e-04 0.00000000e+00 0.00000000e+00
 0.00000000e+00 3.34100227e-04 3.31305413e-04 7.90870562e-04
 7.89616257e-04 3.30340261e-04 0.00000000e+00 0.00000000e+00
 0.00000000e+00 0.000000

## Baseline Comparison (in memory)

It's much faster to just use the mesh stored in memory.

In [53]:
%%time
for i in range(1000):
    test = transport_model.mesh.coeff_to_diffusion.isel(time=i).values

CPU times: total: 3.11 s
Wall time: 3.15 s


## Improve with Chunking?

Chunk along time, because for most use cases with this model, we'll need all of the spatial information but only at a given timestep. This is worse than the baseline loading just from memory, but way better than the default chunking scheme.

In [54]:
for var_name, var_data in transport_model.mesh.variables.items():
    if set(var_data.dims) == {'time', 'nedge'}:
        print(var_name)
        transport_model.mesh[var_name] = transport_model.mesh[var_name].chunk({'time': 1000})

edge_velocity
face_flow
advection_coeff
edge_vertical_area
coeff_to_diffusion


In [55]:
transport_model.mesh.to_zarr(
    'temp/output_chunks.zarr',
)

<xarray.backends.zarr.ZarrStore at 0x159e1173ae0>

In [56]:
output_chunks = xr.open_zarr('temp/output_chunks.zarr')

### Lazy

In [60]:
%%time
for i in range(1000):
    t = output_chunks.isel(time=i)

CPU times: total: 2.77 s
Wall time: 2.8 s


### Loaded
Note that in this instance, it's basically the same to load 1000 time slices as it does 1 time slice (due to chunking scheme).

In [70]:
%%time
for i in range(1000):
    t = output_chunks.isel(time=i).values

CPU times: total: 3.03 s
Wall time: 3.19 s


In [68]:
%%time
for i in range(1000):
    t = output_chunks.isel(time=slice(0,999)).values

CPU times: total: 2.34 s
Wall time: 2.34 s
