# Testing Memory Usage of Dask

## ESGF Data

* http://esgf3.dkrz.de/thredds/fileServer/cmip6/CMIP/MPI-M/MPI-ESM1-2-HR/historical/r1i1p1f1/day/tas/gn/v20190710/tas_day_MPI-ESM1-2-HR_historical_r1i1p1f1_gn_19000101-19041231.nc

## xarray

* http://xarray.pydata.org/en/stable/user-guide/dask.html
* http://stephanhoyer.com/2015/06/11/xray-dask-out-of-core-labeled-arrays/

## Dask

* https://blog.dask.org/2021/03/11/dask_memory_usage
* https://docs.dask.org/en/latest/array-chunks.html
* https://docs.dask.org/en/latest/dataframe-best-practices.html#repartition-to-reduce-overhead
* https://coiled.io/tackling-unmanaged-memory-with-dask/
* https://docs.dask.org/en/latest/diagnostics-distributed.html

## Bugs

* https://github.com/pydata/xarray/issues/3781
* https://github.com/pydata/xarray/issues/3401

## Memory profiler

* https://pypi.org/project/dask_memusage/
* https://pypi.org/project/filprofiler/
* https://pypi.org/project/memory-profiler/

In [None]:
# https://pythonspeed.com/products/filmemoryprofiler/#profiling-in-jupyter
#%load_ext filprofiler

In [None]:
import xarray as xr

#xr.set_options(display_style='html')
%matplotlib inline
#%config InlineBackend.figure_format = 'retina' 

from bokeh.plotting import show
from bokeh.io import output_notebook

output_notebook()

In [None]:
import dask

# https://docs.dask.org/en/latest/setup/single-machine.html
dask.config.set(
    scheduler='single-threaded'
    #scheduler='threads'
    #scheduler='processes'
)

In [None]:
# http://xarray.pydata.org/en/stable/user-guide/dask.html#
ds = xr.open_dataset(
    "data/tas_day_MPI-ESM1-2-HR_historical_r1i1p1f1_gn_19000101-19041231.nc",
    #chunks={"time": "auto"}
    chunks={"time": 1}
)
ds

In [None]:
ds = ds.unify_chunks()
ds.chunks

In [None]:
# ds.tas.sel(time='1900-01-01').squeeze().plot()

In [None]:
delayed_obj = ds.to_netcdf("out.nc", compute=False)

In [None]:
# https://docs.dask.org/en/latest/diagnostics-local.html
from dask.diagnostics import ProgressBar, ResourceProfiler
with ProgressBar(), ResourceProfiler() as rprof:
    delayed_obj.compute()


In [None]:
from bokeh.plotting import show
from bokeh.io import output_notebook

output_notebook()


In [None]:
show(rprof.visualize())