# Example: Opening large local datasets

```{note}
The online laboratory has only been tested in recent Firefox and Chrome browsers. Some features may not (yet) be supported in Safari browsers.
```

```{caution}
In the online laboratory, changes to notebooks and local files are only saved in your web browser's storage and not persisted to disk.

Please download copies of any files that you don't want to loose.

Your files from an old session will usually be kept if you close or refresh this page, unless your browser's storage for `lab.climet.eu` is cleared, e.g.
- manually by clearing the browser's site data
- automatically when too much data is stored
- automatically when you close a private browsing context
- if you have setup your browser to clear site data, e.g. when the browser is closed
```

In [1]:
import sys
sys.path.insert(0, "..")

In [2]:
import utils

## Motivation

The online laboratory operates within a memory constrained environment. Therefore, downloading large datasets into the lab is often not possible.

If the data is stored remotely, e.g. because if exceeds even the size of your machine's working memory or even file storage, [`02-remote.ipynb`](02-remote.ipynb) shows you how to open the remote data to stream it in as needed.

However, if you already have the dataset stored in your local filesystem, mounting the local file into the online laboratory is the preferred option. This approach is explored in this notebook.

Note that you only need to use this approach when running notebooks in the online laboratory on <https://lab.climet.eu>. If you are running notebooks locally, you can simply `open()` the local file directly.

## Mounting a local file into the laboratory

Mounting a local file might seem similar to uploading it. However

1. Mounting does not copy any data and does not read the file into memory, thus allowing arbitrarily large files to be made accessible.
2. A mounted file never leaves your machine and is not uploaded to any server. This is especially important if your data contains sensitive information.

It is worth remembering that large files can still only be read if the algorithm that processes them supports streaming or chunking and does not request to load all data into memory at the same time.

In [3]:
upload_path = await utils.mount_user_local_file()
upload_path

[pyodide]: Loading ipyfilite, ipywidgets, jupyterlab_widgets, widgetsnbextension
[pyodide]: Loaded ipyfilite, ipywidgets, jupyterlab_widgets, widgetsnbextension


FileUploadLite(value=(), description='Upload')

PosixPath('/uploads/18d1e70d-7d05-40ed-91ea-81958220885a/03-t2m.nc')

## Loading the file into `xarray`

In [4]:
import cfgrib
import netCDF4
import zarr

import xarray as xr

[pyodide]: Loading asciitree, attrs, cffi, cfgrib, cftime, click, eccodes, findlibs, netCDF4, numcodecs, numpy, pandas, pycparser, python-dateutil, pytz, six, tzdata, xarray, zarr
[pyodide]: Loaded asciitree, attrs, cffi, cfgrib, cftime, click, eccodes, findlibs, netCDF4, numcodecs, numpy, pandas, pycparser, python-dateutil, pytz, six, tzdata, xarray, zarr
[pyodide]: Loading pyarrow, pyodide-unix-timezones
[pyodide]: Loaded pyarrow, pyodide-unix-timezones
[pyodide]: Loading cloudpickle
[pyodide]: Loaded cloudpickle
[pyodide]: Loading PyYAML, dask, fsspec, locket, partd, toolz
[pyodide]: Loaded PyYAML, dask, fsspec, locket, partd, toolz
[pyodide]: Loading msgpack
[pyodide]: Loaded msgpack
[pyodide]: Memory usage has grown to 154.8MiB (from 49.9MiB) for this notebook
[pyodide]: Loaded 78 new dynamic libraries (84 total for this notebook)


Finally, we can load the data into `xarray` as usual.

When opening a GRIB dataset, `cfgrib` looks for or creates an index file for the dataset. Since we have mounted the local GRIB file as read-only, however, `cfgrib` is unable to create the index file at its usual location and will fail with a cryptic error. You can either disable the generation of an index file using

```python
xr.open_dataset(dataset_path, backend_kwargs=dict(indexpath=""))
```

or provide an explicit index path instead using, e.g

```python
from pathlib import Path

xr.open_dataset(dataset_path, backend_kwargs=dict(
    indexpath=f"./{Path(dataset_path).name}.{{short_hash}}.idx",
))
```

The `utils.open_dataset(..)` helper function uses the first strategy and automatically disables the generation of an index file.

In [5]:
ds = xr.open_dataset(upload_path)
ds

[pyodide]: Loading h5netcdf, h5py
[pyodide]: Loaded h5netcdf, h5py
[pyodide]: Loading scipy
[pyodide]: Loaded scipy
[pyodide]: Loading Pint, flexcache, flexparser, platformdirs, typing_extensions
[pyodide]: Loaded Pint, flexcache, flexparser, platformdirs, typing_extensions
[pyodide]: Loading future, uncertainties
[pyodide]: Loaded future, uncertainties
[pyodide]: Loading Jinja2, MarkupSafe
[pyodide]: Loaded Jinja2, MarkupSafe


[pyodide]: Memory usage has grown to 185.8MiB (from 154.8MiB) for this notebook
[pyodide]: Loaded 39 new dynamic libraries (123 total for this notebook)
