![](http://xarray.pydata.org/en/stable/_static/dataset-diagram-logo.png)
# XARRAY BACKEND API TUTORIAL
Aureliana Barghini ([B-Open](https://www.bopen.eu/))


Notebooks available at: 

Xarray backend documentation available at: http://xarray.pydata.org/en/stable/internals/how-to-add-new-backend.html
<br/>

## Introduction 
Xarray can read different type of file specifing in `xr.open_dataset` the engine to be used:
```python
import xarray as xr
xr.open_dataset("my_file.grib" , engine="cfgrib")
```

For each available egine there is an underlying backend, that reads the data and pack them in a dataset. 

[Internal Backends](http://xarray.pydata.org/en/stable/user-guide/io.html):

- netcdf4 - netCDF4
- scipy - netCDF3
- zarr - Zarr
- pydap - DAP
- ...

External Backends that use the new backend API (xarray >= v0.18.0) that allows to add spport for backend without any change to Xarray 
- [cfgrib](https://github.com/ecmwf/cfgrib) - GRIB
- [tiledb](https://pythonrepo.com/repo/TileDB-Inc-TileDB-xarray) - TileDB
- [rioxarray](https://corteva.github.io/rioxarray/stable/) - GeoTIFF, JPEG-2000, ESRI-hdr, etc (via GDAL)
- [xarray-sentinel](https://github.com/bopen/xarray-sentinel) - Sentinel-1 SAFE
- ...
<br/>

## Why using the Xarray backend API


- Your users don't need to learn a new interface (that is: they can use `xr.open_dataset`)

- With little extra effort you can have lazy loading with Dask:<br/> 
  you have to implement a function for reading blocks and Xarray will manage lazy loading with Dask for you

- It's easy to implement: you don't need to integrate any code in Xarray

<br/>

## Backend without lazy loading

#### BackendEntrypoint
Implement a subclass of `BackendEntrypoint` that expose a method `open_dataset`:

   ```python
    from xarray.backends import BackendEntrypoint

    class MyBackendEntrypoint(BackendEntrypoint):
        def open_dataset(
            self,
            filename_or_obj,
            *,
            drop_variables=None,
        ):
            
            return my_open_dataset(filename_or_obj, drop_variables=drop_variables)

   ```

#### BackendEntrypoint integration
Declare this class as an external plugin in your `setup.py`:

```python
    setuptools.setup(
        ...
        entry_points={
            'xarray.backends': ['engine_name=package.module:my_backendentrypoint'],
        },
    )

```
or pass it in `xr.open_dataset`:

```python
    xr.open_dataset(..., engine=MyBackendEntrypoint)
```
<br/>

## EXAMPLE: Binary Backend

### Sample files

### BinaryBackend Entrypoint
Example of backend to open binary files

In [4]:
import numpy as np
import xarray as xr


class BinaryBackend(xr.backends.BackendEntrypoint):
    def open_dataset(
        filename_or_obj,
        *,
        drop_variables=None,
        # backend specific parameter
        dtype=np.int64
    ):
        with open(filename_or_obj) as f:
            arr = np.fromfile(f, dtype)
        
        var = xr.Variable(dims=('x'), data=arr)
        coords = {"x": np.arange(arr.size) * 10}
        return xr.Dataset({"foo": var}, coords=coords)

### It Works! 

#### But it may be memory demanding

In [5]:
arr = xr.open_dataarray("foo.bin", engine=BinaryBackend)
arr

In [6]:
arr = xr.open_dataarray("foo_float.bin", engine=BinaryBackend, dtype=np.float64)
arr

In [7]:
arr.sel(x=slice(0, 100))