# TileDB Backend for xarray

## About this Example

### What it shows

This example shows some of the basic usage for opening a TileDB array in xarray using the TileDB backend.

### Set-up Requirements
This example requires `tiledb-cf` to be installed and uses the `tiledb`, `xarray`, and `numpy` libraries. 


In [None]:
import tiledb
import xarray as xr
import numpy as np

In [None]:
# Set names for the output generated by the example.
output_dir = "output/tiledb-xarray-basics"
uri1 = f"{output_dir}/example1"
uri2 = f"{output_dir}/example2"
uri3 = f"{output_dir}/example3"

In [None]:
# Reset output folder
import os
import shutil

shutil.rmtree(output_dir, ignore_errors=True)
os.mkdir(output_dir)

## Example 1. Opening a dense array

The TileDB-xarray backend supports opening dense arrays in xarray. Integer TileDB dimensions that have a domain that starts with `0` are treated as NetCDF-like dimensions. Dimensions that start at a different value or have a non-integer domain are treated like NetCDF coordinates.

In this example, we create an array with the following properties:

Dimensions:

| Name | Domain   | Data Type |
|:----:|:---------|:----------|
| x    | (0, 99)  | uint64    |
| y    | (0, 149) | uint64    |
| t    | (1, 36)  | uint64    |

Attributes:

| Name    | Data Type | Details                        |
|:-------:|:----------|:-------------------------------|
| ripple1 | float64   | sin(t * (x^2 + y^2)) / (t + 1) |
| ripple2 | float64   | cos(t * (x^2 + y^2)) / (t + 1) |

Here, xarray will open `x` and `y` as dimensions, `t` as a coordinate, and `ripple1` and `ripple2` as variables.

To assign xarray attributes (metadata) to variables and coordinates we use the prefix `__tiledb_attr.{attr_name}.` or `__tiledb_dim.{dim_name}` before the TileDB metadata keyword.

In [None]:
x_size = 100
y_size = 250
t_size = 36
schema = tiledb.ArraySchema(
    domain=tiledb.Domain(
        tiledb.Dim("x", domain=(0, x_size - 1), dtype=np.uint64),
        tiledb.Dim("y", domain=(0, y_size - 1), dtype=np.uint64),
        tiledb.Dim("time", domain=(1, t_size), dtype=np.uint64),
    ),
    attrs=(
        tiledb.Attr("ripple1", np.float64),
        tiledb.Attr("ripple2", np.float64),
    ),
)
tiledb.Array.create(uri1, schema)
with tiledb.open(uri1, mode="w") as array:
    array[:, :, :] = {
        "ripple1": np.fromfunction(
            lambda x,y,t: np.sin(t * (x ** 2 + y **2 )) / (t + 1), 
            (x_size, y_size, t_size)
        ),
        "ripple2": np.fromfunction(
            lambda x,y,t: np.cos(t * (x ** 2 + y **2 )) / (t + 1), 
            (x_size, y_size, t_size)
        ),
    }
    array.meta["__tiledb_attr.ripple1.description"] = "sin(t * (x^2 + y^2)) / (t + 1)"
    array.meta["__tiledb_attr.ripple2.description"] = "cos(t * (x^2 + y^2)) / (t + 1)"
    array.meta["__tiledb_dim.time.description"] = "time in seconds"
    array.meta["description"] = "Small example dense array"

The TileDB array is opened with xarray using the `tiledb` engine. This allows for xarray to access the data using it's standard lazy-loading. Once we've created the dataspace we can access and slice the data using standard xarray capabilities.

In [None]:
ds = xr.open_dataset(uri1, engine="tiledb")
ds

## Example 2. Handling Coordinates

This is a basic example on how to handle "coordinates" (a xarray variable and dimension with the same name).

In TileDB, an attribute and a dimension in an array cannot have the same name. To handle this, the suffix `.data` and `.index` will be stripped from the name of TileDB attributes and dimensions. In this example, we show creating a coordinate for a one-dimension TileDB array. We create an array with the following properties:

Dimensions:

| Name | Domain   | Data Type |
|:----:|:---------|:----------|
| x    | (0, 63)  | uint64    |


Attributes:

| Name    | Data Type | Details                        |
|:-------:|:----------|:-------------------------------|
| x.data | float64   | evenly spaced grid points in (-1, 1) |
| y      | float64   | exp( - x / 2 ) |

Here, xarray will combine the TileDB dimension `x` and TileDB attribute `x.data` into a coordinate `x`. The attribute `y` will be opened as a variable.

In [None]:
schema = tiledb.ArraySchema(
    domain=tiledb.Domain(tiledb.Dim("x", domain=(0, 63), dtype=np.int64)),
    attrs=[
        tiledb.Attr("x.data", np.float64),
        tiledb.Attr("y", np.float64),
    ]
)
tiledb.Array.create(uri2, schema)
x_values = np.linspace(-1.0, 1.0, 64)
with tiledb.open(uri2, mode="w") as array:
    array[:] = {
        "x.data": x_values,
        "y": np.exp(- x_values / 2.0)
    }

In [None]:
ds2 = xr.open_dataset(uri2, engine="tiledb")
ds2

In [None]:
ds2.plot.scatter(x="x", y="y")

## Example 3. A Dense Datetime Dimension to a Coordinate

TileDB dense arrays support datetime dimensions. When opening a dense TileDB array with datetime dimenions in xarray, the datetime dimension will be mapped to an xarray coordinate.



In this example, we create an array with the following properties:

Dimensions:

| Name | Domain                   | Data Type |
|:----:|:-------------------------|:----------|
| date | (2000-01-01, 2000-01-16) | Day       |


Attributes:

| Name         | Data Type | Details                        |
|:------------:|:----------|:-------------------------------|
| random_value | float64   | evenly spaced grid points in (-1, 1) |

Here, xarray will combine the TileDB dimension `x` and TileDB attribute `x.data` into a coordinate `x` and open `y` as a variable.

In [None]:
schema = tiledb.ArraySchema(
    domain=tiledb.Domain(
        tiledb.Dim(
            name="date",
            domain=(np.datetime64("2000-01-01"), np.datetime64("2000-01-16")),
            tile=np.timedelta64(16, "D"),
            dtype=np.datetime64("", "D"),
        ),
    ),
    attrs=[tiledb.Attr(name="random_value", dtype=np.float64)],
)
tiledb.Array.create(uri3, schema)
with tiledb.DenseArray(uri3, mode="w") as array:
    array[:] = {"random_value": np.random.random((16,))}

In [None]:
ds3 = xr.open_dataset(uri3, engine="tiledb")
ds3

In [None]:
ds3.random_value.plot()