# Introduction to Sparse TileDB Arrays

## About this tutorial

This is a simple example into creating, reading, and writing sparse TileDB arrays.

## Resources

* [TileDB Embedded Docs: API Usage](https://docs.tiledb.com/main/solutions/tiledb-embedded/api-usage)
* [TileDB-Py API Docs](https://tiledb-inc-tiledb.readthedocs-hosted.com/projects/tiledb-py/en/stable/python-api.html#)
* [TileDB-Py Examples](https://github.com/TileDB-Inc/TileDB-Py/tree/dev/examples)

In [1]:
import tiledb
import numpy as np

sparse_array_uri = "arrays/sparse"

In [2]:
import shutil

# clean up any previous runs
try:
    shutil.rmtree(sparse_array_uri)
except:
    pass

<a id="sparse"></a>
## Sparse arrays

TileDB sparse array does not require a value for every cell. Before writing any data, first define the schema of a sparse array. The only difference compared to the dense array is that you now will add `sparse=True` (the default is `False`):

In [3]:
rows = tiledb.Dim(name="rows", domain=(1, 4), tile=4, dtype=np.int32)
cols = tiledb.Dim(name="cols", domain=(1, 4), tile=4, dtype=np.int32)

dom = tiledb.Domain(rows,cols)
attr_a = tiledb.Attr(name="a", dtype=np.int32)
attr_b = tiledb.Attr(name="b", dtype=np.float64)

schema = tiledb.ArraySchema(domain=dom, sparse=True, attrs=[attr_a, attr_b])
print(schema)

ArraySchema(
  domain=Domain(*[
    Dim(name='rows', domain=(1, 4), tile='4', dtype='int32'),
    Dim(name='cols', domain=(1, 4), tile='4', dtype='int32'),
  ]),
  attrs=[
    Attr(name='a', dtype='int32', var=False, nullable=False),
    Attr(name='b', dtype='float64', var=False, nullable=False),
  ],
  cell_order='row-major',
  tile_order='row-major',
  capacity=10000,
  sparse=True,
  allows_duplicates=False,
  coords_filters=FilterList([ZstdFilter(level=-1)]),
)



The next step is to create the (empty) array on disk, and then open and write data to the sparse array. 

In [4]:
tiledb.SparseArray.create(sparse_array_uri, schema)

Like in the dense case, we can try reading before any data is added:

In [5]:
with tiledb.open(sparse_array_uri, mode="r") as array:
    print(f"Non-empty domain: {array.nonempty_domain()}")
    data = array[:, :]
for name, values in data.items():
    print(f"{name}: {values}")

Non-empty domain: None
a: []
b: []
rows: []
cols: []


Let's write the 3 values from `data` to 3 cells in the array with the coordinates in `[I,J]`:

In [6]:
with tiledb.open(sparse_array_uri, mode="w") as array:
    I, J = [1, 2, 2], [1, 4, 3]
    array[I, J] = {"a": np.array([1,2,3]), "b": np.array([-1.5, 0.0, 0.5])}

That is it, you have now also created a TileDB sparse array! 

Read all data from a sparse array in the exact same way as reading it from a dense array:

In [7]:
with tiledb.open(sparse_array_uri, mode="r") as A:
    data = A[:,:]
for name, values in data.items():
    print(f"{name}: {values}")

a: [1 3 2]
b: [-1.5  0.5  0. ]
rows: [1 2 2]
cols: [1 3 4]


Notice that this looks different than for the dense array, where `data` only contained the values for attributes. For the sparse arrays, data is returned in coordinate form: a separate array of each dimension and the attribute is provided in order:

In [8]:
print("Attribute 'a':")
for i, coord in enumerate(zip(data["rows"], data["cols"])):
    print(f"Cell ({coord[0]}, {coord[1]}) has data {data['a'][i]}")
print("\nAttribute 'b':")
for i, coord in enumerate(zip(data["rows"], data["cols"])):
    print(f"Cell ({coord[0]}, {coord[1]}) has data {data['b'][i]}")

Attribute 'a':
Cell (1, 1) has data 1
Cell (2, 3) has data 3
Cell (2, 4) has data 2

Attribute 'b':
Cell (1, 1) has data -1.5
Cell (2, 3) has data 0.5
Cell (2, 4) has data 0.0


Like with the dense array we can return slices of data:

In [9]:
with tiledb.open(sparse_array_uri, mode="r") as array:
    data = array.multi_index[1:3, 1:3]
    for name, values in data.items():
        print(f"{name}: {values}")

rows: [1 2]
cols: [1 3]
a: [1 3]
b: [-1.5  0.5]


Try different queries on the sparse array:

In this array, we did not allow duplicates. This means if we add a new value to a cell, it will replace that value, just like in the dense case. Try writine a new value for `a` and `b` at cell (2,3), then open the array in read mode and view the output.