![igvflogo](images/igvf-logo.png)

# Why use TileDB?
With [TileDB](https://tiledb.io/) you gain the ability to quickly query array-structured data using rectangular slices, update existing arrays with new or changed data, and easily optimize your physical data organization for maximizing compression and read performance.

# What is anndata
Anndata is a python package for handling annotated data matrices in memory and on disk. It is a widely used format for single-cell genomics data. For the purposes of this tutorial we will be using an experiment from [IGVF Project](https://data.igvf.org/matrix-files/IGVFFI0475WSGO/). For More information about anndata, see [anndata documentation](https://anndata.readthedocs.io/en/stable/)

# Installation and configuration
We will be making use of tiledb and tiledbsoma python packages.

In [1]:
!pip install tiledb tiledbsoma

In [8]:
import anndata as ad
import tiledb
import tiledbsoma
import tiledbsoma.io
from tiledbsoma import SOMAError

tiledbsoma.show_package_versions()
cfg = tiledb.Config({"vfs.s3.no_sign_request": True})
vfs = tiledb.VFS(config=cfg)

tiledbsoma.__version__              1.17.1
TileDB core version (libtiledbsoma) 2.28.1
python version                      3.11.9.final.0
OS version                          Darwin 24.6.0


# Open h5ad with tiledb vfs and anndata
Using TileDB’s VFS, you can read the H5AD directly from S3 and load it into memory using the AnnData package:

In [2]:
H5AD_URI = 'IGVFFI0475WSGO.h5ad' #"s3://igvf-public-data/2025/03/27/c70f866e-68ba-4ec9-9c81-441d7e3552cc/IGVFFI0475WSGO.h5ad"

with vfs.open(H5AD_URI) as h5ad:
    adata = ad.read_h5ad(h5ad)

# Explore anndata object
Anndata is a rich container, and we won't go into the detail here. Below we'll look at very basic properties of the object.

In [4]:
adata

AnnData object with n_obs × n_vars = 1837535 × 78298
    layers: 'ambiguous', 'mature', 'nascent'

In [5]:
adata.obs.head()

AACCTATAAACCTATAAAGCGGCA_SS-PKR-129
AACCTATAAACCTATAAATCTCGC_SS-PKR-129
AACCTATAAACCTATAACAGTGGT_SS-PKR-129
AACCTATAAACCTATAACGGTAAT_SS-PKR-129
AACCTATAAACCTATAAGACCAGG_SS-PKR-129


In [6]:
adata.var.head()

ENSMUSG00000102693.2
ENSMUSG00000064842.3
ENSMUSG00000051951.6
ENSMUSG00000102851.2
ENSMUSG00000103377.2


# Ingest anndata into SOMA experiment
SOMA experiment can be created in a local file, S3 bucket or in TileDB Cloud (requires setting up [TileDB Cloud](https://cloud.tiledb.com) account)

In [10]:
EXPERIMENT_URI = 'my-single-cell-soma-experiment' #This URI can also be of format s3:// or tiledb://
try:
    tiledbsoma.io.from_anndata(experiment_uri=EXPERIMENT_URI, measurement_name="RNA", anndata=adata)
    with tiledbsoma.open(EXPERIMENT_URI) as exp:
        print(exp.ms['RNA'].var.domain)
        print(exp.ms["RNA"].X["data"].shape)
except SOMAError:
    print(f'Experiment {EXPERIMENT_URI} already exists. Delete (or deregister if using TileDB Cloud) the experiment before continuing.') 

Experiment my-single-cell-soma-experiment already exists. Delete (or deregister if using TileDB Cloud) the experiment before continuing.[2025-09-10 12:18:03.427] [Process: 78213] [error] [1757531234104572000-Global] Group: Cannot create group; Group 'file:///Users/otto/Documents/repos/igvf/igvf-data-usage-examples/my-single-cell-soma-experiment' already exists

