# Load 'Visium' Spatial transcriptomics datasets

This library provides an interface to load 10xGenomics (visium) 'spatial gene expression' datasets.
Datasets are adapted from [here](https://www.10xgenomics.com/resources/datasets?query=&page=1&configure%5BhitsPerPage%5D=50&configure%5BmaxValuesPerFacet%5D=1000&refinementList%5Bproduct.name%5D%5B0%5D=Spatial%20Gene%20Expression)

This library uses the popular huggingface datasets format for sharing the datasets, it is recommended to cross-reference with datasets [documentation](https://huggingface.co/) when dusing this library.

In [None]:
from st_visium_datasets import setup_logging

setup_logging()

## List availlable dataset names

In huggingface nomenclature, this project provides a single dataset called `visium`, with different possible [configurations](https://huggingface.co/docs/datasets/load_hub#configurations).

The default config for `visium` is `all`: it contains an aggregation of all existing configs.

To get a list of all availlable configs, `list_visium_datasets` can be used.

Config names are generally in the format: `<species>_<anatomical_entity>`

In [None]:
from st_visium_datasets import list_visium_datasets

dataset_names = list_visium_datasets()
dataset_names

## Simple stats

An important information about each dataset is the number of spots under tissue, and the number of genes detected. `st_visium_datasets` provides this information directly per dataset config name

In [None]:
from st_visium_datasets import gen_visium_dataset_stat

gen_visium_dataset_stat("human") # returns a dict

To view stats for all avillable datasets, you can use:

In [None]:
from st_visium_datasets import gen_visium_dataset_stat_table

print(gen_visium_dataset_stat_table())

## Load a 'visium' dataset

Before you take the time to download a dataset, it’s often helpful to quickly get some general information about a dataset. A dataset’s information is stored inside DatasetInfo and can include information such as the dataset description, features, and dataset size.

Use the `load_visium_dataset_builder` function to load a dataset builder and inspect a dataset’s attributes without committing to downloading it.

Note: The `load_visium_dataset_builder` has exactly the same signature as `datasets.load_dataset_builder` (except for the `path` arg which is implicitly set to `visium`)

In [None]:
from st_visium_datasets import load_visium_dataset_builder

ds_builder = load_visium_dataset_builder("human")

In [None]:
# Inspect dataset description
ds_builder.info.description

In [None]:
# Inspect dataset features
for k, v in ds_builder.info.features.items():
    print(f"- {k}: {v}")

If you’re happy with the dataset, then load it with `load_visium_dataset` (again, same api as `datasets.load_dataset`)

To speed data download and loading, we make use of all multiprocessing cores availlable

In [None]:
from st_visium_datasets import load_visium_dataset

num_proc = 2
ds = load_visium_dataset("human", num_proc=num_proc)