Notebooks and analysis for the Open Problems in Single-Cell Analysis NeurIPS 2021 comptition. More information at here.
The current forms of the data are public available on S3. To download the data, first install the AWS CLI on your computer: https://aws.amazon.com/cli/
You can download the data to your local computer with the following command (note the dataset size is roughly 1.2 GiB):
aws s3 sync s3://openproblems-bio/public/explore /tmp/public/ --no-sign-request
You’ll find the following files:
explore
├── LICENSE.txt
├── README.txt
├── cite/cite_adt_processed_training.h5ad
├── cite/cite_gex_processed_training.h5ad
├── multiome/multiome_atac_processed_training.h5ad
└── multiome/multiome_gex_processed_training.h5ad
These are all AnnData h5ad files, as described in the following section.
The training data is accessible in an AnnData h5ad file. More information can be found on AnnData objects here. You can load these files is to use the AnnData.read_h5ad()
function. The easiest way to get started is to spin up a free Jupyter Server on Saturn Cloud.
!pip install anndata
import anndata as ad
adata_gex = ad.read_h5ad("cite/cite_gex_processed_training.h5ad")
adata_adt = ad.read_h5ad("cite/cite_adt_processed_training.h5ad")
You can find code examples for exploring the data in our data exploration notebooks.
-
CellGAN/
: Model Architecture (CellGAN) for Modality Prediction task. This model adapted from WGAN approaches to convert one type of datastructure to another one by Adversarial Training. In the figure, as an exampleGEX
toATAC
have been illusturated but the other type of conversions can be seen below: -
mod1
mod2
"GEX"
"ATAC"
"ATAC"
"GEX"
"GEX"
"ADT"
"ADT"
"GEX"
Above explanations from the official site which can be looked detail here!