In this example, we utilize the 10x Genomics Multiome data provided by the NeurIPS 2021 Multimodal Single-Cell Data Integration competition. This data can be downloaded from NCBI GEO with the accession number [GSE194122](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE194122).

To run this example, users need to ensure that pandas, numpy, scikit-learn, and datatable are installed in the runtime environment. Datatable, in particular, supports rapid reading and writing of large-scale files.

Please download the h5ad file before running the code below.
```bash
wget -O - "https://ftp.ncbi.nlm.nih.gov/geo/series/GSE194nnn/GSE194122/suppl/GSE194122%5Fopenproblems%5Fneurips2021%5Fmultiome%5FBMMC%5Fprocessed.h5ad.gz" | pigz -d > GSE194122_openproblems_neurips2021_multiome_BMMC_processed.h5ad
```

In [6]:
import pandas as pd
import numpy as np
import scanpy as sc

import datatable as dt

from sklearn.preprocessing import OneHotEncoder


In [7]:
adata = sc.read_h5ad('GSE194122_openproblems_neurips2021_multiome_BMMC_processed.h5ad')

### Solely use the transcriptomic data

In [10]:
genes=adata.var_names[adata.var['feature_types']=='GEX']
adata_gex=adata[:,genes].copy()
adata_gex.X = adata_gex.layers['counts'].copy()

In [14]:
sc.pp.normalize_total(adata_gex, target_sum=1e4)
sc.pp.log1p(adata_gex)
sc.pp.highly_variable_genes(adata_gex, n_top_genes=2000, batch_key='batch', subset=True)

Normalized count data: X.
Extracted 2000 highly variable genes.
Logarithmized X.


  log1p(adata)


In [16]:
dt.Frame(pd.DataFrame(adata_gex.layers['counts'].toarray(), columns=adata_gex.var_names)).to_csv('multiome_neurips21_counts.txt')


adata_gex.obs_names.to_frame().to_csv('multiome_neurips21_cell.csv', index=None)
adata_gex.var_names.to_frame().to_csv('multiome_neurips21_gene.csv', index=None)

In [12]:
enc = OneHotEncoder(sparse=False).fit(adata_gex.obs['batch'].to_numpy().reshape(-1,1))
pd.DataFrame(enc.transform(adata_gex.obs['batch'].to_numpy().reshape(-1,1)), columns=enc.categories_).to_csv('multiome_neurips21_uwv.txt', index=False)