Comprehensive mapping of tissue cell architecture via integrated single cell and spatial transcriptomics (cell2location model)
If you use cell2location please cite our paper:
Kleshchevnikov, V., Shmatko, A., Dann, E. et al. Cell2location maps fine-grained cell types in spatial transcriptomics. Nat Biotechnol (2022). https://doi.org/10.1038/s41587-021-01139-4 https://www.nature.com/articles/s41587-021-01139-4
Please note that cell2locations requires 2 user-provided hyperparameters (N_cells_per_location and detection_alpha) - for detailed guidance on setting these hyperparameters and their impact see the flow diagram and the note. Many real datasets (especially human) show within-slide variability in RNA detection sensitivity - requiring you to try both recommended settings of the detection_alpha
parameter: detection_alpha=200
for low within-slide technical variability and detection_alpha=20
for high within-slide technical variability.
Cell2location is a principled Bayesian model that can resolve fine-grained cell types in spatial transcriptomic data and create comprehensive cellular maps of diverse tissues. Cell2location accounts for technical sources of variation and borrows statistical strength across locations, thereby enabling the integration of single cell and spatial transcriptomics with higher sensitivity and resolution than existing tools. This is achieved by estimating which combination of cell types in which cell abundance could have given the mRNA counts in the spatial data, while modelling technical effects (platform/technology effect, contaminating RNA, unexplained variance).
Overview of the spatial mapping approach and the workflow enabled by cell2location. From left to right: Single-cell RNA-seq and spatial transcriptomics profiles are generated from the same tissue (1). Cell2location takes scRNA-seq derived cell type reference signatures and spatial transcriptomics data as input (2, 3). The model then decomposes spatially resolved multi-cell RNA counts matrices into the reference signatures, thereby establishing a spatial mapping of cell types (4).The tutorial covering the estimation of expresson signatures of reference cell types, spatial mapping with cell2location and the downstream analysis can be found here and tried on Google Colab: https://cell2location.readthedocs.io/en/latest/
Please report bugs via https://github.com/BayraktarLab/cell2location/issues and ask any usage questions about cell2location, scvi-tools or Visium data in scverse community discourse.
Cell2location package is implemented in a general way (using https://pyro.ai/ and https://scvi-tools.org/) to support multiple related models - both for spatial mapping, estimating reference cell type signatures and downstream analysis.
We suggest using a separate conda environment for installing cell2location.
Create conda environment and install cell2location
package
conda create -y -n cell2loc_env python=3.10
conda activate cell2loc_env
pip install cell2location[tutorials]
Finally, to use this environment in jupyter notebook, add jupyter kernel for this environment:
conda activate cell2loc_env
python -m ipykernel install --user --name=cell2loc_env --display-name='Environment (cell2loc_env)'
If you do not have conda please install Miniconda first:
cd /path/to/software
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
# use prefix /path/to/software/miniconda3
Before installing cell2location and it's dependencies, it could be necessary to make sure that you are creating a fully isolated conda environment by telling python to NOT use user site for installing packages by running this line before creating conda environment and every time before activatin conda environment in a new terminal session:
export PYTHONNOUSERSITE="literallyanyletters"
User documentation is availlable on https://cell2location.readthedocs.io/en/latest/.
Cell2location architecture is designed to simplify extended versions of the model that account for additional technical and biologial information. We plan to provide a tutorial showing how to add new model classes but please get in touch if you would like to contribute or build on top our package.
We thank all paper authors for their contributions: Vitalii Kleshchevnikov, Artem Shmatko, Emma Dann, Alexander Aivazidis, Hamish W King, Tong Li, Artem Lomakin, Veronika Kedlian, Mika Sarkin Jain, Jun Sung Park, Lauma Ramona, Liz Tuck, Anna Arutyunyan, Roser Vento-Tormo, Moritz Gerstung, Louisa James, Oliver Stegle, Omer Ali Bayraktar
We also thank Pyro developers (Fritz Obermeyer, Martin Jankowiak), Krzysztof Polanski, Luz Garcia Alonso, Carlos Talavera-Lopez, Ni Huang for feedback on the package, Martin Prete for dockerising cell2location and other software support.
See https://github.com/BayraktarLab/cell2location/discussions
Future developments of cell2location are focused on 1) scalability to 100k-mln+ locations using amortised inference of cell abundance (same ideas as used in VAE), 2) extending cell2location to related spatial analysis tasks that require modification of the model (such as using cell type hierarchy information), and 3) incorporating features presented by more recently proposed methods (such as CAR spatial proximity modelling). We are also experimenting with Numpyro and JAX (https://github.com/vitkl/cell2location_numpyro).
export PYTHONNOUSERSITE="True"
conda create -y -n cell2location_cuda118_torch22 python=3.10
conda activate cell2location_cuda118_torch22
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip3 install scvi-tools==1.1.2
pip install git+https://github.com/BayraktarLab/cell2location.git#egg=cell2location[tutorials,dev]
python -m ipykernel install --user --name=cell2location_cuda118_torch22 --display-name='Environment (cell2location_cuda118_torch22)'
Issues with package version mismatches often originate from python user site rather than conda environment being used to install a subset of packages
Before installing cell2location and it's dependencies, it could be necessary to make sure that you are creating a fully isolated conda environment by telling python to NOT use user site for installing packages by running this line before creating conda environment and every time before activatin conda environment in a new terminal session:
export PYTHONNOUSERSITE="True"
Keeping info on distinct sections in a csv file (Google Sheet).
sample_annot = pd.read_csv('./sample_annot.csv')
from glob import glob
sample_annot['path'] = pd.Series(
glob(f'{sp_data_folder}*'),
index=[sub('^.+WTSI_', '', sub('_GRCh38-2020-A$', '', i)) for i in glob(f'{sp_data_folder}*')]
)[sample_annot['Sample_ID']].values
import os
sample_annot['file'] = [os.path.basename(i) for i in sample_annot['path']]
sample_annot['Sample_ID'].unique()
Reading and concatenating samples.
def read_and_qc(sample_name, file, path=sp_data_folder):
"""
Read one Visium file and add minimum metadata and QC metrics to adata.obs
NOTE: var_names is ENSEMBL ID as it should be, you can always plot with sc.pl.scatter(gene_symbols='SYMBOL')
"""
adata = sc.read_visium(path + str(file) +'/',
count_file='filtered_feature_bc_matrix.h5',
load_images=True)
adata.obs['sample'] = sample_name
adata.var['SYMBOL'] = adata.var_names
adata.var.rename(columns={'gene_ids': 'ENSEMBL'}, inplace=True)
adata.var_names = adata.var['ENSEMBL']
adata.var.drop(columns='ENSEMBL', inplace=True)
# just in case there are non-unique ENSEMBL IDs
adata.var_names_make_unique()
# Calculate QC metrics
sc.pp.calculate_qc_metrics(adata, inplace=True)
adata.var['mt'] = [gene.startswith('mt-') for gene in adata.var['SYMBOL']]
adata.obs['mt_frac'] = adata[:, adata.var['mt'].tolist()].X.sum(1).A.squeeze()/adata.obs['total_counts']
# add sample name to obs names
adata.obs["sample"] = [str(i) for i in adata.obs['sample']]
adata.obs_names = 's' + adata.obs["sample"] \
+ '_' + adata.obs_names
adata.obs.index.name = 'spot_id'
file = list(adata.uns['spatial'].keys())[0]
adata.uns['spatial'][sample_name] = adata.uns['spatial'][file].copy()
del adata.uns['spatial'][file]
print(adata.uns['spatial'].keys())
return adata
def read_all_and_qc(
sample_annot, Sample_ID_col, file_col, sp_data_folder,
count_file='filtered_feature_bc_matrix.h5',
):
"""
Read and concatenate all Visium files.
"""
# read first sample
adata = read_and_qc(
sample_annot[Sample_ID_col][0], sample_annot[file_col][0],
path=sp_data_folder
)
# read the remaining samples
slides = {}
for i, s in enumerate(sample_annot[Sample_ID_col][1:]):
adata_1 = read_and_qc(s, sample_annot[file_col][i], path=sp_data_folder)
slides[str(s)] = adata_1
adata_0 = adata.copy()
# combine individual samples
#adata = adata.concatenate(list(slides.values()), index_unique=None)
adata = adata.concatenate(
list(slides.values()),
batch_key="sample",
uns_merge="unique",
batch_categories=sample_annot[Sample_ID_col],
index_unique=None
)
sample_annot.index = sample_annot[Sample_ID_col]
for c in sample_annot.columns:
sample_annot.loc[:, c] = sample_annot[c].astype(str)
adata.obs[sample_annot.columns] = sample_annot.reindex(index=adata.obs['sample']).values
return adata
adata = read_all_and_qc(
sample_annot=sample_annot,
Sample_ID_col='Sample_ID',
file_col='file',
sp_data_folder=sp_data_folder,
count_file='filtered_feature_bc_matrix.h5',
)
adata_incl_nontissue = read_all_and_qc(
sample_annot=sample_annot,
Sample_ID_col='Sample_ID',
file_col='file',
sp_data_folder=sp_data_folder,
count_file='raw_feature_bc_matrix.h5',
)
Since Version 0.9.0 (released on 2023-04-11), the function AnnData.concatenate()
has been deprecated in favour of anndata.concat()
as per the official release notes (Reference). Here is the updated code snippet of read_all_and_qc
:
from anndata import concat
def read_all_and_qc(
sample_annot, Sample_ID_col, file_col, sp_data_folder,
count_file='filtered_feature_bc_matrix.h5',
):
"""
Read and concatenate all Visium files.
"""
# read all samples and store them in a list
adatas = []
for i, s in enumerate(sample_annot[Sample_ID_col]):
adata_i = read_and_qc(s, Sample_ID_col[file_col][i], path=sp_data_folder)
adatas.append(adata_i)
# combine individual samples
adata = concat(
adatas,
merge="unique",
uns_merge="unique",
label="batch",
keys=sample_annot[Sample_ID_col].tolist(),
index_unique=None
)
sample_annot.index = sample_annot[Sample_ID_col]
for c in sample_annot.columns:
sample_annot.loc[:, c] = sample_annot[c].astype(str)
adata.obs[sample_annot.columns] = sample_annot.reindex(index=adata.obs['sample']).values
return adata
adata = read_all_and_qc(
sample_annot=sample_annot,
Sample_ID_col='Sample_ID',
file_col='file',
sp_data_folder=sp_data_folder,
count_file='filtered_feature_bc_matrix.h5',
)
cell2location.models.Cell2location.setup_anndata(
adata=adata_vis,
batch_key="batch")