Skip to content

Commit

Permalink
merge docu branch
Browse files Browse the repository at this point in the history
  • Loading branch information
gaddamshreya1 committed Nov 11, 2021
2 parents 2ce1df5 + 317c417 commit beaf176
Show file tree
Hide file tree
Showing 13 changed files with 1,747 additions and 1,597 deletions.
17 changes: 10 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

[![PyPI version](https://badge.fury.io/py/tangram-sc.svg)](https://badge.fury.io/py/tangram-sc)

Tangram is a Python package, written in [PyTorch](https://pytorch.org/) and based on [scanpy](https://scanpy.readthedocs.io/en/stable/), for mapping single-cell (or single-nucleus) gene expression data onto spatial gene expression data. The single-cell dataset and the spatial dataset should be collected from the same anatomical region/tissue type, ideally from a biological replicate, and need to share a set of genes. Tangram aligns the single-cell data in space by fitting gene expression on the shared genes. The best way to familiarize yourself with Tangram is to check out [our tutorial](https://github.com/broadinstitute/Tangram/blob/master/tangram_tutorial.ipynb). [![colab tutorial](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1SVLUIZR6Da6VUyvX_2RkgVxbPn8f62ge?usp=sharing)
Tangram is a Python package, written in [PyTorch](https://pytorch.org/) and based on [scanpy](https://scanpy.readthedocs.io/en/stable/), for mapping single-cell (or single-nucleus) gene expression data onto spatial gene expression data. The single-cell dataset and the spatial dataset should be collected from the same anatomical region/tissue type, ideally from a biological replicate, and need to share a set of genes. Tangram aligns the single-cell data in space by fitting gene expression on the shared genes. The best way to familiarize yourself with Tangram is to check out [our tutorial](https://github.com/broadinstitute/Tangram/blob/master/example/1_tutorial_tangram.ipynb). [![colab tutorial](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1gDmtiRN45OwCMu4n6l1uygQ_jIGe7NgJ)

![Tangram_overview](https://raw.githubusercontent.com/broadinstitute/Tangram/master/figures/tangram_overview.png)
Tangram has been tested on various types of transcriptomic data (10Xv3, Smart-seq2 and SHARE-seq for single cell data; MERFISH, Visium, Slide-seq, smFISH and STARmap as spatial data). In our [preprint](https://www.biorxiv.org/content/10.1101/2020.08.29.272831v1), we used Tangram to reveal spatial maps of cell types and gene expression at single cell resolution in the adult mouse brain. More recently, we have applied our method to different tissue types including human lung, human kidney developmental mouse brain and metastatic breast cancer.
Expand All @@ -21,11 +21,16 @@ Tangram has been tested on various types of transcriptomic data (10Xv3, Smart-se

To install Tangram, make sure you have [PyTorch](https://pytorch.org/) and [scanpy](https://scanpy.readthedocs.io/en/stable/) installed. If you need more details on the dependences, look at the `environment.yml` file.

* set up conda environment for Tangram
```
conda env create -f environment.yml
```
* install tangram-sc from shell:
```
conda activate tangram-env
pip install tangram-sc
```
* import tangram
* To start using Tangram, import tangram in your jupyter notebooks or/and scripts
```
import tangram as tg
```
Expand All @@ -52,7 +57,7 @@ The returned AnnData,`ad_map`, is a cell-by-voxel structure where `ad_map.X[i, j

The returned `ad_ge` is a voxel-by-gene AnnData, similar to spatial data `ad_sp`, but where gene expression has been projected from the single cells. This allows to extend gene throughput, or correct for dropouts, if the single cells have higher quality (or more genes) than single cell data. It can also be used to transfer cell types onto space.

For more details on how to use Tangram check out [our tutorial](https://github.com/broadinstitute/Tangram/blob/master/tangram_tutorial.ipynb). [![colab tutorial](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1SVLUIZR6Da6VUyvX_2RkgVxbPn8f62ge?usp=sharing)
For more details on how to use Tangram check out [our tutorial](https://github.com/broadinstitute/Tangram/blob/master/example/1_tutorial_tangram.ipynb). [![colab tutorial](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1SVLUIZR6Da6VUyvX_2RkgVxbPn8f62ge?usp=sharing)

***

Expand Down Expand Up @@ -111,9 +116,6 @@ You do not need to segment cells in your histology for mapping on spatial transc
#### I run out of memory when I map: what should I do?
Reduce your spatial data in various parts and map each single part. If that is not sufficient, you will need to downsample your single cell data as well.

#### How to use Tangram with Squidpy?
For tutorial, please reference the example [here](https://github.com/broadinstitute/Tangram/blob/master/tutorial_sq_tangram.ipynb). For environment setup, please use squidpy=1.1.0 and reference this [yml file](https://github.com/broadinstitute/Tangram/blob/master/environment.yml).

***
## How to cite Tangram
Tangram has been released in the following publication
Expand All @@ -127,6 +129,7 @@ If you have questions, please contact the authors of the method:
PyPI maintainer:
- Tommaso Biancalani - <biancalt@gene.com>
- Ziqing Lu - <luz21@gene.com>
- Shreya Gaddam - <gaddams@gene.com>

The artwork has been curated by:
- Anna Hupalowska <ahupalow@broadinstitute.org>
- Anna Hupalowska <ahupalow@broadinstitute.org>
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
tangram.plot\_utils.plot\_cell\_annotation\_sc
==============================================

.. currentmodule:: tangram.plot_utils

.. autofunction:: plot_cell_annotation_sc
6 changes: 6 additions & 0 deletions docs/source/classes/tangram.plot_utils.plot_genes_sc.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
tangram.plot\_utils.plot\_genes\_sc
===================================

.. currentmodule:: tangram.plot_utils

.. autofunction:: plot_genes_sc
4 changes: 4 additions & 0 deletions docs/source/classes/tangram.plot_utils.rst
Original file line number Diff line number Diff line change
Expand Up @@ -27,10 +27,14 @@

plot_cell_annotation

plot_cell_annotation_sc

plot_gene_sparsity

plot_genes

plot_genes_sc

plot_test_scores

plot_training_scores
Expand Down
7 changes: 6 additions & 1 deletion docs/source/getting_started.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,13 @@ Cell Level
**************************
To install Tangram, make sure you have `PyTorch <https://pytorch.org/>`_ and `scanpy <https://scanpy.readthedocs.io/en/stable/>`_ installed. If you need more details on the dependences, look at the `environment.yml <https://github.com/broadinstitute/Tangram/blob/master/environment.yml>`_ file.

Create a conda environment for Tangram::

conda env create --file environment.yml

Install tangram-sc from shell::

conda activate tangram-env
pip install tangram-sc
Import tangram::
Expand Down
6 changes: 5 additions & 1 deletion docs/source/news.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,8 @@ Tangram News

- On Jan 28th 2021, Sten Linnarsson gave a `talk <https://www.youtube.com/watch?v=0mxIe2AsSKs>`_ at the WWNDev Forum and demostrated their mappings of the developmental mouse brain using Tangram.

- On Mar 9th 2021, Nicholas Eagles wrote a `blog post <http://research.libd.org/rstatsclub/2021/03/09/lessons-learned-applying-tangram-on-visium-data/#.YPsZphNKhb->`_ about applying Tangram on Visium data.
- On Mar 9th 2021, Nicholas Eagles wrote a `blog post <http://research.libd.org/rstatsclub/2021/03/09/lessons-learned-applying-tangram-on-visium-data/#.YPsZphNKhb->`_ about applying Tangram on Visium data.

- The Tangram method has been used by our colleagues at Harvard and Broad Institute, to map cell types for the developmental mouse brain -see Fig. 2 (`Nature(2021) <https://www.nature.com/articles/s41586-021-03670-5>`_ )

- Tangram is now officially a part of `Squidpy <https://squidpy.readthedocs.io/en/stable/index.html>`_
4 changes: 2 additions & 2 deletions docs/source/working.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@ Tangram Under the Hood
===========================

Tangram instantiates a `Mapper` object passing the following arguments:
* _S_: single cell matrix with shape cell-by-gene. Note that genes is the number of training genes.
* _G_: spatial data matrix with shape voxels-by-genes. Voxel can contain multiple cells.
| _S_: single cell matrix with shape cell-by-gene. Note that genes is the number of training genes.
| _G_: spatial data matrix with shape voxels-by-genes. Voxel can contain multiple cells.
Then, Tangram searches for a mapping matrix *M*, with shape voxels-by-cells, where the element *M\_ij* signifies the probability of cell *i* of being in spot *j*. Tangram computes the matrix *M* by minimizing the following loss:

Expand Down
3 changes: 2 additions & 1 deletion environment.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
name: tangram-env
dependencies:
- python=3.8.5
- python>=3.8.5
- pip=20.2.2
- pytorch=1.4.0
- scipy=1.5.2
Expand Down
9 changes: 3 additions & 6 deletions tangram/mapping_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -78,21 +78,18 @@ def pp_adatas(adata_sc, adata_sp, genes=None):
)

# Calculate uniform density prior as 1/number_of_spots
rna_count_per_spot = adata_sp.X.sum(axis=1)
adata_sp.obs["uniform_density"] = np.ones(adata_sp.X.shape[0]) / adata_sp.X.shape[0]
logging.info(
f"uniform based density prior is calculated and saved in `obs``uniform_density` of the spatial Anndata."
)

# Calculate rna_count_based density prior as % of rna molecule count
rna_count_per_spot = adata_sp.X.sum(axis=1)
adata_sp.obs["rna_count_based_density"] = rna_count_per_spot / np.sum(
rna_count_per_spot
)
rna_count_per_spot = np.array(adata_sp.X.sum(axis=1)).squeeze()
adata_sp.obs["rna_count_based_density"] = rna_count_per_spot / np.sum(rna_count_per_spot)
logging.info(
f"rna count based density prior is calculated and saved in `obs``rna_count_based_density` of the spatial Anndata."
)


def adata_to_cluster_expression(adata, cluster_label, scale=True, add_density=True):
"""
Expand Down
68 changes: 58 additions & 10 deletions tangram/plot_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -172,20 +172,41 @@ def construct_obs_plot(df_plot, adata, perc=0, suffix=None):
adata.obs = pd.concat([adata.obs, df_plot], axis=1)


def plot_cell_annotation_sc(adata_sp, annotation_list, perc=0):

def plot_cell_annotation_sc(
adata_sp,
annotation_list,
x="x",
y="y",
spot_size=None,
scale_factor=0.1,
perc=0,
ax=None
):

# remove previous df_plot in obs
adata_sp.obs.drop(annotation_list, inplace=True, errors="ignore", axis=1)

# construct df_plot
df = adata_sp.obsm["tangram_ct_pred"][annotation_list]
construct_obs_plot(df, adata_sp, perc=perc)


#non visium data
if 'spatial' not in adata_sp.obsm.keys():
#add spatial coordinates to obsm of spatial data
coords = [[x,y] for x,y in zip(adata_sp.obs[x].values,adata_sp.obs[y].values)]
adata_sp.obsm['spatial'] = np.array(coords)

if 'spatial' not in adata_sp.uns.keys() and spot_size == None and scale_factor == None:
raise ValueError("Spot Size and Scale Factor cannot be None when ad_sp.uns['spatial'] does not exist")

#REVIEW
if 'spatial' in adata_sp.uns.keys() and spot_size != None and scale_factor != None:
raise ValueError("Spot Size and Scale Factor should be None when ad_sp.uns['spatial'] exists")

sc.pl.spatial(
adata_sp, color=annotation_list, cmap="viridis", show=False, frameon=False,
adata_sp, color=annotation_list, cmap="viridis", show=False, frameon=False, spot_size=spot_size, scale_factor=scale_factor, ax=ax
)

# remove df_plot in obs
adata_sp.obs.drop(annotation_list, inplace=True, errors="ignore", axis=1)


Expand Down Expand Up @@ -289,7 +310,18 @@ def plot_cell_annotation(
fig.suptitle(annotation)


def plot_genes_sc(genes, adata_measured, adata_predicted, cmap="inferno", perc=0):
def plot_genes_sc(
genes,
adata_measured,
adata_predicted,
x="x",
y = "y",
spot_size=None,
scale_factor=0.1,
cmap="inferno",
perc=0,
return_figure=False
):

# remove df_plot in obs
adata_measured.obs.drop(
Expand Down Expand Up @@ -350,11 +382,24 @@ def plot_genes_sc(genes, adata_measured, adata_predicted, cmap="inferno", perc=0

fig = plt.figure(figsize=(7, len(genes) * 3.5))
gs = GridSpec(len(genes), 2, figure=fig)

#non visium data
if 'spatial' not in adata_measured.obsm.keys():
#add spatial coordinates to obsm of spatial data
coords = [[x,y] for x,y in zip(adata_measured.obs[x].values,adata_measured.obs[y].values)]
adata_measured.obsm['spatial'] = np.array(coords)
coords = [[x,y] for x,y in zip(adata_predicted.obs[x].values,adata_predicted.obs[y].values)]
adata_predicted.obsm['spatial'] = np.array(coords)

if ("spatial" not in adata_measured.uns.keys()) and (spot_size==None and scale_factor==None):
raise ValueError("Spot Size and Scale Factor cannot be None when ad_sp.uns['spatial'] does not exist")

for ix, gene in enumerate(genes):

ax_m = fig.add_subplot(gs[ix, 0])
sc.pl.spatial(
adata_measured,
spot_size=spot_size,
scale_factor=scale_factor,
color=["{} (measured)".format(gene)],
frameon=False,
ax=ax_m,
Expand All @@ -364,13 +409,15 @@ def plot_genes_sc(genes, adata_measured, adata_predicted, cmap="inferno", perc=0
ax_p = fig.add_subplot(gs[ix, 1])
sc.pl.spatial(
adata_predicted,
spot_size=spot_size,
scale_factor=scale_factor,
color=["{} (predicted)".format(gene)],
frameon=False,
ax=ax_p,
show=False,
cmap=cmap,
)

# sc.pl.spatial(adata_measured, color=['{} (measured)'.format(gene) for gene in genes], frameon=False)
# sc.pl.spatial(adata_predicted, color=['{} (predicted)'.format(gene) for gene in genes], frameon=False)

Expand All @@ -387,6 +434,8 @@ def plot_genes_sc(genes, adata_measured, adata_predicted, cmap="inferno", perc=0
errors="ignore",
axis=1,
)
if return_figure==True:
return fig


def plot_genes(
Expand Down Expand Up @@ -631,8 +680,7 @@ def plot_auc(df_all_genes, test_genes=None):
textstr = 'auc_score={}'.format(np.round(metric_dict['auc_score'], 3))
props = dict(boxstyle='round', facecolor='wheat', alpha=0.3)
# place a text box in upper left in axes coords
plt.text(0.03, 0.1, textstr, fontsize=11,
verticalalignment='top', bbox=props);
plt.text(0.03, 0.1, textstr, fontsize=11, verticalalignment='top', bbox=props);


# Colors used in the manuscript for deterministic assignment.
Expand Down
20 changes: 10 additions & 10 deletions tangram/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -73,13 +73,13 @@ def get_matched_genes(prior_genes_names, sn_genes_names, excluded_genes=None):
prior_genes_names (sequence): List of gene names in the spatial data.
sn_genes_names (sequence): List of gene names in the single nuclei data.
excluded_genes (sequence): Optional. List of genes to be excluded. These genes are excluded even if present in both datasets.
If None, no genes are excluded. Default is None.
If None, no genes are excluded. Default is None.
Returns:
A tuple (mask_prior_indices, mask_sn_indices, selected_genes), with:
mask_prior_indices (list): List of indices for the selected genes in 'prior_genes_names'.
mask_sn_indices (list): List of indices for the selected genes in 'sn_genes_names'.
selected_genes (list): List of names of the selected genes.
mask_prior_indices (list): List of indices for the selected genes in 'prior_genes_names'.
mask_sn_indices (list): List of indices for the selected genes in 'sn_genes_names'.
selected_genes (list): List of names of the selected genes.
For each i, selected_genes[i] = prior_genes_names[mask_prior_indices[i]] = sn_genes_names[mask_sn_indices[i].
"""
prior_genes_names = np.array(prior_genes_names)
Expand Down Expand Up @@ -115,8 +115,8 @@ def one_hot_encoding(l, keep_aggregate=False):
Returns:
A DataFrame with a column for each unique value in the sequence and a one-hot-encoding, and an additional
column with the input list if 'keep_aggregate' is True.
The number of rows are equal to len(l).
column with the input list if 'keep_aggregate' is True.
The number of rows are equal to len(l).
"""
df_enriched = pd.DataFrame({"cl": l})
for i in l.unique():
Expand All @@ -137,7 +137,7 @@ def project_cell_annotations(
adata_sp (AnnData): spatial data used to save the mapping result.
annotation (str): Optional. Cell annotations matrix with shape (number_cells, number_annotations). Default is 'cell_type'.
threshold (float): Optional. Valid for using with adata_map.obs['F_out'] from 'constrained' mode mapping.
Cell's probability below this threshold will be dropped. Default is 0.5.
Cell's probability below this threshold will be dropped. Default is 0.5.
Returns:
None.
Update spatial Anndata by creating `obsm` `tangram_ct_pred` field with a dataframe with spatial prediction for each annotation (number_spots, number_annotations)
Expand Down Expand Up @@ -797,10 +797,10 @@ def df_to_cell_types(df, cell_types):
Args:
df (DataFrame): Columns correspond to cell types. Each row in the DataFrame corresponds to a voxel and
specifies the known number of cells in that voxel for each cell type (int).
The additional column 'centroids' specifies the coordinates of the cells in the voxel (sequence of (x,y) pairs).
specifies the known number of cells in that voxel for each cell type (int).
The additional column 'centroids' specifies the coordinates of the cells in the voxel (sequence of (x,y) pairs).
cell_types (sequence): Sequence of cell type names to be considered for deconvolution.
Columns in 'df' not included in 'cell_types' are ignored for assignment.
Columns in 'df' not included in 'cell_types' are ignored for assignment.
Returns:
A dictionary <cell type name> -> <list of (x,y) coordinates for the cell type>
Expand Down

0 comments on commit beaf176

Please sign in to comment.