Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prior density #55

Closed
wangjiawen2013 opened this issue Apr 11, 2022 · 4 comments
Closed

prior density #55

wangjiawen2013 opened this issue Apr 11, 2022 · 4 comments

Comments

@wangjiawen2013
Copy link

Hi,
Thanks for you great work.
Now I'm trying do perform tangram with squidpy. I am confused about the density_prior function. It's said in the help document:
density_prior (str, ndarray or None): Spatial density of spots, when is a string, value can be 'rna_count_based' or 'uniform', when is a ndarray, shape = (numbe r_spots,). This array should satisfy the constraints sum() == 1. If None, the density term is ignored. Default value is 'rna_count_based'.
what is "density_prior" and how the rna_count_based density are computed (using the raw counts, normalized counts or scaled data) ? I got lots of negative values. In my case, the scaled data is my adata.X, and the log normalized data is in adata.raw and the raw counts were not stored in my anndata object, so I think the negative values could be caused by the lack of raw counts. And what does "spatial density" mean ? Is it the cell number in a spot ?

@wangjiawen2013
Copy link
Author

I have checked some chunk of code of tangram such as mapping utils.py. I am confused about the algorithm to compute prior density based on rna count. Because in the code, the adata.X was used. Seurat and Scanpy advice to scale the data and the scaled data is always stored in adata.X. This is what most seurat and scanpy user's do. In this case, the prior desity cannot be computed based on adata.X and that's why I got negative values. Furthermore, if the logarithmized normalized data are stored in adata.X, tangram still cannot get proper prior density, because the data has been already normalized and the sum of each voxel will all approximately equal ! And that's why the rna density almost the same in the results of tangram tutorial (https://tangram-sc.readthedocs.io/en/latest/tutorial_sq_link.html#). The following the how the density is computed according to tangram source code:
def pp_adatas(adata_sc, adata_sp, genes=None):
"""
Pre-process AnnDatas so that they can be mapped. Specifically:
- Remove genes that all entries are zero
- Find the intersection between adata_sc, adata_sp and given marker gene list, save the intersected markers in two adatas
- Calculate density priors and save it with adata_sp

Args:
    adata_sc (AnnData): single cell data
    adata_sp (AnnData): spatial expression data
    genes (List): Optional. List of genes to use. If `None`, all genes are used.

Returns:
    update adata_sc by creating `uns` `training_genes` `overlap_genes` fields 
    update adata_sp by creating `uns` `training_genes` `overlap_genes` fields and creating `obs` `rna_count_based_density` & `uniform_density` field
"""

# put all var index to lower case to align
adata_sc.var.index = [g.lower() for g in adata_sc.var.index]
adata_sp.var.index = [g.lower() for g in adata_sp.var.index]

adata_sc.var_names_make_unique()
adata_sp.var_names_make_unique()

# remove all-zero-valued genes
sc.pp.filter_genes(adata_sc, min_cells=1)
sc.pp.filter_genes(adata_sp, min_cells=1)

if genes is None:
    # Use all genes
    genes = [g.lower() for g in adata_sc.var.index]
else:
    genes = list(g.lower() for g in genes)

# Refine `marker_genes` so that they are shared by both adatas
genes = list(set(genes) & set(adata_sc.var.index) & set(adata_sp.var.index))
# logging.info(f"{len(genes)} shared marker genes.")

adata_sc.uns["training_genes"] = genes
adata_sp.uns["training_genes"] = genes
logging.info(
    "{} training genes are saved in `uns``training_genes` of both single cell and spatial Anndatas.".format(
        len(genes)
    )
)

# Find overlap genes between two AnnDatas
overlap_genes = list(set(adata_sc.var.index) & set(adata_sp.var.index))
# logging.info(f"{len(overlap_genes)} shared genes.")

adata_sc.uns["overlap_genes"] = overlap_genes
adata_sp.uns["overlap_genes"] = overlap_genes
logging.info(
    "{} overlapped genes are saved in `uns``overlap_genes` of both single cell and spatial Anndatas.".format(
        len(overlap_genes)
    )
)

# Calculate uniform density prior as 1/number_of_spots
adata_sp.obs["uniform_density"] = np.ones(adata_sp.X.shape[0]) / adata_sp.X.shape[0]
logging.info(
    f"uniform based density prior is calculated and saved in `obs``uniform_density` of the spatial Anndata."
)

# Calculate rna_count_based density prior as % of rna molecule count
rna_count_per_spot = np.array(adata_sp.X.sum(axis=1)).squeeze()
adata_sp.obs["rna_count_based_density"] = rna_count_per_spot / np.sum(rna_count_per_spot)
logging.info(
    f"rna count based density prior is calculated and saved in `obs``rna_count_based_density` of the spatial Anndata."
)

@wangjiawen2013
Copy link
Author

Besides, I am not sure if tangram's cluster expression works on scaled data. probably it works, if the single cell data and spatial transcriptome both use the scaled data.

@lewlin
Copy link
Collaborator

lewlin commented May 15, 2022

Sorry for the super late response - since I changed job my time to respond messages is very small.

The "density_prior" is your best estimate for the cell density within spatial voxels. For example, if you segment cells on histology for Visium, you know exactly how many cells you have per voxel (and that's your density). Note that you don't need the absolute number of cells, but simply the density (so overall multiplied by an arbitrary factor). If you work with MERFISH, density is uniform as you have one cell per voxel.

Typically, on Visium, we cannot really segment cells. So we assume that the cell density is proportional to the number of RNA molecules (works great, really). Use the rna_count_based to pass the RNA counts as a proxy for cell density.

As per negative values, let me ask somebody to review the code.

@gaddamshreya
Copy link
Collaborator

Hi @wangjiawen2013 Thank you for using Tangram! You're right about the source of the negative values i.e., lack of raw counts.
@Hejin0701 and I reviewed your issue and it is recommended to use raw counts for spatial data in Tangram.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants