Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interpreting project_genes #66

Closed
kguion1 opened this issue Jul 7, 2022 · 4 comments
Closed

Interpreting project_genes #66

kguion1 opened this issue Jul 7, 2022 · 4 comments

Comments

@kguion1
Copy link

kguion1 commented Jul 7, 2022

Hi, Thank you for all your help in the running tangram!

For interpreting the project_genes results, I noticed that it returns a "spot-by-gene AnnData containing spatial gene expression from the single cell data." I am confused on how each spot has one measurement per gene, because I thought that the goal of deconvolution was to get to single cell resolution (which would be multiple cells per spot). If there were two cells in one spot and one cell had very high expression of a marker gene and the other cell had very low expression of the same gene, would the project_genes return a "medium" expression of that particular gene?

I did look at the figure from plot_genes_sc and noticed that there is a difference in the predicted vs measured, but am confused on what difference I am looking for and exactly what that represents. One of the figures is below.

Finally, I would like to use the single cell resolution to do some neighborhood analysis -- looking at which individual genes have spatial patterns, any pairs / groups of genes that have spatial patterns together, and then repeat with cell type / cluster (which cells have spatial patterns, any pairs / groups of cells that have spatial patterns together). I was looking through the squidpy tutorial and noticed that there was not any deconvolution in the pipeline. How would you recommend approaching this?

image

@hongruhu
Copy link

I have a similar question. It was written in the document that "the returned ad_ge is a spot-by-gene AnnData, similar to spatial data ad_sp, but where gene expression has been projected from the single cells.", but if we assume the spot is "bulk", then why wouldn't the ad_ge return the number of cell type by spot by gene?

Is there a way to get cell-type specific spot-by-gene AnnData? Thanks

@Hejin0701
Copy link
Collaborator

Hejin0701 commented Aug 13, 2022

hi @kguion1 Thank you for using Tangram and raising the issue. Below are the comments questions:

The project_gene gives the spot-by-gene Anndata. The gene at each spot gives the summation of gene expression of all cells times the probability of each cell being mapped to that spot. In your example, if there were two cells mapped to one spot with probability p1 and p2, one cell had expression of a marker gene g1 and the other cell had the expression of the same gene g2, the project_genes return a weighted sum of the expression of that particular gene (p1g1+p2g2).

It seems that for this specific gene, Tangram does not give a good prediction based on the figure as the predicted location of gene expression is different from the measured one. Normally we look at how well the prediction is compared to the measured gene expression when doing project_gene_sc. If the measured gene distribution is sparse while Tangram gives a prediction of uniform gene expression, sometimes it means Tangram helps to correct the low quality genes because Visium can suffer from measurement dropout. In your case, however, since the measured gene distribution is not sparse, and Tangram predicted result does not show good alignment, it is probable that the accuracy is predicted result is not great. You may want to change the training genes or denoise the single-cell/spatial data to see whether it helps to improve the prediction.

Cell-cell communication is also something we would like to investigate and add the function to Tangram. We are still developing the function to learn the colocalization relationship. For current Tangram version, if you have Visium data, Tangram will only give ad_map, where each row contains the probability of a cell mapped to each spot. You can do a dot product of two rows which will give you a colocalization score between two cells, which helps to quantifies how probable the two cells colocalize in the same spot. Then from the colocalization matrix between all the cell you should be able do some cell-cell interaction analysis.

Hope that helps!

@Hejin0701
Copy link
Collaborator

Hi @Hongru-Hu ,

Do you mean that you would like to have a 3D matrix which is spots by cell type by gene? We don't have a function to return that matrix directly in Tangram.

The ad_map returned from map_cell_to_space returns the cells by spots. If you want the cell type by spot, you can simply sums all the rows belonging to the same cell type and obtain the cell type by spots distribution. Furthermore, the ad_ge is actually obtained by multiplying transpose of ad_map with the ad_sc (single cell data: cell by genes). If you would like to obtain the the gene expression contributed by the specific cell type in every spot, you can extract a certain cell type in ad_map, take transpose and multiplying it with the gene expression of that cell type in ad_sc (extract the rows that belongs to the cell type for both ad_map and ad_sc).

Let me know if you have any further questions. Thank you!

@hongruhu
Copy link

@Hejin0701 I followed the steps and the results look pretty good! thanks for your reply.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants