Performance on Xenium/Merfish(Vizgen) - Single cell integration #97

mortunco · 2023-07-05T16:51:38Z

Hello,

First of all thank you very much for coming up with a detailed tutorial. It made my life easy.

I am trying to implement Tangram to integrate recent xenium and SC data.

Xenium data has 160k cells and 313 genes, SC data has 8k cells , 20k genes (after filtration)

After filtering low quality cells and normalisation (or without normalisation), I trained the model with %90 of the Xenium genes and used %10 to test model performance.

### train
ad_map = tg.map_cells_to_space(sc_adata, xen_adata_copy,
        mode="cells",
        #density_prior='rna_count_based', ### This is for visium.
        density_prior='uniform', #### this is for merfish.
        num_epochs=1000,
        device="cuda:0"
    )
### predict
ad_ge = tg.project_genes(adata_map=ad_map, adata_sc=sc_adata)

pred=ad_ge[:,test_genes].to_df()
truth=xen_adata[:,test_genes].to_df()

###
Further comparison and diagnostic plots.

Here you are seeing the best performing and the worst performing genes based on their difference to the Truth dataset.

With no depth norm

With depth norm,

I am trying to understand why Tangram is having problems.

Am I doing something wrong?
I am little bit suspicious about the section of your paper where it says # cells in SC data > cell/voxel/spot numbers in the spatial data. But in this case we see 160k cells from xenium vs 6-7k cells from SC data. Can this be a problem?
To calculate model prediction accuracy, we are calculating RMSE (test_pred_rmse=np.sqrt(((pred-truth) ** 2).mean())) where pred and truth are the expression matrix with same cell and gene order. Then we calculate the same for baseline as shown under. Finally we calculate a measure with (1- test_pred_rmse/test_baseline_rmse).

baseline=truth.mean(axis=0)
test_baseline_rmse=np.sqrt(((pred-baseline.values) ** 2).mean())

I am open to any guidance,

Best regards,

Tunc.

The text was updated successfully, but these errors were encountered:

HelloWorldLTY · 2023-10-03T23:44:06Z

Hi, I believe using count based data for training is more acceptable, that is because 1. Since there are missing genes in the spatial data, we cannot directly normalize the raw count spatial data. 2. Tangram does not have specific distribution modeling for input data.

But I think Xenium has large-scale spots and it is hard for me to place tangram in my gpu node. Do you use the cpu version to train your model? Thanks.

Hejin0701 · 2023-10-25T18:25:01Z

Hi @mortunco , one restriction of Tangram is that the cell type compositions between the scRNA-seq and spatial need to be similar. For Q2, how does the cell type composition compare between the sc and spatial? And another question is that how does the gene prediction behavior looks like overall? Can you help to plot the cos_sim vs sparsity of the gene as in the tutorial.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance on Xenium/Merfish(Vizgen) - Single cell integration #97

Performance on Xenium/Merfish(Vizgen) - Single cell integration #97

mortunco commented Jul 5, 2023

HelloWorldLTY commented Oct 3, 2023

Hejin0701 commented Oct 25, 2023

Performance on Xenium/Merfish(Vizgen) - Single cell integration #97

Performance on Xenium/Merfish(Vizgen) - Single cell integration #97

Comments

mortunco commented Jul 5, 2023

HelloWorldLTY commented Oct 3, 2023

Hejin0701 commented Oct 25, 2023