### Data visualization for deconvolution results for seq-based Spatial Transcriptomics datasets

Last updated: 10/03/2022
Author: Yang-Joon Kim

seq-based Spatial Transcriptomics datasets might require deconvolution to dissect the composition of cell-types from the minimal spatial measurement unit (termed Spots, beads, pixels, etc.). The result resembles the structure of count matrices, Cell x Gene, in fact, Spots x Cell-Types (normalized proportion of cell-types).

Thus, we can actually create this Spots x Cell-Types matrix purely for data visualization purposes. 
- adata.X : Spots x Cell-types
- adata.obs : Spots metadata
- adata.var : cell-type metadata (for example, classes of cell ontologies, T cells, B cells, etc.)
- adata.obsm["spatial"] : spatial coordinates. An array of tuples (x,y) coordinates.

Then, we can compute DE, where the cell-type proportions can be interpreted as if they were normalized counts for each "gene".

In [20]:
import numpy as np
import pandas as pd
import anndata as ad
import scanpy as sc
# import squidpy as sq
# import os


In [2]:
df_deconv = pd.read_csv("/mnt/ibm_lg/spatial-seq/SlideSeq/data/deconvolution/TSP14_liver_slideseq_run1_postQC_RCTD/RCTD/results_df.csv")
df_deconv

Unnamed: 0.1,Unnamed: 0,spot_class,first_type,second_type,first_class,second_class,min_score,singlet_score,conv_all,conv_doublet
0,GAGAACCCCTTCCC,doublet_uncertain,hepatocyte,endothelial cell,False,False,1039.736068,1089.177303,True,True
1,TTAGCTGACGAATT,doublet_certain,hepatocyte,endothelial cell of hepatic sinusoid,False,False,990.514712,1079.264097,True,True
2,ATAATACGACTTGA,doublet_uncertain,hepatocyte,endothelial cell,False,False,989.307464,1017.758316,True,True
3,CGACGGTTAAGACG,doublet_uncertain,hepatocyte,fibroblast,False,False,1013.319616,1064.101606,True,True
4,TCATGCCTAGCACT,doublet_certain,hepatocyte,endothelial cell of hepatic sinusoid,False,False,1081.016320,1106.569472,True,True
...,...,...,...,...,...,...,...,...,...,...
53581,AACTCACAAGGAGC,singlet,hepatocyte,endothelial cell of hepatic sinusoid,False,False,150.919860,158.783137,True,True
53582,GATTAGGGAAATAC,singlet,hepatocyte,macrophage,False,False,138.284512,140.685767,True,True
53583,AGTGCCGAGGAAAG,singlet,hepatocyte,neutrophil,False,False,134.685434,135.763488,True,True
53584,CCTAATGTATAGGG,singlet,hepatocyte,endothelial cell,False,False,116.946792,116.928451,True,True


In [4]:
df_deconv[df_deconv["spot_class"]=="singlet"]

Unnamed: 0.1,Unnamed: 0,spot_class,first_type,second_type,first_class,second_class,min_score,singlet_score,conv_all,conv_doublet
6,GCAGGTGAACGCAT,singlet,hepatocyte,intrahepatic cholangiocyte,False,False,996.516856,1008.655593,True,True
19,TTCAGACAGTTCAC,singlet,hepatocyte,endothelial cell,False,False,951.861416,969.180047,True,True
25,TACCTGGTTCTTTT,singlet,hepatocyte,fibroblast,False,False,1102.182014,1120.346439,True,True
27,CCGAGCACGTGCGC,singlet,hepatocyte,monocyte,False,False,946.558674,971.059118,True,True
28,CCAGCCCGTTCCAT,singlet,hepatocyte,endothelial cell,False,False,950.949007,971.355950,True,True
...,...,...,...,...,...,...,...,...,...,...
53581,AACTCACAAGGAGC,singlet,hepatocyte,endothelial cell of hepatic sinusoid,False,False,150.919860,158.783137,True,True
53582,GATTAGGGAAATAC,singlet,hepatocyte,macrophage,False,False,138.284512,140.685767,True,True
53583,AGTGCCGAGGAAAG,singlet,hepatocyte,neutrophil,False,False,134.685434,135.763488,True,True
53584,CCTAATGTATAGGG,singlet,hepatocyte,endothelial cell,False,False,116.946792,116.928451,True,True


In [6]:
df_deconv[df_deconv["spot_class"]=="doublet_uncertain"]

Unnamed: 0.1,Unnamed: 0,spot_class,first_type,second_type,first_class,second_class,min_score,singlet_score,conv_all,conv_doublet
0,GAGAACCCCTTCCC,doublet_uncertain,hepatocyte,endothelial cell,False,False,1039.736068,1089.177303,True,True
2,ATAATACGACTTGA,doublet_uncertain,hepatocyte,endothelial cell,False,False,989.307464,1017.758316,True,True
3,CGACGGTTAAGACG,doublet_uncertain,hepatocyte,fibroblast,False,False,1013.319616,1064.101606,True,True
5,TGCGATTTGAGATT,doublet_uncertain,hepatocyte,endothelial cell of hepatic sinusoid,False,False,1032.079272,1090.574031,True,True
9,CTAGCTCGTGAACA,doublet_uncertain,hepatocyte,endothelial cell,False,False,1069.784518,1150.788819,True,True
...,...,...,...,...,...,...,...,...,...,...
52828,AAAATAATCCATTG,doublet_uncertain,hepatocyte,fibroblast,False,False,193.180644,248.441964,True,True
52968,ACGTGGAATTTAGC,doublet_uncertain,hepatocyte,endothelial cell,False,False,180.619590,209.707058,True,True
53173,TGGCCATGAGTCCG,doublet_uncertain,hepatocyte,fibroblast,False,False,170.104469,201.572394,True,True
53202,CACCTACTATACAG,doublet_uncertain,plasma cell,hepatocyte,False,False,143.103403,169.848571,True,True


In [5]:
df_deconv[df_deconv["spot_class"]=="doublet_certain"]

Unnamed: 0.1,Unnamed: 0,spot_class,first_type,second_type,first_class,second_class,min_score,singlet_score,conv_all,conv_doublet
1,TTAGCTGACGAATT,doublet_certain,hepatocyte,endothelial cell of hepatic sinusoid,False,False,990.514712,1079.264097,True,True
4,TCATGCCTAGCACT,doublet_certain,hepatocyte,endothelial cell of hepatic sinusoid,False,False,1081.016320,1106.569472,True,True
7,ATAGCCTGGTGCCA,doublet_certain,hepatocyte,endothelial cell of hepatic sinusoid,False,False,938.720447,979.342556,True,True
8,AACAATTTGTAGGC,doublet_certain,hepatocyte,endothelial cell of hepatic sinusoid,False,False,939.193437,965.715740,True,True
15,CAGATCGTTGCAGC,doublet_certain,hepatocyte,fibroblast,False,False,1097.843012,1153.720541,True,True
...,...,...,...,...,...,...,...,...,...,...
51420,TTGTACCAACACCA,doublet_certain,hepatocyte,fibroblast,False,False,169.420589,194.623059,True,True
51701,CTTCTTTTAAATAC,doublet_certain,hepatocyte,fibroblast,False,False,193.145801,240.920272,True,True
52472,TCAAACAAGACTTT,doublet_certain,hepatocyte,fibroblast,False,False,156.860265,186.438100,True,True
53043,TAATAGAGACAACC,doublet_certain,hepatocyte,endothelial cell of hepatic sinusoid,False,False,174.146103,203.307716,True,True


### Read the result file from the Deconvolution

In [40]:
df_result = pd.read_csv("/mnt/ibm_lg/spatial-seq/SlideSeq/data/deconvolution/TSP14_liver_slideseq_run1_postQC_RCTD/RCTD/result.csv",
                        index_col="Unnamed: 0")
df_result

Unnamed: 0,endothelial.cell,endothelial.cell.of.hepatic.sinusoid,erythrocyte,fibroblast,hepatocyte,intrahepatic.cholangiocyte,liver.dendritic.cell,macrophage,monocyte,neutrophil,nk.cell,plasma.cell,t.cell
GAGAACCCCTTCCC,0.018548,0.018317,0.000003,0.008469,0.896195,0.003591,0.000003,0.018051,0.000003,0.000003,0.036814,0.000003,0.000003
TTAGCTGACGAATT,0.000124,0.064968,0.004694,0.017610,0.899758,0.000002,0.000002,0.000002,0.011090,0.001741,0.000002,0.000002,0.000002
ATAATACGACTTGA,0.024079,0.014830,0.000003,0.021054,0.940007,0.000009,0.000003,0.000003,0.000003,0.000003,0.000003,0.000003,0.000003
CGACGGTTAAGACG,0.023888,0.024635,0.000003,0.025867,0.905598,0.019991,0.000003,0.000003,0.000003,0.000003,0.000003,0.000003,0.000003
TCATGCCTAGCACT,0.009518,0.047754,0.004553,0.000003,0.885557,0.027129,0.000003,0.000003,0.000003,0.003535,0.021938,0.000003,0.000003
...,...,...,...,...,...,...,...,...,...,...,...,...,...
AACTCACAAGGAGC,0.055637,0.180253,0.081723,0.000019,0.682154,0.000083,0.000019,0.000019,0.000019,0.000019,0.000019,0.000019,0.000019
GATTAGGGAAATAC,0.000014,0.021220,0.000014,0.000014,0.873193,0.000270,0.000014,0.105189,0.000014,0.000014,0.000014,0.000014,0.000014
AGTGCCGAGGAAAG,0.068116,0.000015,0.000015,0.000034,0.853761,0.005820,0.000015,0.000015,0.000015,0.072147,0.000015,0.000015,0.000015
CCTAATGTATAGGG,0.000017,0.000017,0.000017,0.000017,0.999797,0.000017,0.000017,0.000017,0.000017,0.000017,0.000017,0.000017,0.000017


In [19]:
# check if the cell-type proportions are normalized
np.sum(df_result, axis=1)

  return reduction(axis=axis, out=out, **passkwargs)


cell_id
GAGAACCCCTTCCC    1.0
TTAGCTGACGAATT    1.0
ATAATACGACTTGA    1.0
CGACGGTTAAGACG    1.0
TCATGCCTAGCACT    1.0
                 ... 
AACTCACAAGGAGC    1.0
GATTAGGGAAATAC    1.0
AGTGCCGAGGAAAG    1.0
CCTAATGTATAGGG    1.0
AACCAGAATCTACC    1.0
Length: 53586, dtype: float64

In [46]:
df_result.iloc[:,:]

Unnamed: 0,endothelial.cell,endothelial.cell.of.hepatic.sinusoid,erythrocyte,fibroblast,hepatocyte,intrahepatic.cholangiocyte,liver.dendritic.cell,macrophage,monocyte,neutrophil,nk.cell,plasma.cell,t.cell
GAGAACCCCTTCCC,0.018548,0.018317,0.000003,0.008469,0.896195,0.003591,0.000003,0.018051,0.000003,0.000003,0.036814,0.000003,0.000003
TTAGCTGACGAATT,0.000124,0.064968,0.004694,0.017610,0.899758,0.000002,0.000002,0.000002,0.011090,0.001741,0.000002,0.000002,0.000002
ATAATACGACTTGA,0.024079,0.014830,0.000003,0.021054,0.940007,0.000009,0.000003,0.000003,0.000003,0.000003,0.000003,0.000003,0.000003
CGACGGTTAAGACG,0.023888,0.024635,0.000003,0.025867,0.905598,0.019991,0.000003,0.000003,0.000003,0.000003,0.000003,0.000003,0.000003
TCATGCCTAGCACT,0.009518,0.047754,0.004553,0.000003,0.885557,0.027129,0.000003,0.000003,0.000003,0.003535,0.021938,0.000003,0.000003
...,...,...,...,...,...,...,...,...,...,...,...,...,...
AACTCACAAGGAGC,0.055637,0.180253,0.081723,0.000019,0.682154,0.000083,0.000019,0.000019,0.000019,0.000019,0.000019,0.000019,0.000019
GATTAGGGAAATAC,0.000014,0.021220,0.000014,0.000014,0.873193,0.000270,0.000014,0.105189,0.000014,0.000014,0.000014,0.000014,0.000014
AGTGCCGAGGAAAG,0.068116,0.000015,0.000015,0.000034,0.853761,0.005820,0.000015,0.000015,0.000015,0.072147,0.000015,0.000015,0.000015
CCTAATGTATAGGG,0.000017,0.000017,0.000017,0.000017,0.999797,0.000017,0.000017,0.000017,0.000017,0.000017,0.000017,0.000017,0.000017


In [61]:
np.shape(df_result.to_numpy())

(53586, 13)

In [63]:
df_np_array = df_result.to_numpy()
df_np_array

array([[1.85476894e-02, 1.83165311e-02, 2.57051855e-06, ...,
        3.68144194e-02, 2.57052148e-06, 2.57051827e-06],
       [1.24493044e-04, 6.49682128e-02, 4.69362555e-03, ...,
        2.48239413e-06, 2.48239556e-06, 2.48239424e-06],
       [2.40791524e-02, 1.48302529e-02, 2.65074964e-06, ...,
        2.65074780e-06, 2.65075083e-06, 2.65074780e-06],
       ...,
       [6.81155296e-02, 1.53764335e-05, 1.53764333e-05, ...,
        1.53764331e-05, 1.53764337e-05, 1.53764336e-05],
       [1.68874142e-05, 1.68820249e-05, 1.68900702e-05, ...,
        1.68874126e-05, 1.68874145e-05, 1.68874125e-05],
       [2.34176818e-05, 2.34176815e-05, 2.34176817e-05, ...,
        2.34176815e-05, 2.34176805e-05, 2.70687887e-05]])

In [72]:
# create an AnnData from the above dataframe
adata_deconv = ad.AnnData(X=df_np_array)
adata_deconv.obs_names = df_result.index
adata_deconv.var_names = df_result.columns
adata_deconv

AnnData object with n_obs × n_vars = 53586 × 13

In [None]:
adata_deconv.

In [77]:
# Load the original anndata for spatial coordinates
# Note that we should match the cell_id indices here
adata = sc.read_h5ad("/mnt/ibm_lg/spatial-seq/SlideSeq/data/annotated_data/TabulaSapiens/slideseq_TSP14_liver_run1_min_counts_30_min_genes_30_umap_moranI.h5ad")
adata

AnnData object with n_obs × n_vars = 61171 × 18505
    obs: 'sample', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'n_counts', 'n_genes', 'leiden'
    var: 'mt', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts', 'n_cells', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'mean', 'std'
    uns: 'hvg', 'leiden', 'leiden_colors', 'moranI', 'neighbors', 'pca', 'sample_colors', 'spatial_neighbors', 'umap'
    obsm: 'X_pca', 'X_spatial', 'X_umap', 'spatial'
    varm: 'PCs'
    layers: 'counts'
    obsp: 'connectivities', 'distances', 'spatial_connectivities', 'spatial_distances'

In [78]:
# subset for the "beads" that are in the deconvolution object
adata_subset = adata[adata.obs_names.isin(adata_deconv.obs_names)]
adata_subset

View of AnnData object with n_obs × n_vars = 53586 × 18505
    obs: 'sample', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'n_counts', 'n_genes', 'leiden'
    var: 'mt', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts', 'n_cells', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'mean', 'std'
    uns: 'hvg', 'leiden', 'leiden_colors', 'moranI', 'neighbors', 'pca', 'sample_colors', 'spatial_neighbors', 'umap'
    obsm: 'X_pca', 'X_spatial', 'X_umap', 'spatial'
    varm: 'PCs'
    layers: 'counts'
    obsp: 'connectivities', 'distances', 'spatial_connectivities', 'spatial_distances'

In [80]:
adata_subset.obs

Unnamed: 0,sample,n_genes_by_counts,total_counts,total_counts_mt,pct_counts_mt,n_counts,n_genes,leiden
GAGAACCCCTTCCC,TSP14_liver,995,3141.0,165.0,5.253104,3141.0,995,35
TTAGCTGACGAATT,TSP14_liver,972,3108.0,197.0,6.338481,3108.0,972,38
ATAATACGACTTGA,TSP14_liver,880,3149.0,179.0,5.684344,3149.0,880,28
CGACGGTTAAGACG,TSP14_liver,969,3140.0,128.0,4.076433,3140.0,969,29
TCATGCCTAGCACT,TSP14_liver,1006,3160.0,212.0,6.708860,3160.0,1006,25
...,...,...,...,...,...,...,...,...
AACTCACAAGGAGC,TSP14_liver,65,104.0,8.0,7.692308,104.0,65,5
GATTAGGGAAATAC,TSP14_liver,68,123.0,13.0,10.569106,123.0,68,1
AGTGCCGAGGAAAG,TSP14_liver,71,112.0,12.0,10.714286,112.0,71,59
CCTAATGTATAGGG,TSP14_liver,68,100.0,2.0,2.000000,100.0,68,4


In [82]:
# Add the spatial and UMAP coordinates to 
adata_deconv.obsm["X_spatial"] = adata_subset.obsm["X_spatial"]
adata_deconv.obsm["X_umap"] = adata_subset.obsm["X_umap"]

In [84]:
# save the anndata with "beads x cell-types" with spatial info
adata_deconv.write_h5ad("/mnt/ibm_lg/spatial-seq/SlideSeq/data/deconvolution/TSP14_liver_slideseq_run1_postQC_RCTD/RCTD/TSP14_liver_slideseq_run1_postQC_RCTD_ref_TabulaSapiens_beadsbycelltypes.h5ad")


## 10x Visium - Glioblastoma example



In [87]:
df = pd.read_csv("/mnt/ibm_lg/spatial-seq/visium_pilot/data/annotated_data/glioblastoma/RCTD/RCTD_result.txt",
                 index_col="Unnamed: 0")
df

Unnamed: 0,Astro,BEC,Ependymal,Micro,Micro.act,Neuro,Oligo,OPC,Tumor
AAACAAGTATCTCCCA-1,0.207091,0.004291,0.077254,0.070371,0.000127,0.168775,0.370663,0.080311,0.021117
AAACAATCTACTAGCA-1,0.000062,0.000956,0.000062,0.000062,0.077177,0.000062,0.000062,0.000062,0.921496
AAACCCGAACGAAATC-1,0.332563,0.132248,0.032484,0.077203,0.046570,0.093914,0.119573,0.029270,0.136175
AAACCGTTCGTCCAGG-1,0.307401,0.019239,0.059387,0.148735,0.000473,0.198468,0.170345,0.000473,0.095479
AAACGAAGAACATACC-1,0.001747,0.005418,0.000100,0.000100,0.192610,0.000100,0.000100,0.000100,0.799725
...,...,...,...,...,...,...,...,...,...
TTGTGTTTCCCGAAAG-1,0.353325,0.043547,0.000218,0.093311,0.023685,0.020668,0.373863,0.068820,0.022563
TTGTTAGCAAATTCGA-1,0.000097,0.026893,0.000097,0.000097,0.203176,0.000097,0.000097,0.000097,0.769350
TTGTTCAGTGTGCTAC-1,0.000100,0.006322,0.001318,0.000100,0.123598,0.000100,0.000100,0.000100,0.868265
TTGTTGTGTGTCAAGA-1,0.253042,0.022081,0.000268,0.116351,0.083797,0.000268,0.266210,0.077424,0.180559


In [88]:
# check if the cell-type proportions were normalized to 1
np.sum(df, axis=1)

AAACAAGTATCTCCCA-1    1.0
AAACAATCTACTAGCA-1    1.0
AAACCCGAACGAAATC-1    1.0
AAACCGTTCGTCCAGG-1    1.0
AAACGAAGAACATACC-1    1.0
                     ... 
TTGTGTTTCCCGAAAG-1    1.0
TTGTTAGCAAATTCGA-1    1.0
TTGTTCAGTGTGCTAC-1    1.0
TTGTTGTGTGTCAAGA-1    1.0
TTGTTTGTGTAAATTC-1    1.0
Length: 2157, dtype: float64

In [89]:
df.to_numpy()
# create an AnnData from the above dataframe
adata = ad.AnnData(X=df.to_numpy())
adata.obs_names = df.index
adata.var_names = df.columns
adata

AnnData object with n_obs × n_vars = 2157 × 9

In [94]:
adata_visium = sc.read_h5ad("/mnt/ibm_lg/spatial-seq/visium_pilot/data/annotated_data/glioblastoma/CMS75-GBMA7_brain_visium_Run1_1/CMS75-GBMA7_brain_visium_Run1_1_visium_filtered_annotated_umap.h5ad")
adata_visium

AnnData object with n_obs × n_vars = 2158 × 32285
    obs: 'in_tissue', 'array_row', 'array_col', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_100_genes', 'pct_counts_in_top_200_genes', 'pct_counts_in_top_500_genes', 'total_counts_mt', 'log1p_total_counts_mt', 'pct_counts_mt', 'condition', 'coarse', 'fine', 'leiden'
    var: 'gene_ids', 'feature_types', 'genome', 'mt', 'n_cells_by_counts', 'mean_counts', 'log1p_mean_counts', 'pct_dropout_by_counts', 'total_counts', 'log1p_total_counts', 'mean', 'std'
    uns: 'condition_colors', 'leiden', 'leiden_colors', 'neighbors', 'pca', 'spatial', 't-test', 'umap', 'wilcoxon'
    obsm: 'X_pca', 'X_spatial', 'X_umap', 'spatial'
    varm: 'PCs'
    layers: 'counts'
    obsp: 'connectivities', 'distances'

In [95]:
adata_visium = adata_visium[adata_visium.obs_names.isin(adata.obs_names)]
adata_visium

View of AnnData object with n_obs × n_vars = 2157 × 32285
    obs: 'in_tissue', 'array_row', 'array_col', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_100_genes', 'pct_counts_in_top_200_genes', 'pct_counts_in_top_500_genes', 'total_counts_mt', 'log1p_total_counts_mt', 'pct_counts_mt', 'condition', 'coarse', 'fine', 'leiden'
    var: 'gene_ids', 'feature_types', 'genome', 'mt', 'n_cells_by_counts', 'mean_counts', 'log1p_mean_counts', 'pct_dropout_by_counts', 'total_counts', 'log1p_total_counts', 'mean', 'std'
    uns: 'condition_colors', 'leiden', 'leiden_colors', 'neighbors', 'pca', 'spatial', 't-test', 'umap', 'wilcoxon'
    obsm: 'X_pca', 'X_spatial', 'X_umap', 'spatial'
    varm: 'PCs'
    layers: 'counts'
    obsp: 'connectivities', 'distances'

In [98]:
adata_visium.obs["condition"]

AAACAAGTATCTCCCA-1    healthy
AAACAATCTACTAGCA-1    healthy
AAACCCGAACGAAATC-1    healthy
AAACCGTTCGTCCAGG-1    healthy
AAACGAAGAACATACC-1      tumor
                       ...   
TTGTGTTTCCCGAAAG-1    healthy
TTGTTAGCAAATTCGA-1      tumor
TTGTTCAGTGTGCTAC-1      tumor
TTGTTGTGTGTCAAGA-1      tumor
TTGTTTGTGTAAATTC-1      tumor
Name: condition, Length: 2157, dtype: category
Categories (2, object): ['healthy', 'tumor']

In [99]:
# Add the spatial and UMAP coordinates to 
adata.obsm["X_spatial"] = adata_visium.obsm["X_spatial"]
adata.obsm["X_umap"] = adata_visium.obsm["X_umap"]
adata.obs["he_annotation"] = adata_visium.obs["condition"]

In [100]:
adata.write_h5ad("/mnt/ibm_lg/spatial-seq/visium_pilot/data/annotated_data/glioblastoma/RCTD/visium_glioblastoma_run1_1_RCTD_result.h5ad")