# Spatial transcriptomics unveils the in situ cellular and molecular hallmarks of the lung in fatal COVID-19

# Differential analysis of cell populations

**Author:** Carlos A. Garcia-Prieto

* This notebook explains how to evaluate cell population changes across biological conditions using [scCODA](https://github.com/theislab/scCODA) to perform compositional data analysis on the estimated cell-type abundances. The scCODA model determine statistically credible effects.
* We followed scCODA [import and visualization](https://sccoda.readthedocs.io/en/latest/Data_import_and_visualization.html) and [compositional analysis](https://sccoda.readthedocs.io/en/latest/getting_started.html) tutorials.

## Import modules

In [1]:
import warnings
warnings.filterwarnings("ignore")
import pandas as pd
import anndata as ad
from sccoda.util import cell_composition_data as dat
from sccoda.util import data_visualization as viz
import scanpy as sc
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
import seaborn as sns
#!pip install colorcet
import colorcet as cc
import matplotlib as mpl
from matplotlib import cm
from matplotlib.colors import ListedColormap,LinearSegmentedColormap
import pickle as pkl
from sccoda.util import comp_ana as mod
import sccoda.datasets as scd
pd.set_option('display.max_rows',50)
pd.set_option('display.max_columns',50)

2024-07-18 18:25:48.531238: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-07-18 18:25:48.578237: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI AVX512_BF16 AVX512_FP16 AVX_VNNI AMX_TILE AMX_INT8 AMX_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


## Import exported cell type abundances estimated with cell2location
### To export estimated cell type abundances to a pandas Dataframe for import with scCODA, we used the total abundance (columns) of each cell type (Figure 2A) or lineage (Figure 1D) per sample (row), including a column with disease condition group. 
### To import data from a pandas DataFrame (with each row representing a sample), it is sufficient to specify the names of the metadata (covariate columns).

In [2]:
#Read data into pandas from csv
counts_folder = "/gpfs/scratch/bsc59/MN4/bsc59/bsc59829/Spatial/COVID/Jupyterlab/HLCA_publication/"
cell_counts = pd.read_csv(f"{counts_folder}proportions_c2l_finest_format_scCODA_order_significance.csv", index_col = 0) #For cell type abundance (Figure 2A)
#cell_counts = pd.read_csv(f"{counts_folder}proportions_c2l_finest_format_scCODA_order_significance_lineages.csv", index_col = 0) #For lineage abundance (Figure 1D)

In [3]:
#Import and select covariate columns
data = dat.from_pandas(cell_counts, covariate_columns=["Sample","Condition","Condition2"])

## Create color palette for plotting

In [4]:
#Set same color palette used in all figures for all 45 cell types:
col_dict = {'AT1': '#8c3bff',
 'AT2': '#018700',
 'AT0': '#d60000',
 'AT2 proliferating': '#00acc6',
 'Basal resting': '#0000dd',
 'Club (non-nasal)': '#95b577',
 'Deuterosomal': '#790000',
 'Multiciliated (non-nasal)': '#5900a3',
 'Multiciliated (non-nasal)': '#5900a3', 
 'pre-TB secretory': '#93b8b5',
 'Suprabasal': '#bde6bf',
 'EC aerocyte capillary': '#0774d8',
 'EC general capillary': '#004b00',
 'EC arterial': '#fdf490',
 'EC venous pulmonary': '#8e7900',
 'EC venous systemic': '#ff7266',
 'Lymphatic EC differentiating': '#790000', 
 'Lymphatic EC mature': '#5d7e66',
 'NK cells': '#9e4b00',
 'Interstitial Mph perivascular': '#edb8b8',
 'Monocyte-derived Mph': '#a57bb8',
 'Non-classical monocytes': '#9c3b4f',
 'Alveolar Mph CCL3+': '#ff7ed1',
 'Alveolar Mph MT-positive': '#6b004f',
 'CD4 T cells': '#00fdcf',
 'CD8 T cells': '#a17569',
 'Alveolar macrophages': '#573b00',
 'Alveolar Mph proliferating': '#645474', 
 'B cells': '#005659',
 'Classical monocytes': '#bcb6ff',     
 'DC1': '#bf03b8',
 'DC2': '#645474',
 'Mast cells': '#9ae4ff',
 'Migratory DCs': '#eb0077',
 'Plasma cells': '#00af89',
 'Plasmacytoid DCs': '#8287ff',
 'T cells proliferating': '#db6d01',
 'Peribronchial fibroblasts': '#cac300',
 'Alveolar fibroblasts': '#ffa52f',
 'Pericytes': '#708297',
 'Subpleural fibroblasts': '#fdbfff',
 'Adventitial fibroblasts': '#97ff00',
 'Mesothelium': '#e452ff', 
 'Myofibroblasts': '#03c600',
 'SM activated stress response': '#5d363b',
 'Smooth muscle': '#380000'}

## Compositional data visualization
### Stacked barplot

In [5]:
#Create color palette for lineage and cell types
pal_celltype = sns.color_palette(col_dict.values(), n_colors=48) 
pal_celltype_hex = list(map(mpl.colors.rgb2hex, pal_celltype))

pal_lineage = sns.color_palette(['#97ff00','#d60000','#ffa52f','#005659'], n_colors=4) 
pal_lineage_hex = list(map(mpl.colors.rgb2hex, pal_lineage))

In [6]:
#Create the colormap
cm = LinearSegmentedColormap.from_list('my_list_celltype', pal_celltype, N=45)
#cm = LinearSegmentedColormap.from_list('my_list_lineage', pal_lineage, N=4) #For lineage abundances (Figure 1D)

<div class="alert alert-info">
<b>Paper Figure!</b>
Panel Figure 2A (cell types) and Figure 1D (lineage)
</div>

In [7]:
# Stacked barplot for the levels of "Condition2"
viz.stacked_barplot(data, feature_name="Condition2", figsize=[12,12], cmap=cm)
#plt.show()
plt.savefig(f"{counts_folder}scCODA_stacked_boxplots_finest_Condition2_Paper.png",dpi=300, format="png",pad_inches=0.2,bbox_inches="tight")
plt.close()

### Grouped box plots by condition

In [8]:
#Create color palette with same colors in all figures for each condition
pal2 = sns.color_palette(['#ff7c00','#023eff','#1ac938'], n_colors=3) 
pal2_hex = list(map(mpl.colors.rgb2hex, pal2))

In [9]:
#Create the colormap
cm2 = LinearSegmentedColormap.from_list('my_list2', pal2, N=3)

<div class="alert alert-info">
<b>Paper Figure!</b>
Panel Figure 2A (cell types)
</div>

In [10]:
#Plot grouped boxplots
viz.boxplots(
    data,
    feature_name="Condition2",
    plot_facets=False,
    y_scale="relative",
    add_dots=False,
    figsize=[8,4], 
    cmap=pal2 
)
plt.savefig(f"{counts_folder}scCODA_boxplots_finest_Condition2_Paper.png",dpi=300, format="png",pad_inches=0.2,bbox_inches="tight")
plt.close()

# Compositional data analysis
## Model setup and inference
### Finding a reference cell type
The scCODA model requires a cell type to be set as the reference category. It is used to specify a cell type that is believed to be unchanged by the covariates. We used automatic reference cell type estimation. For the model formula parameter, we simply use the disease condition covariate of our dataset.

In [11]:
model_cond = mod.CompositionalAnalysis(data, formula="C(Condition2, Treatment('Control'))", reference_cell_type="automatic") #Control samples as control group (Ctl vs acute DAD & Ctl vs proliferative DAD)
#model_cond = mod.CompositionalAnalysis(data, formula="C(Condition2, Treatment('COVID-19_acute'))", reference_cell_type="automatic") #Acute DAD samples as control group (Acute DAD vs proliferative DAD)

Automatic reference selection! Reference cell type set to Migratory DCs


2024-07-18 18:25:53.069734: E external/local_xla/xla/stream_executor/cuda/cuda_driver.cc:266] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected


In [12]:
#Run scCODA model
results = model_cond.sample_hmc()

I0000 00:00:1721319955.030685 1426100 service.cc:145] XLA service 0x7f9e88007630 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
I0000 00:00:1721319955.030770 1426100 service.cc:153]   StreamExecutor device (0): Host, Default Version
  0%|                                                                                                                                   | 0/20000 [00:00<?, ?it/s]2024-07-18 18:25:55.090965: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:268] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
I0000 00:00:1721319955.704186 1426100 device_compiler.h:188] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20000/20000 [01:29<00:00, 223.29it/s]


MCMC sampling finished. (113.877 sec)
Acceptance rate: 56.1%


### Save results

In [13]:
#Save results
results.intercept_df.to_csv(f"{counts_folder}scCODA_results_intercept_MigratoryDCs_ControlasControlGroup_Paper.csv", index=True)
#results.intercept_df.to_csv(f"{counts_folder}scCODA_results_intercept_MigratoryDCs_AcuteasControlGroup_Paper.csv", index=True)

In [14]:
#Save credible results (FDR < 0.05)
results.credible_effects().to_csv(f"{counts_folder}scCODA_results_credible_effect_MigratoryDCs_ControlasControlGroup_Paper.csv", index=True)
#results.credible_effects().to_csv(f"{counts_folder}scCODA_results_credible_effect_MigratoryDCs_AcuteasControlGroup_Paper.csv", index=True)