### Cell-Cell Communication using CellphoneDB

1. Use the updated anndata which contains mappings from hubs to celltypes, for all hubs and all celltypes. Then run each subject separately (14, 16 counts .h5ad and .tsv metadata files).
2. Run both method 2 (statistical analysis) and method 3 (DE analysis). 
3. Interpret the results meaningfully:
   * What is the biology of the cell-cell interactions? 
   * What is the difference in cd8+ effector || stem interactions between grades of gvhd? 
4. Reference siyu's code for circuit plots. 

In [1]:
import pandas as pd
import sys
import os

pd.set_option('display.max_columns', 100)
os.chdir('/home/jz3553/CellphoneDB/')

In [2]:
print(sys.version)

3.8.20 (default, Oct  3 2024, 15:24:27) 
[GCC 11.2.0]


In [3]:
import sys
import os
import numpy as np
import pandas as pd
import scanpy as sc
import anndata

import matplotlib.pyplot as plt

#### Formatting Inputs 

##### Understanding anndata structure 
1. celltypes are stored as colns in adata.obs[["celltype_A", ... "celltype_N"]] as a df where each barcode is associated with a probability/proportion 0-1 that shows the deconvolved proportions.
2. each spot we can think of as a single cell in the context of cell-cell communication.

In [5]:
# # Save to an h5ad file
# adata_integrate = sc.read_h5ad("/home/jz3553/data/adata_integrate.h5ad")
# # Check if it loaded correctly
# print(adata_integrate)

AnnData object with n_obs × n_vars = 232882 × 1133
    obs: 'in_tissue', 'array_row', 'array_col', 'pxl_row_in_fullres', 'pxl_col_in_fullres', 'sample', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_100_genes', 'pct_counts_in_top_200_genes', 'pct_counts_in_top_500_genes', 'total_counts_mt', 'log1p_total_counts_mt', 'pct_counts_mt', 'n_counts', 'patient', 'sample_type', 'grade', 'leiden', 'pheno_louvain', 'hub', 'Plasma cells', 'Myeloid', 'Fibroblast', 'B cells', 'NK cells', 'Instestinal_Epithelial cells', 'Enterocyte', 'Stomach_epithelial_cells', 'Stomach_stem_cells', 'Endothelial cell', 'Mast cell', 'CD8+ Effector T  cells', 'CD4+ Effector T cells', 'CD4+ Central Memory T cells', 'CD8+ Cytotoxic Unconventional T cells', 'CD8+ Proliferating T cells', 'CD4+ Regulatory T cells', 'CD8+ Homeostatic Unconventional T cells', 'CD8+ Tissue Resident Memory T cells', 'CD8+ Transitioning  Resident T cells', '

In [6]:
# adata_integrate.obs[['Plasma cells', 'Myeloid', 'Fibroblast']].head()

Unnamed: 0,Plasma cells,Myeloid,Fibroblast
s_016um_00258_00092-1-SLV11,3.3e-05,0.172055,0.416307
s_016um_00245_00212-1-SLV11,0.001375,0.000574,0.002225
s_016um_00283_00339-1-SLV11,0.135085,0.000194,0.000803
s_016um_00191_00066-1-SLV11,3.2e-05,3.5e-05,1.4e-05
s_016um_00244_00289-1-SLV11,9.9e-05,4.9e-05,0.001554


##### Mapping from hubs to celltypes of interest
1. Here, we are mapping hub numbers to stem cells, cd8 effector cells and cd4 effector cells.
2. Everything else will be mapped to nonstem.
3. Filter out nonstem cells to save some compute and only run cellphoneDB on celltypes of interest. 

In [14]:
# hub_stem_list = [4, 5, 8, 30, 32]

# def map_stem_categories(hub):
#     if hub in hub_stem_list:
#         return 'stem'
#     elif hub in [31]:
#         return 'CD8+ Effector T cells'
#     elif hub in [17]:
#         return 'CD4+ Effector T cells'
#     else:
#         return 'nonstem'

# adata_integrate.obs['stem_cd8_cd4_celltypes'] = adata_integrate.obs['hub'].apply(map_stem_categories)

In [15]:
# adata_integrate.obs[['sample', 'hub', 'SampleID_Grade', 'stem_cd8_cd4_celltypes']].head()

Unnamed: 0,sample,hub,SampleID_Grade,stem_cd8_cd4_celltypes
s_016um_00258_00092-1-SLV11,SLV11,13,SLV11_Severe,nonstem
s_016um_00245_00212-1-SLV11,SLV11,10,SLV11_Severe,nonstem
s_016um_00283_00339-1-SLV11,SLV11,0,SLV11_Severe,nonstem
s_016um_00191_00066-1-SLV11,SLV11,3,SLV11_Severe,nonstem
s_016um_00244_00289-1-SLV11,SLV11,5,SLV11_Severe,stem


In [36]:
# subjects_of_interest = ["SLV14", "SLV16", "SLV17", "SLV18"]
# mask = (
#     adata_integrate.obs["sample"].isin(subjects_of_interest)
#     # & (adata_integrate.obs["stem_cd8_cd4_celltypes"] != "nonstem")
# )

# adata_filtered = adata_integrate[mask].copy()
# adata_filtered.write_h5ad("/home/jz3553/CellphoneDB/data/adata_filtered.h5ad")

In [37]:
# adata_filtered.obs[['SampleID_Grade', 'stem_cd8_cd4_celltypes']].head()

Unnamed: 0,SampleID_Grade,stem_cd8_cd4_celltypes
s_016um_00242_00130-1-SLV14,SLV14_Severe,nonstem
s_016um_00230_00364-1-SLV14,SLV14_Severe,nonstem
s_016um_00280_00313-1-SLV14,SLV14_Severe,CD8+ Effector T cells
s_016um_00184_00346-1-SLV14,SLV14_Severe,stem
s_016um_00283_00339-1-SLV14,SLV14_Severe,nonstem


In [39]:
# df = adata_filtered.obs[['stem_cd8_cd4_celltypes']].copy()
# # df.rename(
# #     columns={
# #         'sample': 'barcode_sample', 
# #         'stem_cd8_cd4_celltypes': 'cell_type'
# #     },
# #     inplace=True
# # )
# df.index.name = "spot"
# df.reset_index(inplace=True)
# df.to_csv("/home/jz3553/CellphoneDB/data/spot_to_celltype_map.tsv", sep="\t", index=False)

#### Running Method 1 

In [4]:
cpdb_file_path = '/home/jz3553/CellphoneDB/v5.0.0/cellphonedb.zip'

# input files 
meta_file_path = "/home/jz3553/CellphoneDB/data/spot_to_celltype_map.tsv"
counts_file_path = "/home/jz3553/CellphoneDB/data/adata_filtered.h5ad"

# output path 
out_path = '/home/jz3553/CellphoneDB/results/method1'

#### Input File Formats 
1. The metadata .tsv file should be formated as following, matching barcode sample to celltype
2. 

In [5]:
metadata = pd.read_csv(meta_file_path, sep = '\t')
metadata.head(3)

Unnamed: 0,spot,stem_cd8_cd4_celltypes
0,s_016um_00242_00130-1-SLV14,nonstem
1,s_016um_00230_00364-1-SLV14,nonstem
2,s_016um_00280_00313-1-SLV14,CD8+ Effector T cells


In [6]:
import anndata

adata = anndata.read_h5ad(counts_file_path)
adata.shape

(98770, 1133)

In [7]:
list(adata.obs.index).sort() == list(metadata['spot']).sort()

True

In [8]:
from cellphonedb.src.core.methods import cpdb_analysis_method

cpdb_results = cpdb_analysis_method.call(
    cpdb_file_path = cpdb_file_path,           # mandatory: CellphoneDB database zip file.
    meta_file_path = meta_file_path,           # mandatory: tsv file defining barcodes to cell label.
    counts_file_path = counts_file_path,       # mandatory: normalized count matrix - a path to the counts file, or an in-memory AnnData object
    counts_data = 'hgnc_symbol',               # defines the gene annotation in counts matrix.
    # microenvs_file_path = microenvs_file_path, # optional (default: None): defines cells per microenvironment.
    score_interactions = True,                 # optional: whether to score interactions or not. 
    output_path = out_path,                    # Path to save results    microenvs_file_path = None,
    separator = '|',                           # Sets the string to employ to separate cells in the results dataframes "cellA|CellB".
    threads = 5,                               # number of threads to use in the analysis.
    threshold = 0.1,                           # defines the min % of cells expressing a gene for this to be employed in the analysis.
    result_precision = 3,                      # Sets the rounding for the mean values in significan_means.
    debug = False,                             # Saves all intermediate tables emplyed during the analysis in pkl format.
    output_suffix = None                       # Replaces the timestamp in the output files by a user defined string in the  (default: None)
)

[ ][CORE][21/02/25-18:42:12][INFO] [Non Statistical Method] Threshold:0.1 Precision:3
Reading user files...
The following user files were loaded successfully:
/home/jz3553/CellphoneDB/data/adata_filtered.h5ad
/home/jz3553/CellphoneDB/data/spot_to_celltype_map.tsv
[ ][CORE][21/02/25-18:42:34][INFO] Running Basic Analysis
[ ][CORE][21/02/25-18:42:34][INFO] Building results
[ ][CORE][21/02/25-18:42:34][INFO] Scoring interactions: Filtering genes per cell type..


100%|██████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00,  4.24it/s]

[ ][CORE][21/02/25-18:42:35][INFO] Scoring interactions: Calculating mean expression of each gene per group/cell type..



100%|██████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 17.92it/s]

[ ][CORE][21/02/25-18:42:36][INFO] Scoring interactions: Calculating scores for all interactions and cell types..



100%|████████████████████████████████████████████████████████████████████████████| 16/16 [00:00<00:00, 48.67it/s]


Saved means_result to /home/jz3553/CellphoneDB/results/method1/simple_analysis_means_result_02_21_2025_184236.txt
Saved deconvoluted to /home/jz3553/CellphoneDB/results/method1/simple_analysis_deconvoluted_02_21_2025_184236.txt
Saved deconvoluted_percents to /home/jz3553/CellphoneDB/results/method1/simple_analysis_deconvoluted_percents_02_21_2025_184236.txt
Saved interaction_scores to /home/jz3553/CellphoneDB/results/method1/simple_analysis_interaction_scores_02_21_2025_184236.txt


In [9]:
print(cpdb_results.keys())

dict_keys(['means_result', 'deconvoluted', 'deconvoluted_percents', 'interaction_scores'])


In [10]:
cpdb_results['means_result'].head(2)

Unnamed: 0,id_cp_interaction,interacting_pair,partner_a,partner_b,gene_a,gene_b,secreted,receptor_a,receptor_b,annotation_strategy,is_integrin,directionality,classification,CD4+ Effector T cells|CD4+ Effector T cells,CD4+ Effector T cells|CD8+ Effector T cells,CD4+ Effector T cells|nonstem,CD4+ Effector T cells|stem,CD8+ Effector T cells|CD4+ Effector T cells,CD8+ Effector T cells|CD8+ Effector T cells,CD8+ Effector T cells|nonstem,CD8+ Effector T cells|stem,nonstem|CD4+ Effector T cells,nonstem|CD8+ Effector T cells,nonstem|nonstem,nonstem|stem,stem|CD4+ Effector T cells,stem|CD8+ Effector T cells,stem|nonstem,stem|stem
50,CPI-SS00C24ED50,CDH5_CDH5,simple:P33151,simple:P33151,CDH5,CDH5,False,False,False,curated,False,Adhesion-Adhesion,Adhesion by Cadherin,0.009,0.009,0.011,0.009,0.009,0.008,0.011,0.009,0.011,0.011,0.014,0.012,0.009,0.009,0.012,0.009
65,CPI-SS0A2D7895C,CEACAM1_CEACAM6,simple:P13688,simple:P40199,CEACAM1,CEACAM6,True,False,False,curated,False,Adhesion-Adhesion,Adhesion by CEAM,0.195,0.138,0.285,0.113,0.151,0.094,0.241,0.069,0.27,0.212,0.36,0.187,0.134,0.076,0.224,0.052


In [11]:
cpdb_results['interaction_scores'].head(2)

Unnamed: 0,id_cp_interaction,interacting_pair,partner_a,partner_b,gene_a,gene_b,secreted,receptor_a,receptor_b,annotation_strategy,is_integrin,directionality,classification,CD4+ Effector T cells|CD4+ Effector T cells,CD4+ Effector T cells|CD8+ Effector T cells,CD4+ Effector T cells|nonstem,CD4+ Effector T cells|stem,CD8+ Effector T cells|CD4+ Effector T cells,CD8+ Effector T cells|CD8+ Effector T cells,CD8+ Effector T cells|nonstem,CD8+ Effector T cells|stem,nonstem|CD4+ Effector T cells,nonstem|CD8+ Effector T cells,nonstem|nonstem,nonstem|stem,stem|CD4+ Effector T cells,stem|CD8+ Effector T cells,stem|nonstem,stem|stem
50,CPI-SS00C24ED50,CDH5_CDH5,simple:P33151,simple:P33151,CDH5,CDH5,False,False,False,curated,False,Adhesion-Adhesion,Adhesion by Cadherin,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
65,CPI-SS0A2D7895C,CEACAM1_CEACAM6,simple:P13688,simple:P40199,CEACAM1,CEACAM6,True,False,False,curated,False,Adhesion-Adhesion,Adhesion by CEAM,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,100.0,0.0,0.0,0.0,0.0,0.0


In [12]:
print(cpdb_results['interaction_scores'].unique)

AttributeError: 'DataFrame' object has no attribute 'unique'

In [50]:
cpdb_results['deconvoluted'].head(2)

Unnamed: 0_level_0,gene_name,uniprot,is_complex,protein_name,complex_name,id_cp_interaction,gene,CD4+ Effector T cells,CD8+ Effector T cells,nonstem,stem
multidata_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
1346,CDH5,P33151,False,CADH5_HUMAN,,CPI-SS00C24ED50,CDH5,0.009,0.008,0.014,0.009
340,CEACAM1,P13688,False,CEAM1_HUMAN,,CPI-SS0A2D7895C,CEACAM1,0.17,0.081,0.318,0.046


In [51]:
cpdb_results['deconvoluted_percents'].head(2)

Unnamed: 0_level_0,gene_name,uniprot,is_complex,protein_name,complex_name,id_cp_interaction,gene,CD4+ Effector T cells,CD8+ Effector T cells,nonstem,stem
multidata_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
1346,CDH5,P33151,False,CADH5_HUMAN,,CPI-SS00C24ED50,CDH5,0.009,0.008,0.013,0.009
340,CEACAM1,P13688,False,CEAM1_HUMAN,,CPI-SS0A2D7895C,CEACAM1,0.082,0.042,0.117,0.034
