In this tutorial, we are using single-cell data from pancreatic ductal adenocarcinomas to find sub-groups of patients using clustering of W distances. And then we rank the cells/genes based on clustering results.
- You can download the Anndata (h5ad) file for this tutorial from here and place it in the Datasets folder.
import PILOT as pl
import scanpy as sc
adata = sc.read_h5ad('Datasets/PDAC.h5ad')
Use the following parameters to configure PILOT for your analysis (Setting Parameters):
-
adata: Pass your loaded Anndata object to PILOT.
-
emb_matrix: Provide the name of the variable in the obsm level that holds the dimension reduction (PCA representation).
-
clusters_col: Specify the name of the column in the observation level of your Anndata that corresponds to cell types or clusters.
-
sample_col: Indicate the column name in the observation level of your Anndata that contains information about samples or patients.
-
status: Provide the column name that represents the status or disease (e.g., "control" or "case").
pl.tl.wasserstein_distance(
adata,
emb_matrix='X_pca',
clusters_col='cell_types',
sample_col='sampleID',
status='status'
)
pl.pl.heatmaps(adata)
pl.pl.trajectory(adata, colors = ['red','Blue'])
pl.pl.select_best_sil(adata)
proportion_df=pl.pl.clustering_emd(adata, res = adata.uns['best_res'])
proportion_df.loc[proportion_df['Predicted_Labels'] == '0', 'Predicted_Labels'] = 'Tumor 1'
proportion_df.loc[proportion_df['Predicted_Labels'] == '1', 'Predicted_Labels'] = 'Normal'
proportion_df.loc[proportion_df['Predicted_Labels'] == '2', 'Predicted_Labels'] = 'Tumor 2'
Based on the adjusted p-value threshold you consider, you can choose how statistically significant the cell types you want to have.
pl.pl.cell_type_diff_two_sub_patient_groups(
proportion_df,
proportion_df.columns[0:-2],
group1 = 'Tumor 2',
group2 = 'Tumor 1',
pval_thr = 0.05,
figsize = (15, 4)
)
pl.pl.plot_cell_types_distributions(
proportion_df,
cell_types = ['Stellate cell','Ductal cell type 2','Ductal cell type 1'],
figsize = (17,8),
label_order = ['Normal', 'Tumor 1', 'Tumor 2'],
label_colors = ['#FF7F00','#BCD2EE','#1874CD']
)
pl.pl.plot_cell_types_distributions(
proportion_df,
cell_types = list(proportion_df.columns[0:-2]),
figsize = (17,8),
label_order = ['Normal', 'Tumor 1', 'Tumor 2'],
label_colors = ['#FF7F00','#BCD2EE','#1874CD']
)
#pl.tl.install_r_packages()
- You can run this function for your own data, and it automatically extracts the gene expressions for each cell type ("2000 highly variable genes). In case you want all/more genes, please change the "highly_variable_genes_" or "n_top_genes" parameters.
cell_type = "Ductal cell type 1"
pl.tl.compute_diff_expressions(
adata,
cell_type,
proportion_df,
fc_thr = 0.5,
pval_thr = 0.05,
group1 = 'Tumor 1',
group2 = 'Tumor 2',
sample_col = 'sampleID',
col_cell = 'cell_types',
)
pl.pl.gene_annotation_cell_type_subgroup(
cell_type = cell_type,
group = 'Tumor 1',
num_gos = 10
)
pl.pl.gene_annotation_cell_type_subgroup(
cell_type = cell_type,
group = 'Tumor 2',
num_gos = 10
)
cell_type = "Stellate cell"
pl.tl.compute_diff_expressions(
adata,
cell_type,
proportion_df,
fc_thr = 0.1,
pval_thr = 0.05,
group1 = 'Tumor 1',
group2 = 'Tumor 2',
sample_col = 'sampleID',
col_cell = 'cell_types'
)
pl.pl.gene_annotation_cell_type_subgroup(
cell_type = cell_type,
group = 'Tumor 1',
num_gos = 10
)
pl.pl.gene_annotation_cell_type_subgroup(
cell_type = cell_type,
group = 'Tumor 2',
num_gos = 10
)
cell_type = "Ductal cell type 2"
pl.tl.compute_diff_expressions(
adata,
cell_type,
proportion_df,
fc_thr = 0.020,
pval_thr = 0.05,
group1 = 'Tumor 1',
group2 = 'Tumor 2',
sample_col = 'sampleID',
col_cell = 'cell_types',
)
pl.pl.gene_annotation_cell_type_subgroup(
cell_type = cell_type,
group = 'Tumor 1',
num_gos = 10
)
pl.pl.gene_annotation_cell_type_subgroup(
cell_type = cell_type,
group = 'Tumor 2',
num_gos = 10
)
Note:
- conda install conda-forge::r-tidyverse
- conda install bioconda::bioconductor-singlecellexperiment
- conda install bioconda::bioconductor-deseq2
pl.pl.get_pseudobulk_DE(adata, proportion_df, cell_type = 'Stellate cell', fc_thr = [3.0, 4.0, 0.5])
pl.pl.get_pseudobulk_DE(adata, proportion_df, cell_type = 'Ductal cell type 1', fc_thr = [5.0, 4.0, 0.5])
pl.pl.get_pseudobulk_DE(adata, proportion_df, cell_type = 'Ductal cell type 2', fc_thr = [2.0, 2.0, 0.5])