In [1]:
# Note: this notebook is adapted from https://github.com/jranek/delve_benchmark/blob/main/notebooks/RPE_notebook.ipynb
%matplotlib inline

In [2]:
# Uncomment and run this cell if you're on Colab or Kaggle
# !pip install schub phate statannotations

# Delve Feature Selection for inferring RPE cell cycle trajectories

This tutorial shows about how to use delve {cite}`delve2024` to perform the feature selection for **inferring RPE cell cycle trajectories**.

To run this notebook, you need to first download the `RPE` dataset from this <a href="https://github.com/jranek/delve/blob/main/data/adata_RPE.h5ad">link</a>, and put the downloaded to an appropriate path so that the script below can succussfully read it.

In [3]:
import os
import pandas as pd
import scanpy as sc
import schub

import os.path as osp
from pathlib import Path

adata_directory = osp.join(osp.dirname(osp.abspath("__file__")), "../../_data") # may need to change to the correct path for the downloaded data
adata_path = osp.join(Path(adata_directory).resolve(), "adata_RPE.h5ad")
adata = sc.read_h5ad(adata_path)

  from .autonotebook import tqdm as notebook_tqdm


## Perform DELVE feature selection

In [4]:
n_selected = 30
knn = 10
trial = 0

# feature_directory = os.path.join('../data', 'RPE', 'predicted_features')
# delve_benchmark.pp.make_directory(feature_directory)
schub.pp.delve(adata, knn=knn, use_rep="X", n_clusters=5, num_subsamples=1000, random_state=10)
# get the results from prediction
delta_mean = adata.uns["delve"]["delta_mean"]

performing subsampling: 100%|█████████████████████████████████████████████████████████████| 1/1 [00:02<00:00,  2.75s/it]
clustering features and performing feature-wise permutation testing: 100%|██████████████| 10/10 [00:18<00:00,  1.82s/it]


In [5]:
# adata.var shows the results of delve, including the `module` and
# `delve` column is the Laplacian score of each gene, the smaller of this value, the better the feature
adata.var.head()

Unnamed: 0,delve,delve_cluster_id,delve_cluster_permutation_pval
Int_MeanEdge_AKT_cell,0.850076,static,0.807692
Int_MeanEdge_BP1_cell,0.937886,static,0.999001
Int_MeanEdge_Bcl2_cell,0.93774,static,0.999001
Int_MeanEdge_CDK2_cell,0.709052,static,0.717283
Int_MeanEdge_CDK4_cell,0.855912,static,0.999001


## Selected Cell Cycle related Genes

In [6]:
# show the most important 30 genes
selected_genes = adata.var['delve'].nsmallest(n_selected).index.tolist()
print(selected_genes)

['Int_Med_cycA_nuc', 'Int_Med_cycB1_cyto', 'Int_Med_cycB1_ring', 'Int_Med_cycB1_cell', 'Int_Med_Skp2_nuc', 'Int_Med_pRB_nuc', 'Int_Std_PCNA_nuc', 'Int_Intg_DNA_nuc', 'Int_MeanEdge_cycB1_cell', 'Int_Med_CDK2_nuc', 'Int_Med_pH2AX_nuc', 'Int_Med_cycB1_nuc', 'Int_Med_p21_nuc', 'Int_Med_E2F1_nuc', 'Int_Med_cycA_cyto', 'Int_Med_cycA_ring', 'Int_Med_RB_nuc', 'AreaShape_Area_nuc', 'Int_Med_pp65_nuc', 'Int_Med_cycA_cell', 'Int_Med_cycD1_nuc', 'Int_Med_GSK3b_nuc', 'Int_Med_pp53_nuc', 'Int_Med_Cdh1_nuc', 'Int_Med_Cdt1_nuc', 'Int_Med_pCHK1_nuc', 'Int_Med_p38_nuc', 'Int_Med_p27_nuc', 'Int_Med_cMyc_nuc', 'Int_Med_pp38_nuc']


From the selected gene list above, we can easily find that the most important ones are all cell cycle related.