## Palantir pseudotime workflow

- last updated: 4/1/2024
- author: Yang-Joon Kim

Here, we will test out the example notebook from Palantir (Setty and Pe'er, 2019) to compute the pseudotime from single-cell Multiome (RNA).

- Source: https://github.com/dpeerlab/Palantir/blob/master/notebooks/Palantir_sample_notebook.ipynb

## Introduction - Setty et al.
Palantir is an algorithm to align cells along differentiation trajectories. Palantir models differentiation as a stochastic process where stem cells differentiate to terminally differentiated cells by a series of steps through a low dimensional phenotypic manifold. Palantir effectively captures the continuity in cell states and the stochasticity in cell fate determination.

In [1]:
import palantir
import scanpy as sc
import pandas as pd
import os

from scipy.sparse import issparse

# Plotting
import matplotlib
import matplotlib.pyplot as plt

# warnings
import warnings
from numba.core.errors import NumbaDeprecationWarning

warnings.filterwarnings(action="ignore", category=NumbaDeprecationWarning)
warnings.filterwarnings(
    action="ignore", module="scanpy", message="No data for colormapping"
)

# Inline plotting
%matplotlib inline

In [2]:
# load the data (adata)
adata = sc.read_h5ad("/hpc/projects/data.science/yangjoon.kim/zebrahub_multiome/data/processed_data/01_Signac_processed/integrated_RNA_ATAC_counts_RNA.h5ad")
adata

AnnData object with n_obs × n_vars = 95196 × 32057
    obs: 'orig.ident', 'nCount_RNA', 'nFeature_RNA', 'nCount_ATAC', 'nFeature_ATAC', 'nucleosome_signal', 'nucleosome_percentile', 'TSS.enrichment', 'TSS.percentile', 'nCount_SCT', 'nFeature_SCT', 'global_annotation', 'prediction.score.Lateral_Mesoderm', 'prediction.score.Neural_Crest', 'prediction.score.Somites', 'prediction.score.Epidermal', 'prediction.score.Neural_Anterior', 'prediction.score.Neural_Posterior', 'prediction.score.Endoderm', 'prediction.score.PSM', 'prediction.score.Differentiating_Neurons', 'prediction.score.Adaxial_Cells', 'prediction.score.NMPs', 'prediction.score.Notochord', 'prediction.score.Muscle', 'prediction.score.unassigned', 'prediction.score.max', 'nCount_peaks_bulk', 'nFeature_peaks_bulk', 'nCount_peaks_celltype', 'nFeature_peaks_celltype', 'nCount_peaks_merged', 'nFeature_peaks_merged', 'SCT.weight', 'peaks_merged.weight', 'nCount_Gene.Activity', 'nFeature_Gene.Activity', 'nCount_peaks_integrated', 'nFe

### NOTES:

- log-normalized the data (using pseudocount of 0.1, instead of 1)
- import the PCs computed from "integration" by Seurat using rPCA.

In [3]:
# re-do the log-normalization
sc.pp.normalize_total(adata, target_sum=1e4)
palantir.preprocess.log_transform(adata)

In [4]:
integrated_pca = pd.read_csv("/hpc/projects/data.science/yangjoon.kim/zebrahub_multiome/data/processed_data/01_Signac_processed/integrated_pca.csv", index_col=0)
integrated_pca

Unnamed: 0,PC_1,PC_2,PC_3,PC_4,PC_5,PC_6,PC_7,PC_8,PC_9,PC_10,...,PC_91,PC_92,PC_93,PC_94,PC_95,PC_96,PC_97,PC_98,PC_99,PC_100
AAACAGCCACCTAAGC-1_1,2.337219,-1.429061,-5.863204,2.233877,7.256290,2.174646,2.644288,-0.268576,1.116962,-1.540614,...,-2.423818,0.746383,2.348913,0.599667,0.268230,0.037639,1.093597,2.292499,-2.180902,-0.609504
AAACAGCCAGGGAGGA-1_1,5.925528,-4.112149,-3.745973,-2.587663,-0.824526,-0.770181,-0.629575,-2.431229,-0.577661,-2.289982,...,-0.596900,-0.580806,-0.151273,-2.579358,0.484555,4.244384,1.265782,1.041933,-0.790714,1.459041
AAACAGCCATAGACCC-1_1,3.573711,6.779139,-1.335192,-7.599583,3.383914,-2.221623,-0.410902,2.574602,2.494892,1.434072,...,-1.369190,-1.072793,0.729661,-2.383013,-2.462715,0.829521,-1.447832,-1.037105,-0.019790,-0.119438
AAACATGCAAACTCAT-1_1,-2.198315,-2.919139,4.460779,-3.449388,0.942443,2.305146,-0.609150,-0.384336,3.204518,-2.120470,...,-2.744978,0.271759,0.368134,0.654299,0.698463,3.075106,-1.206539,-0.504473,-1.483571,0.275104
AAACATGCAAGGACCA-1_1,-6.611312,4.038513,-0.699466,2.057417,-0.530705,-0.074898,0.746187,-0.692077,-0.832365,0.885615,...,-0.152070,0.280815,0.164693,0.946766,-0.844613,0.023811,0.588854,0.407269,0.394212,0.313771
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
TTTGTGTTCCCTCAGT-1_7,0.441136,4.210098,-1.878576,-7.065087,2.285191,-0.018840,-0.017662,1.791621,-0.079471,0.505175,...,-1.099142,0.409754,-1.444433,0.163811,-0.333010,1.128468,-0.088681,0.777200,0.695411,-0.442525
TTTGTTGGTACCTTAC-1_7,8.450310,-4.444844,-3.560358,1.049780,-11.561294,1.416145,2.944198,5.288898,-7.119284,0.224083,...,2.083931,-0.448638,-0.363396,1.299509,2.904005,-0.070208,-0.433305,-2.984334,0.827288,0.377326
TTTGTTGGTATTGAGT-1_7,-4.002701,-0.805743,3.497905,0.699492,-0.366578,0.712642,0.655097,-0.752034,0.187395,-2.081523,...,0.383833,0.009061,-0.016094,-0.579264,-0.809333,1.313239,-0.065441,-1.210706,-0.035018,1.377053
TTTGTTGGTGCGCGTA-1_7,-2.591925,2.249332,-3.339347,2.140311,-1.152693,-0.275836,-1.398287,2.606338,0.233157,-1.435872,...,-1.807821,-0.664764,0.581497,-0.440832,1.136641,-0.377381,1.800268,-0.233631,-1.426484,1.090966


In [7]:
# adata.obsm["X_integrated_pca"] = integrated_pca.to_numpy()
# adata

AnnData object with n_obs × n_vars = 95196 × 32057
    obs: 'orig.ident', 'nCount_RNA', 'nFeature_RNA', 'nCount_ATAC', 'nFeature_ATAC', 'nucleosome_signal', 'nucleosome_percentile', 'TSS.enrichment', 'TSS.percentile', 'nCount_SCT', 'nFeature_SCT', 'global_annotation', 'prediction.score.Lateral_Mesoderm', 'prediction.score.Neural_Crest', 'prediction.score.Somites', 'prediction.score.Epidermal', 'prediction.score.Neural_Anterior', 'prediction.score.Neural_Posterior', 'prediction.score.Endoderm', 'prediction.score.PSM', 'prediction.score.Differentiating_Neurons', 'prediction.score.Adaxial_Cells', 'prediction.score.NMPs', 'prediction.score.Notochord', 'prediction.score.Muscle', 'prediction.score.unassigned', 'prediction.score.max', 'nCount_peaks_bulk', 'nFeature_peaks_bulk', 'nCount_peaks_celltype', 'nFeature_peaks_celltype', 'nCount_peaks_merged', 'nFeature_peaks_merged', 'SCT.weight', 'peaks_merged.weight', 'nCount_Gene.Activity', 'nFeature_Gene.Activity', 'nCount_peaks_integrated', 'nFe

In [5]:
adata.obsm["X_pca"] = integrated_pca.to_numpy()
adata

AnnData object with n_obs × n_vars = 95196 × 32057
    obs: 'orig.ident', 'nCount_RNA', 'nFeature_RNA', 'nCount_ATAC', 'nFeature_ATAC', 'nucleosome_signal', 'nucleosome_percentile', 'TSS.enrichment', 'TSS.percentile', 'nCount_SCT', 'nFeature_SCT', 'global_annotation', 'prediction.score.Lateral_Mesoderm', 'prediction.score.Neural_Crest', 'prediction.score.Somites', 'prediction.score.Epidermal', 'prediction.score.Neural_Anterior', 'prediction.score.Neural_Posterior', 'prediction.score.Endoderm', 'prediction.score.PSM', 'prediction.score.Differentiating_Neurons', 'prediction.score.Adaxial_Cells', 'prediction.score.NMPs', 'prediction.score.Notochord', 'prediction.score.Muscle', 'prediction.score.unassigned', 'prediction.score.max', 'nCount_peaks_bulk', 'nFeature_peaks_bulk', 'nCount_peaks_celltype', 'nFeature_peaks_celltype', 'nCount_peaks_merged', 'nFeature_peaks_merged', 'SCT.weight', 'peaks_merged.weight', 'nCount_Gene.Activity', 'nFeature_Gene.Activity', 'nCount_peaks_integrated', 'nFe

## compute the Diffusion maps

- Palantir next determines the diffusion maps of the data as an estimate of the low dimensional phenotypic manifold of the data.

In [15]:
# sc.pp.neighbors(adata, n_pcs=30, use_rep="X_integrated_pca")

In [6]:
# Run diffusion maps
dm_res = palantir.utils.run_diffusion_maps(adata, n_components=10)

In [7]:
ms_data = palantir.utils.determine_multiscale_space(adata)

### MAGIC imputation



In [None]:
imputed_X = palantir.utils.run_magic_imputation(adata)

In [12]:
help(palantir.utils.run_diffusion_maps)

Help on function run_diffusion_maps in module palantir.utils:

run_diffusion_maps(data_df, n_components=10, knn=30, alpha=0)
    Run Diffusion maps using the adaptive anisotropic kernel
    
    :param data_df: PCA projections of the data or adjacency matrix
    :param n_components: Number of diffusion components
    :param knn: Number of nearest neighbors for graph construction
    :param alpha: Normalization parameter for the diffusion operator
    :return: Diffusion components, corresponding eigen values and the diffusion operator

