# Pseudo-Time Analysis Pipeline

## Introduction
This notebook implements a pipeline for analyzing pseudo-time in cellular graph datasets. The pipeline includes the following steps:

1. **Dataset Initialization**
2. **Subgraph Sampling**
3. **Embedding Preparation**
4. **Pseudo-Time Analysis**
5. **Visualization and Output**

All configurations are controlled through the `Config` class.

In [None]:
from adapters.space_gm_adapter import CustomSubgraphSampler
from spacegm import CellularGraphDataset



## Configuration
Modify the parameters in the `Config` class to customize the pipeline for your dataset and analysis needs.

In [2]:
class Config:
    def __init__(self):
        # Paths
        self.data_root = "/root/autodl-tmp/Data/Space-Gm/Processed_Dataset/UPMC"
        self.output_dir = "/root/TIC/data/embedding_analysis/pseudotime_analysis/tumor_test"
        self.model_path = "/root/autodl-tmp/Data/Space-Gm/Processed_Dataset/UPMC/model/graph_level/GIN-primary_outcome-0/model_save_6.pt"
        self.device = 'cuda:0'

        # Dataset parameters
        self.dataset_kwargs = {
            'raw_folder_name': 'graph',
            'processed_folder_name': 'tg_graph',
            'node_features': ["cell_type", "SIZE", "biomarker_expression", "neighborhood_composition", "center_coord"],
            'edge_features': ["edge_type", "distance"],
            'cell_type_mapping': None,
            'cell_type_freq': None,
            'biomarkers': ["ASMA", "PANCK", "VIMENTIN", "PODOPLANIN"],
            'subgraph_size': 3,
            'subgraph_source': 'chunk_save',
            'subgraph_allow_distant_edge': True,
            'subgraph_radius_limit': 55 * 3 + 35,
            'biomarker_expression_process_method': "linear",
            'biomarker_expression_lower_bound': 0,
            'biomarker_expression_upper_bound': 18,
            'neighborhood_size': 10,
        }

        # Sampler parameters
        self.sampler_kwargs = {
            'total_samples': 1000,
            'cell_type': 9,
            'region_id': None,
            'batch_size': 64,
            'num_workers': 8,
            'include_node_info': True,
            'random_seed': 42,
        }

        # Pseudo-time analysis parameters
        self.embedding_keys = [
            "expression_vectors", 
            "composition_vectors", 
            "node_embeddings", 
            "graph_embeddings",
            "composition_vectors+expression_vectors",
            "node_embeddings+expression_vectors",
            "graph_embeddings+composition_vectors"
        ]
        self.start_nodes = [0, 1]
        self.biomarkers = ["ASMA", "PANCK", "VIMENTIN", "PODOPLANIN"]
        self.show_plots = True
        self.num_bins = 100
        self.use_bins = True
        self.plotting_transform = 'smooth'

## Step 1: Dataset Initialization
Load and preprocess the cellular graph dataset.

In [None]:
def initialize_dataset(root_path, dataset_kwargs):
    return CellularGraphDataset(root_path, **dataset_kwargs)

config = Config()
dataset = initialize_dataset(config.data_root, config.dataset_kwargs)

## Step 2: Subgraph Sampling
Sample subgraphs based on specific conditions like cell type or region.

In [None]:
def initialize_sampler(dataset, sampler_kwargs):
    return CustomSubgraphSampler(dataset, **sampler_kwargs)

sampler = initialize_sampler(dataset, config.sampler_kwargs)

## Step 3: Embedding Preparation
Prepare and concatenate embeddings for the sampled subgraphs.

In [None]:
from scripts.pseudotime_analysis import prepare_embeddings

sampler = prepare_embeddings(dataset, sampler, config.model_path, config.device, config.embedding_keys)

## Step 4: Pseudo-Time Analysis
Perform dimensionality reduction, clustering, and compute pseudo-time for selected start nodes.

In [None]:
from scripts.pseudotime_analysis import perform_pseudo_time_analysis_pipeline

perform_pseudo_time_analysis_pipeline(config, sampler)

## Step 5: Visualization and Output
Visualize biomarker trends across pseudo-time and save results.