# 1. Postprocessing

In [21]:
import sys
import os
import importlib
import pickle
import veloev as vev

# You should modify the data path to your local data folder.
os.chdir("/mnt/e/project/Benchmark_velocity/veloev/demo/demo")

To begin, you should define the `benchmark_info` configuration dictionary. This dictionary controls the inputs and parameters for the evaluation pipeline. The required keys are:

- methods (list): A list of method names. These names must correspond to the velocity layer keys in adata.layers and the method identifiers in the filenames.

- methods_map (dict): A dictionary for visualization purposes. It maps the internal method names (keys) to their display labels (values).

- datasets_name (list): A list of dataset names. Each name must match a corresponding folder name in your data directory.

- tasks (list): A list of tasks to execute. The package supports eight tasks: directional, temporal, directional_temporal, negative_control, seq_depth_directional, seq_depth_temporal, seq_depth_directional_temporal, and simulation.

- k_folds (list): A list defining the number of cross-validation folds (k-folds) to run for each respective dataset.

- cluster_key (list): The column name in adata.obs containing cluster annotations. Required for directional tasks; set to None otherwise.

- time_key (list): The column name in adata.obs containing latent time or pseudotime. Required for temporal tasks; set to None otherwise.

- cell_type_transitions (list): Defines the ground-truth transitions between cell types for directional tasks.

- time_transitions (list): Defines the ground-truth time transitions for temporal tasks.

If you are using our toy example (available via this [link](https://drive.google.com/drive/folders/1GWvnG897EhheAcX-oKjMq_z5d9Mrd4Ln?usp=sharing)), you can use the configuration below directly. For custom datasets, please adapt the dictionary to follow this template.

In [22]:
benchmark_info = {
    'methods': ['scvelo_stc','sdevelo','unitvelo_uni','velocyto'],
    'methods_map': {'scvelo_stc': 'scVelo (stc)',
                    'sdevelo': 'SDEvelo',
                    'unitvelo_uni': 'UniTVelo (uni)',
                    'velocyto': 'Velocyto'},
    'datasets_name': ['01_bone_marrow', 
                    '07_fucci_u2os',
                    '11_pbmc68k'],
    'tasks': ['directional',
                'temporal',
                'negative_control'],
    'k_fold': [3, 3, 5],
    'cluster_key': ['clusters', None,'celltype'],
    'time_key': [None, 'dtime',None],
    'cell_type_transitions':[[("HSC_1", "Ery_1"), ("HSC_1", "HSC_2"), ("Ery_1", "Ery_2")],
                                None,
                                None],
    'time_transitions': [None,
                         [(0, 1), (1, 2), (2, 3), (3, 4)],
                         None]}

# If use full data, please set k_fold to 0

Then, we provide a function to validate the contents of `benchmark_info` and summarize the benchmark configuration.

In [23]:
vev.pp.check_save_summarize_info(benchmark_info)


ðŸ”¹ STEP 1: VALIDATION CHECK
âœ… Optional Check: 'methods_map' correctly covers all methods.
âœ… Validation Successful: Configuration is valid.
ðŸ’¾ File Saved: benchmark_info.pkl

ðŸ”¹ STEP 2: SUMMARY REPORT
â€¢ Total Methods:  4
  â””â”€ scvelo_stc (scVelo (stc)), sdevelo (SDEvelo), unitvelo_uni (UniTVelo (uni)), velocyto (Velocyto)
â€¢ Total Datasets: 3

ðŸ“‹ Dataset Summary:


Unnamed: 0,Dataset Task,Dataset Names,Count
0,directional,01_bone_marrow,1
1,temporal,07_fucci_u2os,1
2,negative_control,11_pbmc68k,1


In [24]:
with open("benchmark_info.pkl", "rb") as f:
    benchmark_info = pickle.load(f)

You are now ready to begin post-processing.

In [25]:
vev.pp.run_postprocessing(benchmark_info, base_dir='./', n_jobs=20)

ðŸš€ Starting post-processing for 3 datasets...


Processing 11_pbmc68k: 100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 3/3 [03:17<00:00, 65.88s/dataset]    


âœ… Post-processing completed.





In the next tutorial, we will cover the evaluation process!