SCOPE combines:
- Random Forest classifiers for cell fate prediction
- Label spreading on k-nearest neighbor graphs
- Conformal prediction for uncertainty quantification
- Iterative learning with progressive recruitment of unlabeled cells
# Add SCOPE to your Python path
export PYTHONPATH="/path/to/SCOPE:$PYTHONPATH"
# Or add in your script
import sys
sys.path.append('/path/to/SCOPE')import anndata as ad
from main import SCOPE
# Load your data (must have Palantir pseudotime already computed)
data = ad.read_h5ad('your_data.h5ad')
# Create terminal_state_cluster column
# Set known terminal states to their labels, unknown cells to 'TBD'
data.obs['terminal_state_cluster'] = ...
# Initialize SCOPE
scope = SCOPE(
data=data,
feature_key='imputed_hvg', # Key in data.obsm with features
latent_key='vae_latent_space', # Key in data.obsm with latent space
alpha=0.1, # Label spreading parameter
iter_graph=100, # Label spreading iterations
initial_trees=100 # Initial RF trees
)
# Run SCOPE pipeline
scope.prepare_data() \
.initialize_conformal_result() \
.build_graph() \
.initialize_classifiers() \
.run_all() \
.save_results('output_dir')Your AnnData object must contain:
terminal_state_cluster: Cell fate labels- Known terminal states: 'Monocyte', 'Neutrophil', etc.
- Unknown cells: 'TBD' (To Be Determined)
palantir_pseudotime: Pseudotime values from Palantir
- Feature matrix (e.g.,
imputed_hvg): Cell × genes matrix for classification- OR use
.Xdirectly if no imputation available (setuse_X=True)
- OR use
- Latent space (e.g.,
vae_latent_space): Low-dimensional embedding for graph construction
scope = SCOPE(
data, # AnnData object
feature_key='imputed_hvg', # Feature matrix key (ignored if use_X=True)
latent_key='vae_latent_space', # Latent space key
alpha=0.1, # Label spreading alpha
omit_tail=0, # Prediction set tail omission
iter_graph=100, # Label spreading iterations
initial_trees=100, # Initial RF trees
trees_per_iteration=50, # Trees added per iteration
n_neighbors=10, # k-NN graph neighbors
use_X=False # If True, use data.X instead of data.obsm[feature_key]
)Prepares data for SCOPE analysis:
- Identifies TBD (unlabeled) cells
- Creates dummy labels for terminal states
- Sorts cells by pseudotime
- Initializes visit tracking
Returns: self (for method chaining)
Initializes conformal prediction structures:
- Sets recruitment size (√n_cells)
- Creates result dictionaries for qhat, prediction sets, sizes
Returns: self (for method chaining)
Builds k-nearest neighbor graph for label spreading:
- Constructs graph from latent space
- Uses specified number of neighbors
- Euclidean distance metric
Returns: self (for method chaining)
Creates random forest classifiers:
- One classifier per terminal state
- Initialized with specified number of trees
- Binary classification setup
Returns: self (for method chaining)
Runs one iteration of SCOPE prediction:
- Updates classifiers (adds trees if not first iteration)
- Selects test cells based on pseudotime
- Trains random forest classifiers
- Applies label spreading on graph
- Computes conformal prediction sets
- Updates cell labels
Returns: bool - True if successful, False if no cells left
Use case: When you want control over each iteration
scope.prepare_data() \
.initialize_conformal_result() \
.build_graph() \
.initialize_classifiers()
# Run iterations manually
while scope.run_scope():
print(f"Completed iteration {scope.iteration}")
# Do custom analysis between iterationsRuns SCOPE until all cells are processed:
- Repeatedly calls
run_scope()until complete - Tracks total execution time
- Prints summary
Returns: self (for method chaining)
Saves SCOPE results to disk:
data_complete_results.h5ad: Annotated data with predictionsconformal_result.pkl: Conformal prediction results
Parameters:
output_dir(str): Directory to save results
After running SCOPE, your data object contains:
visit: 1 if cell has been labeled, 0 if still TBDiteration_recruited: Which iteration cell was labeled (-1 for initial)terminal_state_cluster: Updated with predictions
dummy_label: Probabilistic labels (cells × terminal states)
- Stored per iteration in
.varmor.uns
conformal_result = {
'recruitment_size': int, # Cells recruited per iteration
'qhat': [float, ...], # Threshold per iteration
'size': DataFrame, # Prediction set sizes
'prediction_set': DataFrame, # Prediction sets per cell
'prob_test': DataFrame # Test probabilities
}-
alpha(default: 0.1): Controls label smoothness- Lower values: More spreading, smoother labels
- Higher values: Less spreading, preserve initial predictions
- Range: [0, 1]
-
iter_graph(default: 100): Label spreading iterations- More iterations: Better convergence
- Typical range: 50-200
-
initial_trees(default: 100): Starting number of trees- More trees: Better initial predictions, slower
- Typical range: 50-200
-
trees_per_iteration(default: 50): Trees added per iteration- Gradual increase in model complexity
- Typical range: 20-100
n_neighbors(default: 10): k-NN graph neighbors- More neighbors: Smoother predictions, higher computation
- Typical range: 5-30
omit_tail(default: 0): Probability mass to omit from prediction sets- Removes unlikely predictions
- Range: [0, 0.5]
# See: analysis/Setty/SCOPE-Setty-refactored.py
data = ad.read_h5ad('palantir_bone_marrow.h5ad')
data.obs['terminal_state_cluster'] = data.obs['cluster']
scope = SCOPE(
data=data,
feature_key='imputed_hvg',
latent_key='vae_latent_space'
)# See: analysis/Persad/RNA/SCOPE-Persad-RNA-refactored.py
data = ad.read_h5ad('cd34_multiome_rna.h5ad')
# Set terminal states: DC, Ery, Mono
# Set others to TBD
scope = SCOPE(
data=data,
feature_key='imputed_hvg',
latent_key='vae_latent_space'
)# See: analysis/Weinreb/SCOPE-Larry-refactored.py
data = ad.read_h5ad('larry_dataset.h5ad')
# Use day 6 terminal states as labeled
# Rest as TBD
# Option 1: Use data.X directly (no imputation)
scope = SCOPE(
data=data,
latent_key='X_pca',
use_X=True # Use data.X instead of data.obsm
)
# Option 2: If you have feature matrix in obsm
scope = SCOPE(
data=data,
feature_key='highly_variable_genes',
latent_key='X_pca',
use_X=False
)1. Prepare Data
↓
- Load AnnData with Palantir results
- Create terminal_state_cluster column
- Mark known terminal states
- Mark unknown cells as 'TBD'
2. Initialize SCOPE
↓
- Set feature and latent space keys
- Configure parameters (alpha, trees, etc.)
3. Prepare Data
↓
scope.prepare_data()
- Sort by pseudotime
- Create dummy labels
- Initialize tracking
4. Initialize Conformal Result
↓
scope.initialize_conformal_result()
- Set recruitment size
- Create result structures
5. Build Graph
↓
scope.build_graph()
- Construct k-NN graph from latent space
6. Initialize Classifiers
↓
scope.initialize_classifiers()
- Create RF classifiers for each terminal state
7. Run SCOPE
↓
scope.run_all() # or scope.run_scope() for single iteration
For each iteration:
- Update classifiers (add trees)
- Select test cells by pseudotime
- Train RF on labeled cells
- Apply label spreading
- Compute conformal prediction sets
- Update labels
- Recruit new cells
8. Save Results
↓
scope.save_results('output_dir')
- Save annotated data
- Save conformal results
-
Start with defaults: The default parameters work well for most datasets
-
Pseudotime matters: Ensure Palantir pseudotime is well-calibrated
-
Terminal states: Be conservative - only mark cells you're confident about
-
Monitor progress: Use
run_scope()in a loop to track each iteration -
Feature selection: Use highly variable genes or imputed features
-
Latent space: VAE or PCA embeddings work well for graph construction
If you use SCOPE in your research, please cite:
[Citation information to be added]
[License information to be added]
For questions or issues, please [contact information or link to issues page]