# Tangelo Velocity Tutorial

This notebook demonstrates how to use **Tangelo Velocity** for multi-modal single-cell RNA velocity estimation with spatial transcriptomics and ATAC-seq data.

## Overview

Tangelo Velocity is a novel computational method that integrates:
- **Spatial transcriptomics** (spatial coordinates)
- **RNA velocity** (spliced/unspliced RNA)
- **ATAC-seq data** (chromatin accessibility)

Using:
- **Dual GraphSAGE architecture** for spatial and expression relationships
- **Regulatory networks** informed by chromatin accessibility
- **ODE modeling** with TorchODE for cell-specific dynamics

## Setup and Imports

In [None]:
# Core imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path

# Single-cell analysis
import scanpy as sc
import muon as mu
import anndata as ad

# Tangelo Velocity
import tangelo_velocity as tv

# Configure scanpy
sc.settings.verbosity = 3
sc.settings.set_figure_params(dpi=100, facecolor='white', figsize=(6, 6))

# Configure matplotlib
plt.rcParams['figure.figsize'] = (8, 6)
plt.rcParams['figure.dpi'] = 100

print(f"Tangelo Velocity version: {tv.__version__}")
print(f"Scanpy version: {sc.__version__}")
print(f"Muon version: {mu.__version__}")

## 1. Data Loading and Preparation

Tangelo Velocity expects a **MuData object** with specific structure:

```
MuData object
  obs: 'x_pixel', 'y_pixel', 'x_position', 'y_position'
  2 modalities:
    rna: n_cells x n_genes
      layers: 'spliced', 'unspliced', 'open_chromatin'
    atac: n_cells x n_peaks
      layers: 'counts', 'X_tfidf'
```

In [None]:
# Load your multi-modal data
# Replace this with your actual data path
data_path = "your_multimodal_data.h5mu"

# For demonstration, we'll create example code structure
# In practice, uncomment the following line:
# adata = mu.read_h5mu(data_path)

# For this tutorial, we'll show what the data should look like
print("Expected data structure:")
print("""
MuData object with n_obs × n_vars = 2399 × 191211
  obs:    'x_pixel', 'y_pixel', 'x_position', 'y_position'
  2 modalities
    atac:    2399 x 144347
      layers:    'counts', 'X_tfidf'
      obsm:    'X_lsi'
    rna:    2399 x 20322
      layers:    'spliced', 'unspliced', 'M_u', 'M_s', 'open_chromatin'
      obsm:    'X_pca'
      obsp:    'moments_con'
""")

# If you don't have data ready, uncomment this to create synthetic data
# adata = create_synthetic_mudata()  # We'll define this function below

In [None]:
def create_synthetic_mudata(n_cells=1000, n_genes=2000, n_peaks=5000):
    """
    Create synthetic multi-modal data for demonstration.
    In practice, replace this with your actual data loading.
    """
    print(f"Creating synthetic data: {n_cells} cells, {n_genes} genes, {n_peaks} peaks")
    
    # Create spatial coordinates (simulate 2D tissue)
    np.random.seed(42)
    x_pixel = np.random.uniform(0, 100, n_cells)
    y_pixel = np.random.uniform(0, 100, n_cells)
    
    # Create RNA data
    spliced = np.random.negative_binomial(20, 0.3, (n_cells, n_genes)).astype(np.float32)
    unspliced = np.random.negative_binomial(10, 0.4, (n_cells, n_genes)).astype(np.float32)
    
    # Simulate open chromatin (binary accessibility)
    open_chromatin = np.random.binomial(1, 0.2, (n_cells, n_genes)).astype(np.float32)
    
    # Create moments (for scVelo compatibility)
    M_s = spliced + np.random.normal(0, 0.1, spliced.shape)
    M_u = unspliced + np.random.normal(0, 0.1, unspliced.shape)
    
    # Create RNA AnnData
    rna_adata = ad.AnnData(X=spliced)
    rna_adata.layers['spliced'] = spliced
    rna_adata.layers['unspliced'] = unspliced
    rna_adata.layers['open_chromatin'] = open_chromatin
    rna_adata.layers['M_s'] = M_s
    rna_adata.layers['M_u'] = M_u
    
    # Add gene names
    rna_adata.var_names = [f"Gene_{i}" for i in range(n_genes)]
    
    # Create ATAC data  
    atac_counts = np.random.negative_binomial(5, 0.1, (n_cells, n_peaks)).astype(np.float32)
    
    # Simulate TF-IDF transformation
    from sklearn.feature_extraction.text import TfidfTransformer
    tfidf = TfidfTransformer()
    atac_tfidf = tfidf.fit_transform(atac_counts).toarray().astype(np.float32)
    
    # Create ATAC AnnData
    atac_adata = ad.AnnData(X=atac_counts)
    atac_adata.layers['counts'] = atac_counts
    atac_adata.layers['X_tfidf'] = atac_tfidf
    
    # Add peak names
    atac_adata.var_names = [f"Peak_{i}" for i in range(n_peaks)]
    
    # Create MuData object
    mdata = mu.MuData({'rna': rna_adata, 'atac': atac_adata})
    
    # Add spatial coordinates
    mdata.obs['x_pixel'] = x_pixel
    mdata.obs['y_pixel'] = y_pixel
    mdata.obs['x_position'] = x_pixel / 100  # Normalized coordinates
    mdata.obs['y_position'] = y_pixel / 100
    
    # Add cell metadata
    mdata.obs['cell_type'] = np.random.choice(['TypeA', 'TypeB', 'TypeC'], n_cells)
    
    print(f"Created MuData object: {mdata}")
    return mdata

# Create synthetic data for demonstration
adata = create_synthetic_mudata(n_cells=1000, n_genes=500, n_peaks=1000)

In [None]:
# Inspect the data structure
print("=== Data Overview ===")
print(adata)
print("\n=== RNA modality ===")
print(adata['rna'])
print("\n=== ATAC modality ===")
print(adata['atac'])
print("\n=== Spatial coordinates ===")
print(adata.obs[['x_pixel', 'y_pixel', 'cell_type']].head())

## 2. Basic Velocity Estimation

The simplest way to estimate velocities with Tangelo Velocity:

In [None]:
# Basic velocity estimation (Stage 3 - Integrated Model)
print("=== Basic Velocity Estimation ===")

# One-line velocity estimation
tv.estimate_velocity(adata, stage=3)

# Check results
print(f"\nVelocity computed: {'velocity' in adata['rna'].layers}")
if 'velocity' in adata['rna'].layers:
    print(f"Velocity shape: {adata['rna'].layers['velocity'].shape}")
    print(f"Mean velocity magnitude: {np.linalg.norm(adata['rna'].layers['velocity'], axis=1).mean():.4f}")

## 3. Advanced API Usage

For more control over the estimation process:

In [None]:
print("=== Advanced Velocity Estimation ===")

# Create custom configuration
config = tv.TangeloConfig(
    development_stage=3,
    graph=tv.config.GraphConfig(
        n_neighbors_spatial=8,
        n_neighbors_expression=15,
        use_node2vec=False,  # Disable for speed in demo
    ),
    encoder=tv.config.EncoderConfig(
        latent_dim=32,  # Smaller for demo
        hidden_dims=(128, 64, 32),
        fusion_method="sum",  # Simpler fusion for demo
    ),
    training=tv.config.TrainingConfig(
        n_epochs=20,  # Reduced for demo
        learning_rate=1e-3,
        batch_size=256,
    )
)

print("Configuration created:")
print(f"  - Stage: {config.development_stage}")
print(f"  - Spatial neighbors: {config.graph.n_neighbors_spatial}")
print(f"  - Expression neighbors: {config.graph.n_neighbors_expression}")
print(f"  - Latent dimension: {config.encoder.latent_dim}")
print(f"  - Training epochs: {config.training.n_epochs}")

In [None]:
# Initialize model with custom configuration
model = tv.TangeloVelocity(config=config)

# Fit the model
print("Fitting Tangelo Velocity model...")
model.fit(adata.copy())  # Use copy to preserve original

print("\nModel training completed!")
print(f"Model fitted: {model.is_fitted}")

## 4. Extracting Model Components

Tangelo Velocity provides access to learned representations and parameters:

In [None]:
print("=== Model Component Extraction ===")

# Get latent representations
latent_reps = model.get_latent_representations()
print("\nLatent Representations:")
for name, tensor in latent_reps.items():
    print(f"  {name}: {tensor.shape}")

# Get ODE parameters
ode_params = model.get_ode_parameters()
print("\nODE Parameters:")
for name, tensor in ode_params.items():
    print(f"  {name}: {tensor.shape} (mean: {tensor.mean().item():.4f})")

# Get interaction network
interaction_matrix = model.get_interaction_network()
print(f"\nInteraction Network: {interaction_matrix.shape}")
print(f"  Non-zero interactions: {(interaction_matrix != 0).sum().item()}")
print(f"  Interaction strength range: [{interaction_matrix.min().item():.4f}, {interaction_matrix.max().item():.4f}]")

In [None]:
# Visualize latent representations
fig, axes = plt.subplots(1, 3, figsize=(15, 4))

# Convert to numpy for plotting
spatial_latent = latent_reps['spatial_latent'].detach().cpu().numpy()
expression_latent = latent_reps['expression_latent'].detach().cpu().numpy()
combined_latent = latent_reps['combined_latent'].detach().cpu().numpy()

# PCA for visualization
from sklearn.decomposition import PCA
pca = PCA(n_components=2)

# Plot spatial latent space
spatial_pca = pca.fit_transform(spatial_latent)
axes[0].scatter(spatial_pca[:, 0], spatial_pca[:, 1], c=adata.obs['x_pixel'], cmap='viridis', s=20)
axes[0].set_title('Spatial Latent Space\n(colored by x-position)')
axes[0].set_xlabel('PC1')
axes[0].set_ylabel('PC2')

# Plot expression latent space  
expr_pca = pca.fit_transform(expression_latent)
cell_types = adata.obs['cell_type'].astype('category')
for i, ct in enumerate(cell_types.cat.categories):
    mask = cell_types == ct
    axes[1].scatter(expr_pca[mask, 0], expr_pca[mask, 1], label=ct, s=20)
axes[1].set_title('Expression Latent Space\n(colored by cell type)')
axes[1].set_xlabel('PC1')
axes[1].legend()

# Plot combined latent space
combined_pca = pca.fit_transform(combined_latent)
axes[2].scatter(combined_pca[:, 0], combined_pca[:, 1], c=adata.obs['y_pixel'], cmap='plasma', s=20)
axes[2].set_title('Combined Latent Space\n(colored by y-position)')
axes[2].set_xlabel('PC1')

plt.tight_layout()
plt.show()

## 5. Downstream Analysis

Compute velocity graph and embedding for standard single-cell velocity analysis:

In [None]:
print("=== Downstream Velocity Analysis ===")

# Compute velocity graph
model.compute_velocity_graph(adata, n_neighbors=30)
print("Velocity graph computed")

# Compute UMAP embedding for visualization
sc.pp.neighbors(adata['rna'], n_neighbors=15)
sc.tl.umap(adata['rna'])
print("UMAP embedding computed")

# Compute velocity embedding
model.compute_velocity_embedding(adata, basis="umap")
print("Velocity embedding computed")

print(f"\nAvailable embeddings: {list(adata['rna'].obsm.keys())}")

In [None]:
# Visualize velocity results
fig, axes = plt.subplots(2, 2, figsize=(12, 10))

# UMAP colored by cell type
sc.pl.umap(adata['rna'], color='cell_type', ax=axes[0,0], show=False, frameon=False)
axes[0,0].set_title('UMAP - Cell Types')

# UMAP with velocity arrows (if available)
try:
    sc.pl.velocity_embedding_stream(adata['rna'], basis='umap', ax=axes[0,1], show=False)
    axes[0,1].set_title('Velocity Stream Plot')
except:
    # Fallback if velocity embedding not available
    sc.pl.umap(adata['rna'], ax=axes[0,1], show=False, frameon=False)
    axes[0,1].set_title('UMAP (velocity embedding unavailable)')

# Spatial plot colored by cell type
axes[1,0].scatter(adata.obs['x_pixel'], adata.obs['y_pixel'], 
                 c=[plt.cm.tab10(i) for i in adata.obs['cell_type'].astype('category').cat.codes], 
                 s=20, alpha=0.7)
axes[1,0].set_xlabel('X pixel')
axes[1,0].set_ylabel('Y pixel')
axes[1,0].set_title('Spatial Distribution - Cell Types')

# ODE parameter distribution
if 'tangelo_beta' in adata['rna'].obs.columns:
    axes[1,1].hist(adata['rna'].obs['tangelo_beta'], bins=30, alpha=0.7, label='Splicing rate (β)')
    axes[1,1].hist(adata['rna'].obs['tangelo_gamma'], bins=30, alpha=0.7, label='Degradation rate (γ)')
    axes[1,1].set_xlabel('Parameter value')
    axes[1,1].set_ylabel('Frequency')
    axes[1,1].set_title('ODE Parameter Distributions')
    axes[1,1].legend()
else:
    axes[1,1].text(0.5, 0.5, 'ODE parameters\nnot available', 
                  ha='center', va='center', transform=axes[1,1].transAxes)
    axes[1,1].set_title('ODE Parameters')

plt.tight_layout()
plt.show()

## 6. Stage Comparison

Compare velocity estimates across different development stages:

In [None]:
print("=== Stage Comparison ===")

# Compare stages 1, 2, and 3 (reduced epochs for demo)
stage_results = {}

for stage in [1, 2, 3]:
    print(f"\n--- Stage {stage} ---")
    
    # Create stage-specific config with reduced epochs
    stage_config = tv.config.get_stage_config(stage)
    stage_config.training.n_epochs = 10  # Reduced for demo
    
    # Train model
    stage_model = tv.TangeloVelocity(config=stage_config)
    stage_results[stage] = stage_model.fit(adata.copy(), copy=True)
    
    # Compute basic statistics
    velocity = stage_results[stage]['rna'].layers['velocity']
    velocity_magnitude = np.linalg.norm(velocity, axis=1)
    
    print(f"  Mean velocity magnitude: {velocity_magnitude.mean():.4f}")
    print(f"  Velocity magnitude std: {velocity_magnitude.std():.4f}")
    print(f"  Max velocity magnitude: {velocity_magnitude.max():.4f}")

In [None]:
# Visualize stage comparison
fig, axes = plt.subplots(1, 3, figsize=(15, 4))

for i, stage in enumerate([1, 2, 3]):
    velocity = stage_results[stage]['rna'].layers['velocity']
    velocity_magnitude = np.linalg.norm(velocity, axis=1)
    
    # Histogram of velocity magnitudes
    axes[i].hist(velocity_magnitude, bins=30, alpha=0.7, density=True)
    axes[i].set_xlabel('Velocity Magnitude')
    axes[i].set_ylabel('Density')
    axes[i].set_title(f'Stage {stage}\nVelocity Distribution')
    axes[i].axvline(velocity_magnitude.mean(), color='red', linestyle='--', 
                   label=f'Mean: {velocity_magnitude.mean():.3f}')
    axes[i].legend()

plt.tight_layout()
plt.show()

# Summary comparison
print("\n=== Stage Comparison Summary ===")
comparison_df = pd.DataFrame({
    'Stage': [1, 2, 3],
    'Mean_Velocity': [np.linalg.norm(stage_results[s]['rna'].layers['velocity'], axis=1).mean() 
                     for s in [1, 2, 3]],
    'Std_Velocity': [np.linalg.norm(stage_results[s]['rna'].layers['velocity'], axis=1).std() 
                    for s in [1, 2, 3]],
    'Max_Velocity': [np.linalg.norm(stage_results[s]['rna'].layers['velocity'], axis=1).max() 
                    for s in [1, 2, 3]]
})

print(comparison_df.round(4))

## 7. Configuration Management

Save and load configurations for reproducible experiments:

In [None]:
print("=== Configuration Management ===")

# Create and save a custom configuration
custom_config = tv.TangeloConfig(
    development_stage=3,
    graph=tv.config.GraphConfig(
        n_neighbors_spatial=10,
        n_neighbors_expression=20,
        use_node2vec=True,
        node2vec_dim=64,
    ),
    encoder=tv.config.EncoderConfig(
        latent_dim=64,
        hidden_dims=(512, 256, 128),
        fusion_method="attention",
    ),
    training=tv.config.TrainingConfig(
        n_epochs=200,
        learning_rate=5e-4,
        batch_size=1024,
    )
)

# Save configuration
config_path = Path("custom_tangelo_config.yaml")
custom_config.save_yaml(config_path)
print(f"Configuration saved to: {config_path}")

# Load configuration
loaded_config = tv.TangeloConfig.from_yaml(config_path)
print(f"Configuration loaded successfully")
print(f"  - Stage: {loaded_config.development_stage}")
print(f"  - Spatial neighbors: {loaded_config.graph.n_neighbors_spatial}")
print(f"  - Use Node2Vec: {loaded_config.graph.use_node2vec}")
print(f"  - Latent dim: {loaded_config.encoder.latent_dim}")
print(f"  - Fusion method: {loaded_config.encoder.fusion_method}")

## 8. Analysis and Metrics

Evaluate velocity quality and perform trajectory analysis:

In [None]:
print("=== Velocity Metrics and Analysis ===")

# Use the best result from stage comparison
best_result = stage_results[3]  # Stage 3 result

# Initialize velocity metrics
try:
    metrics = tv.analysis.VelocityMetrics(best_result)
    summary = metrics.summary()
    
    print("Velocity Quality Metrics:")
    for metric, value in summary.items():
        print(f"  {metric}: {value:.4f}")
        
except Exception as e:
    print(f"Metrics computation not available: {e}")
    
    # Compute basic metrics manually
    velocity = best_result['rna'].layers['velocity']
    velocity_magnitude = np.linalg.norm(velocity, axis=1)
    
    print("Basic Velocity Statistics:")
    print(f"  Mean magnitude: {velocity_magnitude.mean():.4f}")
    print(f"  Std magnitude: {velocity_magnitude.std():.4f}")
    print(f"  Min magnitude: {velocity_magnitude.min():.4f}")
    print(f"  Max magnitude: {velocity_magnitude.max():.4f}")
    print(f"  Non-zero velocities: {(velocity_magnitude > 1e-6).sum()}/{len(velocity_magnitude)}")

## 9. Saving Results

Save your velocity estimation results:

In [None]:
print("=== Saving Results ===")

# Save the final result
output_path = Path("tangelo_velocity_results.h5mu")
best_result.write_h5mu(output_path)
print(f"Results saved to: {output_path}")

# Save velocity matrix separately
velocity_path = Path("velocity_matrix.csv")
velocity_df = pd.DataFrame(
    best_result['rna'].layers['velocity'],
    index=best_result['rna'].obs_names,
    columns=best_result['rna'].var_names
)
velocity_df.to_csv(velocity_path)
print(f"Velocity matrix saved to: {velocity_path}")

# Save cell metadata with ODE parameters
metadata_path = Path("cell_metadata.csv")
metadata_df = best_result.obs.copy()
metadata_df.to_csv(metadata_path)
print(f"Cell metadata saved to: {metadata_path}")

print("\nAll results saved successfully!")

## Summary

This tutorial demonstrated:

1. **Data preparation** for Tangelo Velocity (MuData with spatial, RNA, and ATAC modalities)
2. **Basic velocity estimation** with one-line interface
3. **Advanced API usage** with custom configurations
4. **Model component extraction** (latent representations, ODE parameters, interaction networks)
5. **Downstream analysis** (velocity graphs, embeddings, visualization)
6. **Stage comparison** across different development stages
7. **Configuration management** for reproducible experiments
8. **Quality metrics** and analysis tools
9. **Result saving** and export

## Next Steps

- **Use your own data**: Replace the synthetic data with your multi-modal dataset
- **Experiment with stages**: Try different development stages (1-4) based on your needs
- **Tune parameters**: Adjust graph construction, encoder architecture, and training parameters
- **Advanced analysis**: Explore perturbation analysis and trajectory modeling
- **Visualization**: Create publication-ready plots with the plotting module

For more information, see the [Tangelo Velocity documentation](https://github.com/yourusername/tangelo-velocity).