# MuVIcell Tutorial: Multi-View Integration for Sample-Aggregated Single-Cell Data

This notebook demonstrates how to use the MuVIcell package for multi-view integration and analysis of sample-aggregated single-cell data using MuVI (Multi-View Integration).

## Overview

MuVIcell provides a streamlined workflow for:
1. **Generating/Loading** multi-view data in muon format (samples x features)
2. **Preprocessing** data for MuVI analysis
3. **Running MuVI** to identify latent factors using `muvi.tl.from_mdata`
4. **Analyzing** and interpreting factors
5. **Visualizing** results

Note: Each row represents a **sample** (not individual cells) and views contain **cell type aggregated data per sample**.

In [None]:
import numpy as np
import pandas as pd
import muon as mu
import warnings
warnings.filterwarnings('ignore')

# Import muvicell modules
from muvicell import synthetic, preprocessing, analysis, visualization

# Import MuVI directly to show compatibility
try:
    import muvi
    import muvi.tl
    MUVI_AVAILABLE = True
    print("MuVI is available")
except ImportError:
    MUVI_AVAILABLE = False
    print("MuVI not available - using mock implementation")

# Set random seed for reproducibility
np.random.seed(42)

## 1. Generate Synthetic Multi-View Data

Generate synthetic multi-view data with 3 views (5, 10, 15 features) and 200 samples:

In [None]:
# Generate synthetic data as specified
mdata = synthetic.generate_synthetic_data(
    n_samples=200,  # Note: samples, not cells
    view_configs={
        'view1': {'n_vars': 5, 'sparsity': 0.3},
        'view2': {'n_vars': 10, 'sparsity': 0.4},
        'view3': {'n_vars': 15, 'sparsity': 0.5}
    }
)

print(f"Generated data shape: {mdata.shape}")
print(f"Views: {list(mdata.mod.keys())}")
for view_name, view_data in mdata.mod.items():
    print(f"  {view_name}: {view_data.shape}")

## 2. Add Latent Factor Structure

Add realistic latent factor structure to the synthetic data:

In [None]:
# Add latent structure with shared metadata
mdata_structured = synthetic.add_latent_structure(
    mdata, 
    n_latent_factors=5
)

print(f"Sample metadata columns: {list(mdata_structured.obs.columns)}")
print(f"Unique cell types: {mdata_structured.obs['cell_type'].unique()}")
print(f"Unique conditions: {mdata_structured.obs['condition'].unique()}")

## 3. Preprocess Data for MuVI

Apply preprocessing pipeline (optimized for synthetic data):

In [None]:
# Preprocess for MuVI analysis
mdata_processed = preprocessing.preprocess_for_muvi(
    mdata_structured,
    filter_cells=False,  # Don't filter synthetic data
    filter_genes=False,  # Don't filter synthetic data
    normalize=True,
    find_hvg=False,      # Skip HVG for synthetic data
    subset_hvg=False
)

print(f"Preprocessed data shape: {mdata_processed.shape}")
print("Data ready for MuVI analysis")

## 4. Run MuVI Analysis

Run MuVI using the exact same API as the original analysis:

In [None]:
# Run MuVI using the exact same pattern as original analysis
if MUVI_AVAILABLE:
    model = muvi.tl.from_mdata(
        mdata_processed,
        n_factors=10,
        nmf=False,
        device='cpu'
    )
    
    # Fit the model
    model.fit()
    
    print(f"MuVI model fitted with {model.n_factors} factors")
else:
    # Use mock implementation for demonstration
    from muvicell.muvi_runner import _create_mock_muvi_model
    model = _create_mock_muvi_model(mdata_processed, n_factors=10)
    model.fit()
    print("Mock MuVI model fitted with 10 factors")

# Access the updated mdata with MuVI results
mdata_muvi = model.mdata
print(f"Factor scores shape: {mdata_muvi.obsm['X_muvi'].shape}")

## 5. Characterize Factors

Identify top genes contributing to each factor:

In [None]:
# Characterize factors by top genes
factor_genes = analysis.characterize_factors(
    model,
    top_genes_per_factor=3,
    loading_threshold=0.05
)

print("Top genes for each factor:")
for view_name, df in factor_genes.items():
    if len(df) > 0:
        print(f"\n{view_name}:")
        print(df.head(10))
    else:
        print(f"\n{view_name}: No significant genes found")

## 6. Factor Analysis

Analyze factor relationships and associations:

In [None]:
# Calculate factor correlations
factor_correlations = analysis.calculate_factor_correlations(model)
print("Factor correlation matrix:")
print(factor_correlations.round(3))

# Identify factor-metadata associations
associations = analysis.identify_factor_associations(
    model,
    categorical_test='kruskal'
)
print("\nFactor-metadata associations:")
print(associations.head(10))

## 7. Sample Clustering

Cluster samples based on factor scores:

In [None]:
# Cluster samples based on factors
clusters = analysis.cluster_cells_by_factors(
    model,
    n_clusters=3,
    factors_to_use=None  # Use all factors
)

# Add clusters to sample metadata
mdata_muvi.obs['factor_clusters'] = clusters
print(f"Created {len(np.unique(clusters))} sample clusters")
print(f"Cluster distribution: {np.bincount(clusters)}")

## 8. Visualizations

Create publication-ready plots using plotnine:

In [None]:
# 1. Variance explained plot
p1 = visualization.plot_variance_explained(model, max_factors=10)
print("Variance explained by factors:")
p1.show()

In [None]:
# 2. Factor scores colored by cell type
p2 = visualization.plot_factor_scores(
    model,
    factors=[0, 1],
    color_by='cell_type'
)
print("Factor scores by cell type:")
p2.show()

In [None]:
# 3. Factor loadings heatmap
p3 = visualization.plot_factor_loadings(
    model,
    view='view1',
    factors=[0, 1, 2],
    top_genes=5
)
print("Factor loadings for view1:")
p3.show()

In [None]:
# 4. Factor associations heatmap
p4 = visualization.plot_factor_associations(
    model,
    associations_df=associations
)
print("Factor-metadata associations:")
p4.show()

In [None]:
# 5. Factor comparison across cell types
p5 = visualization.plot_factor_comparison(
    model,
    factors=[0, 1, 2],
    group_by='cell_type',
    plot_type='boxplot'
)
print("Factor activity by cell type:")
p5.show()

## 9. Summary

Final analysis summary:

In [None]:
# Summary statistics
print("=== MuVIcell Analysis Summary ===")
print(f"Samples analyzed: {mdata_muvi.n_obs}")
print(f"Total features across views: {sum(v.n_vars for v in mdata_muvi.mod.values())}")
print(f"MuVI factors: {mdata_muvi.obsm['X_muvi'].shape[1]}")
print(f"Sample clusters: {len(np.unique(clusters))}")

# Show some factor-metadata associations
significant_assoc = associations[associations['p_value'] < 0.05]
print(f"\nSignificant factor-metadata associations: {len(significant_assoc)}")
if len(significant_assoc) > 0:
    print(significant_assoc[['factor', 'metadata', 'p_value', 'effect_size']].head())

print("\n✅ MuVIcell analysis completed successfully!")