# MuVIcell Basic Usage Example

This notebook demonstrates the basic functionality of the MuVIcell package for analyzing multicellular coordination and cell-type specific features.

## Installation

First, make sure you have installed MuVIcell with the notebook dependencies:

```bash
# With uv (recommended)
uv add muvicell[notebooks]

# Or with pip
pip install muvicell[notebooks]
```

In [None]:
# Import necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Import MuVIcell components
from muvicell import MuVIcellAnalyzer, load_data, visualize_results
from muvicell.utils import generate_sample_data, validate_data, calculate_correlation_matrix

# Set up plotting style
plt.style.use('default')
sns.set_palette("husl")

print("MuVIcell imported successfully!")

## 1. Generate Sample Data

Let's start by generating some sample data to work with:

In [None]:
# Generate sample data
data = generate_sample_data(
    n_samples=200,
    n_features=8,
    n_cell_types=4,
    random_state=42
)

print(f"Generated data shape: {data.shape}")
print(f"Columns: {list(data.columns)}")
print(f"Cell types: {data['cell_type'].unique()}")

# Display first few rows
data.head()

## 2. Data Validation

Before analysis, let's validate our data:

In [None]:
# Validate the data
validation_report = validate_data(data, required_columns=['cell_type'])

print("Validation Report:")
print(f"  Shape: {validation_report['shape']}")
print(f"  Duplicate rows: {validation_report['duplicate_rows']}")
print(f"  Validation passed: {validation_report['validation_passed']}")

if validation_report['issues']:
    print("  Issues found:")
    for issue in validation_report['issues']:
        print(f"    - {issue}")
else:
    print("  No issues found!")

## 3. Initialize MuVIcell Analyzer

Now let's create a MuVIcell analyzer instance and load our data:

In [None]:
# Initialize the analyzer
analyzer = MuVIcellAnalyzer(data=data)

# Get summary
summary = analyzer.get_summary()
print("Analyzer Summary:")
for key, value in summary.items():
    print(f"  {key}: {value}")

## 4. Data Preprocessing

Let's preprocess the data by normalizing features:

In [None]:
# Preprocess the data
processed_data = analyzer.preprocess_data(
    normalize=True,
    remove_outliers=True,
    outlier_threshold=3.0
)

print(f"Original data shape: {data.shape}")
print(f"Processed data shape: {processed_data.shape}")

# Compare distributions before and after preprocessing
fig, axes = plt.subplots(1, 2, figsize=(15, 5))

# Original data
data[['feature_1', 'feature_2', 'feature_3']].hist(bins=20, ax=axes[0], alpha=0.7)
axes[0].set_title('Original Data Distribution')

# Processed data
processed_data[['feature_1', 'feature_2', 'feature_3']].hist(bins=20, ax=axes[1], alpha=0.7)
axes[1].set_title('Normalized Data Distribution')

plt.tight_layout()
plt.show()

## 5. Cell-Type Feature Analysis

Now let's analyze cell-type specific features:

In [None]:
# Define feature columns to analyze
feature_cols = ['feature_1', 'feature_2', 'feature_3', 'feature_4']

# Analyze cell features using different methods
mean_results = analyzer.analyze_cell_features(
    cell_type_col='cell_type',
    feature_cols=feature_cols,
    method='mean'
)

print("Mean feature values by cell type:")
for cell_type, features in mean_results.items():
    print(f"\n{cell_type}:")
    for feature, value in features.items():
        print(f"  {feature}: {value:.3f}")

In [None]:
# Create visualization of cell-type features
fig = visualize_results(
    data=processed_data,
    x_col='cell_type',
    y_col='feature_1',
    plot_type='box',
    figsize=(10, 6)
)
plt.title('Feature 1 Distribution by Cell Type')
plt.show()

## 6. Correlation Analysis

Let's examine correlations between features:

In [None]:
# Calculate correlation matrix
from muvicell.utils import plot_correlation_heatmap

corr_matrix = calculate_correlation_matrix(processed_data)

# Plot correlation heatmap
fig = plot_correlation_heatmap(corr_matrix, figsize=(10, 8))
plt.show()

print("\nCorrelation Matrix:")
print(corr_matrix.round(3))

## 7. Multi-Feature Visualization

Let's create some more complex visualizations:

In [None]:
# Scatter plot with cell type coloring
fig = visualize_results(
    data=processed_data,
    x_col='feature_1',
    y_col='feature_2',
    hue_col='cell_type',
    plot_type='scatter',
    figsize=(10, 8)
)
plt.title('Feature 1 vs Feature 2 by Cell Type')
plt.show()

In [None]:
# Create a comprehensive feature comparison
fig, axes = plt.subplots(2, 2, figsize=(15, 12))
axes = axes.ravel()

for i, feature in enumerate(feature_cols):
    sns.violinplot(data=processed_data, x='cell_type', y=feature, ax=axes[i])
    axes[i].set_title(f'{feature.replace("_", " ").title()} by Cell Type')
    axes[i].tick_params(axis='x', rotation=45)

plt.tight_layout()
plt.show()

## 8. Summary and Export

Finally, let's get a summary of our analysis:

In [None]:
# Get final summary
final_summary = analyzer.get_summary()

print("Final Analysis Summary:")
print("=" * 40)
for key, value in final_summary.items():
    if isinstance(value, list) and len(value) > 5:
        print(f"{key}: {len(value)} items")
    else:
        print(f"{key}: {value}")

# Export results (optional)
# processed_data.to_csv('muvicell_processed_data.csv', index=False)
# print("\nProcessed data exported to 'muvicell_processed_data.csv'")

## Next Steps

This notebook demonstrated the basic functionality of MuVIcell. For more advanced features, check out the `advanced_features.ipynb` notebook which covers:

- Custom analysis pipelines
- Advanced visualization techniques
- Integration with other bioinformatics tools
- Working with real biological datasets

For more information, visit the [MuVIcell GitHub repository](https://github.com/HartmannLab/MuVIcell).