# Comprehensive Spatial Analysis: Comparative & Cell-Cell Communication

This notebook performs:
1. **Comparative Analysis** - Cell type composition, spatial organization, and dynamics across disease stages
2. **Cell-Cell Communication (CCI)** - Ligand-receptor interactions and condition-specific signaling
3. **Deep Research Report** - Publication-quality summary of all findings

## Setup

In [None]:
import os, sys, warnings
warnings.filterwarnings('ignore')

# Working directory is project root
work_dir = "/home/wangh256/hanchen/Agent_dev/spatialagent_dev_agx"
os.chdir(work_dir)
sys.path.insert(0, work_dir)

from spatialagent.agent import SpatialAgent, make_llm

## Initialize Agent

In [None]:
# Configure paths
data_path = "./data/"
save_path = "./experiments/comprehensive_cci/"

# Clean up previous run results
import shutil
if os.path.exists(save_path):
    shutil.rmtree(save_path)
    print(f"Cleaned up previous results in {save_path}")
os.makedirs(save_path, exist_ok=True)

# Initialize LLM and Agent
llm = make_llm("claude-sonnet-4-5-20250929")

agent = SpatialAgent(
    llm=llm,
    save_path=save_path,
    data_path=data_path,
    auto_interpret_figures=True,
)

print(f"Agent ready with {len(agent.tool_registry.tools)} tools")
print(f"Skills available: {agent.skill_manager.list_skills()}")

---
## Part 1: Comparative Analysis

Comprehensive comparison of cell types, neighborhoods, and spatial patterns across disease stages.

In [None]:
result = agent.run(
    f"""
    # Comprehensive Comparative Analysis of Mouse Colon Colitis
    
    ## Dataset
    Path: /home/wangh256/hanchen/Agent_dev/spatialagent_dev_agx/data/dataset_ccc/adata_test.h5ad
    
    This is a MERFISH spatial transcriptomics dataset of mouse colon tissue during DSS-induced colitis.
    
    ### Key Variables:
    - `Tier1`, `Tier2`, `Tier3`: Cell type annotations (use Tier2 for most analyses - good balance of resolution)
    - `Leiden_neigh`: Neighborhood/niche clusters
    - `Sample_type`: Disease stage (Healthy, DSS3, DSS9, DSS21)
    - `sample`: Sample ID (multiple samples per stage)
    - `x`, `y`: Spatial coordinates
    
    ### Disease Progression:
    - Healthy: Normal colon
    - DSS3: Early inflammation (Day 3)
    - DSS9: Peak inflammation (Day 9) 
    - DSS21: Recovery phase (Day 21)
    
    ## Task: Comprehensive Comparative Analysis
    
    Perform thorough comparative analysis focusing on how the cellular microenvironment changes during colitis progression.
    
    ### Required Analyses:
    
    **1. Dataset Overview**
    - Load data and summarize basic statistics
    - Show cell counts per condition and per cell type
    - Create overview visualizations
    
    **2. Cell Type Composition Analysis**
    - Calculate cell type proportions for each disease stage
    - Create stacked bar plots comparing composition across stages
    - Identify cell types with largest changes (especially at DSS9)
    - Statistical comparison of proportions
    
    **3. Spatial Distribution Analysis**
    - Create spatial scatter plots colored by cell type for each stage
    - Compare spatial organization between Healthy vs DSS9 vs DSS21
    - Quantify spatial clustering/dispersion of key cell types
    
    **4. Neighborhood/Niche Analysis**
    - Analyze cell type composition within each neighborhood (Leiden_neigh)
    - Create heatmaps of neighborhood x cell type composition per stage
    - Identify neighborhoods most affected by inflammation
    - Track neighborhood composition changes across disease progression
    
    **5. Differential Analysis**
    - Compare Healthy vs DSS9 (peak inflammation)
    - Compare DSS9 vs DSS21 (recovery)
    - Identify signature changes at each transition
    
    **6. Summary Statistics**
    - Create comprehensive tables of all comparisons
    - Save all intermediate results to {save_path}
    
    Generate all figures with clear titles and save them. Be thorough and generate many visualizations.
    """,
    config={{"thread_id": "comprehensive_analysis"}}
)

In [None]:
# Continue with more detailed comparative analysis
result = agent.run(
    """
    Continue with more detailed comparative analysis:
    
    **7. Immune Cell Focus**
    - Analyze immune cell subtypes specifically
    - Track immune infiltration patterns across stages
    - Visualize immune cell spatial distribution at DSS9 vs Healthy
    
    **8. Epithelial Cell Analysis**
    - Analyze epithelial cell changes (damage/recovery)
    - Compare epithelial proportions and spatial organization
    - Identify epithelial-associated neighborhoods
    
    **9. Stromal/Fibroblast Analysis**
    - Track fibroblast expansion during inflammation
    - Analyze fibroblast spatial patterns
    
    **10. Cross-Stage Correlation**
    - Correlate cell type proportions across stages
    - Identify coordinated changes
    - Create correlation heatmaps
    
    Generate publication-quality figures for each analysis.
    """,
    config={"thread_id": "comprehensive_analysis"}
)

---
## Part 2: Cell-Cell Communication Analysis

Comprehensive ligand-receptor interaction analysis across disease stages.

In [None]:
result = agent.run(
    f"""
    # Cell-Cell Communication (CCI) Analysis
    
    Now perform comprehensive cell-cell communication analysis to understand how cellular signaling changes during colitis.
    
    ## CCI Analysis Plan:
    
    **1. Run LIANA for Ligand-Receptor Analysis**
    Use the `infer_cell_cell_interactions` tool with:
    - adata_path: /home/wangh256/hanchen/Agent_dev/spatialagent_dev_agx/data/dataset_ccc/adata_test.h5ad
    - batch_key: 'sample'
    - condition_key: 'Sample_type'
    - cell_type_key: 'Tier2' (or Tier1 for broader categories)
    - organism: 'mouse'
    - save_path: {save_path}
    
    This will:
    - Run LIANA per sample to find ligand-receptor pairs
    - Build interaction tensor
    - Perform tensor factorization to find context-specific patterns
    
    **2. Analyze CCI Results**
    After running the tool, analyze the results:
    - Load the factor.pkl and liana_results.csv
    - Identify top ligand-receptor pairs
    - Compare interactions across conditions
    
    Start by running the infer_cell_cell_interactions tool.
    """,
    config={{"thread_id": "comprehensive_analysis"}}
)

In [None]:
# Analyze CCI results in detail
result = agent.run(
    f"""
    # Detailed CCI Analysis
    
    Now analyze the cell-cell interaction results in detail:
    
    **3. Top Interactions Analysis**
    - Load liana_results.csv from {save_path}/cci_analysis/
    - Identify top 50 ligand-receptor pairs by magnitude_rank
    - Group by sender-receiver cell type pairs
    - Create visualizations of top interactions
    
    **4. Condition-Specific Interactions**
    - Compare interactions between Healthy vs DSS9
    - Identify interactions that are:
      * Gained at DSS9 (inflammation-specific)
      * Lost at DSS9 (suppressed during inflammation)
      * Recovered at DSS21
    - Create heatmaps showing interaction strength per condition
    
    **5. Cell Type Communication Patterns**
    - Which cell types are the main "senders" at each stage?
    - Which cell types are the main "receivers"?
    - Create network diagrams or chord plots of communication
    
    **6. Immune-Epithelial Crosstalk**
    - Focus on immune cell → epithelial interactions
    - Focus on fibroblast → immune interactions
    - Identify inflammatory signaling pathways
    
    Generate comprehensive visualizations for each analysis.
    """,
    config={{"thread_id": "comprehensive_analysis"}}
)

In [None]:
# Tensor factorization analysis
result = agent.run(
    f"""
    # Tensor Factorization Analysis
    
    Analyze the tensor factorization results from the CCI analysis:
    
    **7. Load and Analyze Tensor Factors**
    - Load factor.pkl from {save_path}/cci_analysis/
    - The tensor object contains factorization results with factors for:
      * Contexts (samples/conditions)
      * Ligand-Receptor Pairs
      * Sender Cells
      * Receiver Cells
    
    **8. Condition Factor Analysis**
    - Extract condition loadings from the tensor
    - Identify which factors are associated with each disease stage
    - Visualize condition-factor associations
    
    **9. LR Pair Factor Analysis**
    - For each factor, identify top ligand-receptor pairs
    - Characterize the biological meaning of each factor
    - Create factor-specific LR heatmaps
    
    **10. Cell Type Factor Analysis**
    - Identify which cell types load on each factor (as senders and receivers)
    - Visualize sender-receiver patterns per factor
    
    Use the tensor.factors dictionary and tensor.get_top_factor_elements() if available.
    """,
    config={{"thread_id": "comprehensive_analysis"}}
)

In [None]:
# More CCI visualizations
result = agent.run(
    f"""
    # Additional CCI Visualizations
    
    Create more comprehensive visualizations of cell-cell communication:
    
    **11. Interaction Strength Heatmaps**
    - Create cell type x cell type interaction strength matrices
    - One heatmap per condition (Healthy, DSS3, DSS9, DSS21)
    - Show total interaction counts between cell type pairs
    
    **12. Top LR Pairs Visualization**
    - Bar plots of top 20 LR pairs overall
    - Dot plots showing LR pair expression across conditions
    - Highlight inflammation-associated pairs
    
    **13. Pathway-Level Analysis**
    - Group LR pairs by biological function/pathway if possible
    - Identify enriched signaling pathways at DSS9
    - Compare pathway activity across stages
    
    **14. Spatial Context of Interactions**
    - Map top interactions to spatial neighborhoods
    - Which neighborhoods have the most active signaling?
    - Visualize interaction hotspots spatially
    
    Save all figures with descriptive names.
    """,
    config={{"thread_id": "comprehensive_analysis"}}
)

---
## Part 3: Summary Statistics and Tables

In [None]:
result = agent.run(
    f"""
    # Generate Summary Statistics
    
    Create comprehensive summary tables and statistics:
    
    **1. Cell Type Summary**
    Use summarize_celltypes tool with:
    - adata_path: /home/wangh256/hanchen/Agent_dev/spatialagent_dev_agx/data/dataset_ccc/adata_test.h5ad
    - cell_type_key: 'Tier2'
    - save_path: {save_path}
    
    **2. Condition Summary**
    Use summarize_conditions tool with:
    - adata_path: /home/wangh256/hanchen/Agent_dev/spatialagent_dev_agx/data/dataset_ccc/adata_test.h5ad
    - condition_key: 'Sample_type'
    - cell_type_key: 'Tier2'
    - save_path: {save_path}
    
    **3. Tissue Region Summary**
    Use summarize_tissue_regions tool with:
    - adata_path: /home/wangh256/hanchen/Agent_dev/spatialagent_dev_agx/data/dataset_ccc/adata_test.h5ad
    - region_key: 'Leiden_neigh'
    - cell_type_key: 'Tier2'
    - save_path: {save_path}
    
    Run all three summarization tools.
    """,
    config={{"thread_id": "comprehensive_analysis"}}
)

---
## Part 4: Deep Research Report Generation

In [None]:
result = agent.run(
    f"""
    # Generate Deep Research Report
    
    Now generate a comprehensive publication-quality research report using the generate_deep_research_report tool.
    
    Use these parameters:
    - user_query: "How does the cellular microenvironment and cell-cell communication change during DSS-induced colitis progression in mouse colon? Focus on comparing healthy tissue, peak inflammation (DSS9), and recovery (DSS21) stages."
    - data_info: "MERFISH spatial transcriptomics of mouse colon tissue. 50,000 cells across 20 samples from 4 disease stages: Healthy (Day 0), DSS3 (Day 3 - early inflammation), DSS9 (Day 9 - peak inflammation), DSS21 (Day 21 - recovery). Cell types annotated at 3 resolution levels (Tier1-3). Spatial neighborhoods identified via Leiden clustering."
    - save_path: {save_path}
    
    This tool will:
    1. Auto-discover all figures we generated
    2. Use vision LLM to interpret each figure
    3. Read all CSV data tables
    4. Generate a comprehensive markdown report with:
       - Executive summary
       - Introduction and background
       - Methods documentation
       - Results with figure references
       - Discussion and interpretation
       - Conclusions
    
    Run the generate_deep_research_report tool now.
    """,
    config={{"thread_id": "comprehensive_analysis"}}
)

---
## Part 5: Key Findings Summary

In [None]:
result = agent.run(
    """
    # Final Summary
    
    Provide a concise summary of all key findings from our comprehensive analysis:
    
    1. **Cell Composition Changes**: What are the main cell type changes during colitis?
    
    2. **Spatial Reorganization**: How does spatial organization change at DSS9?
    
    3. **Neighborhood Dynamics**: Which neighborhoods are most affected?
    
    4. **Key Cell-Cell Interactions**: 
       - Top 5 interactions gained during inflammation
       - Top 5 interactions lost during inflammation
       - Key immune-epithelial crosstalk changes
    
    5. **Recovery Patterns**: What recovers by DSS21 and what doesn't?
    
    6. **Biological Insights**: What do these findings tell us about colitis pathophysiology?
    
    Format this as a clear bullet-point summary that could be used in a paper abstract.
    """,
    config={"thread_id": "comprehensive_analysis"}
)

In [None]:
# List all generated outputs
import os
print("Generated outputs:")
print("="*50)
for root, dirs, files in os.walk(save_path):
    level = root.replace(save_path, '').count(os.sep)
    indent = ' ' * 2 * level
    print(f"{indent}{os.path.basename(root)}/")
    subindent = ' ' * 2 * (level + 1)
    for file in files:
        size = os.path.getsize(os.path.join(root, file))
        print(f"{subindent}{file} ({size/1024:.1f} KB)")

---
## View Generated Report

In [None]:
# Display the generated report
report_path = f"{save_path}/deep_research_report.md"
if os.path.exists(report_path):
    with open(report_path, 'r') as f:
        report_content = f.read()
    from IPython.display import Markdown, display
    display(Markdown(report_content[:20000]))  # Show first 20k chars
else:
    print(f"Report not found at {report_path}")