# ND2 Image Analysis Pipeline - Example Usage

This notebook demonstrates how to use the ND2 Image Analysis Pipeline for processing multi-channel microscopy images.

## Features Demonstrated:
- Loading and configuring the pipeline
- Processing ND2 files with custom settings
- Creating visualizations
- Analyzing results
- Exporting reports

## 1. Setup and Imports

In [None]:
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Import pipeline components
from data_models import GroupConfig, VisualizationConfig
from processing_pipeline import ND2Pipeline, visualize_single_file
from visualization import ND2Visualizer
from config import DEFAULT_GROUPS

# Set up plotting
plt.rcParams['figure.figsize'] = (12, 8)
plt.rcParams['font.size'] = 10

print("ND2 Analysis Pipeline - Example Usage")
print("=====================================")

## 2. Configure Your Analysis

In [None]:
# Define your treatment groups and mouse IDs
my_groups = {
    "Control": ["N00"],
    "Treatment_A": ["X12", "W97", "W88", "X85"],
    "Treatment_B": ["X84", "X3", "W95", "W87"],
    "Treatment_C": ["X83", "W108", "W102", "W82"],
    "High_Dose": ["X82", "W105", "W81", "W80"]
}

# Custom thresholds (optional)
my_thresholds = {
    '3d': {
        'channel_1': 2500.0,  # Green channel
        'channel_2': 2500.0,  # Red channel
        'channel_3': 300.0    # Blue channel
    },
    '2d': {
        'channel_1': 800.0,
        'channel_2': 600.0,
        'channel_3': 200.0
    }
}

# Create configuration
config = GroupConfig(groups=my_groups, thresholds=my_thresholds)

print(f"Configuration created with {len(my_groups)} treatment groups:")
for group, mice in my_groups.items():
    print(f"  {group}: {len(mice)} mice")

# Save configuration for future use
config.to_json("my_config.json")
print("\nConfiguration saved to my_config.json")

## 3. Set Input/Output Paths

In [None]:
# Set your data directories
input_directory = r"C:\path\to\your\nd2\files"  # UPDATE THIS PATH
output_directory = r"C:\path\to\output\results"  # UPDATE THIS PATH

# Check if directories exist
if os.path.exists(input_directory):
    print(f"✓ Input directory found: {input_directory}")
    
    # Count ND2 files
    nd2_files = [f for f in os.listdir(input_directory) if f.lower().endswith('.nd2')]
    print(f"  Found {len(nd2_files)} ND2 files")
    
    if nd2_files:
        print(f"  Example files: {nd2_files[:3]}")
else:
    print(f"⚠ Input directory not found: {input_directory}")
    print("Please update the input_directory path above")

# Create output directory if it doesn't exist
os.makedirs(output_directory, exist_ok=True)
print(f"✓ Output directory: {output_directory}")

## 4. Run the Analysis Pipeline

In [None]:
# Create pipeline instance
pipeline = ND2Pipeline(config)

# Run analysis
print("Starting ND2 analysis...")
print("This may take several minutes depending on the number of files.")

try:
    results = pipeline.process_directory(
        input_dir=input_directory,
        output_dir=output_directory,
        is_3d=True,  # Set to False for 2D data
        n_jobs=4,    # Adjust based on your CPU cores
        scale_bar_um=50,  # Scale bar size for visualizations
        create_visualizations=True
    )
    
    print("\n✓ Analysis completed successfully!")
    
except Exception as e:
    print(f"\n✗ Analysis failed: {e}")
    print("Check the error message and file paths above.")

## 5. Explore the Results

In [None]:
# Display detailed summary with custom aggregations (without rounding)
detailed_summary = results.raw_data.groupby(['Group', 'MouseID']).agg({
    'Channel_1_area': 'mean',
    'Channel_2_area': 'mean', 
    'Channel_3_area': 'mean',
    'Channel_2_per_Channel_3_area': 'mean',
    'Channel_1_per_Channel_3_area': 'mean',
    'Channel_1_mean_intensity': 'mean',
    'Channel_2_mean_intensity': 'mean',
    'Channel_3_mean_intensity': 'mean'
})

print("Detailed Summary by Group and Mouse (with full precision):")
print(detailed_summary)

## 6. Representative Images Analysis

In [None]:
# Show representative images for each group
if 'results' in locals():
    print("REPRESENTATIVE IMAGES")
    print("=====================")
    
    for group, filenames in results.representative_images.items():
        print(f"\n{group}:")
        for i, filename in enumerate(filenames, 1):
            print(f"  {i}. {filename}")
    
    # Create a summary DataFrame of representative images
    repr_data = []
    for group, filenames in results.representative_images.items():
        for rank, filename in enumerate(filenames, 1):
            # Find the corresponding data
            file_data = results.raw_data[results.raw_data['Filename'] == filename]
            if len(file_data) > 0:
                repr_data.append({
                    'Group': group,
                    'Rank': rank,
                    'Filename': filename,
                    'MouseID': file_data['MouseID'].iloc[0],
                    'Channel_2_area': file_data['Channel_2_area'].iloc[0]
                })
    
    if repr_data:
        repr_df = pd.DataFrame(repr_data)
        print("\nREPRESENTATIVE IMAGES SUMMARY")
        display(repr_df)

## 7. Data Visualization and Analysis

In [None]:
# Plot key metrics by group
if 'results' in locals():
    df = results.raw_data
    
    # Channel areas comparison
    fig, axes = plt.subplots(2, 2, figsize=(15, 10))
    
    # Channel 1 area
    df.boxplot(column='Channel_1_area', by='Group', ax=axes[0,0])
    axes[0,0].set_title('Channel 1 Area by Group')
    axes[0,0].set_xlabel('Group')
    axes[0,0].set_ylabel('Area (%)')
    
    # Channel 2 area
    df.boxplot(column='Channel_2_area', by='Group', ax=axes[0,1])
    axes[0,1].set_title('Channel 2 Area by Group')
    axes[0,1].set_xlabel('Group')
    axes[0,1].set_ylabel('Area (%)')
    
    # Channel 3 area
    df.boxplot(column='Channel_3_area', by='Group', ax=axes[1,0])
    axes[1,0].set_title('Channel 3 Area by Group')
    axes[1,0].set_xlabel('Group')
    axes[1,0].set_ylabel('Area (%)')
    
    # Channel 2/3 ratio
    df.boxplot(column='Channel_2_per_Channel_3_area', by='Group', ax=axes[1,1])
    axes[1,1].set_title('Channel 2/Channel 3 Ratio by Group')
    axes[1,1].set_xlabel('Group')
    axes[1,1].set_ylabel('Ratio (%)')
    
    plt.tight_layout()
    plt.show()
    
    # Basic statistics
    print("\nBASIC STATISTICS")
    print("=================")
    print("Sample sizes:")
    print(df['Group'].value_counts().sort_index())
    
    print("\nChannel 2 Area - Group Means:")
    group_means = df.groupby('Group')['Channel_2_area'].mean().sort_values(ascending=False)
    for group, mean_val in group_means.items():
        print(f"  {group}: {mean_val:.2f}%")

## 8. Visualize Individual Files

In [None]:
# Visualize a specific file (example)
if 'results' in locals() and len(results.raw_data) > 0:
    # Get the first representative image from the first group
    first_group = list(results.representative_images.keys())[0]
    first_repr_file = results.representative_images[first_group][0]
    
    print(f"Visualizing representative image: {first_repr_file}")
    print(f"From group: {first_group}")
    
    # Find the full path
    input_file_path = os.path.join(input_directory, first_repr_file)
    output_viz_path = os.path.join(output_directory, f"example_visualization_{first_repr_file}.png")
    
    if os.path.exists(input_file_path):
        success = visualize_single_file(
            filepath=input_file_path,
            output_path=output_viz_path,
            is_3d=True,
            scale_bar_um=50
        )
        
        if success:
            print(f"✓ Visualization saved: {output_viz_path}")
            
            # Display the image if possible
            from IPython.display import Image, display
            try:
                display(Image(output_viz_path))
            except:
                print("Cannot display image in notebook, but file was saved successfully.")
        else:
            print("✗ Visualization failed")
    else:
        print(f"File not found: {input_file_path}")
        print("Available files in input directory:")
        for f in os.listdir(input_directory)[:5]:
            print(f"  {f}")

## 9. Export Additional Reports

In [None]:
# Export summary data to CSV for external analysis
if 'results' in locals():
    # Export raw data
    csv_path = os.path.join(output_directory, "analysis_data.csv")
    results.raw_data.to_csv(csv_path, index=False)
    print(f"✓ Raw data exported to: {csv_path}")
    
    # Create custom summary
    custom_summary = results.raw_data.groupby('Group').agg({
        'Channel_1_area': ['count', 'mean', 'std'],
        'Channel_2_area': ['mean', 'std'],
        'Channel_3_area': ['mean', 'std'],
        'Channel_2_per_Channel_3_area': ['mean', 'std']
    }).round(3)
    
    # Flatten column names
    custom_summary.columns = ['_'.join(col).strip() for col in custom_summary.columns.values]
    
    summary_path = os.path.join(output_directory, "custom_summary.csv")
    custom_summary.to_csv(summary_path)
    print(f"✓ Custom summary exported to: {summary_path}")
    
    print("\nCUSTOM SUMMARY")
    print("===============")
    display(custom_summary)

## 10. Check Output Files

In [None]:
# List all output files
print("OUTPUT FILES GENERATED")
print("======================")

if os.path.exists(output_directory):
    for root, dirs, files in os.walk(output_directory):
        level = root.replace(output_directory, '').count(os.sep)
        indent = ' ' * 2 * level
        print(f"{indent}{os.path.basename(root)}/")
        
        subindent = ' ' * 2 * (level + 1)
        for file in files:
            file_path = os.path.join(root, file)
            file_size = os.path.getsize(file_path)
            size_mb = file_size / (1024 * 1024)
            print(f"{subindent}{file} ({size_mb:.1f} MB)")
            
            # Break after showing first 20 files to avoid clutter
            if len([f for r, d, fs in os.walk(output_directory) for f in fs]) > 20:
                print(f"{subindent}... and more files")
                break
        
        if len([f for r, d, fs in os.walk(output_directory) for f in fs]) > 20:
            break
            
    print(f"\n✓ All results saved to: {output_directory}")
    print("\n🔍 Key files to check:")
    print("  📊 analysis_results.xlsx - Main Excel report")
    print("  📁 representative_images/ - Most representative images")
    print("  📁 comparison_plots/ - Statistical comparison plots")
    print("  📄 processed_data.json - Machine-readable results")
else:
    print("❌ Output directory not found")

## Summary

This notebook demonstrated:

1. **Configuration**: Setting up treatment groups and analysis parameters
2. **Processing**: Running the automated analysis pipeline
3. **Results**: Exploring the generated data and statistics
4. **Visualization**: Creating publication-quality images
5. **Export**: Saving results in multiple formats

### Next Steps:

- Open the Excel report (`analysis_results.xlsx`) for detailed results
- Check the representative images folder for the most characteristic images
- Use the comparison plots for presentations
- Import the CSV data into your preferred statistical software

### Command Line Usage:

You can also run the analysis from the command line:

```bash
python main.py --input "path/to/nd2/files" --output "path/to/results" --config "my_config.json" --dimension 3d --jobs 4 --scale-bar 50
```

### Troubleshooting:

- **No files found**: Check the input directory path and ensure ND2 files are present
- **Mouse ID errors**: Verify filename format and marker position (default: "C1" for 3D)
- **Memory issues**: Reduce the number of parallel jobs (`n_jobs`)
- **Visualization errors**: Check scale bar settings and image dimensions