# Classification Plot Examples

This notebook demonstrates the various features and options available in the `classification_plot` function for visualizing cell classification results.

## Setup and Data Loading

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from pathlib import Path

from omero_screen_plots import classification_plot
from omero_screen_plots.colors import COLOR
from omero_screen_plots.utils import save_fig

# Setup output directory
path = Path("../images")
path.mkdir(parents=True, exist_ok=True)

# Load sample data
df = pd.read_csv("data/sample_plate_data.csv")

# Define conditions and classes based on actual data
conditions = ['control', 'cond01', 'cond02', 'cond03']
classes = ["normal", "micronuclei", "collapsed"]  # From 'classifier' column

print("Available conditions:", conditions)
print("Available cell lines:", df['cell_line'].unique())
print("Data shape:", df.shape)
print("\nClassification classes:", sorted(df['classifier'].unique()))
print("\nUsing classifier column for morphology classification:", classes)

## 1. Basic Stacked Classification Plot

The default display mode shows stacked bars with error bars representing standard deviation across biological replicates.

In [None]:
# Basic stacked bar plot with error bars
fig, ax = classification_plot(
    df=df,
    classes=["normal", "micronuclei", "collapsed"],
    conditions=conditions,
    condition_col="condition",
    class_col="classifier",  # Dynamic class column
    selector_col="cell_line",
    selector_val="MCF10A",
    display_mode="stacked",  # Default mode
    title="Cell Morphology Classification - Stacked View",
    save=True,
    path=path,
    file_format="pdf"
)
print("Stacked plot shows mean percentages with error bars (standard deviation)")

## 2. Individual Triplicates Display

Show individual biological replicates as separate bars to visualize replicate-to-replicate variation.

In [None]:
# Individual triplicates without grouping
fig, ax = classification_plot(
    df=df,
    classes=["normal", "micronuclei", "collapsed"],
    conditions=conditions,
    condition_col="condition",
    class_col="classifier",
    selector_col="cell_line",
    selector_val="MCF10A",
    display_mode="triplicates",  # Show individual repeats
    group_size=1,  # No grouping (default)
    title="Cell Morphology Classification - Individual Triplicates",
    save=True,
    path=path,
    file_format="pdf"
)
print("Triplicates plot shows individual biological replicates with grouping boxes")

## 3. Grouped Triplicates Display

Group conditions together for easier comparison of treatment effects.

In [None]:
# Grouped triplicates - group conditions in pairs
fig, ax = classification_plot(
    df=df,
    classes=["normal", "micronuclei", "collapsed"],
    conditions=conditions,
    condition_col="condition",
    class_col="classifier",
    selector_col="cell_line",
    selector_val="MCF10A",
    display_mode="triplicates",
    group_size=2,  # Group conditions in pairs
    within_group_spacing=0.2,  # Tight spacing within groups
    between_group_gap=0.4,  # Larger gap between groups
    title="Cell Morphology Classification - Grouped Triplicates",
    fig_size=(12, 7),  # Wider figure for grouped layout
    save=True,
    path=path,
    file_format="pdf"
)
print("Grouped layout: [control, cond01] gap [cond02, cond03]")

## 4. Custom Colors

Override default colors with custom color schemes for different classification types.

In [None]:
# Custom colors for morphology classification
morphology_colors = [COLOR.GREY.value, COLOR.LIGHT_GREEN.value, COLOR.OLIVE.value]

fig, ax = classification_plot(
    df=df,
    classes=["normal", "micronuclei", "collapsed"],
    conditions=['control', 'cond01'],  # Fewer conditions for cleaner display
    condition_col="condition",
    class_col="classifier",
    selector_col="cell_line",
    selector_val="MCF10A",
    display_mode="stacked",
    colors=morphology_colors,  # Custom color scheme
    title="Cell Morphology Classification - Custom Colors",
    fig_size=(8, 7),
    save=True,
    path=path,
    file_format="pdf"
)
print("Custom colors applied: Grey (normal), Light Green (micronuclei), Olive (collapsed)")

## 5. Alternative Color Schemes

Compare different color schemes for the same classification data.

In [None]:
# Side-by-side comparison of different color schemes
fig, axes = plt.subplots(1, 3, figsize=(7, 3))

# Default colors (automatic)
classification_plot(
    df=df,
    classes=["normal", "micronuclei", "collapsed"],
    conditions=['control', 'cond01', 'cond02'],
    condition_col="condition",
    class_col="classifier",
    selector_col="cell_line",
    selector_val="MCF10A",
    display_mode="stacked",
    show_legend=False,
    axes=axes[0]
)
axes[0].set_title("Default Colors", fontsize=12)

# Custom color scheme 1
colors1 = [COLOR.GREY.value, COLOR.LIGHT_GREEN.value, COLOR.OLIVE.value]
classification_plot(
    df=df,
    classes=["normal", "micronuclei", "collapsed"],
    conditions=['control', 'cond01', 'cond02'],
    condition_col="condition",
    class_col="classifier",
    selector_col="cell_line",
    selector_val="MCF10A",
    display_mode="stacked",
    colors=colors1,
    show_legend=False,
    axes=axes[1]
)
axes[1].set_title("Grey-Green Scheme", fontsize=12)

# Custom color scheme 2
colors2 = ['#4CAF50', '#FF9800', '#F44336']  # Green, Orange, Red
classification_plot(
    df=df,
    classes=["normal", "micronuclei", "collapsed"],
    conditions=['control', 'cond01', 'cond02'],
    condition_col="condition",
    class_col="classifier",
    selector_col="cell_line",
    selector_val="MCF10A",
    display_mode="stacked",
    colors=colors2,
    axes=axes[2]
)
axes[2].set_title("Traffic Light Scheme", fontsize=12)

fig.suptitle("Color Scheme Comparison", fontsize=14, fontweight='bold')
fig.tight_layout()
save_fig(fig, path, "classification_color_schemes", fig_extension="pdf")
print("Color scheme comparison created")

## 6. Legend and Layout Customization

Control legend positioning and plot layout options.

In [None]:
# Custom legend positioning and layout
fig, axes = plt.subplots(2, 2, figsize=(6, 5))

legend_positions = [
    {"bbox": (1.05, 1.0), "title": "Right (default)"},
    {"bbox": (0.5, 1.15), "title": "Top center"},
    {"bbox": (-0.1, 1.0), "title": "Left"},
    {"show": False, "title": "No legend"}
]

for ax, legend_config in zip(axes.flat, legend_positions):
    classification_plot(
        df=df,
        classes=["normal", "micronuclei", "collapsed"],
        conditions=['control', 'cond01'],
        condition_col="condition",
        class_col="classifier",
        selector_col="cell_line",
        selector_val="MCF10A",
        display_mode="stacked",
        show_legend=legend_config.get("show", True),
        legend_bbox=legend_config.get("bbox", (1.05, 1.0)),
        axes=ax
    )
    ax.set_title(legend_config["title"], fontsize=10)

fig.suptitle("Legend Positioning Options", fontsize=14)
fig.tight_layout()
save_fig(fig, path, "classification_legend_options", fig_extension="pdf")
print("Legend positioning examples created")

## 7. Bar Width and Spacing Customization

Fine-tune the visual appearance with bar width and spacing parameters.

In [None]:
# Compare different bar widths
fig, axes = plt.subplots(1, 3, figsize=(7, 3))

bar_configs = [
    {"bar_width": 0.5, "title": "Narrow bars (0.5)"},
    {"bar_width": 0.75, "title": "Standard bars (0.75)"},
    {"bar_width": 1.0, "title": "Wide bars (1.0)"}
]

for ax, config in zip(axes, bar_configs):
    classification_plot(
        df=df,
        classes=["normal", "micronuclei", "collapsed"],
        conditions=conditions,
        condition_col="condition",
        class_col="classifier",
        selector_col="cell_line",
        selector_val="MCF10A",
        display_mode="stacked",
        bar_width=config["bar_width"],
        show_legend=False,  # Hide legend for cleaner comparison
        axes=ax
    )
    ax.set_title(config["title"], fontsize=10)

fig.suptitle("Bar Width Comparison", fontsize=14)
fig.tight_layout()
save_fig(fig, path, "classification_bar_width", fig_extension="pdf")
print("Bar width comparison created")

## 8. Triplicates Spacing Customization

Control the spacing and offset of individual triplicates in grouped displays.

In [None]:
# Compare different triplicate spacing settings
fig, axes = plt.subplots(2, 2, figsize=(6, 5))

spacing_configs = [
    {"offset": 0.1, "within": 0.2, "between": 0.4, "title": "Tight spacing"},
    {"offset": 0.18, "within": 0.2, "between": 0.4, "title": "Standard spacing"},
    {"offset": 0.25, "within": 0.3, "between": 0.6, "title": "Loose spacing"},
    {"offset": 0.18, "within": 0.1, "between": 0.8, "title": "Wide group gaps"}
]

for ax, config in zip(axes.flat, spacing_configs):
    classification_plot(
        df=df,
        classes=["normal", "micronuclei", "collapsed"],
        conditions=conditions,
        condition_col="condition",
        class_col="classifier",
        selector_col="cell_line",
        selector_val="MCF10A",
        display_mode="triplicates",
        group_size=2,  # Enable grouping to see spacing effects
        repeat_offset=config["offset"],
        within_group_spacing=config["within"],
        between_group_gap=config["between"],
        show_legend=False,
        axes=ax
    )
    ax.set_title(config["title"], fontsize=10)

fig.suptitle("Triplicates Spacing Options", fontsize=14)
fig.tight_layout()
save_fig(fig, path, "classification_spacing_options", fig_extension="pdf")
print("Triplicates spacing comparison created")

## 9. Y-axis Limits and Scaling

Control the y-axis range to focus on specific percentage ranges or accommodate different data scales.

In [None]:
# Compare different y-axis limits
fig, axes = plt.subplots(1, 3, figsize=(5, 3))

y_limit_configs = [
    {"ylim": (0, 100), "title": "Full scale (0-100%)"},
    {"ylim": (0, 80), "title": "Focused (0-80%)"},
    {"ylim": (70, 100), "title": "Zoomed (70-100%)"}
]

for ax, config in zip(axes, y_limit_configs):
    classification_plot(
        df=df,
        classes=["normal", "micronuclei", "collapsed"],
        conditions=['control', 'cond01'],
        condition_col="condition",
        class_col="classifier",
        selector_col="cell_line",
        selector_val="MCF10A",
        display_mode="stacked",
        y_lim=config["ylim"],
        show_legend=False,
        axes=ax
    )
    ax.set_title(config["title"], fontsize=10)

fig.suptitle("Y-axis Scaling Options", fontsize=14)
fig.tight_layout()
save_fig(fig, path, "classification_y_limits", fig_extension="pdf")
print("Y-axis scaling comparison created")

## 10. Display Mode Comparison

Direct comparison between stacked and triplicates display modes.

In [None]:
# Side-by-side comparison of display modes
fig, axes = plt.subplots(1, 3, figsize=(7, 3))

# Stacked mode
classification_plot(
    df=df,
    classes=["normal", "micronuclei", "collapsed"],
    conditions=conditions,
    condition_col="condition",
    class_col="classifier",
    selector_col="cell_line",
    selector_val="MCF10A",
    display_mode="stacked",
    colors=[COLOR.GREY.value, COLOR.LIGHT_GREEN.value, COLOR.OLIVE.value],
    show_legend=False,
    axes=axes[0]
)
axes[0].set_title("Stacked (Mean ± SD)", fontsize=12, fontweight='bold')

# Triplicates mode - no grouping
classification_plot(
    df=df,
    classes=["normal", "micronuclei", "collapsed"],
    conditions=conditions,
    condition_col="condition",
    class_col="classifier",
    selector_col="cell_line",
    selector_val="MCF10A",
    display_mode="triplicates",
    group_size=1,
    colors=[COLOR.GREY.value, COLOR.LIGHT_GREEN.value, COLOR.OLIVE.value],
    show_legend=False,
    axes=axes[1]
)
axes[1].set_title("Individual Triplicates", fontsize=12, fontweight='bold')

# Triplicates mode - with grouping
classification_plot(
    df=df,
    classes=["normal", "micronuclei", "collapsed"],
    conditions=conditions,
    condition_col="condition",
    class_col="classifier",
    selector_col="cell_line",
    selector_val="MCF10A",
    display_mode="triplicates",
    group_size=2,
    colors=[COLOR.GREY.value, COLOR.LIGHT_GREEN.value, COLOR.OLIVE.value],
    axes=axes[2]
)
axes[2].set_title("Grouped Triplicates", fontsize=12, fontweight='bold')

fig.suptitle("Display Mode Comparison", fontsize=14, fontweight='bold')
fig.tight_layout()
save_fig(fig, path, "classification_display_modes", fig_extension="pdf")
print("Display mode comparison created")

## 11. Comprehensive Multi-Panel Analysis

Create a publication-ready figure combining different views and styling options.

In [None]:
# Create a comprehensive multi-panel analysis
fig = plt.figure(figsize=(5, 10))

# Top row: Different display modes
ax1 = plt.subplot(3, 3, 1)
classification_plot(
    df=df,
    classes=["normal", "micronuclei", "collapsed"],
    conditions=conditions,
    condition_col="condition",
    class_col="classifier",
    selector_col="cell_line",
    selector_val="MCF10A",
    display_mode="stacked",
    colors=[COLOR.GREY.value, COLOR.LIGHT_GREEN.value, COLOR.OLIVE.value],
    show_legend=False,
    axes=ax1
)
ax1.set_title("A) Stacked Mode", fontsize=7, fontweight='bold')

ax2 = plt.subplot(3, 3, 2)
classification_plot(
    df=df,
    classes=["normal", "micronuclei", "collapsed"],
    conditions=conditions,
    condition_col="condition",
    class_col="classifier",
    selector_col="cell_line",
    selector_val="MCF10A",
    display_mode="triplicates",
    group_size=1,
    colors=[COLOR.GREY.value, COLOR.LIGHT_GREEN.value, COLOR.OLIVE.value],
    show_legend=False,
    axes=ax2
)
ax2.set_title("B) Individual Triplicates", fontsize=7, fontweight='bold')

ax3 = plt.subplot(3, 3, 3)
classification_plot(
    df=df,
    classes=["normal", "micronuclei", "collapsed"],
    conditions=conditions,
    condition_col="condition",
    class_col="classifier",
    selector_col="cell_line",
    selector_val="MCF10A",
    display_mode="triplicates",
    group_size=2,
    colors=[COLOR.GREY.value, COLOR.LIGHT_GREEN.value, COLOR.OLIVE.value],
    axes=ax3
)
ax3.set_title("C) Grouped Triplicates", fontsize=7, fontweight='bold')

# Middle row: Color schemes
ax4 = plt.subplot(3, 3, 4)
classification_plot(
    df=df,
    classes=["normal", "micronuclei", "collapsed"],
    conditions=['control', 'cond01', 'cond02'],
    condition_col="condition",
    class_col="classifier",
    selector_col="cell_line",
    selector_val="MCF10A",
    display_mode="stacked",
    colors=['#4CAF50', '#FF9800', '#F44336'],  # Green, Orange, Red
    show_legend=False,
    axes=ax4
)
ax4.set_title("D) Traffic Light Colors", fontsize=7, fontweight='bold')

ax5 = plt.subplot(3, 3, 5)
classification_plot(
    df=df,
    classes=["normal", "micronuclei", "collapsed"],
    conditions=['control', 'cond01', 'cond02'],
    condition_col="condition",
    class_col="classifier",
    selector_col="cell_line",
    selector_val="MCF10A",
    display_mode="stacked",
    colors=['#2E86AB', '#A23B72', '#F18F01'],  # Blue, Purple, Orange
    show_legend=False,
    axes=ax5
)
ax5.set_title("E) Custom Palette", fontsize=7, fontweight='bold')

ax6 = plt.subplot(3, 3, 6)
classification_plot(
    df=df,
    classes=["normal", "micronuclei", "collapsed"],
    conditions=['control', 'cond01', 'cond02'],
    condition_col="condition",
    class_col="classifier",
    selector_col="cell_line",
    selector_val="MCF10A",
    display_mode="stacked",
    # Default colors (no colors parameter)
    show_legend=False,
    axes=ax6
)
ax6.set_title("F) Default Colors", fontsize=7, fontweight='bold')

# Bottom row: Styling options
ax7 = plt.subplot(3, 3, 7)
classification_plot(
    df=df,
    classes=["normal", "micronuclei", "collapsed"],
    conditions=['control', 'cond01'],
    condition_col="condition",
    class_col="classifier",
    selector_col="cell_line",
    selector_val="MCF10A",
    display_mode="stacked",
    bar_width=0.5,  # Narrow bars
    show_legend=False,
    axes=ax7
)
ax7.set_title("G) Narrow Bars", fontsize=7, fontweight='bold')

ax8 = plt.subplot(3, 3, 8)
classification_plot(
    df=df,
    classes=["normal", "micronuclei", "collapsed"],
    conditions=['control', 'cond01'],
    condition_col="condition",
    class_col="classifier",
    selector_col="cell_line",
    selector_val="MCF10A",
    display_mode="stacked",
    y_lim=(70, 100),  # Focused y-axis
    show_legend=False,
    axes=ax8
)
ax8.set_title("H) Focused Y-axis", fontsize=7, fontweight='bold')

ax9 = plt.subplot(3, 3, 9)
classification_plot(
    df=df,
    classes=["normal", "micronuclei", "collapsed"],
    conditions=['control', 'cond01'],
    condition_col="condition",
    class_col="classifier",
    selector_col="cell_line",
    selector_val="MCF10A",
    display_mode="stacked",
    bar_width=1.0,  # Wide bars
    axes=ax9
)
ax9.set_title("I) Wide Bars + Legend", fontsize=7, fontweight='bold')

fig.suptitle("Comprehensive Classification Plot Analysis", fontsize=10, fontweight='bold', position=(0.35, 1))
fig.tight_layout()
save_fig(fig, path, "classification_comprehensive_analysis", fig_extension="pdf")
print("Comprehensive analysis figure created")

## Summary

The `classification_plot` function provides flexible visualization of cell classification data with:

### Key Features:

1. **Dynamic Classification Column**: Use `class_col` parameter to specify any classification column (e.g., "classifier", "cell_cycle", "phenotype")

2. **Two Display Modes**:
   - **`display_mode="stacked"`**: Stacked bars with error bars (mean ± std)
   - **`display_mode="triplicates"`**: Individual biological replicates with grouping boxes

3. **Flexible Grouping** (triplicates mode):
   - `group_size=1`: No grouping, sequential layout
   - `group_size>1`: Group conditions with spacing gaps

4. **Customizable Styling**:
   - **Colors**: Custom color schemes for different classification types
   - **Bar width**: Control bar thickness (`bar_width`)
   - **Spacing**: Fine-tune triplicate positioning (`repeat_offset`, `within_group_spacing`, `between_group_gap`)
   - **Y-axis**: Custom limits and scaling (`y_lim`)
   - **Legend**: Positioning and visibility (`show_legend`, `legend_bbox`)

### Example Usage:

```python
# Basic morphology classification plot
fig, ax = classification_plot(
    df=df,
    classes=["normal", "micronuclei", "collapsed"],
    conditions=['control', 'treatment1', 'treatment2'],
    condition_col="condition",
    class_col="classifier",  # Dynamic class column
    selector_col="cell_line",
    selector_val="MCF10A",
    display_mode="stacked"  # or "triplicates"
)
```

### Common Use Cases:

1. **Morphology Classification**: Compare normal vs abnormal cell phenotypes across treatments
2. **Cell Cycle Analysis**: Visualize phase distributions (using `class_col="cell_cycle"`)
3. **Drug Response**: Quantify classification changes with dose-response studies
4. **Quality Control**: Monitor classification consistency across experimental plates
5. **Multi-condition Studies**: Compare treatment effects with proper statistical visualization

### Design Principles:

- **Stacked mode**: Best for showing overall distribution patterns and treatment effects with statistical confidence
- **Triplicates mode**: Best for assessing replicate consistency and identifying outlier experiments
- **Grouping**: Useful for comparing related treatments or dose series
- **Color coding**: Consistent colors help distinguish classification categories across figures

### Statistical Considerations:

- Error bars represent standard deviation across biological replicates
- Individual triplicates show replicate-to-replicate variation
- No automated statistical testing due to complex multi-class nature
- Percentage scales (0-100%) provide intuitive interpretation