# Combination Plot Examples

This notebook demonstrates the use of combination plots (combplots) for visualizing cell cycle data from OMERO screens. These plots combine multiple visualization types to provide comprehensive insights into cell cycle progression and DNA content distribution.

In [None]:
import pandas as pd
from pathlib import Path

# Import the combplot functions
from omero_screen_plots.combplot import comb_plot, combplot_simple

# Create output directory
path = Path("../images")
path.mkdir(parents=True, exist_ok=True)

In [None]:
# Load sample data
df = pd.read_csv("../data/sample_plate_data.csv")

# Define conditions to analyze
conditions = [
    "palb:0.0 c604:0",
    "palb:0.0 c604:1",
    "palb:0.75 c604:0",
    "palb:0.75 c604:1",
]

# Preview the data structure
print(f"Data shape: {df.shape}")
print(f"Available columns: {list(df.columns)}")
print(f"Unique cell cycle phases: {df['cell_cycle'].unique()}")
print(f"Unique conditions: {df['condition'].unique()[:4]}...")  # Show first 4

## 1. Combined Plot with Feature Analysis (`comb_plot`)

This plot type creates a comprehensive visualization with three rows:
1. **Top row**: DNA content histograms for each condition
2. **Middle row**: Scatter plots of DNA content vs EdU intensity (S-phase marker)
3. **Bottom row**: Scatter plots of DNA content vs a custom feature

This is particularly useful when you want to analyze how a specific cellular feature correlates with cell cycle progression.

### Function Arguments for `comb_plot`

#### Required Arguments
- `df`: DataFrame containing cell cycle data
- `conditions`: List of condition names to plot
- `feature_col`: Column name for the feature to analyze in the third row
- `feature_y_lim`: Threshold value for feature categorization (used for coloring)

#### Data Selection
- `condition_col`: Column name for experimental condition (default: "condition")
- `selector_col`: Column name for selector, e.g., cell line (default: "cell_line")
- `selector_val`: Value to filter selector_col by (e.g., specific cell line name)

#### Visualization Options
- `title`: Plot title (default: auto-generated)
- `cell_number`: Number of cells to randomly sample per condition (default: None = use all)
- `colors`: List of colors for plotting (default: matplotlib style colors)
- `figsize`: tuple with width and hight of plots (default in cm)
- `size_units`: "cm" defaukt; if ist not cm than inches will be used.


#### Saving Options
- `save`: Whether to save the figure (default: True)
- `path`: Directory path to save the figure (required when save=True)

In [None]:
# Example: Analyze p21 intensity as a feature
comb_plot(
    df=df,
    conditions=conditions,
    feature_col="intensity_mean_p21_nucleus",  # Analyze p21 (cell cycle inhibitor)
    feature_y_lim=5000,  # Threshold for p21 high vs low
    condition_col="condition",
    selector_col="cell_line", 
    selector_val="MCF10A",
    title="Cell Cycle Analysis with p21 Expression",
    cell_number=5000,  # Sample 5000 cells per condition for performance
    fig_size=(10, 7), # default is 10 cm wide and 7 cm high
    size_units="cm", # default is cm, else inches
    save=True,
    path=path,
)

## 2. Simple Combined Plot with Stacked Bar Chart (`combplot_simple`)

This plot type creates a more compact visualization that includes:
1. **Top row**: DNA content histograms for each condition
2. **Middle row**: Scatter plots of DNA content vs EdU intensity
3. **Right column**: Stacked bar chart showing cell cycle phase proportions

This is ideal when you want to see both single-cell distributions and population-level statistics in one figure.

### Function Arguments for `combplot_simple`

#### Required Arguments
- `df`: DataFrame containing cell cycle data
- `conditions`: List of condition names to plot

#### Data Selection
- `condition_col`: Column name for experimental condition (default: "condition")
- `selector_col`: Column name for selector (default: "cell_line")
- `selector_val`: Value to filter selector_col by

#### Visualization Options
- `title`: Plot title (default: auto-generated)
- `cell_number`: Number of cells to randomly sample per condition (default: None = use all)
- `colors`: List of colors for plotting (default: matplotlib style colors)
- `figsize`: tuple with width and hight of plots (default in cm)
- `size_units`: "cm" defaukt; if ist not cm than inches will be used.

- `H3`: Whether to use H3 pS10 data to separate G2 and M phases (default: False)

#### Saving Options
- `save`: Whether to save the figure (default: True)
- `path`: Directory path to save the figure (required when save=True)

Note: The figure size is automatically calculated based on the number of conditions.

In [None]:
# Example: Simple combplot with population statistics
combplot_simple(
    df=df,
    conditions=conditions,
    condition_col="condition",
    selector_col="cell_line",
    selector_val="MCF10A",
    title="Combined Cell Cycle Analysis",
    cell_number=5000,  # Sample for scatter plots
    #colors=colors # default
    fig_size=(10, 4), # default is 10 cm wide and 7 cm high
    size_units="cm", # default is cm, else inches
    H3=False,  # Standard G2/M combined phases
    save=True,
    path=path,
)

## 3. Working with Multiple Conditions

Both plot types work well with larger numbers of conditions. Here's an example with more conditions:

In [None]:
# Extended conditions list
conditions_extended = [
    "palb:0.0 c604:0",
    "palb:0.0 c604:1",
    "palb:0.375 c604:0",
    "palb:0.375 c604:1",
    "palb:0.75 c604:0",
    "palb:0.75 c604:1",
]

# Create a wider combplot_simple for more conditions
combplot_simple(
    df=df,
    conditions=conditions_extended,
    condition_col="condition",
    selector_col="cell_line",
    selector_val="MCF10A",
    title="Dose Response Analysis - Extended Conditions",
    cell_number=3000,  # Fewer cells per condition for performance
    fig_size=(20, 5), # default is 10 cm wide and 7 cm high
    save=True,
    path=path,
)

## 4. Analyzing Different Features

The `comb_plot` function allows you to analyze any numerical feature in your dataset. Here are some examples of features you might want to analyze:

In [None]:
# Example: Analyze nuclear area
comb_plot(
    df=df,
    conditions=conditions[:2],  # Just first two conditions for clarity
    feature_col="area_nucleus",
    feature_y_lim=200,  # Threshold for normal vs large nuclei
    condition_col="condition",
    selector_col="cell_line",
    selector_val="MCF10A",
    title="Nuclear Size Analysis",
    cell_number=2000,
    save=True,
    path=path,
)

# Example: Analyze tubulin intensity
comb_plot(
    df=df,
    conditions=conditions[:2],
    feature_col="intensity_mean_Tub_nucleus",
    feature_y_lim=10000,  # Threshold for tubulin levels
    condition_col="condition",
    selector_col="cell_line", 
    selector_val="MCF10A",
    title="Tubulin Expression Analysis",
    cell_number=2000,
    save=True,
    path=path,
)

## 5. Best Practices and Tips

### When to use each plot type:

**Use `comb_plot` when:**
- You have a specific feature of interest to correlate with cell cycle
- You want to understand how a biomarker changes across cell cycle phases
- You need to identify subpopulations based on feature expression

**Use `combplot_simple` when:**
- You want a compact overview of cell cycle distributions
- You need to compare population-level statistics across conditions
- You're creating figures for publication that need both single-cell and population data

### Performance considerations:
- Use `cell_number` parameter to downsample large datasets
- Scatter plots are saved as PNG to avoid vector graphics issues with many points
- Consider splitting very large condition sets into multiple figures

### Data requirements:
- Must have normalized DNA content (`integrated_int_DAPI_norm`)
- Must have EdU intensity data (`intensity_mean_EdU_nucleus_norm`)
- For `comb_plot`: need the specified feature column
- Cell cycle phase assignments should be pre-calculated

## 6. Customizing Colors

Both functions accept a custom color list. The color order corresponds to cell cycle phases:

In [None]:
# Custom color palette
custom_colors = [
    "#FF6B6B",  # Sub-G1 (red)
    "#4ECDC4",  # G1 (teal)
    "#95E1D3",  # S (light green)
    "#C7CEEA",  # G2/M (lavender)
    "#FFD93D",  # Polyploid (yellow)
    "#6C757D",  # Additional colors if needed
]

# Apply custom colors
combplot_simple(
    df=df,
    conditions=conditions[:3],
    condition_col="condition",
    selector_col="cell_line",
    selector_val="MCF10A",
    title="Custom Color Scheme Example",
    colors=custom_colors,
    cell_number=3000,
    save=True,
    path=path,
)