# MAP Analysis Pipeline

This notebook provides a comprehensive workflow for running MAP (Molecular ALS Phenotype) analysis on imaging data. It combines model training with detailed explanations of each step in the pipeline.

## Overview

The MAP analysis framework enables classification of cell lines based on morphological imaging features. This notebook demonstrates:

1. **Data Loading**: Using `ImageScreenMultiAntibody` to load multi-marker imaging data
2. **Preprocessing**: Quality control and feature engineering steps
3. **Model Training**: Training classification models with leave-one-out cross-validation
4. **Evaluation**: Generating predictions and assessing model performance
5. **Post-hoc Analysis**: Adjusting for technical covariates and visualizing results

## 1. Setup and Configuration

In [4]:
# ---- Analysis Parameters ----
SCREEN = "20250216_AWALS37_Full_screen_n96"
ANALYSIS = "binary"  # Analysis type: binary_loocv, multiclass, etc.
MARKER = "all"  # "all" or specific marker name to filter
ANTIBODY = "FUS/EEA1"  # Can be single antibody or list of antibodies

# ---- Color Palette for Visualizations ----
PALETTE = {
    "WT": "#9A9A9A",
    "FUS": "#B24745",
    "C9orf72": "#6A6599",
    "sporadic": "#79AF97",
    "SOD1": "#00A1D5",
    "TDP43": "#DF8F44"
}

In [2]:
import os
import random
import numpy as np
import torch

# ---- Set seeds for reproducibility ----
SEED = 42
os.environ["PYTHONHASHSEED"] = str(SEED)
random.seed(SEED)
np.random.seed(SEED)
torch.manual_seed(SEED)
torch.cuda.manual_seed(SEED)
torch.cuda.manual_seed_all(SEED)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False

print("Random seeds set for reproducibility")

Random seeds set for reproducibility


## 2. Load Analysis Parameters

MAPs analysis parameters are stored in JSON configuration files that specify:

- **Screen information**: Dataset paths and metadata
- **Preprocessing steps**: Feature selection, transformations, and quality control
- **Model configuration**: Architecture, training parameters, and evaluation strategy
- **Analysis settings**: Cross-validation scheme, response variables, and grouping factors

Parameters are loaded as a dict and passed to `maps` classes to specifiy analyses configurations. Parameters can be changed within your python script. For example, in the block below, we manually override the `screen`, `antibodies`, and `preprocessing` (to drop selected antibodies) fields of our parameter file. 

In [5]:
from maps.screens import ImageScreenMultiAntibody
import json
from pathlib import Path

# --- Initialize parameters ---
pdir = Path("/home/kkumbier/als/scripts/pipelines/params")
with open(pdir / f"{ANALYSIS}.json", "r") as f:
    params = json.load(f)

params["screen"] = SCREEN
params["antibodies"] = ANTIBODY

# Update marker if specified
if MARKER != "all":
    fstr = params["preprocess"]["drop_feature_types"]["feature_str"]
    fstr += f"|^.*{MARKER}.*$"
    params["preprocess"]["drop_feature_types"]["feature_str"] = fstr

# Display configuration
print("Analysis Configuration:")
print(json.dumps(params, indent=4))

Analysis Configuration:
{
    "name": "binary",
    "description": "Binary classification analysis. Models trained on classify a single ALS genetic background vs health. Genetic-specific models applied to all ALS genetic backgrounds in eval set.",
    "screen": "20250216_AWALS37_Full_screen_n96",
    "antibodies": "FUS/EEA1",
    "root": "/awlab/projects/2024_ALS/Experiments",
    "data_file": "Objects_Population - Nuclei Selected.txt",
    "eval_dir": "Evaluation1",
    "result_dir": "/home/kkumbier/als/analysis_results",
    "preprocess": {
        "drop_na_features": {
            "na_prop": 0.1
        },
        "drop_sample_by_feature": {
            "drop_key": [
                {
                    "CellLines": [
                        "C9014",
                        "NS048",
                        "FTD37"
                    ]
                },
                {
                    "Mutations": [
                        "TDP43"
                    ]
                }
    

## 3. Initialize Screen and Load Data


The `ImageScreenMultiAntibody` class is designed to handle data I/O for multiple antibody markers simultaneously, allowing for multi-modal analysis. Each marker set is processed independently during preprocessing but can be integrated during model training. 

The `ImageScreenMultiAntibody` class provides utilities for:

- **Listing available markers**: Scan the dataset to identify all available antibody combinations
- **Loading marker data**: Load imaging features for specified antibody sets
- **Managing metadata**: Track cell line information, mutation status, and experimental conditions

Data and metadata for multi-antibody screens are stored as Python dictionaries, keyed by marker set names. This structure allows flexible analysis of single or multiple markers. *Note:* previous versions of the `maps` used the `ImageScreen` class. While this class can still be used for a single marker set analysis, it is recommended to use `ImageScreenMultiAntibody` with a single marker set to ensure consistent formatting of data/metadata format.

In [7]:
# Initialize screen class
screen = ImageScreenMultiAntibody(params)

# Display available antibody combinations
print("Available antibody combinations in dataset:")
available_antibodies = screen.loader.list_antibodies().unique()
for ab in available_antibodies:
    print(f"  - {ab}")

Available antibody combinations in dataset:
  - COX IV/Galectin3/atubulin
  - Rab1/CHMP2B
  - pTDP43/HMOX1
  - CD63/SEC16A
  - LAMP/TDP43-C
  - HSP70/SOD1
  - FUS/EEA1
  - TDP43_abcam/G3BP1
  - p62/LC3


In [8]:
# Load data for specified antibody/antibodies
print(f"\nLoading data for: {params['antibodies']}")
screen.load(antibody=params["antibodies"])

# Display loaded data structure
print("\nLoaded data structure:")
print(f"Data type: {type(screen.data)}")
if isinstance(screen.data, dict):
    for ab, data in screen.data.items():
        print(f"  {ab}: {data.shape if hasattr(data, 'shape') else type(data)}")


Loading data for: FUS/EEA1

Loaded data structure:
Data type: <class 'dict'>
  FUS/EEA1: (236027, 4723)

Loaded data structure:
Data type: <class 'dict'>
  FUS/EEA1: (236027, 4723)


## 4. Data Preprocessing

Preprocessing transforms raw imaging features into analysis-ready data. The preprocessing pipeline typically includes:

### Quality Control Steps:
- **`drop_sample_by_feature`**: Remove cell lines with abnormal characteristics (e.g., unusually low cell counts)
- **`drop_cells_by_feature_qt`**: Filter outlier cells based on quantile thresholds for size metrics
  - Removes cells below 5th or above 95th percentiles in nuclear/cell region size
  - Helps screen out segmentation artifacts and debris

### Feature Engineering:
- **`select_feature_types`**: Filter to specific feature categories (e.g., intensity features only)
- **`drop_feature_types`**: Remove unwanted feature types (e.g., segmentation channels)

### Sampling:
- **`subsample_rows_by_id`**: Balanced sampling of cells per well
  - **Critical step** to prevent training biases
  - Ensures equal representation of each cell line
  - Without this, over-represented cell lines can dominate the model
  - Some models (e.g., MultiAntibodyClassifier) automate class balancing, so this step is no longer required.

**Note**: The exact preprocessing steps should be tailored to your specific dataset. For example, cell lines flagged in QC may differ across experiments.

In [9]:
print("Processing data...")
screen.preprocess()
assert screen.data is not None, "Data loading or preprocessing failed"

# Display processed data information
print("\nProcessed data summary:")
for ab in params["antibodies"] if isinstance(params["antibodies"], list) else [params["antibodies"]]:
    print(f"\nMarker set: {ab}")
    print(f"Data shape: {screen.data[ab].shape}")
    print(f"Features: {screen.data[ab].columns[:10]}...")  # Show first 10 features
    
    # Show example of transformed data
    if hasattr(screen.data[ab], 'head'):
        print(f"\nSample data:")
        display(screen.data[ab].head())

Processing data...
Preprocessing complete

Processed data summary:

Marker set: FUS/EEA1
Data shape: (115816, 315)
Features: ['Total_Spot_Area', 'Relative_Spot_Intensity', 'Number_of_Spots', 'Number_of_Spots_per_Area_of_Cell', 'Total_Spot_Area_(2)', 'Relative_Spot_Intensity_(2)', 'Number_of_Spots_(2)', 'Number_of_Spots_per_Area_of_Cell_(2)', 'Spot1_overlap_spot2_ROI_Border_Distance_[µm]', 'Spot1_overlap_spot2_Overlap_[%]']...

Sample data:
Preprocessing complete

Processed data summary:

Marker set: FUS/EEA1
Data shape: (115816, 315)
Features: ['Total_Spot_Area', 'Relative_Spot_Intensity', 'Number_of_Spots', 'Number_of_Spots_per_Area_of_Cell', 'Total_Spot_Area_(2)', 'Relative_Spot_Intensity_(2)', 'Number_of_Spots_(2)', 'Number_of_Spots_per_Area_of_Cell_(2)', 'Spot1_overlap_spot2_ROI_Border_Distance_[µm]', 'Spot1_overlap_spot2_Overlap_[%]']...

Sample data:


Total_Spot_Area,Relative_Spot_Intensity,Number_of_Spots,Number_of_Spots_per_Area_of_Cell,Total_Spot_Area_(2),Relative_Spot_Intensity_(2),Number_of_Spots_(2),Number_of_Spots_per_Area_of_Cell_(2),Spot1_overlap_spot2_ROI_Border_Distance_[µm],Spot1_overlap_spot2_Overlap_[%],Spot2_overlap_spot1_ROI_Border_Distance_[µm],Spot2_overlap_spot1_Overlap_[%],Intensity_Nucleus_Region_Alexa_488_Mean,Intensity_Nucleus_Region_Alexa_488_StdDev,Intensity_Nucleus_Region_Alexa_488_Median,Intensity_Nucleus_Region_Alexa_488_Maximum,Intensity_Nucleus_Region_Alexa_488_Minimum,Intensity_Nucleus_Region_Alexa_488_CV_[%],Intensity_Nucleus_Region_Alexa_488_Quantile_90%,Intensity_Nucleus_Region_Alexa_488_Contrast,Nucleus_Region_Alexa_488_Symmetry_02_SER-Spot,Nucleus_Region_Alexa_488_Symmetry_03_SER-Spot,Nucleus_Region_Alexa_488_Symmetry_04_SER-Spot,Nucleus_Region_Alexa_488_Symmetry_05_SER-Spot,Nucleus_Region_Alexa_488_Symmetry_12_SER-Spot,Nucleus_Region_Alexa_488_Symmetry_13_SER-Spot,Nucleus_Region_Alexa_488_Symmetry_14_SER-Spot,Nucleus_Region_Alexa_488_Symmetry_15_SER-Spot,Nucleus_Region_Alexa_488_Threshold_Compactness_30%_SER-Spot,Nucleus_Region_Alexa_488_Threshold_Compactness_40%_SER-Spot,Nucleus_Region_Alexa_488_Threshold_Compactness_50%_SER-Spot,Nucleus_Region_Alexa_488_Threshold_Compactness_60%_SER-Spot,Nucleus_Region_Alexa_488_Axial_Small_Length_SER-Spot,Nucleus_Region_Alexa_488_Axial_Length_Ratio_SER-Spot,Nucleus_Region_Alexa_488_Radial_Mean_SER-Spot,Nucleus_Region_Alexa_488_Radial_Relative_Deviation_SER-Spot,Nucleus_Region_Alexa_488_Radial_Mean_Ratio_SER-Spot,…,Cell_Region_Alexa_647_Profile_1/5_SER-Spot,Cell_Region_Alexa_647_Profile_2/5_SER-Spot,Cell_Region_Alexa_647_Profile_3/5_SER-Spot,Cell_Region_Alexa_647_Profile_4/5_SER-Spot,Cell_Region_Alexa_647_Profile_5/5_SER-Spot,Cell_Region_Alexa_647_SER_Spot_2_px,Intensity_Membrane_Region_Alexa_647_Mean,Intensity_Membrane_Region_Alexa_647_StdDev,Intensity_Membrane_Region_Alexa_647_Median,Intensity_Membrane_Region_Alexa_647_Maximum,Intensity_Membrane_Region_Alexa_647_Minimum,Intensity_Membrane_Region_Alexa_647_CV_[%],Intensity_Membrane_Region_Alexa_647_Quantile_90%,Intensity_Membrane_Region_Alexa_647_Contrast,Membrane_Region_Alexa_647_Symmetry_02_SER-Spot,Membrane_Region_Alexa_647_Symmetry_03_SER-Spot,Membrane_Region_Alexa_647_Symmetry_04_SER-Spot,Membrane_Region_Alexa_647_Symmetry_05_SER-Spot,Membrane_Region_Alexa_647_Symmetry_12_SER-Spot,Membrane_Region_Alexa_647_Symmetry_13_SER-Spot,Membrane_Region_Alexa_647_Symmetry_14_SER-Spot,Membrane_Region_Alexa_647_Symmetry_15_SER-Spot,Membrane_Region_Alexa_647_Threshold_Compactness_30%_SER-Spot,Membrane_Region_Alexa_647_Threshold_Compactness_40%_SER-Spot,Membrane_Region_Alexa_647_Threshold_Compactness_50%_SER-Spot,Membrane_Region_Alexa_647_Threshold_Compactness_60%_SER-Spot,Membrane_Region_Alexa_647_Axial_Small_Length_SER-Spot,Membrane_Region_Alexa_647_Axial_Length_Ratio_SER-Spot,Membrane_Region_Alexa_647_Radial_Mean_SER-Spot,Membrane_Region_Alexa_647_Radial_Relative_Deviation_SER-Spot,Membrane_Region_Alexa_647_Radial_Mean_Ratio_SER-Spot,Membrane_Region_Alexa_647_Profile_1/5_SER-Spot,Membrane_Region_Alexa_647_Profile_2/5_SER-Spot,Membrane_Region_Alexa_647_Profile_4/5_SER-Spot,Membrane_Region_Alexa_647_Profile_5/5_SER-Spot,Membrane_Region_Alexa_647_SER_Spot_2_px,ID
f32,f32,f32,f32,f32,f32,f32,f32,f32,f32,f32,f32,f32,f32,f32,f32,f32,f32,f32,f32,f32,f32,f32,f32,f32,f32,f32,f32,f32,f32,f32,f32,f32,f32,f32,f32,f32,…,f32,f32,f32,f32,f32,f32,f32,f32,f32,f32,f32,f32,f32,f32,f32,f32,f32,f32,f32,f32,f32,f32,f32,f32,f32,f32,f32,f32,f32,f32,f32,f32,f32,f32,f32,f32,str
512.0,0.149164,41.0,0.006391,188.0,0.08494,16.0,0.002494,2.7e-05,0.0,0.001764,0.0,10400.900391,3543.330078,11336.0,17466.0,1410.0,34.067501,14177.0,0.808195,0.313642,0.046911,0.234682,0.099687,0.436055,0.047939,0.279426,0.094696,0.468053,0.535269,0.611724,0.800864,5.98029,0.556865,8.39812,0.266757,1.22766,…,0.003343,0.003031,0.00699,0.001993,0.001788,0.003135,1016.820007,1129.319946,788.0,27328.0,661.0,111.064003,1256.0,0.10059,0.212519,0.116577,0.235368,0.291751,0.220795,0.319191,0.355421,0.233324,0.670847,0.781579,0.81224,0.980669,26.782,0.760706,28.182699,0.481481,0.774853,0.002758,0.004845,0.001993,0.001788,0.003628,"""2024042020-1-14"""
349.0,0.435725,13.0,0.012393,70.0,0.139137,5.0,0.004766,0.0,1.71429,0.0,8.33333,13254.5,4941.77002,13780.0,24550.0,1339.0,37.283798,19429.0,0.823481,0.684141,0.169644,0.536281,0.155983,0.790972,0.1417,0.612462,0.174325,0.66465,0.759584,0.773102,0.882557,4.21088,0.283199,10.4871,0.292833,1.27339,…,0.00521,0.008293,0.031321,0.002485,0.001832,0.00536,1139.569946,969.562012,830.0,10576.0,682.0,85.0812,1766.0,0.20256,0.844228,0.305572,0.473147,0.216928,0.879301,0.080754,0.574882,0.271605,0.913502,1.02333,1.33985,1.4472,6.68956,0.211631,21.1597,0.407151,0.883491,0.006014,0.005057,0.002485,0.001832,0.005913,"""2024042020-1-14"""
688.0,0.305056,59.0,0.012661,227.0,0.11754,17.0,0.003648,0.0,10.9955,0.0,32.888901,23440.800781,7535.009766,23525.0,38543.0,4856.0,32.144901,33355.0,0.823963,0.513305,0.140395,0.246605,0.232122,0.636819,0.18323,0.334424,0.262484,0.60567,0.644529,0.699411,0.746779,5.09798,0.409261,9.10249,0.305293,1.23878,…,0.004804,0.003856,0.009317,0.006153,0.003562,0.004313,1112.459961,926.325012,832.0,13606.0,657.0,83.268097,1550.0,0.021207,0.249385,0.14285,0.237429,0.137547,0.344329,0.254618,0.3041,0.305506,0.528053,0.655374,0.753148,1.30635,22.473301,0.584209,28.4489,0.475604,0.843254,0.004473,0.005786,0.006153,0.003562,0.005004,"""2024042020-1-14"""
396.0,0.414577,10.0,0.009891,43.0,0.136631,3.0,0.002967,2.7e-05,0.0,0.001764,0.0,17405.699219,5809.810059,18576.0,28899.0,3153.0,33.378899,23902.0,0.820276,0.409822,0.002809,0.315248,0.058833,0.517627,0.003025,0.311395,0.06049,0.561493,0.613542,0.647016,0.679393,5.80487,0.508376,8.80714,0.240144,1.19951,…,0.006879,0.002675,0.002362,0.005182,0.001492,0.00414,1365.199951,1995.930054,868.0,22654.0,713.0,146.201004,1501.0,0.149828,0.355892,0.192464,0.25675,0.495027,0.413549,0.275776,0.20587,0.473326,1.25331,1.77245,2.50663,2.50663,9.07609,0.559442,12.3416,0.36662,0.808496,0.004771,0.00151,0.005182,0.001492,0.004629,"""2024042020-1-14"""
748.0,0.348293,64.0,0.013649,192.0,0.176876,13.0,0.002772,0.0,2.93333,0.0,11.5789,21588.300781,7086.779785,22797.0,34266.0,2810.0,32.8269,29848.0,0.858209,0.462546,0.165044,0.383288,0.248984,0.563666,0.132431,0.410202,0.224595,0.528943,0.621959,0.696286,0.798623,5.65664,0.468261,9.24212,0.203756,1.23241,…,0.002496,0.002903,0.015643,0.000204,0.001138,0.003003,846.544006,534.426025,772.0,9948.0,662.0,63.130299,847.0,-0.192966,0.878802,0.727683,0.829698,0.746359,0.931813,0.747422,0.86358,0.747939,0.795413,0.87511,0.884692,1.02029,8.71173,0.147244,38.924599,0.424549,1.15596,0.002844,0.000996,0.000204,0.001138,0.001941,"""2024042020-1-14"""


## 5. Model Training

The `MAP` class provides a unified interface for model training and evaluation:

- **Initialization**: The MAP class is initialized with a `Screen` object
- **Configuration**: Training parameters (model type, fitting strategy) are read from the screen's parameter file
- **Fitting**: The `fit()` method executes the complete training workflow

### Model Types:
- **Single-marker models**: Traditional ML models (logistic regression, random forest, etc.)
- **Multi-marker models**: PyTorch-based models that integrate data across markers
  - Allows flexible integration strategies (concatenation, attention mechanisms, etc.)
  - Suitable for learning marker interactions and multi-modal representations

### Training Strategy:
- **Leave-one-out cross-validation (LOOCV)**: Each mutation group is held out in turn
- **Sample split**: Cell lines are divided into two sets with equal class representation, models are trained on one set & evaluated on the other, then the process is repeated with training and eval sets swapped.

In [10]:
from maps.analyses import MAP

print("Initializing MAP analysis...")
map_analysis = MAP(screen)

print("Training model...")
print("This may take several minutes depending on dataset size and model complexity.\n")
map_analysis.fit()

print("\nModel training complete!")

Initializing MAP analysis...
Training model...
This may take several minutes depending on dataset size and model complexity.

Training SOD1...
--- Replicate 1/3 ---


TypeError: create_multiantibody_dataloader() got an unexpected keyword argument 'select_samples'

## 6. Examine Predictions

The fitted MAP object contains predictions for each mutation group. For single-marker models, predictions are organized by mutation type. For multi-marker models, predictions may be stored differently depending on the model architecture.

### Prediction Structure:
- **Cell-level predictions**: Individual MAP scores for each cell
- **Metadata**: Cell line, mutation status, well information
- **Model outputs**: Predicted probabilities or class labels

These predictions can be aggregated and analyzed at various levels (cell, well, cell line) for downstream analysis.

In [None]:
# Display prediction structure
print("Prediction structure:")
print(f"Type: {type(map_analysis.fitted)}")

# For single-marker LOOCV models
if isinstance(map_analysis.fitted, dict) and "predicted" not in map_analysis.fitted:
    print(f"\nMutation groups analyzed: {list(map_analysis.fitted.keys())}")
    
    # Show example predictions for first mutation group
    first_mut = list(map_analysis.fitted.keys())[0]
    print(f"\nExample predictions for {first_mut}:")
    if "predicted" in map_analysis.fitted[first_mut]:
        display(map_analysis.fitted[first_mut]["predicted"].head())
        
        # Show model parameters
        if hasattr(map_analysis.model, 'model') and hasattr(map_analysis.model.model, 'params'):
            print(f"\nModel parameters:")
            print(map_analysis.model.model.params)

# For multi-marker models
elif isinstance(map_analysis.fitted, dict) and "predicted" in map_analysis.fitted:
    print("\nMulti-marker model predictions:")
    display(map_analysis.fitted["predicted"].head(24))

## 7. Post-hoc Analysis: Count Adjustment

MAP scores can be influenced by technical factors such as cell count per well. To adjust for these effects:

### Count Adjustment Process:
1. **Aggregate to well-level**: Group cell-level predictions by well
2. **Fit size model**: Estimate MAP scores as a function of cell count
3. **Adjust predictions**: Remove the estimated count effect from raw MAP scores

This adjustment helps isolate biological signal from technical variation, improving the reliability of downstream comparisons.

### When to Apply:
- When cell counts vary substantially across wells
- When quality control reveals count-dependent bias in MAP scores
- Before making quantitative comparisons between cell lines or conditions

In [None]:
import pandas as pd
import polars as pl
from maps.utils import group_predicted, fit_size_model, adjust_map_scores

# Only applicable for LOOCV single-marker models
if isinstance(map_analysis.fitted, dict) and "predicted" not in map_analysis.fitted:
    adjusted = {}
    groups = ["CellLines", "Mutations", "Well"] 
    
    print("Performing count adjustment for each mutation group...\n")
    
    for k, v in map_analysis.fitted.items():
        print(f"Processing {k}...")
        
        # Group predictions by well
        grouped_pred = group_predicted(v["predicted"], groups, "Ypred")
        
        # Merge with cell counts
        counts = screen.metadata.select(["Well", "NCells"]).to_pandas()
        df = pd.merge(grouped_pred, counts, on="Well") 
        
        # Fit size adjustment model
        model, X = fit_size_model(df)
        
        # Apply adjustment
        adjusted[k] = adjust_map_scores(df, X, model)
        adjusted[k]["Group"] = k
    
    print("\nCount adjustment complete!")
    print(f"\nExample adjusted predictions for {list(adjusted.keys())[0]}:")
    display(adjusted[list(adjusted.keys())[0]].head())
else:
    print("Count adjustment is typically applied to LOOCV single-marker models.")
    print("For multi-marker models, adjustments may be handled differently.")

## 8. Visualization

Visualize MAP scores and count adjustment effects:

### Visualization Types:
- **Grouped plots**: MAP scores by cell line, colored by mutation status
- **Adjustment plots**: Before/after comparison showing count correction
- **Distribution plots**: Cell line variability and statistical differences

These visualizations help assess:
- Model performance and discrimination ability
- Impact of count adjustment
- Biological patterns and outliers

In [None]:
from maps.figures import plot_grouped, plot_map_adjustment

# Only create visualizations for LOOCV single-marker models
if isinstance(map_analysis.fitted, dict) and "predicted" not in map_analysis.fitted:
    for k, v in map_analysis.fitted.items():
        print(f"\n=== Visualizations for {k} ===")
        
        # Filter to relevant mutations
        raw_pred = v["predicted"] \
            .filter(pl.col("Mutations").is_in([k, "WT"]))
        
        adj_pred = pl.DataFrame(adjusted[k]) \
            .filter(pl.col("Mutations").is_in([k, "WT"]))
        
        # Plot raw MAP scores
        print(f"\nRaw MAP scores (before count adjustment):")
        plot_grouped(
            df=raw_pred, 
            y="Ypred",
            x="CellLines",
            hue="Mutations",
            ylab="MAP score",
            palette=PALETTE
        )
        
        # Plot count adjustment model
        print(f"\nCount adjustment model:")
        grouped_pred = group_predicted(raw_pred, groups, "Ypred")
        counts = screen.metadata.select(["Well", "NCells"]).to_pandas()
        df = pd.merge(grouped_pred, counts, on="Well") 
        model, X = fit_size_model(df)
        
        plot_map_adjustment(
            df=df, 
            model=model, 
            X=X, 
            sporadics=False
        )  
        
        # Plot adjusted MAP scores
        print(f"\nAdjusted MAP scores (after count adjustment):")
        plot_grouped(
            df=adj_pred, 
            y="Score",
            x="CellLines",
            hue="Mutations",
            ylab="Adjusted MAP score",
            palette=PALETTE
        )
else:
    print("Visualizations shown are for LOOCV single-marker models.")
    print("For multi-marker models, create custom visualizations based on model outputs.")

## 9. Save Results

Save the trained model and predictions for downstream analysis or deployment.

### Saved Components:
- **MAP analysis object**: Complete fitted model with all metadata
- **Parameters**: Configuration used for this analysis
- **Predictions**: Can be saved separately for easier access

Results are organized by screen name and analysis type for easy retrieval.

In [None]:
import pickle

# Create antibody string for filename
if isinstance(ANTIBODY, list):
    ab_string = "_".join(ANTIBODY).replace("/", "-")
else:
    ab_string = ANTIBODY.replace("/", "-")

# Set output directory
output_dir = Path(params.get("result_dir", "./results")) / params.get("screen")
output_dir.mkdir(parents=True, exist_ok=True)

# Save analysis object and parameters
output_file = output_dir / f"{ANALYSIS}-{ab_string}-{MARKER}.pkl"
with open(output_file, "wb") as f:
    pickle.dump({"analysis": map_analysis, "params": params}, f)

print(f"Results saved to: {output_file}")

# Optionally save predictions as CSV for easier access
if isinstance(map_analysis.fitted, dict) and "predicted" in map_analysis.fitted:
    pred_file = output_dir / f"{ANALYSIS}-{ab_string}-{MARKER}-predictions.csv"
    map_analysis.fitted["predicted"].write_csv(pred_file)
    print(f"Predictions saved to: {pred_file}")

## Summary

This notebook demonstrated the complete MAP analysis pipeline:

1. ✓ Configured analysis parameters and set random seeds
2. ✓ Loaded multi-antibody imaging data
3. ✓ Preprocessed features with quality control and dimensionality reduction
4. ✓ Trained classification models with cross-validation
5. ✓ Generated and examined predictions
6. ✓ Adjusted for technical covariates (cell count)
7. ✓ Visualized results and model performance
8. ✓ Saved models and predictions for future use

### Next Steps:

- **Post-hoc marker analysis**: Identify specific markers driving classification (see `posthoc_markers.ipynb`)
- **iMAP analysis**: Aggregate predictions across markers for improved performance (see `posthoc_imaps.ipynb`)
- **Model comparison**: Test different architectures or hyperparameters
- **External validation**: Apply trained models to held-out test sets

### Key Considerations:

- **Preprocessing is critical**: Customize steps based on your QC findings
- **Balanced sampling matters**: Always use `subsample_rows_by_id` to prevent bias
- **Count adjustment helps**: Apply when cell counts vary substantially
- **Reproducibility**: Set random seeds and document all parameter choices