# DUSP1 Confirmation and Visualization Notebook

This notebook processes experimental data from Big-FISH and CellProperties CSV files to classify and visualize DUSP1 smiFISH spots. The analysis is modular, using dedicated classes for loading data, performing signal-to-noise classification, measurement extraction, and filtering.

---

### **Input**
- Big-FISH CSV files (`spots`, `clusters`)
- CellProperties CSV files (`cell_props`, `cell_results`)

---

### **Workflow Overview**

1. **Load Experimental Data**
   - Use `DUSP1AnalysisManager` to identify and load datasets from HDF5 files or local CSVs.
   - Extract file paths from a log directory if not directly provided.

2. **Classify Spots by Signal Quality (SNR Analysis)**
   - Use `SNRAnalysis` to perform:
     - **Weighted SNR thresholding** using Big-FISH `'snr'` values with a percentile cutoff (e.g., 20th percentile within the range 2–5).
     - **Absolute thresholding** using a fixed cutoff on the Big-FISH `'snr'` value.
     - **MG SNR calculation**:  
       A more localized method that accounts for subcellular context. It computes:
       ```
       MG_SNR = (signal - mean) / std
       ```
       where `mean` and `std` are drawn from either the nuclear or cytoplasmic region of the same cell depending on the spot's location. This method reflects more accurate signal variation within individual cells.

   - Adds boolean flags and comparison columns (`MG_pass`, `Abs_pass`, `Weighted_pass`) to aid classification.

3. **Spot and Cell Measurement Extraction**
   - Use `DUSP1Measurement` to:
     - Quantify spot-level and cell-level metrics.
     - Support optional filtering using SNR results.
     - Append `unique_cell_id` for downstream aggregation and visualization.

4. **Filter and Save Processed Data**
   - Apply quality filters to retain only confident spots and cells.
   - Final outputs include:
     - `Finalspots`
     - `Finalclusters`
     - `Finalcellprops`
     - `SSITcellresults`

5. **Visualization**
   - Use `DUSP1DisplayManager` to inspect and validate each analysis step.
   - View raw data, spot overlays, SNR distributions, and per-cell metrics.

---

### **Core Classes**

- **`DUSP1AnalysisManager`**  
  Loads HDF5 datasets or existing CSVs. Manages filepaths and FOV indexing. Exports intermediate and final data to CSV.

- **`SNRAnalysis`**  
  Performs three types of signal quality assessment:
  - **Weighted**: percentile-based filtering on Big-FISH `'snr'`.
  - **Absolute**: fixed-threshold filtering on Big-FISH `'snr'`.
  - **MG SNR**: cell-aware SNR using cytoplasmic or nuclear mean and std from CellProperties.  
  Adds classification flags and comparison columns.

- **`DUSP1Measurement`**  
  Aggregates spot and cluster information to the cell level. Handles integration of filtered data for downstream analysis.

- **`DUSP1DisplayManager`**  
  Offers visual inspection tools across the full pipeline. Supports 2D/3D napari visualizations, crop display, and removal overlays.

In [None]:
import h5py
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import dask.array as da
import os
import sys
import logging
import seaborn as sns
import datetime

# Today's date
today = datetime.date.today()
# Format date as 'Mar21' (for example)
date_str = today.strftime("%b%d")

logging.getLogger('matplotlib.font_manager').disabled = True
numba_logger = logging.getLogger('numba')
numba_logger.setLevel(logging.WARNING)

matplotlib_logger = logging.getLogger('matplotlib')
matplotlib_logger.setLevel(logging.WARNING)

src_path = os.path.abspath(os.path.join(os.getcwd(), '..', '..'))
print(src_path)
sys.path.append(src_path)

from src.Analysis_DUSP1_v2 import DUSP1AnalysisManager, SNRAnalysis, DUSP1Measurement, DUSP1_filtering, DUSP1DisplayManager

**`DUSP1AnalysisManager`** 
   - Manages HDF5 file access.
   - Extracts file paths from a log directory (if no direct locations are provided).
   - Provides methods to select an analysis (by name) and load datasets from HDF5 files.
   - Saves datasets as CSV.

In [None]:
loc = None
log_location = r'/Volumes/share/Users/Eric/GR_DUSP1_reruns'
save_dir = r'/Volumes/share/Users/Eric/DUSP1_May2025'
if not os.path.exists(save_dir):
    os.makedirs(save_dir)

# Define filtering method and thresholds
method = 'mg'            # options: 'mg', 'absolute', 'mg_abs', 'weighted', 'rf', 'none'
abs_threshold = 5          # only used if method is 'absolute' or 'mg_abs'
mg_threshold = 3           # only used for MG_SNR filtering

**`DUSP1 Replica D 3hr 100nM time-sweep R1`**

In [None]:
# 1. Create an instance of the DUSP1AnalysisManager class.
am = DUSP1AnalysisManager(location=loc, log_location=log_location, mac=True) 
am.select_analysis('DUSP1_D_Final')

# Load the datasets
spots_df = am.select_datasets("spotresults", dtype="dataframe")
clusters_df = am.select_datasets("clusterresults", dtype="dataframe")
props_df = am.select_datasets("cell_properties", dtype="dataframe")

print(f"Data loaded and moving to SNRAnalysis...")
# 2. Create an instance of the SNRAnalysis class.
snr_df = SNRAnalysis(spots_df, props_df, clusters_df, abs_threshold=abs_threshold, mg_threshold=mg_threshold)

merged_spots_df, merged_clusters_df, merged_cellprops_df = snr_df.get_results()

print(f"SNR analysis complete, data merged and moving to DUSP1Measurement...")
# 3. Create an instance of the DUSP1Measurement class.
dusp = DUSP1Measurement(merged_spots_df, merged_clusters_df, merged_cellprops_df)

# Process the data with a chosen threshold method
cell_level_results = dusp.measure(abs_threshold=abs_threshold, mg_threshold=mg_threshold)

# Add replica level unique IDs for 'unique_cell_id', 'unique_spot_id', and 'unique_cluster_id'
# Get number of digits in the max unique_cell_id
max_id = merged_cellprops_df['unique_cell_id'].max()
num_digits = len(str(max_id))

# Calculate multiplier to add a '10' followed by the right number of zeroes - prefix is specific for each experiment (e.g., repD:1, repE:2, etc.)
rep_prefix = 10
prefix = rep_prefix ** num_digits  # e.g., if max_id = 30245 → prefix = 100000

# Apply prefix to all related DataFrames
merged_spots_df['unique_cell_id'] += prefix
merged_clusters_df['unique_cell_id'] += prefix
merged_cellprops_df['unique_cell_id'] += prefix
cell_level_results['unique_cell_id'] += prefix

# Repeat for unique_spot_id and unique_cluster_id
max_spot_id = merged_spots_df['unique_spot_id'].max()
spot_prefix = rep_prefix ** len(str(max_spot_id))
merged_spots_df['unique_spot_id'] += spot_prefix

max_cluster_id = merged_clusters_df['unique_cluster_id'].max()
cluster_prefix = rep_prefix ** len(str(max_cluster_id))
merged_clusters_df['unique_cluster_id'] += cluster_prefix

# Save the intermediate results
rep_string = 'DUSP1_D_Final'
intermediate_dir = save_dir
os.makedirs(intermediate_dir, exist_ok=True)
# cell_level_results.to_csv(os.path.join(intermediate_dir, f"{rep_string}_cellresults_MG{mg_threshold}_Abs{abs_threshold}_{date_str}.csv"), index=False)
# merged_spots_df.to_csv(os.path.join(intermediate_dir, f"{rep_string}_Spots_MG{mg_threshold}_Abs{abs_threshold}_{date_str}.csv"), index=False)
# merged_clusters_df.to_csv(os.path.join(intermediate_dir, f"{rep_string}_Clusters_MG{mg_threshold}_Abs{abs_threshold}_{date_str}.csv"), index=False)
# merged_cellprops_df.to_csv(os.path.join(intermediate_dir, f"{rep_string}_CellProps_MG{mg_threshold}_Abs{abs_threshold}_{date_str}.csv"), index=False)
cell_level_results.to_csv(os.path.join(intermediate_dir, f"{rep_string}_cellresults_MG{mg_threshold}_{date_str}.csv"), index=False)
merged_spots_df.to_csv(os.path.join(intermediate_dir, f"{rep_string}_Spots_MG{mg_threshold}_{date_str}.csv"), index=False)
merged_clusters_df.to_csv(os.path.join(intermediate_dir, f"{rep_string}_Clusters_MG{mg_threshold}_{date_str}.csv"), index=False)
merged_cellprops_df.to_csv(os.path.join(intermediate_dir, f"{rep_string}_CellProps_MG{mg_threshold}_{date_str}.csv"), index=False)

print(f"Intermediate results saved, moving to filtering...")
# Initialize filtering object
filterer = DUSP1_filtering(method=method)

# Apply filtering and measurement
filtered_spots, filtered_clusters, filtered_cellprops, SSITcellresults, removed_spots = filterer.apply_all(
    spots=merged_spots_df,
    clusters=merged_clusters_df,
    cellprops=merged_cellprops_df
)

print(f"Filtering complete, saving results...")
# Save all results to CSV
output_dir = save_dir
os.makedirs(output_dir, exist_ok=True)
# SSITcellresults.to_csv(os.path.join(output_dir, f"{rep_string}_SSITcellresults_MG{mg_threshold}_Abs{abs_threshold}_{date_str}.csv"), index=False)
# filtered_spots.to_csv(os.path.join(output_dir, f"{rep_string}_FinalSpots_MG{mg_threshold}_Abs{abs_threshold}_{date_str}.csv"), index=False)
# filtered_clusters.to_csv(os.path.join(output_dir, f"{rep_string}_FinalClusters_MG{mg_threshold}_Abs{abs_threshold}_{date_str}.csv"), index=False)
# filtered_cellprops.to_csv(os.path.join(output_dir, f"{rep_string}_FinalCellProps_MG{mg_threshold}_Abs{abs_threshold}_{date_str}.csv"), index=False)
SSITcellresults.to_csv(os.path.join(output_dir, f"{rep_string}_SSITcellresults_MG{mg_threshold}_{date_str}.csv"), index=False)
filtered_spots.to_csv(os.path.join(output_dir, f"{rep_string}_FinalSpots_MG{mg_threshold}_{date_str}.csv"), index=False)
filtered_clusters.to_csv(os.path.join(output_dir, f"{rep_string}_FinalClusters_MG{mg_threshold}_{date_str}.csv"), index=False)
filtered_cellprops.to_csv(os.path.join(output_dir, f"{rep_string}_FinalCellProps_MG{mg_threshold}_{date_str}.csv"), index=False)

print(f"Results saved, moving to display...")
# 4. Create an instance of the DUSP1DisplayManager class.
display_manager = DUSP1DisplayManager(am, 
                                      cell_level_results=SSITcellresults,
                                      spots=filtered_spots,
                                      clusters=filtered_clusters,
                                      cellprops=filtered_cellprops,
                                      removed_spots=removed_spots)
# Run the main display function.
display_manager.main_display()