# DUSP1 Analysis and Visualization Notebook

This notebook demonstrates how to use the new analysis manager code from `Analysis_DUSP1.py`.

In this notebook, we will:
1. Load the processed CSV files (spots, clusters, and cell properties).
2. Instantiate the measurement manager (DUSP1Measurement) and compute cell-level metrics,
   with optional SNR filtering.
3. Create a DisplayManager instance to visualize gating overlays and cell crops.
4. (Optional) Use the new expression grouping and visualization functions.

Make sure that `Analysis_DUSP1.py` is in the same directory or on the Python path.

In [None]:
import h5py
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import dask.array as da
import os
import sys
import logging
import seaborn as sns

logging.getLogger('matplotlib.font_manager').disabled = True
numba_logger = logging.getLogger('numba')
numba_logger.setLevel(logging.WARNING)

matplotlib_logger = logging.getLogger('matplotlib')
matplotlib_logger.setLevel(logging.WARNING)

src_path = os.path.abspath(os.path.join(os.getcwd(), '..', '..'))
print(src_path)
sys.path.append(src_path)

from src.Analysis_DUSP1 import DUSP1AnalysisManager, DUSP1Measurement, SNRAnalysis, DisplayManager, GR_Confirmation

# Use the log file to search for analyses

In [None]:
loc = None
log_location = r'/Volumes/share/Users/Eric/GR_DUSP1_reruns'

In [None]:
am = DUSP1AnalysisManager(location=loc, log_location=log_location, mac=True) 

In [None]:
# list all analysis done 
all_analysis_names = am.list_analysis_names()

# DUSP1 Experiment Analysis List

### DUSP1 100nM Dex 3hr Time-sweep
- Replica D: `Analysis_DUSP1_D_NoThreshold_2025-02-21`
- Replica E: `Analysis_DUSP1_E_NoThreshold_2025-02-21`
- Replica F: `Analysis_DUSP1_F_NoThreshold_2025-02-22`
- Replica M: `Analysis_DUSP1_M_NoThreshold_2025-02-22`
- Replica N: `Analysis_DUSP1_N_NoThreshold_2025-02-22`

### DUSP1 75min Concentration-sweep
- Replica G: `Analysis_DUSP1_G_NoThreshold_2025-02-22`
- Replica H: `Analysis_DUSP1_H_NoThreshold_2025-02-22`
- Replica I: `Analysis_DUSP1_I_NoThreshold_2025-02-22`

### DUSP1 0.3, 1, 10nM Dex 3hr Time-sweep
- Replica J: `Analysis_DUSP1_J_NoThreshold_2025-02-22`
- Replica K: `Analysis_DUSP1_K_NoThreshold_2025-02-22`
- Replica L: `Analysis_DUSP1_L_NoThreshold_2025-02-22`

### DUSP1 TPL
- Replica O `Analysis_DUSP1_O_NoThreshold_2025-02-22`
- Replica P `Analysis_DUSP1_P_NoThreshold_2025-02-22`

# GR Experiment Analyis List

### GR 1, 10, 100nM Dex 3hr Time-Sweep
- Replica A: `Analysis_GR_IC_A_ER020725_2025-02-07`
- Replica B: `Analysis_GR_IC_B_ReRun021025_2025-02-10`
- Replica C: `Analysis_GR_IC_C_ER020725_2025-02-08`

### GR 1, 10, 100nM Dex 3hr Time-Sweep - No Illumination Correction
- Replica A: `Analysis_GR_noIC_A_ER021725_2025-02-18`
- Replica B: `Analysis_GR_noIC_B_ER021725_2025-02-18`
- Replica C: `Analysis_GR_noIC_C_ER021725_2025-02-18`

## Example workflow

In [None]:
# Initiate the class and find analysis at log_location
# Select the specific analysis - ex. DUSP1 100nM Dex 3hr Time-sweep Replica 1
am.select_analysis('DUSP1_D_NoThreshold')
print('locations with this dataset:', am.location)

In [None]:
# Load datasets
spots_df = am.select_datasets("spotresults", dtype="dataframe")
clusters_df = am.select_datasets("clusterresults", dtype="dataframe")
props_df = am.select_datasets("cell_properties", dtype="dataframe")

print("Spots shape:", spots_df.shape)
print("Clusters shape:", clusters_df.shape)
print("Cell properties shape:", props_df.shape)

## Step 2: Compute Cell-Level Metrics with Different SNR Filtering Methods

We create three DUSP1Measurement objects (or re-use one with different filtering options)
to compare the following methods:
- Weighted: uses weighted thresholding based on 'snr'.
- Absolute: keeps spots with snr >= 4.
- MG: computes MG_SNR and keeps spots with MG_SNR >= 4.

Note: Adjust the snr_threshold for MG if needed.

In [None]:
measurement_weighted = DUSP1Measurement(spots_df, clusters_df, props_df)
results_weighted = measurement_weighted.measure(snr_filter_method="weighted")
print("Results (Weighted Filtering):")
print(results_weighted.head())

In [None]:
# Absolute filtering:
measurement_absolute = DUSP1Measurement(spots_df, clusters_df, props_df)
results_absolute = measurement_absolute.measure(snr_filter_method="absolute", snr_threshold=4)
print("Results (Absolute Filtering, threshold=4):")
print(results_absolute.head())

In [None]:
# MG filtering:
measurement_mg = DUSP1Measurement(spots_df, clusters_df, props_df)
results_mg = measurement_mg.measure(snr_filter_method="mg", snr_threshold=4)
print("Results (MG Filtering, threshold=4):")
print(results_mg.head())

    DUSP1 100nM Dex 3hr Time-sweep Replica 2

In [None]:
DUSP1_RepE = filter_DUSP1('DUSP1_E_NoThreshold')

    DUSP1 100nM Dex 3hr Time-sweep Replica 3

In [None]:
DUSP1_RepF = filter_DUSP1('DUSP1_F_NoThreshold')                        

    DUSP1 100nM Dex 3hr Time-sweep Replica 4 (partial)

In [None]:
DUSP1_RepM = filter_DUSP1('DUSP1_M_NoThreshold')                        

    DUSP1 100nM Dex 3hr Time-sweep Replica 5 (partial)

In [None]:
DUSP1_RepN = filter_DUSP1('DUSP1_N_NoThreshold')

    DUSP1 75min Concentration-sweep Replica 1

In [None]:
DUSP1_RepG = filter_DUSP1('DUSP1_G_NoThreshold')                        

    DUSP1 75min Concentration-sweep Replica 2

In [None]:
DUSP1_RepH = filter_DUSP1('DUSP1_H_NoThreshold')                         

    DUSP1 75min Concentration-sweep Replica 3

In [None]:
DUSP1_RepI = filter_DUSP1('DUSP1_I_NoThreshold')

    DUSP1 0.3, 1, 10nM Dex 3hr Time-sweep Replica 1

In [None]:
DUSP1_RepJ = filter_DUSP1('DUSP1_J_NoThreshold')


    DUSP1 0.3, 1, 10nM Dex 3hr Time-sweep Replica 2

In [None]:
DUSP1_RepK = filter_DUSP1('DUSP1_K_NoThreshold')

    DUSP1 0.3, 1, 10nM Dex 3hr Time-sweep Replica 3

In [None]:
DUSP1_RepL = filter_DUSP1('DUSP1_L_NoThreshold')

    DUSP1 100nM Dex & 5µM TPL Time-sweep Replica 1

In [None]:
DUSP1_RepO_TPL = filter_DUSP1('DUSP1_O_NoThreshold', True)

    DUSP1 100nM Dex & 5µM TPL Time-sweep Replica 2

In [None]:
DUSP1_RepP_TPL = filter_DUSP1('DUSP1_P_NoThreshold', True)

In [None]:
# label Dex-only frames as TPL = False
for df in [DUSP1_RepD, DUSP1_RepE, DUSP1_RepF, DUSP1_RepM, DUSP1_RepN,
           DUSP1_RepG, DUSP1_RepH, DUSP1_RepI, DUSP1_RepJ, DUSP1_RepK, DUSP1_RepL]:
    df['is_TPL'] = False

# Label TPL frames as TPL = True
for df in [DUSP1_RepO_TPL, DUSP1_RepP_TPL]:
    df['is_TPL'] = True

# 1) Concatenate Dex-only DUSP1 data
DUSP1_ALL = pd.concat(
    [DUSP1_RepD, DUSP1_RepE, DUSP1_RepF, DUSP1_RepM, DUSP1_RepN,
     DUSP1_RepG, DUSP1_RepH, DUSP1_RepI, DUSP1_RepJ, DUSP1_RepK, DUSP1_RepL],
    ignore_index=True
)
DUSP1_ALL['unique_cell_id'] = np.arange(len(DUSP1_ALL))

# 2) Concatenate TPL data
DUSP1_TPL = pd.concat(
    [DUSP1_RepO_TPL, DUSP1_RepP_TPL],
    ignore_index=True
)
DUSP1_TPL['unique_cell_id'] = np.arange(len(DUSP1_TPL))

# 3) Create a combined DataFrame (Dex + TPL), preserving the is_TPL flag
DUSP1_Dex_TPL_ALL = pd.concat([DUSP1_ALL, DUSP1_TPL], 
                           ignore_index=True, 
                           sort=False)
# Assign a new unique_cell_id across the combined data
DUSP1_Dex_TPL_ALL['unique_cell_id'] = np.arange(len(DUSP1_Dex_TPL_ALL))

In [None]:
from datetime import datetime

# SAVE THE PRE-GATED DATAFRAMES
# ============================
current_date = datetime.now().strftime("%b%d%y")

# (Optional) save each DataFrame to disk
DUSP1_ALL.to_csv(f'DUSP1_ALL_{current_date}_NoThreshold.csv', index=False)
DUSP1_TPL.to_csv(f'DUSP1_TPL_{current_date}_NoThreshold.csv', index=False)
DUSP1_Dex_TPL_ALL.to_csv(f'DUSP1_Dex_TPL_ALL_{current_date}_NoThreshold.csv', index=False)

## GR_ALL & DUSP1_All final dataframe preperation for SSIT

1) Fit a Polynomial (2nd-degree) using (nuc_area_px, cyto_area_px) from DUSP1_ALL.

2) Estimate Cytoplasm Area in GR_ALL:
3) Creates `CalcCytoArea` by evaluating the fitted polynomial at each row’s `nuc_area`.

4) Gate both data sets on the 25%–75% range of nuclear area.

5) Compute “Normalized” GR (`normGRnuc`, `normGRcyt`) in GR_ALL:
- Scales nuclear/cyt intensities (5%→95% range) into integer bins [0,30].

6) Plot Histograms for the normalized nuclear/cyt GR (using custom colors).

7) Save the updated, gated data sets to CSV.

In [None]:
# 1) READ INPUT DATA
# =========================
# DUSP1_ALL from above or load from disk:
if f'DUSP1_ALL_{current_date}_NoThreshold' in locals():
    df_dusp = DUSP1_ALL
else:
    df_dusp = pd.read_csv('DUSP1_ALL.csv')

# GR_ALL from above or load from disk:
if 'GR_ALL' in locals():
    df_gr = GR_ALL
else:
    df_gr = pd.read_csv('GR_ALL_pregate.csv')


# 2) FIT POLYNOMIAL TO (NUC, CYTO) FROM DUSP1_ALL
# =========================
# We'll use only the rows that have valid nuc_area_px and cyto_area_px.
num_cells = df_dusp.shape[0]
df_dusp_nonmissing = df_dusp.dropna(subset=['nuc_area_px', 'cyto_area_px']).copy()
print(f'Cells removed because of NaN areas in DUSP1: {df_dusp_nonmissing.shape[0] - num_cells}')


# 4) GATE DUSP1 DATAFRAME ON [25%, 75%] NUCLEAR AREA
# =========================
# We'll define a helper function for gating.
num_cells = df_dusp.shape[0]
def gate_on_nuc_area(df, nuc_col):
    """Return a copy of df gated to [25th, 75th percentile] of nuc_col."""
    lower = df[nuc_col].quantile(0.25)
    upper = df[nuc_col].quantile(0.75)
    return df[(df[nuc_col] >= lower) & (df[nuc_col] <= upper)].copy()

# Gate DUSP1_ALL on nuc_area_px
print('+++ Gating Nuc Area +++')
df_dusp_gated = gate_on_nuc_area(df_dusp_nonmissing, 'nuc_area_px') 
print(f'Cells removed because of nuc_area_px gating: {df_dusp_gated.shape[0] - num_cells}')
print(f'Cells remaining after nuc_area_px gating: {df_dusp_gated.shape[0]}')

# SAVE THE GATED DATAFRAMES

In [None]:
from datetime import datetime

# SAVE THE GATED DATAFRAMES
# =========================
# Gated DUSP1 (unchanged except row filtering)
# Get the current date
current_date = datetime.now().strftime("%b%d%y")

# Save the gated DUSP1 dataframe with the current date in the filename
df_dusp_gated.to_csv(f"DUSP1_ALL_gated_{current_date}_NoThreshold.csv", index=False)
print(f"Saved gated DUSP1 to 'DUSP1_ALL_gated_{current_date}_NoThreshold.csv")


# Load in gated data

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import pandas as pd

df_dusp_gated = pd.read_csv(f'DUSP1_ALL_gated_Feb2425_NoThreshold.csv')
df_gr_gated = pd.read_csv('GR_ALL_gated_with_CytoArea_and_normGR_Feb2125.csv')

In [None]:
# Make a copy of the DUSP1 data
DUSP1_data = df_dusp_gated.copy()

# Experiment 1: 100 nM Dex time sweep with 12 timepoints
df_expt1 = DUSP1_data[DUSP1_data['replica'].isin(['D', 'E', 'F', 'M', 'N'])]
expt1_timepoints = [10, 20, 30, 40, 50, 60, 75, 90, 120, 150, 180]
expt1_concs = [100]

# Experiment 3: Time and concentration sweep
df_expt3 = DUSP1_data[DUSP1_data['replica'].isin(['J', 'K', 'L'])]
expt3_concs = [0.3, 1, 10, 100]
expt3_timepoints = [30, 50, 75, 90, 120, 180]

# Calculate means for each replica
replica_means = DUSP1_data.groupby(['dex_conc', 'time', 'replica']).agg({
    'num_nuc_spots': 'mean',
    'num_cyto_spots': 'mean'
}).reset_index()

# Calculate the mean and standard deviation of the replica means
summary_stats = replica_means.groupby(['dex_conc', 'time']).agg({
    'num_nuc_spots': ['mean', 'std'],
    'num_cyto_spots': ['mean', 'std']
}).reset_index()

# Rename columns for easier access
summary_stats.columns = ['dex_conc', 'time', 'mean_nuc_count', 'std_nuc_count', 'mean_cyto_count', 'std_cyto_count']

# Calculate overall mean and standard deviation for each concentration and time point
overall_stats = DUSP1_data.groupby(['dex_conc', 'time']).agg({
    'num_nuc_spots': ['mean', 'std'],
    'num_cyto_spots': ['mean', 'std']
}).reset_index()

# Rename columns for easier access
overall_stats.columns = ['dex_conc', 'time', 'overall_mean_nuc', 'overall_std_nuc', 'overall_mean_cyto', 'overall_std_cyto']

# Extract 0 min data (shared baseline from dex_conc == 0)
zero_min_summary = summary_stats[summary_stats['time'] == 0]
zero_min_overall = overall_stats[overall_stats['time'] == 0]

# Set Style
sns.set_theme(style="ticks", palette="colorblind", context="poster", font='times new roman')

# Define the color palette for Nuclear and Cytoplasmic intensities
colors_nuc_cyto = sns.color_palette("colorblind", 2)  # Two colors: one for Nuclear, one for Cytoplasmic

# Loop through the three experiments
experiments = {
    "Experiment 1: 100 nM Time Sweep": (expt1_concs, expt1_timepoints),
    # "Experiment 2: 75 min Concentration Sweep": (expt2_concs, expt2_timepoints),
    "Experiment 3: Time and Concentration Sweep": (expt3_concs, expt3_timepoints),
}

for expt_name, (concs, timepoints) in experiments.items():
    for conc in concs:
        # Filter data for plotting
        subset_summary = summary_stats[(summary_stats['dex_conc'] == conc) & (summary_stats['time'].isin(timepoints))]
        subset_overall = overall_stats[(overall_stats['dex_conc'] == conc) & (overall_stats['time'].isin(timepoints))]

        # Add 0 min time point to all subsets if not already present
        if 0 not in subset_summary['time'].values:
            subset_summary = pd.concat([zero_min_summary, subset_summary], ignore_index=True)
        if 0 not in subset_overall['time'].values:
            subset_overall = pd.concat([zero_min_overall, subset_overall], ignore_index=True)

        plt.figure(figsize=(10, 5))

        # Plot Nuclear mRNA Count Mean with Error Bars
        plt.errorbar(subset_summary['time'], subset_summary['mean_nuc_count'],
                     yerr=subset_summary['std_nuc_count'], fmt='-o', color=colors_nuc_cyto[0], capsize=5,
                     label='Nuclear mRNA Count Replicas')

        # Filling between std deviations for overall data - Nuclear
        plt.fill_between(subset_overall['time'],
                         subset_overall['overall_mean_nuc'] - subset_overall['overall_std_nuc'],
                         subset_overall['overall_mean_nuc'] + subset_overall['overall_std_nuc'],
                         color=colors_nuc_cyto[0], alpha=0.2, label='Total Data Spread - Nuclear')

        # Plot Cytoplasmic mRNA Count Mean with Error Bars
        plt.errorbar(subset_summary['time'], subset_summary['mean_cyto_count'],
                     yerr=subset_summary['std_cyto_count'], fmt='-o', color=colors_nuc_cyto[1], capsize=5,
                     label='Cytoplasmic mRNA Count Replicas')

        # Filling between std deviations for overall data - Cytoplasmic
        plt.fill_between(subset_overall['time'],
                         subset_overall['overall_mean_cyto'] - subset_overall['overall_std_cyto'],
                         subset_overall['overall_mean_cyto'] + subset_overall['overall_std_cyto'],
                         color=colors_nuc_cyto[1], alpha=0.2, label='Total Data Spread - Cytoplasmic')

        # Customize the plot
        plt.title(f"{expt_name} - {conc} nM Dex", fontsize=18, fontweight='bold')
        plt.xlabel('Time (min)', fontsize=14)
        plt.ylabel('mRNA Spot Count', fontsize=14)
        plt.grid(True)
        plt.legend(loc='upper left', fontsize=12, frameon=False, bbox_to_anchor=(1, 1))


        # Show the plot
        plt.show()


In [None]:
# Make a copy of the DUSP1 data
DUSP1_data = df_dusp_gated.copy()

# Experiment 2: 75 min concentration sweep (Replicas G, H, I)
df_expt2 = DUSP1_data[DUSP1_data['replica'].isin(['G', 'H', 'I'])].copy()

# Set Seaborn style
sns.set_theme(style="ticks", palette="colorblind", context="poster", font='times new roman')

# Melt DataFrame for Seaborn Box Plot
melted_expt2_data = df_expt2.melt(id_vars=['dex_conc'], value_vars=['num_nuc_spots', 'num_cyto_spots'],
                                  var_name='Spot_Type', value_name='Spot_Count')

# Update labels for readability
melted_expt2_data['Spot_Type'] = melted_expt2_data['Spot_Type'].replace({
    'num_nuc_spots': 'Nuclear Spots',
    'num_cyto_spots': 'Cytoplasmic Spots'
})

# Create figure with two subplots
fig, axes = plt.subplots(1, 2, figsize=(14, 6), sharey=True)

# Nuclear Spots Plot
ax1 = axes[0]
nuc_data = melted_expt2_data[melted_expt2_data['Spot_Type'] == 'Nuclear Spots']
sns.boxplot(x='dex_conc', y='Spot_Count', data=nuc_data, linewidth=2, width=0.6, showfliers=False, notch=True, ax=ax1)
sns.stripplot(x='dex_conc', y='Spot_Count', data=nuc_data, dodge=True, jitter=True, size=3, alpha=0.4, ax=ax1, marker='o', edgecolor='black', color='gray')

ax1.set_xlabel("Dexamethasone Concentration (nM)", fontsize=14, fontweight='bold')
ax1.set_ylabel("mRNA Spot Count", fontsize=14, fontweight='bold')
ax1.set_title("Nuclear DUSP1 Spot Count", fontsize=16, fontweight='bold')
ax1.grid(True, linestyle="--", linewidth=0.5)

# Cytoplasmic Spots Plot
ax2 = axes[1]
cyto_data = melted_expt2_data[melted_expt2_data['Spot_Type'] == 'Cytoplasmic Spots']
sns.boxplot(x='dex_conc', y='Spot_Count', data=cyto_data, linewidth=2, width=0.6, showfliers=False, notch=True, ax=ax2)
sns.stripplot(x='dex_conc', y='Spot_Count', data=cyto_data, dodge=True, jitter=True, size=2, alpha=0.4, ax=ax2, marker='o', edgecolor='black', color='gray')

ax2.set_xlabel("Dexamethasone Concentration (nM)", fontsize=14, fontweight='bold')
ax2.set_ylabel("")  # Remove redundant label
ax2.set_title("Cytoplasmic DUSP1 Spot Count", fontsize=16, fontweight='bold')
ax2.grid(True, linestyle="--", linewidth=0.5)

# Set Y-limits 
ax1.set_ylim(0, 300)
ax2.set_ylim(0, 800)

# Adjust layout
plt.tight_layout()

# Show the plot
plt.show()


In [None]:
## Cytoplasmic and Nuclear mRNA Spot Counts at 0 min

# Make a copy of the DUSP1 data
DUSP1_data = df_dusp_gated.copy()

# Subset data for 0 min time points
df_0min = DUSP1_data[DUSP1_data['time'] == 0]

# Plot distribution of nuclear and cytoplasmic mRNA spots across replicas for 0 min time point
fig, axes = plt.subplots(1, 2, figsize=(14, 6), sharey=True)

sns.boxplot(data=df_0min, x='replica', y='num_nuc_spots', ax=axes[0])
axes[0].set_title('Nuclear mRNA Spot Counts at 0 min')
axes[0].set_ylabel('Nuclear Spot Count')
axes[0].set_xlabel('Replica')

sns.boxplot(data=df_0min, x='replica', y='num_cyto_spots', ax=axes[1])
axes[1].set_title('Cytoplasmic mRNA Spot Counts at 0 min')
axes[1].set_ylabel('Cytoplasmic Spot Count')
axes[1].set_xlabel('Replica')

plt.tight_layout()
plt.show()