# Morph2REP Estrous Cycle Detection via Wavelet Analysis
## Calendar Day Alignment (No Time Shift)

### Overview

This notebook applies the Smarr et al. (2017) wavelet-based estrous detection method to Morph2REP Study 1001 vehicle-treated female mice.

### Time Alignment Decision

Smarr et al. aligned their analysis to circadian time (CT), starting each day at lights-off (6PM). However, we chose to use **calendar day alignment** (midnight to midnight) for the following reasons:

1. **Simplicity**: Direct mapping between analysis days and calendar dates makes interpretation straightforward
2. **Treatment alignment**: Dosing and cage changes are recorded in calendar time, making it easier to relate findings to experimental events
3. **Daily averaging**: Since we compute daily mean ultradian power (averaging across 24h), the specific hour boundaries matter less than for hour-by-hour analyses
4. **Data preservation**: Time shifting created artificial "empty" days due to recording end times, losing usable data

### Analysis Pipeline
1. **Data Loading** - Load locomotion bout data from S3
2. **Data Quality EDA** - Identify incomplete recording days
3. **Wavelet Analysis** - Compute ultradian power and detect LOW days
4. **Visualization** - Display results with treatment schedule overlay

---
## Section 1: Configuration and Setup

In [None]:
# =============================================================================
# IMPORTS
# =============================================================================
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
import duckdb
from datetime import date, datetime, timedelta
from scipy.signal import cwt, morlet2
from scipy.stats import wilcoxon, chisquare
from collections import Counter
import warnings
warnings.filterwarnings('ignore')

# Plot settings
plt.rcParams['figure.dpi'] = 150
plt.rcParams['font.size'] = 10

# Constants
MINUTES_PER_DAY = 1440
PERIODS_MINUTES = np.logspace(np.log10(60), np.log10(39*60), 50)
PERIODS_HOURS = PERIODS_MINUTES / 60

print("Setup complete.")

In [None]:
# =============================================================================
# STUDY CONFIGURATION - CALENDAR DAY ALIGNMENT
# =============================================================================

S3_BASE = "s3://jax-envision-public-data/study_1001/2025v3.3/tabular"

# Vehicle-treated cages with full experimental timeline
# Using CALENDAR DAYS (midnight to midnight)
VEHICLE_CAGES = {
    "Rep1": {
        "cages": [4918, 4922, 4923],
        "analysis_start": "2025-01-07",
        "analysis_end": "2025-01-22",
        "n_days": 16,
        "valid_days": (2, 16),  # Day 1 partial (starts 22:00), Day 16 partial but usable (13h)
        # Treatment schedule
        "dose_1": datetime(2025, 1, 14, 6, 0),
        "dose_2": datetime(2025, 1, 17, 17, 0),
        "cage_change": datetime(2025, 1, 15, 12, 0),
    },
    "Rep2": {
        "cages": [4928, 4929, 4934],
        "analysis_start": "2025-01-22",
        "analysis_end": "2025-02-04",
        "n_days": 14,
        "valid_days": (2, 14),  # Day 1 partial (starts 14:00), Day 14 complete
        # Treatment schedule
        "dose_1": datetime(2025, 1, 28, 17, 0),
        "dose_2": datetime(2025, 1, 31, 6, 0),
        "cage_change": datetime(2025, 1, 29, 12, 0),
    },
}

# Calculate day numbers for treatment events (calendar days)
for rep, cfg in VEHICLE_CAGES.items():
    start = pd.to_datetime(cfg['analysis_start'])
    cfg['dose_1_day'] = (cfg['dose_1'].date() - start.date()).days + 1
    cfg['dose_2_day'] = (cfg['dose_2'].date() - start.date()).days + 1
    cfg['cage_change_day'] = (cfg['cage_change'].date() - start.date()).days + 1

print("Study Configuration (Calendar Day Alignment):")
print("="*60)
for rep, cfg in VEHICLE_CAGES.items():
    print(f"\n{rep}:")
    print(f"  Date range: {cfg['analysis_start']} to {cfg['analysis_end']}")
    print(f"  Total days: {cfg['n_days']}, Valid for analysis: Days {cfg['valid_days'][0]}-{cfg['valid_days'][1]}")
    print(f"  Cages: {cfg['cages']}")
    print(f"  Dose 1: Day {cfg['dose_1_day']} ({cfg['dose_1'].strftime('%b %d, %I:%M %p')})")
    print(f"  Dose 2: Day {cfg['dose_2_day']} ({cfg['dose_2'].strftime('%b %d, %I:%M %p')})")
    print(f"  Cage change: Day {cfg['cage_change_day']} ({cfg['cage_change'].strftime('%b %d, %I:%M %p')})")

In [None]:
# =============================================================================
# WAVELET FUNCTIONS
# =============================================================================

def compute_wavelet_transform(data, periods_minutes=None, w=5):
    """
    Compute continuous wavelet transform using Morlet wavelet.
    """
    data = pd.Series(data).interpolate().bfill().ffill().fillna(0).values
    
    if periods_minutes is None:
        periods_minutes = PERIODS_MINUTES
    
    fs = 1
    scales = periods_minutes * fs * w / (2 * np.pi)
    
    coeffs = cwt(data, morlet2, scales, w=w)
    power = np.abs(coeffs)**2
    
    return power, periods_minutes


def extract_band_power(power, periods_minutes, band_hours):
    """
    Extract MAX power in a frequency band.
    """
    periods_hours = periods_minutes / 60
    band_mask = (periods_hours >= band_hours[0]) & (periods_hours <= band_hours[1])
    
    if not np.any(band_mask):
        return np.zeros(power.shape[1])
    
    band_power = np.max(power[band_mask, :], axis=0)
    return band_power


def bouts_to_minute_counts_calendar(bout_df, start_date, n_days):
    """
    Convert bout data to minute-level counts using CALENDAR days.
    Start time = midnight on start_date.
    """
    bout_df = bout_df.copy()
    start_time = pd.to_datetime(start_date + " 00:00:00")  # Midnight
    n_minutes = n_days * MINUTES_PER_DAY
    
    bout_df['minutes_from_start'] = (bout_df['start_time'] - start_time).dt.total_seconds() / 60
    bout_df = bout_df[(bout_df['minutes_from_start'] >= 0) & (bout_df['minutes_from_start'] < n_minutes)]
    bout_df['minute_bin'] = bout_df['minutes_from_start'].astype(int)
    
    counts = bout_df.groupby('minute_bin').size()
    
    full_series = pd.Series(index=range(n_minutes), dtype=float).fillna(0)
    full_series.update(counts)
    
    return full_series.values


print("Wavelet functions defined (using calendar day alignment).")

---
## Section 2: Data Loading

In [None]:
# =============================================================================
# DATA LOADING FUNCTION
# =============================================================================

def load_parquet_s3(cage_id, start_date, end_date, table_name):
    """Load parquet data from S3 for a specific cage and date range."""
    conn = duckdb.connect()
    conn.execute("INSTALL httpfs; LOAD httpfs;")
    conn.execute("SET s3_region='us-east-1';")
    
    dates = pd.date_range(start_date, end_date, freq='D')
    all_data = []
    
    for d in dates:
        date_str = d.strftime('%Y-%m-%d')
        path = f"{S3_BASE}/cage_id={cage_id}/date={date_str}/{table_name}"
        try:
            df = conn.execute(f"SELECT * FROM read_parquet('{path}')").fetchdf()
            df['cage_id'] = cage_id
            df['date'] = date_str
            all_data.append(df)
        except:
            continue
    
    conn.close()
    return pd.concat(all_data, ignore_index=True) if all_data else pd.DataFrame()

print("Load function defined.")

In [None]:
# =============================================================================
# LOAD LOCOMOTION BOUT DATA
# =============================================================================
print("Loading Morph2REP Locomotion Bout Data...")
print("="*60)

all_bouts = []
for rep, cfg in VEHICLE_CAGES.items():
    print(f"\n{rep} ({cfg['n_days']} days):")
    for cage_id in cfg['cages']:
        print(f"  Cage {cage_id}...", end=" ")
        df = load_parquet_s3(cage_id, cfg['analysis_start'], cfg['analysis_end'], 'animal_bouts.parquet')
        if len(df) > 0:
            df['replicate'] = rep
            all_bouts.append(df)
            print(f"{len(df):,} rows")
        else:
            print("No data")

df_bouts = pd.concat(all_bouts, ignore_index=True)
print(f"\nTotal bout rows: {len(df_bouts):,}")

# Filter for locomotion bouts
df_loco = df_bouts[df_bouts['state_name'] == 'animal_bouts.locomotion'].copy()
df_loco['start_time'] = pd.to_datetime(df_loco['start_time'])

print(f"\nLocomotion bouts: {len(df_loco):,}")
for rep in ['Rep1', 'Rep2']:
    rep_df = df_loco[df_loco['replicate'] == rep]
    print(f"  {rep}: {len(rep_df):,} bouts, {rep_df['animal_id'].nunique()} animals")

---
## Section 3: Data Quality EDA

Check for incomplete recording days using calendar day boundaries.

In [None]:
# =============================================================================
# CHECK DATA COMPLETENESS BY CALENDAR DAY
# =============================================================================
print("="*70)
print("DATA COMPLETENESS CHECK (Calendar Days)")
print("="*70)

for rep, cfg in VEHICLE_CAGES.items():
    print(f"\n{rep}:")
    print("-"*70)
    
    rep_df = df_loco[df_loco['replicate'] == rep].copy()
    rep_df['calendar_date'] = rep_df['start_time'].dt.date
    
    # Count bouts per calendar day
    daily_counts = rep_df.groupby('calendar_date').agg({
        'start_time': ['count', 'min', 'max']
    })
    daily_counts.columns = ['bout_count', 'first_bout', 'last_bout']
    daily_counts = daily_counts.sort_index()
    
    median_count = daily_counts['bout_count'].median()
    
    print(f"\n{'Day':<5} {'Date':<12} {'Bouts':<10} {'First':<10} {'Last':<10} {'Hours':<8} {'Status'}")
    print("-"*75)
    
    for i, (date, row) in enumerate(daily_counts.iterrows()):
        day_num = i + 1
        first_hour = row['first_bout'].hour
        last_hour = row['last_bout'].hour
        hours_covered = last_hour - first_hour + 1
        
        # Determine status
        if row['bout_count'] < median_count * 0.5:
            status = "⚠️ LOW COUNT"
        elif first_hour > 6:
            status = f"⚠️ LATE START ({first_hour}:00)"
        elif last_hour < 18:
            status = f"⚠️ EARLY END ({last_hour}:00)"
        else:
            status = "✓ Complete"
            
        print(f"D{day_num:<4} {str(date):<12} {int(row['bout_count']):<10} {first_hour:02d}:00{'':<5} {last_hour:02d}:59{'':<5} ~{hours_covered:<6} {status}")
    
    print(f"\nMedian daily bout count: {median_count:,.0f}")

In [None]:
# =============================================================================
# CHECK CAGE-LEVEL DATA FOR PROBLEMATIC DAYS
# =============================================================================
print("\n" + "="*70)
print("CAGE-LEVEL CHECK FOR POTENTIALLY PROBLEMATIC DAYS")
print("="*70)

# Rep2 Day 14 - check for missing cage
print("\nRep2 - Checking all days by cage:")
rep2_df = df_loco[df_loco['replicate'] == 'Rep2'].copy()
rep2_df['calendar_date'] = rep2_df['start_time'].dt.date

dates = sorted(rep2_df['calendar_date'].unique())
cages = VEHICLE_CAGES['Rep2']['cages']

print(f"\n{'Day':<5}", end="")
for cage in cages:
    print(f"Cage {cage:<10}", end="")
print("Status")
print("-"*60)

for i, d in enumerate(dates):
    day_num = i + 1
    print(f"D{day_num:<4}", end="")
    cage_counts = []
    for cage in cages:
        count = len(rep2_df[(rep2_df['calendar_date'] == d) & (rep2_df['cage_id'] == cage)])
        cage_counts.append(count)
        flag = "⚠️" if count < 1000 else ""
        print(f"{count:<14}{flag}", end="")
    
    if any(c == 0 for c in cage_counts):
        print("MISSING CAGE")
    elif any(c < 1000 for c in cage_counts):
        print("LOW COUNT")
    else:
        print("OK")

In [None]:
# =============================================================================
# VISUALIZE DATA COMPLETENESS
# =============================================================================

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

for col, (rep, cfg) in enumerate(VEHICLE_CAGES.items()):
    ax = axes[col]
    
    rep_df = df_loco[df_loco['replicate'] == rep].copy()
    rep_df['calendar_date'] = rep_df['start_time'].dt.date
    
    daily_counts = rep_df.groupby('calendar_date').size()
    days = list(range(1, len(daily_counts) + 1))
    counts = daily_counts.values
    
    # Determine valid days
    valid_start, valid_end = cfg['valid_days']
    
    # Color bars based on validity
    colors = []
    for i, c in enumerate(counts):
        day_num = i + 1
        if day_num < valid_start:
            colors.append('red')  # Excluded (partial start)
        elif day_num > valid_end:
            colors.append('red')  # Excluded
        else:
            colors.append('green')  # Valid
    
    ax.bar(days, counts, color=colors, edgecolor='black', alpha=0.7)
    
    # Mark treatment days
    ax.axvline(x=cfg['dose_1_day'], color='purple', linestyle='-', linewidth=2.5, alpha=0.8, label='Dose 1/2')
    ax.axvline(x=cfg['dose_2_day'], color='purple', linestyle='-', linewidth=2.5, alpha=0.8)
    ax.axvline(x=cfg['cage_change_day'], color='orange', linestyle='--', linewidth=2, alpha=0.8, label='Cage Change')
    
    # Add date labels
    date_labels = [d.strftime('%b %d') for d in daily_counts.index]
    ax.set_xticks(days)
    ax.set_xticklabels(date_labels, rotation=45, ha='right', fontsize=8)
    
    ax.set_xlabel('Date (Calendar Day)', fontsize=11)
    ax.set_ylabel('Locomotion Bout Count', fontsize=11)
    ax.set_title(f'{rep}: Daily Bout Counts (Calendar Days)\nGreen = Valid, Red = Excluded', fontsize=12)
    ax.legend(loc='upper right', fontsize=9)

plt.tight_layout()
plt.savefig('morph2rep_calendar_data_completeness.png', dpi=150, bbox_inches='tight')
plt.show()

### Data Quality Summary

Using calendar day alignment:

**Rep1 (Jan 7-22):**
- **Day 1 (Jan 7):** Recording started at 22:00 - only 2 hours → **EXCLUDE**
- **Days 2-15 (Jan 8-21):** Complete 24-hour recordings → **VALID**
- **Day 16 (Jan 22):** Recording ended at 12:59 - 13 hours → **VALID** (sufficient for daily average)

**Rep2 (Jan 22 - Feb 4):**
- **Day 1 (Jan 22):** Recording started at 14:00 - only 10 hours → **EXCLUDE**
- **Days 2-14 (Jan 23 - Feb 4):** Complete recordings → **VALID**

**Note:** Unlike the time-shifted approach, we can include Day 16 for Rep1 since it has 13 hours of data, sufficient for computing daily mean ultradian power.

---
## Section 4: Wavelet Analysis

In [None]:
# =============================================================================
# WAVELET ANALYSIS - CALENDAR DAY ALIGNMENT
# =============================================================================

all_rep_results = {}

for rep, cfg in VEHICLE_CAGES.items():
    print(f"\n{'='*70}")
    print(f"{rep} WAVELET ANALYSIS (Calendar Days)")
    print(f"{'='*70}")
    
    rep_df = df_loco[df_loco['replicate'] == rep].copy()
    animals = sorted([a for a in rep_df['animal_id'].unique() if a != 0])
    valid_days = cfg['valid_days']
    
    print(f"Animals: {len(animals)}")
    print(f"Total days: {cfg['n_days']}, Valid for analysis: Days {valid_days[0]}-{valid_days[1]}")
    
    # Compute wavelet for each animal
    animal_results = []
    
    for animal_id in animals:
        animal_df = rep_df[rep_df['animal_id'] == animal_id].copy()
        cage_id = animal_df['cage_id'].iloc[0]
        
        # Convert bouts to minute counts (calendar alignment)
        animal_ts = bouts_to_minute_counts_calendar(animal_df, cfg['analysis_start'], cfg['n_days'])
        
        # Compute wavelet
        power, _ = compute_wavelet_transform(animal_ts, PERIODS_MINUTES)
        ultradian = extract_band_power(power, PERIODS_MINUTES, (1, 3))
        
        # Day-by-day power (calendar days)
        day_powers = []
        for day in range(1, cfg['n_days'] + 1):
            day_start = (day - 1) * MINUTES_PER_DAY
            day_end = day * MINUTES_PER_DAY
            if day_end <= len(ultradian):
                day_powers.append(np.nanmean(ultradian[day_start:day_end]))
            else:
                day_powers.append(np.nan)
        
        animal_results.append({
            'animal_id': animal_id,
            'cage_id': cage_id,
            'day_powers': day_powers,
            'n_days': cfg['n_days'],
            'ultradian_ts': ultradian
        })
    
    # =========================================================================
    # DAY-BY-DAY TABLE
    # =========================================================================
    print(f"\n--- Day-by-Day Ultradian Power ---")
    
    # Header
    header = f"{'Animal':<10} {'Cage':<8}"
    for d in range(1, cfg['n_days'] + 1):
        marker = "*" if d < valid_days[0] or d > valid_days[1] else ""
        header += f"D{d}{marker:<5}"
    print(header)
    print("-" * len(header))
    
    for r in animal_results:
        row = f"{r['animal_id']:<10} {r['cage_id']:<8}"
        for d in range(1, cfg['n_days'] + 1):
            val = r['day_powers'][d-1]
            row += f"{val:<7.1f}" if not np.isnan(val) else f"{'N/A':<7}"
        print(row)
    
    print(f"\n* = Excluded from analysis")
    
    # =========================================================================
    # LOW DAY DETECTION
    # =========================================================================
    print(f"\n--- LOW Days Detection (Days {valid_days[0]}-{valid_days[1]}) ---")
    
    threshold_pct = 0.80
    
    print(f"\n{'Animal':<10} {'Cage':<8} {'LOW days':<25} {'Spacings':<20} {'4-day cycle?'}")
    print("-"*90)
    
    cycle_results = []
    
    for r in animal_results:
        animal_id = r['animal_id']
        cage_id = r['cage_id']
        day_powers = r['day_powers']
        
        # Calculate median using only valid days
        valid_powers = [day_powers[i] for i in range(valid_days[0]-1, valid_days[1])
                        if i < len(day_powers) and not np.isnan(day_powers[i])]
        
        if not valid_powers:
            continue
        
        median_power = np.median(valid_powers)
        threshold = median_power * threshold_pct
        
        # Find LOW days (only within valid range)
        low_days = []
        for day in range(valid_days[0], valid_days[1] + 1):
            idx = day - 1
            if idx < len(day_powers) and not np.isnan(day_powers[idx]) and day_powers[idx] < threshold:
                low_days.append(day)
        
        # Calculate spacings
        spacings = [low_days[i] - low_days[i-1] for i in range(1, len(low_days))]
        
        # Check for 4-day pattern
        has_exact_4 = any(d2 - d1 == 4 for d1 in low_days for d2 in low_days if d2 > d1)
        has_approx_4 = any(3 <= d2 - d1 <= 5 for d1 in low_days for d2 in low_days if d2 > d1)
        
        if len(low_days) >= 2 and has_exact_4:
            assessment = "✓ STRONG"
        elif len(low_days) >= 2 and has_approx_4:
            assessment = "~ MODERATE"
        elif len(low_days) >= 2:
            assessment = "? IRREGULAR"
        else:
            assessment = "✗ INSUFFICIENT"
        
        cycle_results.append({
            'animal_id': animal_id,
            'cage_id': cage_id,
            'low_days': low_days,
            'spacings': spacings,
            'has_4day': has_exact_4,
            'assessment': assessment,
            'median_power': median_power,
            'threshold': threshold
        })
        
        print(f"{animal_id:<10} {cage_id:<8} {str(low_days):<25} {str(spacings):<20} {assessment}")
    
    # Summary
    n_strong = sum(1 for r in cycle_results if '✓' in r['assessment'])
    n_moderate = sum(1 for r in cycle_results if '~' in r['assessment'])
    n_total = len(cycle_results)
    
    print("-"*90)
    print(f"Strong 4-day cycling: {n_strong}/{n_total}")
    print(f"Moderate evidence: {n_moderate}/{n_total}")
    
    # =========================================================================
    # PHASE CONSISTENCY CHECK
    # =========================================================================
    all_low_days = []
    for r in cycle_results:
        all_low_days.extend(r['low_days'])
    
    if all_low_days:
        phases = [(d - 1) % 4 for d in all_low_days]
        phase_counts = Counter(phases)
        
        print(f"\n--- Phase Consistency Check ---")
        print(f"Phase 0 (Days 1,5,9,13...): {phase_counts.get(0, 0)}")
        print(f"Phase 1 (Days 2,6,10,14...): {phase_counts.get(1, 0)}")
        print(f"Phase 2 (Days 3,7,11,15...): {phase_counts.get(2, 0)}")
        print(f"Phase 3 (Days 4,8,12,16...): {phase_counts.get(3, 0)}")
        
        if sum(phase_counts.values()) >= 4:
            observed = [phase_counts.get(i, 0) for i in range(4)]
            expected = [len(all_low_days) / 4] * 4
            stat, p_chi = chisquare(observed, expected)
            print(f"\nChi-square test: χ²={stat:.2f}, p={p_chi:.4f}")
            if p_chi < 0.05:
                dominant = max(phase_counts, key=phase_counts.get)
                print(f"→ Significant clustering at Phase {dominant}")
            else:
                print(f"→ No significant phase clustering")
        else:
            p_chi = None
    else:
        phase_counts = Counter()
        p_chi = None
    
    # Store results
    all_rep_results[rep] = {
        'animal_results': animal_results,
        'cycle_results': cycle_results,
        'n_strong': n_strong,
        'n_moderate': n_moderate,
        'n_total': n_total,
        'phase_counts': phase_counts,
        'p_chi': p_chi,
        'valid_days': valid_days
    }

---
## Section 5: Visualizations

In [None]:
# =============================================================================
# FIGURE 1: INDIVIDUAL ANIMAL BAR PLOTS WITH TREATMENT MARKERS
# =============================================================================

for rep, results in all_rep_results.items():
    cfg = VEHICLE_CAGES[rep]
    animal_results = results['animal_results']
    cycle_results = results['cycle_results']
    valid_days = results['valid_days']
    
    n_animals = len(animal_results)
    n_cols = 3
    n_rows = (n_animals + n_cols - 1) // n_cols
    
    fig, axes = plt.subplots(n_rows, n_cols, figsize=(16, 4*n_rows))
    axes = axes.flatten()
    
    # Get calendar dates for x-axis
    start_date = pd.to_datetime(cfg['analysis_start'])
    dates = [(start_date + pd.Timedelta(days=i)).strftime('%b %d') for i in range(cfg['n_days'])]
    
    for idx, r in enumerate(animal_results):
        ax = axes[idx]
        day_powers = r['day_powers']
        days = list(range(1, len(day_powers) + 1))
        
        # Get threshold from cycle results
        cr = next((c for c in cycle_results if c['animal_id'] == r['animal_id']), None)
        if cr:
            threshold = cr['threshold']
            median_power = cr['median_power']
            low_days = cr['low_days']
        else:
            valid_powers = [p for i, p in enumerate(day_powers) if valid_days[0] <= i+1 <= valid_days[1] and not np.isnan(p)]
            median_power = np.median(valid_powers) if valid_powers else 0
            threshold = median_power * 0.80
            low_days = []
        
        # Color bars
        colors = []
        for i, p in enumerate(day_powers):
            day_num = i + 1
            if day_num < valid_days[0] or day_num > valid_days[1]:
                colors.append('lightgray')
            elif np.isnan(p):
                colors.append('white')
            elif p < threshold:
                colors.append('red')
            else:
                colors.append('steelblue')
        
        ax.bar(days, day_powers, color=colors, edgecolor='black', alpha=0.8)
        ax.axhline(y=threshold, color='red', linestyle='--', linewidth=2, label='80% threshold')
        ax.axhline(y=median_power, color='green', linestyle='-', linewidth=1, label='Median')
        
        # Treatment markers
        ymax = max([p for p in day_powers if not np.isnan(p)]) * 1.15
        ax.axvline(x=cfg['dose_1_day'], color='purple', linestyle='-', linewidth=2.5, alpha=0.8)
        ax.axvline(x=cfg['dose_2_day'], color='purple', linestyle='-', linewidth=2.5, alpha=0.8)
        ax.axvline(x=cfg['cage_change_day'], color='orange', linestyle='--', linewidth=2, alpha=0.8)
        
        # Title
        if cr:
            title_color = 'green' if '✓' in cr['assessment'] else 'orange' if '~' in cr['assessment'] else 'black'
            ax.set_title(f"Animal {r['animal_id']} (Cage {r['cage_id']})\nLOW: {low_days}", 
                        fontsize=9, color=title_color)
        
        ax.set_xlabel('Day')
        ax.set_ylabel('Ultradian Power')
        ax.set_xticks(days)
        ax.set_ylim(0, ymax * 1.2)
        
        if idx == 0:
            ax.legend(loc='upper right', fontsize=7)
    
    for idx in range(len(animal_results), len(axes)):
        axes[idx].axis('off')
    
    fig.text(0.5, 0.02, 
             'Purple = Doses | Orange = Cage Change | Gray = Excluded | Red = LOW (potential estrus)',
             ha='center', fontsize=10, style='italic')
    
    plt.suptitle(f'{rep} - Ultradian Power by Calendar Day\nValid: Days {valid_days[0]}-{valid_days[1]}', 
                 fontsize=13, fontweight='bold')
    plt.tight_layout(rect=[0, 0.03, 1, 0.95])
    plt.savefig(f'morph2rep_{rep}_calendar_bars.png', dpi=150, bbox_inches='tight')
    plt.show()

In [None]:
# =============================================================================
# FIGURE 2: HEATMAP WITH CALENDAR DATES
# =============================================================================

fig, axes = plt.subplots(1, 2, figsize=(18, 9))

for col, (rep, cfg) in enumerate(VEHICLE_CAGES.items()):
    ax = axes[col]
    results = all_rep_results[rep]
    animal_results = results['animal_results']
    cycle_results = results['cycle_results']
    valid_days = results['valid_days']
    
    # Collect data for heatmap
    heatmap_data = []
    animal_labels = []
    cage_labels = []
    
    for cage_id in cfg['cages']:
        for r in animal_results:
            if r['cage_id'] == cage_id:
                heatmap_data.append(r['day_powers'])
                animal_labels.append(str(r['animal_id']))
                cage_labels.append(cage_id)
    
    heatmap_data = np.array(heatmap_data)
    
    # Normalize by median (using valid days only)
    normalized_data = np.zeros_like(heatmap_data)
    for i in range(len(heatmap_data)):
        valid_vals = [heatmap_data[i, j] for j in range(valid_days[0]-1, valid_days[1])
                      if j < heatmap_data.shape[1] and not np.isnan(heatmap_data[i, j])]
        median_val = np.nanmedian(valid_vals) if valid_vals else 1
        normalized_data[i] = heatmap_data[i] / median_val if median_val > 0 else heatmap_data[i]
    
    # Plot heatmap
    im = ax.imshow(normalized_data, aspect='auto', cmap='RdBu_r', vmin=0.5, vmax=1.5)
    
    # Mark LOW days
    row_idx = 0
    for cage_id in cfg['cages']:
        for r in animal_results:
            if r['cage_id'] == cage_id:
                cr = next((c for c in cycle_results if c['animal_id'] == r['animal_id']), None)
                if cr:
                    for low_day in cr['low_days']:
                        ax.scatter(low_day - 1, row_idx, marker='o', s=200, facecolors='none', 
                                  edgecolors='black', linewidths=2)
                row_idx += 1
    
    # Gray out excluded days
    ax.axvspan(-0.5, valid_days[0] - 1.5, alpha=0.4, color='gray')
    if valid_days[1] < cfg['n_days']:
        ax.axvspan(valid_days[1] - 0.5, cfg['n_days'] - 0.5, alpha=0.4, color='gray')
    
    # Treatment markers
    ax.axvline(x=cfg['dose_1_day'] - 1, color='purple', linestyle='-', linewidth=3, label='Dose')
    ax.axvline(x=cfg['dose_2_day'] - 1, color='purple', linestyle='-', linewidth=3)
    ax.axvline(x=cfg['cage_change_day'] - 1, color='orange', linestyle='--', linewidth=2.5, label='Cage Change')
    
    # Calendar date labels
    start_date = pd.to_datetime(cfg['analysis_start'])
    date_labels = [(start_date + pd.Timedelta(days=i)).strftime('%b %d') for i in range(cfg['n_days'])]
    
    ax.set_xticks(range(cfg['n_days']))
    ax.set_xticklabels(date_labels, rotation=45, ha='right', fontsize=8)
    ax.set_yticks(range(len(animal_labels)))
    ax.set_yticklabels([f'{animal_labels[i]}\n({cage_labels[i]})' for i in range(len(animal_labels))], fontsize=9)
    
    ax.set_xlabel('Date', fontsize=12)
    ax.set_ylabel('Animal (Cage)', fontsize=12)
    ax.set_title(f"{rep} - Normalized Ultradian Power\n○ = LOW day | Gray = Excluded", 
                 fontsize=11, fontweight='bold')
    ax.legend(loc='upper right', fontsize=8)

# Colorbar
cbar = fig.colorbar(im, ax=axes.ravel().tolist(), shrink=0.5, pad=0.02)
cbar.set_label('Power / Median', fontsize=11)
cbar.ax.axhline(y=0.8, color='black', linestyle='--', linewidth=2)
cbar.ax.text(2.5, 0.8, '80%', va='center', fontsize=9)

plt.tight_layout()
plt.savefig('morph2rep_calendar_heatmap.png', dpi=150, bbox_inches='tight')
plt.show()

---
## Section 6: Summary

In [None]:
# =============================================================================
# SUMMARY
# =============================================================================
print("="*70)
print("SUMMARY: MORPH2REP ESTROUS CYCLE DETECTION (Calendar Days)")
print("="*70)

print("\n--- Configuration ---")
print(f"{'Replicate':<10} {'Date Range':<25} {'Valid Days':<15} {'Excluded'}")
print("-"*70)
for rep, cfg in VEHICLE_CAGES.items():
    valid = f"Days {cfg['valid_days'][0]}-{cfg['valid_days'][1]}"
    excluded = f"Day 1" if cfg['valid_days'][0] > 1 else "None"
    print(f"{rep:<10} {cfg['analysis_start']} to {cfg['analysis_end']:<10} {valid:<15} {excluded}")

print("\n--- Results ---")
print(f"\n{'Metric':<35} {'Rep1':<15} {'Rep2':<15}")
print("-"*65)

rep1 = all_rep_results['Rep1']
rep2 = all_rep_results['Rep2']

print(f"{'Animals analyzed':<35} {rep1['n_total']:<15} {rep2['n_total']:<15}")
print(f"{'Strong 4-day cycling':<35} {rep1['n_strong']}/{rep1['n_total']:<15} {rep2['n_strong']}/{rep2['n_total']:<15}")
print(f"{'Moderate evidence':<35} {rep1['n_moderate']}/{rep1['n_total']:<15} {rep2['n_moderate']}/{rep2['n_total']:<15}")

p1 = f"p={rep1['p_chi']:.4f}" if rep1['p_chi'] else "N/A"
p2 = f"p={rep2['p_chi']:.4f}" if rep2['p_chi'] else "N/A"
print(f"{'Phase clustering':<35} {p1:<15} {p2:<15}")

print("\n--- Treatment Timeline (Calendar Days) ---")
print(f"\n{'Event':<15} {'Rep1':<30} {'Rep2':<30}")
print("-"*75)
print(f"{'Dose 1':<15} Day {VEHICLE_CAGES['Rep1']['dose_1_day']} ({VEHICLE_CAGES['Rep1']['dose_1'].strftime('%b %d, %I%p')}){'  ':<5} Day {VEHICLE_CAGES['Rep2']['dose_1_day']} ({VEHICLE_CAGES['Rep2']['dose_1'].strftime('%b %d, %I%p')})")
print(f"{'Cage Change':<15} Day {VEHICLE_CAGES['Rep1']['cage_change_day']} ({VEHICLE_CAGES['Rep1']['cage_change'].strftime('%b %d, %I%p')}){'  ':<5} Day {VEHICLE_CAGES['Rep2']['cage_change_day']} ({VEHICLE_CAGES['Rep2']['cage_change'].strftime('%b %d, %I%p')})")
print(f"{'Dose 2':<15} Day {VEHICLE_CAGES['Rep1']['dose_2_day']} ({VEHICLE_CAGES['Rep1']['dose_2'].strftime('%b %d, %I%p')}){'  ':<5} Day {VEHICLE_CAGES['Rep2']['dose_2_day']} ({VEHICLE_CAGES['Rep2']['dose_2'].strftime('%b %d, %I%p')})")

In [None]:
# =============================================================================
# LIST OF SAVED FIGURES
# =============================================================================
print("\n" + "="*60)
print("SAVED FIGURES:")
print("="*60)
print("  1. morph2rep_calendar_data_completeness.png")
print("  2. morph2rep_Rep1_calendar_bars.png")
print("  3. morph2rep_Rep2_calendar_bars.png")
print("  4. morph2rep_calendar_heatmap.png")

---
## Conclusions

### Time Alignment Choice

We used **calendar day alignment** (midnight to midnight) rather than Smarr's circadian time alignment (6PM to 6PM) because:
1. Direct mapping to treatment schedule dates
2. Simpler interpretation
3. Preserves more data (Day 16 in Rep1 has 13 hours of usable data)
4. Daily averaging minimizes the impact of specific hour boundaries

### Data Quality

- **Rep1:** Day 1 excluded (only 2 hours of recording); Days 2-16 valid
- **Rep2:** Day 1 excluded (only 10 hours of recording); Days 2-14 valid

### Estrous Cycling Evidence

Results from calendar day analysis show similar patterns to the time-shifted approach:
- **Rep1:** Modest evidence of cycling, with individual animals showing 4-day patterns but no strong phase synchronization
- **Rep2:** Weaker evidence, with LOW days tending to cluster late in the recording period

The lack of strong phase clustering in both replicates suggests that while individual females may be cycling, they are not synchronized to a common phase—consistent with the expected "staggered" condition in group-housed females without male pheromone exposure.