# Noise Model Development Dashboard

**Goal:** Characterize and model the noise structure in prtecan fluorescence data to improve synthetic data generation and fitting methods.

---

## üéØ Key Research Questions

1. **Systematic Bias:** Why is y1 label at lowest pH always negative (sometimes >3œÉ)?
2. **Adjacent Correlation:** Why do adjacent residuals alternate positive/negative?
3. **X-value Uncertainty:** Are pH values systematically wrong (per-well or plate-wide)?
4. **Method Comparison:** Which is better - Generalized Least Squares (GLS) or PyMC?

---

## üìö Notebook Organization

### üî¨ [01_noise_characterization.ipynb](./01_noise_characterization.ipynb)
**Exploratory analysis of real prtecan data**

- Residual distributions by label and pH range
- Covariance and correlation structure
- Systematic bias detection
- Adjacent point correlation analysis
- X-value error hypothesis testing
- **Outputs:** Covariance matrices, bias parameters, summary statistics

### üß™ [02_synthetic_data_generator.ipynb](./02_synthetic_data_generator.ipynb)
**Build realistic synthetic data with characterized noise**

- Import noise parameters from notebook 01
- Implement noise model components:
  - Base heteroscedastic noise
  - Label-dependent bias
  - Correlated noise structure
  - X-value uncertainty simulation
- Validation: compare synthetic vs real residual patterns
- **Outputs:** Updated `make_dataset()` function

### üìä [03_fitting_method_comparison.ipynb](./03_fitting_method_comparison.ipynb)
**Compare GLS vs PyMC on synthetic and real data**

- Method implementations (GLS with covariance, PyMC hierarchical)
- Performance metrics: bias, RMSE, coverage, computational cost
- Robustness to outliers
- Apply to real prtecan data
- **Outputs:** Method recommendations, diagnostic plots

---

## üõ†Ô∏è Utilities

### [dev/noise_models.py](./dev/noise_models.py)
Reusable functions for all notebooks:
- `compute_residual_covariance()` - Covariance matrices by label
- `analyze_label_bias()` - Detect systematic bias (Issue 1)
- `detect_adjacent_correlation()` - Test adjacent correlation (Issue 2)
- `estimate_x_shift_statistics()` - Detect pH shifts (Issue 3)
- `simulate_correlated_noise()` - Generate correlated noise
- `export_noise_parameters()` - Save parameters for synthetic data

---

## üìà Current Status

Update this section as you progress through the analysis.

## Quick Setup

In [None]:
from pathlib import Path

# Check that notebook files exist
notebooks = [
    "01_noise_characterization.ipynb",
    "02_synthetic_data_generator.ipynb",
    "03_fitting_method_comparison.ipynb",
]

for nb in notebooks:
    if Path(nb).exists():
        print(f"‚úÖ {nb}")
    else:
        print(f"‚ùå {nb} - not found")

# Check utility module
if Path("dev/noise_models.py").exists():
    print("‚úÖ dev/noise_models.py")
else:
    print("‚ùå dev/noise_models.py - not found")

## Key Findings Summary

### Issue 1: Systematic Bias
- [ ] Analysis complete
- **Finding:** _Update after running 01_noise_characterization.ipynb_
- **Impact on synthetic data:** _TBD_

### Issue 2: Adjacent Correlation
- [ ] Analysis complete
- **Finding:** _Update after running 01_noise_characterization.ipynb_
- **Impact on synthetic data:** _TBD_

### Issue 3: X-value Uncertainty
- [ ] Analysis complete
- **Finding:** _Update after running 01_noise_characterization.ipynb_
- **Impact on synthetic data:** _TBD_

### Method Comparison (GLS vs PyMC)
- [ ] Analysis complete
- **Recommendation:** _Update after running 03_fitting_method_comparison.ipynb_

---

## Decision Log

| Date | Decision | Rationale |
|------|----------|----------|
| 2025-12-22 | Split notebook into 3 focused notebooks | Better organization, easier to navigate |
| 2025-12-22 | Created dev/noise_models.py | Reusable functions, reduce duplication |
| | | |