# Synthetic Data Generation with Known Ground Truth

This notebook demonstrates how to generate synthetic spectroscopy data with controlled noise characteristics. This is useful for:

**Use cases:**
- Testing fitting algorithms with known ground truth
- Validating analysis pipelines
- Exploring noise effects on parameter recovery
- Creating test datasets for method development
- Educational demonstrations

In [None]:
import os
import numpy as np
import trspecfit

## 1. Set Up Model

Define the "ground truth" model that will generate your synthetic data.

In [None]:
# Create parent project
project = trspecfit.Project(path=os.getcwd())

# Create file instance with axes
file = trspecfit.File(
    parent_project=project,
    energy=np.arange(0, 20, 0.05),
    time=np.arange(-10, 100, 0.25)
)

In [None]:
# Load energy-resolved model
file.load_model(
    model_yaml='models_energy.yaml',
    model_info=['single_peak']
)

In [None]:
# Examine the energy model
file.describe_model()

In [None]:
# Add time dependence to make it 2D
file.add_time_dependence(
    model_yaml="models_time.yaml",
    model_info=['MonoExpPosIRF'],
    par_name="GLP_01_x0"
)

# Tip: Copy this cell to add more time dependencies for other parameters

In [None]:
# Examine the complete 2D model
file.describe_model()

## 2. Visualize Ground Truth Model

Before adding noise, visualize what the "clean" data looks like.

In [None]:
# Plot 1D spectrum at a specific time point
file.model_active.plot_1D(t_ind=0, plot_ind=True)

In [None]:
# Generate and plot full 2D dataset
file.model_active.create_value2D()
file.model_active.plot_2D()

## 3. Configure Simulator

Choose detection mode and noise characteristics.

### Detection Modes:

**Analog Detection** (`detection='analog'`):
- Continuous signal measurement
- Specify `noise_level` (relative to signal, 0-1)
- Choose `noise_type`: `'gaussian'` or `'poisson'`

**Photon Counting** (`detection='photon_counting'`):
- Discrete photon events
- Specify `counts_per_delay` (or `count_rate` + `integration_time`)
- Poisson noise (realistic for low photon counts)

In [None]:
# Create simulator with photon counting detection
sim = trspecfit.Simulator(
    model=file.model_active,
    detection='photon_counting',
    counts_per_delay=1E5,  # Adjust for desired SNR
    seed=42  # For reproducibility (None for random)
)

**Alternative: Analog detection**

```python
sim = trspecfit.Simulator(
    model=file.model_active,
    detection='analog',
    noise_level=0.05,  # 5% noise relative to signal
    noise_type='gaussian',
    seed=42
)
```

## 4. Generate Synthetic Data

Generate noisy data at different dimensionalities.

### 1D Simulation (Single Time Point)

Useful for testing energy-resolved fitting.

In [None]:
# Simulate 1D dataset at specific time index
sim_data_1d = sim.simulate_1D(t_ind=0)

# Visualize clean vs noisy
sim.plot_comparison(t_ind=0, dim=1)

### 2D Simulation (Full Time Series)

Single realization of the full 2D dataset.

In [None]:
# Simulate full 2D dataset
sim_data_2d = sim.simulate_2D()

# Visualize data, fit, and residual
sim.plot_comparison(dim=2)

### Multiple Realizations

Generate multiple independent noisy realizations for statistical analysis or testing.

In [None]:
# Simulate N independent realizations
clean_data, noisy_data_list, noise_list = sim.simulate_N(N=20)

print(f"Generated {len(noisy_data_list)} realizations")
print(f"Each shape: {noisy_data_list[0].shape}")

## 5. Save and Export Data

Save simulated data for later use or fitting tests.

In [None]:
# Save all realizations to HDF5 file
sim.save_data(N_data=noisy_data_list)

print("\nData saved to 'simulated_data/' directory")
print("Files contain:")
print("  - Clean (ground truth) data")
print("  - Noisy realizations")
print("  - Noise-only data")
print("  - Energy and time axes")
print("  - Model parameters (ground truth)")

## Tips for Synthetic Data Generation

**Noise Levels:**
- Match your experimental conditions
- Test multiple noise levels to understand robustness
- Photon counting: Lower counts_per_delay = higher noise
- Analog: noise_level of 0.05-0.10 is typical for good SNR

**Parameter Ranges:**
- Use realistic parameter values from your experiments
- Test edge cases (very fast/slow dynamics, weak/strong signals)
- Ensure parameters produce physically meaningful spectra

**Multiple Realizations:**
- N=10-20 sufficient for testing fitting algorithms
- N=50+ for robust statistical analysis
- Each realization is independent (different noise)

**Validation Strategy:**
- Always fit simulated data to verify parameter recovery
- Compare fitted parameters to ground truth
- Check if confidence intervals contain true values
- Test with different initial guesses

**File Organization:**
- Save different noise levels in separate files
- Include metadata about simulation parameters
- Keep ground truth parameters with simulated data