# Advanced Association Methods

This notebook demonstrates the advanced pixel-to-photon association methods:

1. **Mystic** - Constrained optimization using the mystic framework
2. **ML** - Machine learning trained on constrained associations

Both methods implement the verified EMPIR algorithm:
- ToT-weighted center of gravity
- Euclidean distance metric
- Time window filtering

## Setup

In [1]:
import neutron_event_analyzer as nea
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

## Load Data

Load a dataset with pixels, photons, and events. Adjust the path to your data folder.

In [2]:
data_path = "/work/nuclear/PTB/archive/pencilbeam0/fast_neutrons"

assoc = nea.Analyse(
    data_path,
    pixels=True,
    photons=True,
    events=True,
    limit=1000
)


ðŸ“‚ Auto-loaded association results: 1,000 rows
ðŸ“¥ Loading raw data for re-analysis...


Loading event-photon pairs:   0%|          | 0/20 [00:00<?, ?it/s]

Loading pixels:   0%|          | 0/20 [00:00<?, ?it/s]

Loaded 154347 pixels in total.


Check the loaded data:

In [3]:
assoc.pixels_df.shape, assoc.photons_df.shape, assoc.events_df.shape

((1000, 5), (8397, 4), (7321, 6))

---
## Method 1: Simple Association (Baseline)

The default method uses a forward time-window with ToT-weighted center of gravity matching.

In [4]:
assoc_simple = nea.Analyse(data_path, pixels=True, photons=True, events=True, limit=1000)
assoc_simple.associate(method='simple')


ðŸ“‚ Auto-loaded association results: 1,000 rows
ðŸ“¥ Loading raw data for re-analysis...


Loading event-photon pairs:   0%|          | 0/20 [00:00<?, ?it/s]

Loading pixels:   0%|          | 0/20 [00:00<?, ?it/s]

Loaded 154347 pixels in total.


Associating pixels to photons:   0%|          | 0/8397 [00:00<?, ?it/s]

Associating photons to events:   0%|          | 0/1 [00:00<?, ?it/s]

2026-01-24 13:30:47,252 - Before grouping: 8364 photons with non-NaN assoc_x
2026-01-24 13:30:47,259 - After grouping: 8364 photons with non-NaN assoc_event_id


âœ… Matched 8364 of 8397 photons (99.6%)
âœ… Saved 1000 rows to /work/nuclear/PTB/archive/pencilbeam0/fast_neutrons/AssociatedResults/associated_data.csv
   File size: 0.10 MB
   Columns: 18
ðŸ’¾ Auto-saved results to: /work/nuclear/PTB/archive/pencilbeam0/fast_neutrons/AssociatedResults/associated_data.csv


Group,Pixel â†’ Photon,Pixel â†’ Photon,Pixel â†’ Photon,Pixel â†’ Photon,Photon â†’ Event,Photon â†’ Event,Photon â†’ Event,Photon â†’ Event,CoM Quality (px2ph),CoM Quality (px2ph),CoM Quality (ph2ev),CoM Quality (ph2ev)
Group,Pixels,Pix %,Photons,Phot %,Photons,Phot %,Events,Evt %,Exact %,Good %,Exact %,Good %
fast_neutrons,"541 / 1,000",54.1%,"63 / 8,397",0.8%,"8,364 / 8,397",99.6%,"7,305 / 7,305",100.0%,6.3%,79.4%,99.8%,0.1%


In [5]:
assoc_simple.associated_df.head()

Unnamed: 0,px/x,px/y,px/toa,px/tot,px/tof,ph/id,ph/x,ph/y,ph/toa,ph/cog,ev/id,ev/x,ev/y,ev/toa,ev/n,ev/psd,ev/cog,ph/n
0,125.0,127.0,0.006246,2.0,7.71875e-07,,,,,,,,,,,,,0
1,126.0,127.0,0.006246,1.0,7.71875e-07,,,,,,,,,,,,,0
2,128.0,130.0,0.006246,3.0,7.71875e-07,8331.0,129.08,130.71,0.006246,0.350566,3780.0,129.08,130.71,0.006246,1.0,0.0,0.0,12
3,128.0,133.0,0.006246,1.0,7.71875e-07,,,,,,,,,,,,,0
4,129.0,133.0,0.006246,1.0,7.71875e-07,8333.0,130.04,133.83,0.006246,0.465105,4081.0,130.04,133.83,0.006246,1.0,0.0,0.0,10


---
## Method 2: Mystic Optimization

The mystic method formulates association as a constrained optimization problem:

- **Objective**: Minimize CoG distance + time penalty
- **Constraints**: Minimum pixel count, spatial/temporal bounds
- **Solver**: Powell's method or differential evolution

This method is slower but can find better associations in ambiguous cases.

In [6]:
assoc_mystic = nea.Analyse(data_path, pixels=True, photons=True, events=True, limit=1000)
assoc_mystic.associate(method='mystic')


ðŸ“‚ Auto-loaded association results: 1,000 rows
ðŸ“¥ Loading raw data for re-analysis...


Loading event-photon pairs:   0%|          | 0/20 [00:00<?, ?it/s]

Loading pixels:   0%|          | 0/20 [00:00<?, ?it/s]

Loaded 154347 pixels in total.


Associating pixels to photons (mystic):   0%|          | 0/8397 [00:00<?, ?it/s]

âœ… Pixel-Photon Association (mystic optimization):
   Pixels:  542 / 1,000 matched (54.2%)
   Photons: 63 / 8,397 matched (0.8%)
   Optimization: 63 success, 0 fallback, 0 failed


Associating photons to events (mystic):   0%|          | 0/7321 [00:00<?, ?it/s]

âœ… Photon-Event Association (mystic optimization):
   Photons: 8,363 / 8,397 matched (99.6%)
   Events:  7,311 / 7,321 matched (99.9%)
   Optimization: 994 success, 6317 fallback, 0 failed
âœ… Saved 1000 rows to /work/nuclear/PTB/archive/pencilbeam0/fast_neutrons/AssociatedResults/associated_data.csv
   File size: 0.10 MB
   Columns: 18
ðŸ’¾ Auto-saved results to: /work/nuclear/PTB/archive/pencilbeam0/fast_neutrons/AssociatedResults/associated_data.csv


Group,Pixel â†’ Photon,Pixel â†’ Photon,Pixel â†’ Photon,Pixel â†’ Photon,Photon â†’ Event,Photon â†’ Event,Photon â†’ Event,Photon â†’ Event,CoM Quality (px2ph),CoM Quality (px2ph),CoM Quality (ph2ev),CoM Quality (ph2ev)
Group,Pixels,Pix %,Photons,Phot %,Photons,Phot %,Events,Evt %,Exact %,Good %,Exact %,Good %
fast_neutrons,"542 / 1,000",54.2%,"63 / 8,397",0.8%,"8,364 / 8,397",99.6%,"7,305 / 7,305",100.0%,6.3%,74.6%,99.8%,0.1%


In [7]:
assoc_mystic.associated_df.head()

Unnamed: 0,px/x,px/y,px/toa,px/tot,px/tof,ph/id,ph/x,ph/y,ph/toa,ph/cog,ev/id,ev/x,ev/y,ev/toa,ev/n,ev/psd,ev/cog,ph/n
0,125.0,127.0,0.006246,2.0,7.71875e-07,,,,,,,,,,,,,0
1,126.0,127.0,0.006246,1.0,7.71875e-07,,,,,,,,,,,,,0
2,128.0,130.0,0.006246,3.0,7.71875e-07,8328.0,129.08,130.71,0.006246,0.350566,7262.0,129.08,130.71,0.006246,1.0,0.0,0.0,12
3,128.0,133.0,0.006246,1.0,7.71875e-07,,,,,,,,,,,,,0
4,129.0,133.0,0.006246,1.0,7.71875e-07,8333.0,130.04,133.83,0.006246,0.465105,7261.0,130.04,133.83,0.006246,1.0,0.0,0.0,10


---
## Method 3: ML-Based Association

The ML method uses a trained classifier to predict pixel-to-photon associations.

### Step 1: Train the Model

Train on constrained associations from mystic or simple method:

In [8]:
# Train on mystic-constrained associations
trainer = nea.Analyse(data_path, pixels=True, photons=True, events=True, limit=5000)

model = trainer.train_association_model(
    method='simple',      # Use simple/mystic/kdtree for training labels
    model_type='rf',      # rf, gb, mlp, or torch
    n_samples=10000
)


ðŸ“‚ Auto-loaded association results: 1,000 rows
ðŸ“¥ Loading raw data for re-analysis...


Loading event-photon pairs:   0%|          | 0/20 [00:00<?, ?it/s]

Loading pixels:   0%|          | 0/20 [00:00<?, ?it/s]

Loaded 154347 pixels in total.
Generating training data using 'simple' method...


Generating training data:   0%|          | 0/8616 [00:00<?, ?it/s]

ValueError: No training data generated. Check your data and parameters.

Available model types:

| Type | Description |
|------|-------------|
| `rf` | Random Forest (fast, good default) |
| `gb` | Gradient Boosting (slower, often better) |
| `mlp` | Neural Network (sklearn MLP) |
| `torch` | PyTorch Neural Network |

### Step 2: Use the Trained Model

In [None]:
assoc_ml = nea.Analyse(data_path, pixels=True, photons=True, events=True, limit=1000)
assoc_ml._ml_association_model = model
assoc_ml.associate(method='ml')

In [None]:
assoc_ml.associated_df.head()

The ML method adds an `ml_confidence` column with prediction confidence:

In [None]:
assoc_ml.associated_df['ml_confidence'].describe()

### Step 3: Save/Load Models

Save the model for later use:

In [None]:
import joblib

# Save
joblib.dump(model, 'association_model.joblib')

# Load and use
loaded_model = joblib.load('association_model.joblib')

---
## Compare Methods

Compare association statistics across methods:

In [None]:
def get_stats(analyser):
    df = analyser.associated_df
    return {
        'pixels_matched': (df['ph/id'] >= 0).sum(),
        'pixels_total': len(df),
        'match_rate': (df['ph/id'] >= 0).mean() * 100,
        'cog_mean': df['ph/cog'].mean(),
        'cog_std': df['ph/cog'].std()
    }

pd.DataFrame({
    'Simple': get_stats(assoc_simple),
    'Mystic': get_stats(assoc_mystic),
    'ML': get_stats(assoc_ml)
}).T

---
## Visualize Results

Compare CoG quality distributions:

In [None]:
fig, axes = plt.subplots(1, 3, figsize=(12, 4))

for ax, (name, a) in zip(axes, [('Simple', assoc_simple), ('Mystic', assoc_mystic), ('ML', assoc_ml)]):
    cog = a.associated_df['ph/cog'].dropna()
    ax.hist(cog, bins=50, edgecolor='white', alpha=0.7)
    ax.axvline(cog.median(), color='red', linestyle='--', label=f'median={cog.median():.2f}')
    ax.set_xlabel('CoG Distance (px)')
    ax.set_title(name)
    ax.legend()

plt.tight_layout()

---
## Feature Importance (ML)

For Random Forest models, inspect feature importance:

In [None]:
feature_names = ['dist_norm', 'dx_norm', 'dy_norm', 'dt_norm', 'tot_norm',
                 'dist_centroid', 'tot_rel', 'cluster_size', 'pixel_order', 'abs_dt']

if hasattr(model, 'feature_importances_'):
    importance = pd.Series(model.feature_importances_, index=feature_names).sort_values()
    importance.plot.barh(figsize=(8, 4))
    plt.xlabel('Feature Importance')
    plt.tight_layout()

---
## Tips

**When to use each method:**

| Method | Use Case |
|--------|----------|
| `simple` | Default choice, fast and accurate |
| `kdtree` | Large datasets, same accuracy as simple |
| `mystic` | Ambiguous associations, training data generation |
| `ml` | Fast inference after training, large datasets |

**Training recommendations:**
- Use `mystic` method for training labels (higher quality)
- Train on representative data (similar conditions to inference)
- Use at least 10,000 samples for good generalization