# Segment-Level Aggregate Analysis

This tutorial shows how to compute blink properties and signal features for each 30-second segment of a raw `.fif` file.
Each code line is explained in comments so you can adapt the steps to your own data.

## Input and output overview

This notebook expects a `mne.io.Raw` instance loaded from a FIF file. It produces several tables:
- **blink_df**: blink events per segment with columns `seg_id`, `blink_id`, `start_blink`, `max_blink`, `end_blink`, `outer_start`, `outer_end`, `left_zero`, `right_zero`.
- **blink_props**: properties for each blink derived from the raw signal.
- **agg_props**: average of all numeric blink properties per segment.
- **df_features**: frequency, energy and waveform features per segment.
- **df_combined**: merge of `agg_props` with `df_features`.


In [None]:
from pathlib import Path  # filesystem path management
import mne  # reading EEG/EOG/EAR data from .fif files
import pandas as pd  # table handling

# preprocessing helper: slices raw into 30 s segments and refines blink markers
from pyear.utils.raw_preprocessing import prepare_refined_segments
# build DataFrame of blinks from segments
from pyear.blink_events import generate_blink_dataframe
# per-blink property extraction
from pyear.pyblinkers.segment_blink_properties import compute_segment_blink_properties
# high level feature aggregation
from pyear.pipeline import extract_features
# time-domain energy and complexity metrics
from pyear.energy_complexity import compute_time_domain_features
# frequency-domain metrics for a single segment
from pyear.frequency_domain.segment_features import compute_frequency_domain_features


### Synthetic data for development

The `unitest.fixtures` package includes helpers to generate a small mock recording and refined blink list.

In [None]:
from unitest.fixtures.mock_raw_generation import generate_mock_raw
from unitest.fixtures.mock_ear_generation import _generate_refined_ear

# Create a Raw object with blink annotations
synthetic_raw = generate_mock_raw()
# Segments and refined blink annotations
synthetic_segments, synthetic_refined = prepare_refined_segments(synthetic_raw, channel="EOG")

# Or directly obtain a refined blink list
synthetic_blinks, syn_sfreq, syn_epoch_len, syn_n_epochs = _generate_refined_ear()

## 1. Load the raw recording

In [None]:
fif_path = Path('path/to/your_file.fif')  # path to your .fif data
raw = mne.io.read_raw_fif(fif_path, preload=False)  # read without loading all data into memory
print(f'Sampling rate: {raw.info["sfreq"]} Hz')  # display sampling frequency

## 2. Segment the recording and refine blink annotations

In [None]:
segments, refined_blinks = prepare_refined_segments(raw, channel='EEG-E8')  # 30 s slices and refined blink timings
print(f'Generated {len(segments)} segments')  # confirm segment count

## 3. Build a blink table from the segments

In [None]:
blink_df = generate_blink_dataframe(segments, channel='EEG-E8')  # convert annotations to a DataFrame
blink_df.head()  # preview

`blink_df` columns:
- `seg_id`: index of the segment.
- `blink_id`: order of the blink in that segment.
- `start_blink`, `max_blink`, `end_blink`: sample indices for start, peak and end.
- `outer_start`, `outer_end`: search bounds used for detection.
- `left_zero`, `right_zero`: zero-crossing frames.

## 4. Compute blink properties for each blink

In [None]:
params = {  # parameters used by the blink property extraction
    'base_fraction': 0.5,
    'shut_amp_fraction': 0.9,
    'p_avr_threshold': 3,
    'z_thresholds': [[0.9, 0.98], [2.0, 5.0]]
}
blink_props = compute_segment_blink_properties(
    segments, blink_df, params, channel='EEG-E8', run_fit=False
)  # one row per blink with many properties
blink_props.head()

`blink_props` adds features like `closing_time_base`, `reopening_time_base`, `time_shut_base`, `peak_time_blink` and `inter_blink_max_amp` for each blink.

## 5. Aggregate blink properties per segment

In [None]:
agg_props = (
    blink_props.groupby('seg_id').mean(numeric_only=True)  # average across blinks
    .add_suffix('_mean')
    .reset_index()
)
agg_props.head()

`agg_props` averages the numeric columns of `blink_props` for each `seg_id` and appends `_mean` to the names.

## 6. Extract aggregated features with the pipeline

In [None]:
sfreq = raw.info['sfreq']  # sampling frequency
epoch_len = 30.0  # segment length
n_epochs = len(segments)  # total segments
selected = ['waveform', 'frequency', 'energy']  # feature groups to compute
df_features = extract_features(
    refined_blinks, sfreq, epoch_len, n_epochs,
    features=selected, raw_segments=segments
)  # DataFrame indexed by epoch
df_features.head()

`df_features` holds frequency, energy and waveform metrics indexed by the `epoch` column.

## 7. Combine blink properties with other features

In [None]:
df_combined = pd.merge(
    df_features.reset_index(), agg_props,
    left_on='epoch', right_on='seg_id', how='left'
)  # join on segment index
df_combined.head()

`df_combined` merges `agg_props` with `df_features` so each row summarizes one segment.

## 8. Signal features for all EEG/EOG/EAR channels
> **Warning**: make sure all channels are referenced consistently before comparisons.

In [None]:
channels = [ch for ch in raw.ch_names if ch.startswith(('EEG', 'EOG', 'EAR'))]  # select channel types
records = []  # container for results
for ch in channels:  # iterate channels
    for idx, seg in enumerate(segments):  # each segment
        signal = seg.get_data(picks=ch)[0]  # 1D signal
        time_feats = compute_time_domain_features(signal, sfreq)  # energy + complexity
        freq_feats = compute_frequency_domain_features([], signal, sfreq)  # spectral metrics
        record = {'channel': ch, 'segment_index': idx}  # base info
        record.update(time_feats)
        record.update(freq_feats)
        records.append(record)
df_segments = pd.DataFrame(records)  # final table
df_segments.head()

`df_segments` lists per-channel metrics with columns `channel` and `segment_index`.