# DREAMT Dataset Exploration

This notebook demonstrates how to load and visualize IMU and PPG signals from the DREAMT dataset.

**DREAMT**: Dataset for Real-time sleep stage EstimAtion using Multisensor wearable Technology

## Dataset Overview

- 100 participants with sleep apnea
- Wearable E4 device signals:
  - **BVP** (64 Hz): Blood Volume Pulse from PPG sensor
  - **ACC_X, ACC_Y, ACC_Z** (32 Hz): Triaxial accelerometry (IMU)
  - **EDA** (4 Hz): Electrodermal Activity
  - **TEMP** (4 Hz): Skin Temperature
  - **HR** (1 Hz): Heart Rate
  - **IBI**: Inter-beat Interval
- Sleep stage labels: W (Wake), N1, N2, N3 (NREM), R (REM)


## 1. Setup and Imports


In [None]:
import sys
from pathlib import Path

# Add src to path
project_root = Path.cwd().parent
sys.path.insert(0, str(project_root / 'src'))

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Import our custom modules
from data.loader import DREAMTLoader
from data.preprocessing import normalize_signal, bandpass_filter, detect_ppg_peaks
from visualization.signals import SignalVisualizer
from visualization.sleep import plot_hypnogram, plot_stage_distribution, compute_sleep_metrics
from utils.helpers import format_duration, get_sampling_rate

# Configure matplotlib
%matplotlib inline
plt.rcParams['figure.dpi'] = 100
plt.rcParams['figure.figsize'] = [14, 6]

print("✓ Imports successful!")


## 2. Configure Data Path

Update the `DATA_DIR` variable to point to where you downloaded the DREAMT dataset.


In [None]:
# ========================================
# UPDATE THIS PATH to your DREAMT data location
# ========================================
DATA_DIR = project_root / 'data' / 'dreamt'

# Alternative: Use an absolute path
# DATA_DIR = Path('/path/to/dreamt-2.1.0')

# Resolution to use: '64Hz' or '100Hz'
RESOLUTION = '64Hz'

print(f"Data directory: {DATA_DIR}")
print(f"Resolution: {RESOLUTION}")
print(f"Data exists: {DATA_DIR.exists()}")


## 3. Load the Data

Initialize the data loader and explore available participants.


In [None]:
# Initialize the loader
try:
    loader = DREAMTLoader(DATA_DIR, resolution=RESOLUTION)
    print(f"Loader initialized: {loader}")
    print(f"\nAvailable participants ({len(loader.participants)}):")
    print(loader.participants[:10], "..." if len(loader.participants) > 10 else "")
except FileNotFoundError as e:
    print(f"⚠️ Error: {e}")
    print("\nPlease update DATA_DIR to point to your DREAMT dataset location.")
    print("Expected folder structure:")
    print("  dreamt/")
    print("  ├── data_64Hz/")
    print("  │   ├── P001.csv")
    print("  │   ├── P002.csv")
    print("  │   └── ...")
    print("  └── data_100Hz/")
    print("      └── ...")


## 4. Load a Single Participant

Let's explore the data from one participant.


In [None]:
# Select a participant
participant_id = loader.participants[0]  # First participant

# Load data
df = loader.load_participant(participant_id)

print(f"Loaded participant: {participant_id}")
print(f"Data shape: {df.shape}")
print(f"Columns: {list(df.columns)}")
print(f"\nSampling rate: {get_sampling_rate(RESOLUTION)} Hz")

# Calculate recording duration
fs = get_sampling_rate(RESOLUTION)
duration_sec = len(df) / fs
print(f"Recording duration: {format_duration(duration_sec)}")


In [None]:
# Preview the data
df.head(10)


## 5. Extract IMU and PPG Signals


In [None]:
# Extract signals
imu_data = loader.get_imu_signals(df)
ppg_data = loader.get_ppg_signals(df, include_derived=True)
sleep_stages = loader.get_sleep_stages(df)

# Get time vector
time = loader.get_time_vector(df)

print("IMU columns:", list(imu_data.columns))
print("PPG columns:", list(ppg_data.columns))
print(f"Number of samples: {len(time)}")
print(f"\nUnique sleep stages: {sleep_stages.unique()}")


## 6. Visualize IMU Signals

The accelerometer captures movement in 3 axes (X, Y, Z). During sleep, we expect minimal movement except during stage transitions and brief arousals.


In [None]:
# Initialize visualizer
viz = SignalVisualizer()

# Plot full night IMU data
fig = viz.plot_imu_signals(
    time,
    imu_data['ACC_X'].values,
    imu_data['ACC_Y'].values,
    imu_data['ACC_Z'].values,
    title=f"IMU Accelerometer Signals - {participant_id}",
    show_magnitude=True,
    sleep_stages=sleep_stages.values,
    time_unit='hours'
)
plt.show()


## 7. Visualize PPG (Blood Volume Pulse) Signal

The PPG signal captures blood volume changes and can be used to derive heart rate and heart rate variability.


In [None]:
# Plot full night PPG data
hr_data = ppg_data['HR'].values if 'HR' in ppg_data.columns else None

fig = viz.plot_ppg_signal(
    time,
    ppg_data['BVP'].values,
    title=f"PPG Blood Volume Pulse - {participant_id}",
    hr=hr_data,
    sleep_stages=sleep_stages.values,
    time_unit='hours'
)
plt.show()


### 7.1 PPG Detail View with Peak Detection

Let's look at a short segment of PPG and detect heartbeat peaks.


In [None]:
# Get a 10-second segment of PPG
segment_start = int(2 * 3600 * fs)  # 2 hours into recording
segment_duration = int(10 * fs)      # 10 seconds

bvp_segment = ppg_data['BVP'].values[segment_start:segment_start + segment_duration]
time_segment = time[segment_start:segment_start + segment_duration]

# Detect peaks
peaks, peak_values = detect_ppg_peaks(bvp_segment, fs)

# Plot
fig, ax = plt.subplots(figsize=(14, 4))
ax.plot(time_segment, bvp_segment, color='#C44569', linewidth=1, label='BVP')
ax.scatter(time_segment[peaks], peak_values, color='#2ECC71', s=50, zorder=5, label='Peaks')
ax.set_xlabel('Time (seconds)')
ax.set_ylabel('BVP (a.u.)')
ax.set_title('PPG Signal with Detected Heartbeat Peaks')
ax.legend()
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

# Calculate instantaneous heart rate
if len(peaks) > 1:
    ibi = np.diff(peaks) / fs * 1000  # Inter-beat interval in ms
    hr_inst = 60000 / ibi  # Heart rate in bpm
    print(f"Number of detected beats: {len(peaks)}")
    print(f"Average IBI: {np.mean(ibi):.1f} ms")
    print(f"Instantaneous HR: {np.mean(hr_inst):.1f} ± {np.std(hr_inst):.1f} bpm")


## 8. Combined View: IMU + PPG


In [None]:
# Plot combined view
fig = viz.plot_combined_signals(
    time,
    imu_data['ACC_X'].values,
    imu_data['ACC_Y'].values,
    imu_data['ACC_Z'].values,
    ppg_data['BVP'].values,
    sleep_stages=sleep_stages.values,
    title=f"Combined IMU + PPG Signals - {participant_id}",
    time_unit='hours'
)
plt.show()


## 9. Sleep Stage Analysis

Now let's analyze the sleep stages associated with this recording.


In [None]:
# Get epochs (one label per 30 seconds)
epochs = loader.get_epoch_data(df)
print(f"Number of 30-second epochs: {len(epochs)}")

# Get stage labels per epoch (mode of each epoch)
epoch_stages = []
for epoch in epochs:
    if 'Sleep_Stage' in epoch.columns:
        mode_stage = epoch['Sleep_Stage'].mode()
        epoch_stages.append(mode_stage.iloc[0] if len(mode_stage) > 0 else 'Missing')
    else:
        epoch_stages.append('Missing')

epoch_stages = np.array(epoch_stages)
print(f"Stage labels extracted: {len(epoch_stages)}")


In [None]:
# Plot hypnogram
fig = plot_hypnogram(
    epoch_stages,
    epoch_duration=30.0,
    title=f"Hypnogram - {participant_id}",
    show_hours=True
)
plt.show()


In [None]:
# Plot stage distribution
fig = plot_stage_distribution(
    epoch_stages,
    epoch_duration=30.0,
    title=f"Sleep Stage Distribution - {participant_id}"
)
plt.show()


In [None]:
# Compute sleep metrics
metrics = compute_sleep_metrics(epoch_stages, epoch_duration=30.0)

print("Sleep Metrics")
print("=" * 50)
print(f"Total Recording Time: {metrics['total_recording_time_min']:.1f} min")
print(f"Total Sleep Time: {metrics['total_sleep_time_min']:.1f} min")
print(f"Sleep Onset Latency: {metrics['sleep_onset_latency_min']:.1f} min")
print(f"Wake After Sleep Onset: {metrics['wake_after_sleep_onset_min']:.1f} min")
print(f"Sleep Efficiency: {metrics['sleep_efficiency_pct']:.1f}%")
print(f"REM Latency: {metrics['rem_latency_min']:.1f} min" if metrics['rem_latency_min'] else "REM Latency: N/A")
print(f"\nTime in Each Stage:")
for stage in ['W', 'N1', 'N2', 'N3', 'R']:
    key = f"{stage}_minutes"
    if key in metrics:
        print(f"  {stage}: {metrics[key]:.1f} min")


## 10. Next Steps

Now that you can load and visualize the data, you can:

1. **Feature Extraction**: Extract time-domain and frequency-domain features from IMU and PPG
2. **Signal Processing**: Apply filtering, artifact removal, and normalization
3. **Machine Learning**: Build models to predict sleep stages from wearable signals
4. **Cross-Participant Analysis**: Compare patterns across the 100 participants

### Useful References:

- [DREAMT Paper](https://proceedings.mlr.press/v248/wang24a.html)
- [PhysioNet Dataset Page](https://physionet.org/content/dreamt/)
- [DREAMT_FE Feature Extraction](https://github.com/WillKeWang/DREAMT_FE)
