# üß† Tutorial: BCI Competition IV Dataset 4 Exploration
## Working with ECoG Brain Signal Data

---

**Learning Objectives:**
- Download and load the BCI Competition IV Dataset 4 from braindecode
- Explore ECoG (Electrocorticography) signal structure
- Understand channel configurations and sampling rates
- Visualize neural signals in time and frequency domains
- Analyze signal characteristics across different brain channels

---


## üì• Step 1: Data Download

We'll download the BCI Competition IV Dataset 4, which contains ECoG recordings from patients performing finger movements.

**Data Source:** [BCI Competition IV](http://www.bbci.de/competition/iv/) - Dataset 4
- **Dataset:** ECoG recordings from 3 patients
- **Task:** Finger flexion movements (5 fingers)
- **Channels:** Multiple ECoG electrodes recording brain activity


In [None]:
# Install required packages if not already installed
# Uncomment the lines below if you need to install packages
# !uv add braindecode moabb
# !uv pip install -r requirements.txt


In [None]:
# Download and load BCI Competition IV Dataset 4
import os
from pathlib import Path
from braindecode.datasets import BCICompetitionIVDataset4

# Download dataset if not already available
print("Downloading BCI Competition IV Dataset 4...")
try:
    BCICompetitionIVDataset4.download()
    print("‚úì Dataset download complete!")
except Exception as e:
    print(f"‚ö† Download error: {e}")
    print("Dataset may already be downloaded or there was a connection issue.")

# Check dataset location
from main import get_dataset_path
base_path, dataset_path = get_dataset_path()
print(f"\nDataset storage information:")
print(f"  - Base path: {base_path}")
print(f"  - Dataset path: {dataset_path}")
print(f"  - Directory exists: {dataset_path.exists()}")

if dataset_path.exists():
    contents = list(dataset_path.iterdir())
    if contents:
        print(f"  - Found {len(contents)} items in dataset directory")


## üìä Step 2: Load and Inspect Dataset Structure

Now we'll load the dataset and explore its structure, including the number of subjects, channels, and recording characteristics.


In [None]:
# Load dataset for subject 1 (you can change this to [1, 2, 3] for all subjects)
subject_ids = 1  # Can be 1, 2, 3, or [1, 2, 3] for all subjects

print(f"Loading dataset for subject(s): {subject_ids}")
dataset = BCICompetitionIVDataset4(subject_ids=subject_ids)

print(f"\n‚úì Dataset loaded successfully!")
print(f"  - Number of recordings: {len(dataset.datasets)}")
print(f"  - Dataset type: {type(dataset).__name__}")


### üîç Quick Data Inspection


In [None]:
# Explore the first recording
if len(dataset.datasets) > 0:
    first_recording = dataset.datasets[0]
    print(f"First recording type: {type(first_recording).__name__}")
    print(f"First recording description:\n{first_recording.description}")
    
    # Get raw data
    raw = first_recording.raw
    print(f"\nüìä Raw Data Information:")
    print(f"  - Number of channels: {len(raw.ch_names)}")
    print(f"  - Sampling frequency: {raw.info['sfreq']} Hz")
    print(f"  - Duration: {raw.times[-1]:.2f} seconds")
    print(f"  - Number of time points: {len(raw.times)}")
    print(f"  - Channel names (first 10): {raw.ch_names[:10]}")
else:
    print("‚ö† No recordings found in dataset")


In [None]:
# Extract the actual ECoG data
data, times = raw[:, :]

print(f"üìà Data Shape: {data.shape}")
print(f"  - Channels: {data.shape[0]}")
print(f"  - Time points: {data.shape[1]}")
print(f"  - Time range: {times[0]:.2f} to {times[-1]:.2f} seconds")

print(f"\nüìä Data Statistics:")
print(f"  - Mean: {data.mean():.4f}")
print(f"  - Std: {data.std():.4f}")
print(f"  - Min: {data.min():.4f}")
print(f"  - Max: {data.max():.4f}")


## üìè Step 3: Data Preparation

Let's prepare the data for analysis by extracting key information and organizing it into a more convenient format.


In [None]:
import numpy as np
import pandas as pd

# Create a summary DataFrame for easier analysis
channel_info = []
for i, ch_name in enumerate(raw.ch_names):
    channel_data = data[i, :]
    channel_info.append({
        'channel_index': i,
        'channel_name': ch_name,
        'mean': channel_data.mean(),
        'std': channel_data.std(),
        'min': channel_data.min(),
        'max': channel_data.max(),
        'range': channel_data.max() - channel_data.min()
    })

df_channels = pd.DataFrame(channel_info)

print("‚úì Channel information extracted")
print(f"\nChannel Statistics Summary:")
print(df_channels.describe())


In [None]:
# Display first few channels
print("First 10 channels:")
df_channels.head(10)


## üìä Step 4: Data Visualization

Let's create comprehensive visualizations to understand the ECoG signals in both time and frequency domains.


### 4.1 Setup Visualization Libraries


In [None]:
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import signal

# Set style for prettier plots
sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (15, 10)
plt.rcParams['figure.dpi'] = 100

print("‚úì Visualization libraries ready")


### 4.2 Time Series Visualization - Sample Channels


In [None]:
# Plot first 10 seconds of multiple channels
n_channels_to_plot = min(10, len(raw.ch_names))
time_mask = times <= 10.0  # First 10 seconds

fig, ax = plt.subplots(figsize=(16, 8))

for i in range(n_channels_to_plot):
    # Normalize for visualization
    channel_data = data[i, time_mask]
    channel_data_norm = (channel_data - channel_data.mean()) / (channel_data.std() + 1e-8)
    ax.plot(times[time_mask], channel_data_norm + i * 2, 
            label=raw.ch_names[i], alpha=0.7, linewidth=1)

ax.set_xlabel('Time (seconds)', fontsize=12)
ax.set_ylabel('Channel (normalized amplitude)', fontsize=12)
ax.set_title(f'ECoG Signals - First {n_channels_to_plot} Channels (First 10 seconds)', 
             fontsize=14, fontweight='bold')
ax.legend(loc='upper right', fontsize=8, ncol=2)
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

print(f"‚úì Displayed {n_channels_to_plot} channels over first 10 seconds")


### 4.3 Power Spectral Density Analysis


In [None]:
# Compute and plot power spectral density for a sample channel
sample_channel_idx = 0
sample_channel_data = data[sample_channel_idx, :]

# Compute power spectral density using Welch's method
freqs, psd = signal.welch(sample_channel_data, fs=raw.info['sfreq'], nperseg=1024)

fig, ax = plt.subplots(figsize=(14, 6))
ax.semilogy(freqs, psd, linewidth=2, color='steelblue')
ax.set_xlabel('Frequency (Hz)', fontsize=12)
ax.set_ylabel('Power Spectral Density', fontsize=12)
ax.set_title(f'Power Spectral Density - Channel: {raw.ch_names[sample_channel_idx]}', 
             fontsize=14, fontweight='bold')
ax.set_xlim(0, 100)  # Focus on 0-100 Hz range (most relevant for neural signals)
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

# Find peak frequency
peak_freq_idx = np.argmax(psd[(freqs >= 1) & (freqs <= 100)])
peak_freq = freqs[(freqs >= 1) & (freqs <= 100)][peak_freq_idx]
print(f"‚úì Peak frequency: {peak_freq:.2f} Hz")


### 4.4 Channel Variability Analysis


In [None]:
# Plot standard deviation across all channels
fig, axes = plt.subplots(1, 2, figsize=(16, 6))

# Bar plot of channel standard deviations
axes[0].bar(range(len(raw.ch_names)), df_channels['std'], 
            alpha=0.7, color='coral', edgecolor='black')
axes[0].set_xlabel('Channel Index', fontsize=12)
axes[0].set_ylabel('Standard Deviation', fontsize=12)
axes[0].set_title('Signal Variability Across Channels', fontsize=14, fontweight='bold')
axes[0].grid(True, alpha=0.3, axis='y')

# Histogram of channel statistics
axes[1].hist(df_channels['std'], bins=30, color='seagreen', alpha=0.7, edgecolor='black')
axes[1].set_xlabel('Standard Deviation', fontsize=12)
axes[1].set_ylabel('Number of Channels', fontsize=12)
axes[1].set_title('Distribution of Channel Variability', fontsize=14, fontweight='bold')
axes[1].axvline(df_channels['std'].median(), color='red', linestyle='--', linewidth=2,
                label=f'Median: {df_channels["std"].median():.4f}')
axes[1].legend()
axes[1].grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.show()

print(f"‚úì Channel variability analysis complete")
print(f"  - Most variable channel: {df_channels.loc[df_channels['std'].idxmax(), 'channel_name']} (std: {df_channels['std'].max():.4f})")
print(f"  - Least variable channel: {df_channels.loc[df_channels['std'].idxmin(), 'channel_name']} (std: {df_channels['std'].min():.4f})")


### 4.5 Signal Amplitude Distribution


In [None]:
# Plot amplitude distributions for multiple channels
n_channels_for_dist = min(5, len(raw.ch_names))

fig, axes = plt.subplots(n_channels_for_dist, 1, figsize=(14, 3*n_channels_for_dist))

for i in range(n_channels_for_dist):
    channel_data = data[i, :]
    axes[i].hist(channel_data, bins=50, alpha=0.7, color='mediumpurple', edgecolor='black')
    axes[i].set_title(f'Channel {i}: {raw.ch_names[i]}', fontsize=12, fontweight='bold')
    axes[i].set_xlabel('Amplitude', fontsize=11)
    axes[i].set_ylabel('Frequency', fontsize=11)
    axes[i].axvline(channel_data.mean(), color='red', linestyle='--', linewidth=2,
                    label=f'Mean: {channel_data.mean():.4f}')
    axes[i].legend()
    axes[i].grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.show()

print(f"‚úì Displayed amplitude distributions for {n_channels_for_dist} channels")


### 4.6 Correlation Between Channels


In [None]:
# Compute correlation matrix for a subset of channels (for performance)
n_channels_corr = min(20, len(raw.ch_names))
channel_data_subset = data[:n_channels_corr, :]

# Sample every 100th time point for faster computation
sample_indices = np.arange(0, channel_data_subset.shape[1], 100)
corr_data = channel_data_subset[:, sample_indices].T

# Compute correlation
correlation_matrix = np.corrcoef(corr_data.T)

# Plot correlation heatmap
fig, ax = plt.subplots(figsize=(12, 10))
im = ax.imshow(correlation_matrix, cmap='coolwarm', aspect='auto', vmin=-1, vmax=1)
ax.set_xticks(range(n_channels_corr))
ax.set_yticks(range(n_channels_corr))
ax.set_xticklabels([raw.ch_names[i] for i in range(n_channels_corr)], rotation=45, ha='right')
ax.set_yticklabels([raw.ch_names[i] for i in range(n_channels_corr)])
ax.set_title(f'Channel Correlation Matrix (First {n_channels_corr} channels)', 
             fontsize=14, fontweight='bold')
plt.colorbar(im, ax=ax, label='Correlation Coefficient')
plt.tight_layout()
plt.show()

print(f"‚úì Correlation analysis complete for {n_channels_corr} channels")


## üíæ Step 5: Save Processed Data

Let's save the processed channel information for future use.


In [None]:
# Save channel information to CSV
output_dir = Path("output")
output_dir.mkdir(exist_ok=True)

output_file = output_dir / "bci_channel_info.csv"
df_channels.to_csv(output_file, index=False)

print(f"‚úì Channel information saved to: {output_file}")
print(f"  - Total channels: {len(df_channels)}")
print(f"  - Columns: {list(df_channels.columns)}")

# Also save a summary
summary = {
    'subject_id': subject_ids,
    'num_channels': len(raw.ch_names),
    'sampling_freq': raw.info['sfreq'],
    'duration_seconds': times[-1],
    'num_timepoints': len(times),
    'data_mean': data.mean(),
    'data_std': data.std(),
    'data_min': data.min(),
    'data_max': data.max()
}

summary_df = pd.DataFrame([summary])
summary_file = output_dir / "bci_dataset_summary.csv"
summary_df.to_csv(summary_file, index=False)

print(f"‚úì Dataset summary saved to: {summary_file}")


---

## üéØ Ready for Time Series Forecasting!

Your ECoG data is now explored and ready for forecasting:

**Options:**
- Use individual channels for univariate time series forecasting
- Aggregate multiple channels (mean/median) for composite signals
- Apply TimesFM or other forecasting models
- Use alternative methods (Linear Regression) on Apple Silicon

**Next steps:**
- Run `main.py` for complete forecasting pipeline
- Explore different channel combinations
- Analyze frequency domain features
- Build predictive models for finger movement decoding

---


---

## üéâ Exploration Complete!

**What we accomplished:**
1. ‚úÖ Downloaded and loaded BCI Competition IV Dataset 4
2. ‚úÖ Explored dataset structure and channel information
3. ‚úÖ Visualized ECoG signals in time domain
4. ‚úÖ Analyzed power spectral density
5. ‚úÖ Examined channel variability and correlations
6. ‚úÖ Saved processed data for future use

**Key Findings:**
- Dataset contains multi-channel ECoG recordings
- Signals sampled at high frequency (1000 Hz typical)
- Channels show varying levels of activity
- Ready for time series forecasting analysis

---
