# EEG and Sensor Data Exploratory Analysis

This notebook provides an interactive exploration of EEG and sensor data collected from an Emotiv device.

## Dataset Overview

The dataset contains:
- **EEG Data**: Raw electroencephalography signals from 14 channels
- **Mental State Metrics**: Attention, engagement, excitement, stress, relaxation, etc.
- **Motion Data**: Head orientation (quaternions), accelerometer, and magnetometer readings
- **Power Spectral Data**: Frequency band powers (theta, alpha, beta, gamma)
- **Device Quality**: Signal quality and contact information

In [None]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots
import warnings
warnings.filterwarnings('ignore')

# Set up plotting
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")

print("Libraries imported successfully!")

In [None]:
# Import our custom analysis modules
from data_loader import DataLoader, load_session_data
from eeg_analysis import EEGAnalyzer
from mental_state_analysis import MentalStateAnalyzer
from motion_analysis import MotionAnalyzer
from comprehensive_analysis import ComprehensiveAnalyzer

print("Analysis modules imported successfully!")

## 1. Data Loading and Overview

In [None]:
# Load the data
data_directory = "../collected_data"
loader, data = load_session_data(data_directory)

# Get data information
data_info = loader.get_data_info()

print("Dataset Information:")
print("=" * 40)
for data_type, info in data_info.items():
    print(f"\n{data_type.upper()} Data:")
    print(f"  Shape: {info['shape']}")
    print(f"  Duration: {info.get('duration', 'N/A')} seconds")
    print(f"  Sampling Rate: {info.get('sampling_rate', 'N/A')} Hz")
    print(f"  Columns: {info['columns'][:5]}{'...' if len(info['columns']) > 5 else ''}")

In [None]:
# Display first few rows of each dataset
for data_type, df in data.items():
    if not df.empty:
        print(f"\n{data_type.upper()} Data Sample:")
        print("-" * 30)
        display(df.head())

## 2. EEG Signal Analysis

In [None]:
# Initialize EEG analyzer
if 'eeg' in data and not data['eeg'].empty:
    eeg_analyzer = EEGAnalyzer(loader)
    
    print(f"EEG Channels: {eeg_analyzer.channels}")
    print(f"Sampling Rate: {eeg_analyzer.sampling_rate} Hz")
    
    # Analyze first channel
    if eeg_analyzer.channels:
        sample_channel = eeg_analyzer.channels[0]
        print(f"\nAnalyzing sample channel: {sample_channel}")
        
        # Get raw and filtered data
        raw_data = eeg_analyzer.eeg_data[sample_channel]
        filtered_data = eeg_analyzer.preprocess_signal(raw_data)
        
        print(f"Raw data stats: Mean={raw_data.mean():.2f}, Std={raw_data.std():.2f}")
        print(f"Filtered data stats: Mean={filtered_data.mean():.2f}, Std={filtered_data.std():.2f}")
else:
    print("No EEG data available")

In [None]:
# Visualize EEG signals
if 'eeg' in data and not data['eeg'].empty:
    # Plot raw vs filtered signal for sample channel
    fig, axes = plt.subplots(2, 1, figsize=(15, 8))
    
    # Limit to first 10 seconds for clarity
    sample_duration = pd.Timedelta(seconds=10)
    end_time = raw_data.index[0] + sample_duration
    
    raw_sample = raw_data[raw_data.index <= end_time]
    filtered_sample = filtered_data[filtered_data.index <= end_time]
    
    # Raw signal
    axes[0].plot(raw_sample.index, raw_sample.values, 'b-', alpha=0.7)
    axes[0].set_title(f'Raw EEG Signal - {sample_channel}')
    axes[0].set_ylabel('Amplitude (µV)')
    axes[0].grid(True, alpha=0.3)
    
    # Filtered signal
    axes[1].plot(filtered_sample.index, filtered_sample.values, 'r-', alpha=0.7)
    axes[1].set_title(f'Filtered EEG Signal - {sample_channel}')
    axes[1].set_ylabel('Amplitude (µV)')
    axes[1].set_xlabel('Time')
    axes[1].grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()

In [None]:
# Frequency analysis
if 'eeg' in data and not data['eeg'].empty:
    # Calculate frequency bands for sample channel
    band_powers = eeg_analyzer.extract_band_power(filtered_data)
    
    print("Frequency Band Powers:")
    for band, power in band_powers.items():
        print(f"  {band.capitalize()}: {power:.2e} µV²/Hz")
    
    # Plot frequency bands
    bands = list(band_powers.keys())
    powers = list(band_powers.values())
    
    plt.figure(figsize=(10, 6))
    bars = plt.bar(bands, powers, color=['purple', 'blue', 'green', 'orange', 'red'])
    plt.title(f'Frequency Band Powers - {sample_channel}')
    plt.ylabel('Power (µV²/Hz)')
    plt.yscale('log')
    
    # Add value labels on bars
    for bar, power in zip(bars, powers):
        height = bar.get_height()
        plt.text(bar.get_x() + bar.get_width()/2., height,
                f'{power:.1e}', ha='center', va='bottom')
    
    plt.show()

## 3. Mental State Metrics Exploration

In [None]:
# Initialize mental state analyzer
if 'met' in data and not data['met'].empty:
    mental_analyzer = MentalStateAnalyzer(loader)
    
    print(f"Mental State Metrics: {mental_analyzer.metrics}")
    
    # Get basic statistics
    basic_stats = mental_analyzer.get_basic_statistics()
    
    print("\nBasic Statistics:")
    stats_df = pd.DataFrame(basic_stats).T
    display(stats_df[['mean', 'std', 'min', 'max']].round(3))
else:
    print("No mental state data available")

In [None]:
# Visualize mental state metrics over time
if 'met' in data and not data['met'].empty:
    fig, axes = plt.subplots(2, 2, figsize=(15, 10))
    axes = axes.flatten()
    
    metrics_to_plot = mental_analyzer.metrics[:4]  # First 4 metrics
    
    for i, metric in enumerate(metrics_to_plot):
        if metric in mental_analyzer.mental_data.columns:
            data_series = mental_analyzer.mental_data[metric].dropna()
            
            axes[i].plot(data_series.index, data_series.values, linewidth=1.5)
            axes[i].set_title(f'{metric.capitalize()} Over Time')
            axes[i].set_ylabel('Value')
            axes[i].grid(True, alpha=0.3)
            
            # Add rolling average
            rolling_avg = data_series.rolling('30S').mean()
            axes[i].plot(rolling_avg.index, rolling_avg.values, 
                        color='red', linewidth=2, alpha=0.7, label='30s average')
            axes[i].legend()
    
    plt.tight_layout()
    plt.show()

In [None]:
# Mental state correlations
if 'met' in data and not data['met'].empty:
    correlation_matrix = mental_analyzer.compute_correlations()
    
    plt.figure(figsize=(10, 8))
    mask = np.triu(np.ones_like(correlation_matrix, dtype=bool))
    sns.heatmap(correlation_matrix, mask=mask, annot=True, cmap='RdBu_r', center=0,
                square=True, linewidths=0.5, cbar_kws={"shrink": .8})
    plt.title('Mental State Metrics Correlation Matrix')
    plt.show()

## 4. Motion Data Analysis

In [None]:
# Initialize motion analyzer
if 'mot' in data and not data['mot'].empty:
    motion_analyzer = MotionAnalyzer(loader)
    
    print(f"Motion Sensors: {motion_analyzer.sensors}")
    
    # Calculate head orientation
    head_angles = motion_analyzer.calculate_head_orientation()
    
    if not head_angles.empty:
        print("\nHead Orientation Statistics:")
        display(head_angles.describe().round(2))
    
    # Calculate acceleration magnitude
    acc_magnitude = motion_analyzer.calculate_acceleration_magnitude()
    
    if not acc_magnitude.empty:
        print(f"\nAcceleration Magnitude: Mean={acc_magnitude.mean():.3f}, Std={acc_magnitude.std():.3f}")
else:
    print("No motion data available")

In [None]:
# Visualize motion data
if 'mot' in data and not data['mot'].empty and not head_angles.empty:
    fig, axes = plt.subplots(2, 2, figsize=(15, 10))
    
    # Head orientation angles
    for i, angle in enumerate(['roll', 'pitch', 'yaw']):
        if angle in head_angles.columns:
            row, col = i // 2, i % 2
            axes[row, col].plot(head_angles.index, head_angles[angle], linewidth=1.5)
            axes[row, col].set_title(f'Head {angle.capitalize()} Angle')
            axes[row, col].set_ylabel('Degrees')
            axes[row, col].grid(True, alpha=0.3)
    
    # Acceleration magnitude
    if not acc_magnitude.empty:
        axes[1, 1].plot(acc_magnitude.index, acc_magnitude.values, 
                       color='red', linewidth=1.5)
        axes[1, 1].set_title('Acceleration Magnitude')
        axes[1, 1].set_ylabel('Magnitude')
        axes[1, 1].grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()

In [None]:
# Detect head movements
if 'mot' in data and not data['mot'].empty:
    movements = motion_analyzer.detect_head_movements()
    
    print("Detected Movements:")
    for movement_type, movement_data in movements.items():
        print(f"  {movement_type.replace('_', ' ').title()}: {len(movement_data)} events")
        if len(movement_data) > 0 and 'magnitude' in movement_data.columns:
            print(f"    Average magnitude: {movement_data['magnitude'].mean():.2f}")
            print(f"    Max magnitude: {movement_data['magnitude'].max():.2f}")

## 5. Integrated Analysis and Correlations

In [None]:
# Create comprehensive analyzer
comprehensive_analyzer = ComprehensiveAnalyzer(data_directory)

print(f"Available analyzers: {list(comprehensive_analyzer.analyzers.keys())}")

In [None]:
# Synchronize all data types
synchronized_data = comprehensive_analyzer.synchronize_all_data()

if not synchronized_data.empty:
    print(f"Synchronized data shape: {synchronized_data.shape}")
    print(f"Time range: {synchronized_data.index.min()} to {synchronized_data.index.max()}")
    print(f"Duration: {(synchronized_data.index.max() - synchronized_data.index.min()).total_seconds():.1f} seconds")
    
    # Show column types
    print("\nData types available:")
    for prefix in ['eeg', 'met_', 'mot_', 'pow_', 'dev_']:
        cols = [col for col in synchronized_data.columns if col.startswith(prefix)]
        if cols:
            print(f"  {prefix}: {len(cols)} columns")
else:
    print("Could not synchronize data")

In [None]:
# Analyze EEG-Mental state correlations
eeg_mental_correlations = comprehensive_analyzer.analyze_eeg_mental_correlations()

if 'eeg_mental_correlation' in eeg_mental_correlations:
    corr_matrix = eeg_mental_correlations['eeg_mental_correlation']
    
    print("EEG-Mental State Correlations:")
    
    # Find strongest correlations
    strongest_corrs = []
    for eeg_col in corr_matrix.index:
        for mental_col in corr_matrix.columns:
            corr_val = corr_matrix.loc[eeg_col, mental_col]
            if not pd.isna(corr_val) and abs(corr_val) > 0.1:  # Threshold for significance
                strongest_corrs.append((eeg_col, mental_col, corr_val))
    
    # Sort by absolute correlation value
    strongest_corrs.sort(key=lambda x: abs(x[2]), reverse=True)
    
    print("\nTop 5 EEG-Mental correlations:")
    for i, (eeg_col, mental_col, corr) in enumerate(strongest_corrs[:5]):
        print(f"  {i+1}. {eeg_col} - {mental_col}: {corr:.3f}")
    
    # Visualize correlation matrix
    if corr_matrix.shape[0] > 0 and corr_matrix.shape[1] > 0:
        plt.figure(figsize=(12, 8))
        sns.heatmap(corr_matrix.astype(float), annot=False, cmap='RdBu_r', center=0,
                   cbar_kws={"shrink": .8})
        plt.title('EEG Channels vs Mental State Metrics Correlation')
        plt.xlabel('Mental State Metrics')
        plt.ylabel('EEG Channels')
        plt.xticks(rotation=45)
        plt.yticks(rotation=0)
        plt.tight_layout()
        plt.show()
else:
    print("No EEG-Mental correlations available")

In [None]:
# Analyze motion artifact impact
motion_impact = comprehensive_analyzer.analyze_motion_artifact_impact()

if 'motion_artifact_impact' in motion_impact:
    impact_data = motion_impact['motion_artifact_impact']
    
    print("Motion Artifact Impact on EEG:")
    
    impact_df = pd.DataFrame(impact_data).T
    if not impact_df.empty:
        print("\nMotion-EEG correlations and variability ratios:")
        display(impact_df.round(3))
        
        # Plot motion impact
        fig, axes = plt.subplots(1, 2, figsize=(15, 5))
        
        # Motion correlations
        if 'motion_correlation' in impact_df.columns:
            motion_corrs = impact_df['motion_correlation'].dropna()
            axes[0].bar(range(len(motion_corrs)), motion_corrs.values)
            axes[0].set_title('Motion-EEG Correlations')
            axes[0].set_xlabel('EEG Channels')
            axes[0].set_ylabel('Correlation')
            axes[0].set_xticks(range(len(motion_corrs)))
            axes[0].set_xticklabels(motion_corrs.index, rotation=45)
        
        # Variability ratios
        if 'variability_ratio' in impact_df.columns:
            var_ratios = impact_df['variability_ratio'].dropna()
            axes[1].bar(range(len(var_ratios)), var_ratios.values, color='orange')
            axes[1].set_title('Signal Variability During Motion')
            axes[1].set_xlabel('EEG Channels')
            axes[1].set_ylabel('High/Low Motion Variability Ratio')
            axes[1].set_xticks(range(len(var_ratios)))
            axes[1].set_xticklabels(var_ratios.index, rotation=45)
            axes[1].axhline(y=1, color='red', linestyle='--', alpha=0.7, label='No difference')
            axes[1].legend()
        
        plt.tight_layout()
        plt.show()
else:
    print("No motion artifact analysis available")

## Summary and Next Steps

This notebook provided an exploratory analysis of your EEG and sensor data. Key findings:

1. **Data Quality**: Review the signal quality metrics and sampling rates
2. **EEG Patterns**: Examine frequency band distributions and any artifacts
3. **Mental States**: Look for patterns in attention, engagement, and other metrics
4. **Motion Impact**: Understand how head movement affects signal quality
5. **Correlations**: Identify relationships between different data types

### Recommended Next Steps:

1. **Generate Full Reports**: Run the comprehensive analysis for detailed reports
2. **Signal Processing**: Apply advanced filtering techniques for cleaner signals
3. **Machine Learning**: Build models to predict mental states from EEG
4. **Real-time Analysis**: Implement streaming analysis for live feedback
5. **Artifact Removal**: Develop better motion artifact correction methods

In [None]:
# Generate full analysis reports
print("To generate comprehensive analysis reports, run:")
print("python run_analysis.py")
print("\nOr use the comprehensive analyzer:")
print("output_files = comprehensive_analyzer.run_full_analysis()")
print("\nThis will create HTML visualizations and text reports in the 'output' folder.")