# Magnetic Signal Data Preprocessing Pipeline

## Overview
This notebook implements a comprehensive preprocessing pipeline for magnetic field data collected from Type 1 Diabetes patients and normal subjects. The pipeline processes raw TDMS files containing voltage measurements from magnetic sensors positioned at various anatomical locations (head, hand, liver) and converts them into clean, downsampled signals suitable for analysis.

## Scientific Context
The preprocessing supports research into magnetic field signatures generated by ATP synthase activity in mitochondria. The fundamental hypothesis is that glucose metabolism affects ATP production, which generates detectable magnetic fields that vary with diabetic state and metabolic activity.

## Data Sources
- **Normal Subjects**: Baseline magnetic field measurements from healthy individuals
- **T1DM Clamp Subjects**: Magnetic field data from Type 1 Diabetes patients during insulin clamp procedures
- **Sensor Configuration**: Dual-channel magnetic sensors positioned at:
  - Head (left/right channels)
  - Hand (dual channels)
  - Liver (dual channels)
  - Background (dual channels for noise reference)

## Processing Pipeline

### 1. Raw Data Loading
- Reads TDMS files containing voltage measurements from magnetic sensors
- Applies GMT+2 timezone correction for sensor timestamps
- Handles multiple patient datasets with different subdirectories

### 2. Unit Conversion
- Converts raw voltage signals to magnetic field strength (nanoTesla)
- Uses calibrated conversion factor: 20 nT per 1V
- Maintains temporal alignment across all channels

### 3. Anti-Aliasing Filtering
- Applies 6th-order Butterworth band-pass filter with 0.05-10Hz bandpass
- Removes high-frequency noise and prevents aliasing artifacts
- Uses zero-phase filtering (filtfilt) to preserve signal timing

### 4. Saturation Handling
- Detects and removes saturated samples (>±250 nT threshold)
- Excludes saturated values from averaging calculations
- Tracks saturation statistics for quality assessment

### 5. Downsampling
- Reduces sampling rate from 5000Hz to 25Hz using averaging
- Maintains signal quality while reducing computational requirements
- Preserves temporal resolution sufficient for metabolic analysis

### 6. Data Storage
- Saves processed signals in efficient Parquet format
- Updates patient metadata with processed file locations
- Enables fast loading for subsequent analysis steps

### Signal Processing
- **Anti-aliasing**: 10 Hz cutoff, 6th-order Butterworth
- **Noise Mitigation**: 50 Hz powerline and harmonics awareness (not used in this step)
- **Quality Control**: Saturation detection and removal

## Visualization Features
- Time series plots by channel groups
- Channel correlation analysis (within and between locations)
- Signal quality assessment with saturation flagging
- Cross-correlation analysis for sensor pair validation

## Output Files
- **Processed Signals**: `{PatientName}_downsampled_25hz.parquet`
- **Updated Metadata**: `patients.json` with processing status
- **Quality Metrics**: Saturation statistics and signal characteristics

## Usage Notes
- Set `Visualize_Signal = True` to enable plotting during processing
- Processed files are stored in `ProcessedData/Signal_Files/` directory
- Analysis can resume from processed files without reprocessing raw data
- Memory usage is optimized through incremental processing and cleanup

In [1]:
# General imports:

# Disable warnings:
import warnings

warnings.filterwarnings('ignore')

# Essential imports
import pandas as pd
import polars as pl
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from pathlib import Path
import os
import gc

import json

from tqdm import tqdm
from utils import read_tdms

# Add signal processing imports for antialiasing filter
from scipy.signal import butter, filtfilt

# Plotting enhancements
plt.rcParams['figure.figsize'] = (12, 8)
plt.rcParams['font.size'] = 12
plt.style.use('seaborn-v0_8-whitegrid')

#### Base variables and constants:

In [2]:
# Key Settings:
Visualize_Signal = False # Signal visualization (only if required)

# Physical constants and sensors specifications:
SENSITIVITY = 50  # mV/nT
MAGNETIC_NOISE = 3  # pT/√Hz @ 1 Hz
MAX_AC_LINEARITY = 250  # nT (+/- 250 nT) - Equivalent to 21.78 V
MAX_DC_LINEARITY = 60  # nT (+/- 60 nT)
VOLTAGE_LIMIT = 15 # V (+/-15V)
CONVERSION_FACTOR = 20  # nT per 1V
SAMPLING_FREQUENCY = 5000  # Hz - expected from the experimental data
SENSOR_SATURATION = 250  # nT - saturation threshold for the sensor

# Subjects and their types
Subject = {"Normal": "Normal Subjects","Clamp": "T1DM Clamp Subjects", "Additional": "Additional Subjects"}

# Path and directories
base_dir = Path("../../../Data")

# Output directory for saving results
output_dir = base_dir / "ProcessedData"
os.makedirs(output_dir, exist_ok=True)

# Directory for saving processed/downsampled signal files (parquet format)
signals_dir = output_dir / "Signal_Files"
os.makedirs(signals_dir, exist_ok=True)

# Labels directory
labels_dir = base_dir / "RawData"
labels_filename = "FilteredLabels.xlsx"

# Patients data file
patients_file = "patients.json"

# Patient Data
with open(labels_dir / patients_file, 'r') as f:
    # Load the JSON data
    patients_data = json.load(f)

# GMT zone correction for the sensor = GMT+2
GMT = 2

# Key frequencies from background noise analysis
POWER_LINE_FREQ = 50  # Hz
POWERLINE_HARMONICS = [POWER_LINE_FREQ*i for i in range(1, 4)]  # 50, 100, 150 Hz.

# Filter parameters:
HIGHCUT_FREQ = 10  # Hz low-pass filter cutoff frequency
LOWCUT_FREQ = 0.05 # Hz high-pass filter cutoff frequency
FILTER_ORDER = 6 # for steeper roll-off

# Channel grouping
signal_channels = {
    'Head': ['Head_left', 'Head_right'],
    'Hand': ['Hand1', 'Hand2'],
    'Liver': ['Liver1', 'Liver2'],
    'Background': ['Background1', 'Background2']
}

#### Define voltage to nanoTesla conversion function

In [3]:
def convert_voltage_to_nanotesla(df, signal_channels, conversion_factor):
    """
    Convert raw voltage signals to nanoTesla (nT).

    Parameters:
    - df: polars DataFrame with voltage signal data
    - signal_channels: dictionary of channel groups
    - conversion_factor: conversion factor in nT per Volt (default: 20 nT/V)

    Returns:
    - df_converted: polars DataFrame with signals converted to nT
    """
    # Get all signal channel names (exclude time column)
    signal_column_names = []
    for channel_group in signal_channels.values():
        signal_column_names.extend(channel_group)

    # Apply conversion to each signal channel
    converted_data = {}

    # Keep the time column unchanged
    time_col = None
    for col in df.columns:
        if col.lower() == 'time':
            time_col = col
            break

    if time_col is not None:
        converted_data[time_col] = df[time_col].to_numpy()
        print(f"Time column '{time_col}' preserved during conversion")
    else:
        print("Warning: No time column found in input data")

    print(f"Converting voltage signals to nanoTesla using factor: {conversion_factor} nT/V")

    for channel in tqdm(signal_column_names, desc="Converting channels"):
        if channel in df.columns:
            # Extract voltage signal data
            voltage_signal = df[channel].to_numpy()

            # Convert to nanoTesla: nT = V × conversion_factor
            nanotesla_signal = voltage_signal * conversion_factor

            # Store converted signal
            converted_data[channel] = nanotesla_signal
        else:
            print(f"Warning: Channel '{channel}' not found in data")

    # Convert back to polars DataFrame
    df_converted = pl.DataFrame(converted_data)

    print(f"Voltage to nanoTesla conversion completed. Converted {len(signal_column_names)} channels.")
    print(f"Signal values are now in nanoTesla (nT) units.")

    return df_converted

#### Define the anti-aliasing filter function

In [4]:
def apply_antialiasing_filter(df, lowcut_freq=0.05, highcut_freq=10, sampling_freq=5000, filter_order=6):
    """
    Apply low-pass and high-pass filters separately with improved stability.

    Parameters:
    - df: polars DataFrame with signal data
    - lowcut_freq: low cutoff frequency in Hz (default: 0.05Hz)
    - highcut_freq: high cutoff frequency in Hz (default: 10Hz)
    - sampling_freq: sampling frequency in Hz (default: 5000Hz)
    - filter_order: filter order (default: 6)

    Returns:
    - df_filtered: polars DataFrame with filtered signals
    """
    # Calculate normalized cutoff frequencies
    nyq = 0.5 * sampling_freq
    high_normal = highcut_freq / nyq

    # Design Butterworth low-pass filter
    b_low, a_low = butter(filter_order, high_normal, btype='low', analog=False)

    # Design high-pass filter with improved stability
    apply_highpass = lowcut_freq > 0
    if apply_highpass:
        low_normal = lowcut_freq / nyq
        print(f"High-pass normalized frequency: {low_normal:.6f}")

        # Use lower filter order for very low frequencies to improve stability
        hp_filter_order = min(filter_order, 4) if low_normal < 0.001 else filter_order

        # Design high-pass filter with SOS (Second-Order Sections) for better numerical stability
        sos_high = butter(hp_filter_order, low_normal, btype='high', analog=False, output='sos')
        print(f"Using filter order {hp_filter_order} for high-pass filter")

    # Get all signal channel names (exclude time column)
    signal_column_names = []
    for channel_group in signal_channels.values():
        signal_column_names.extend(channel_group)

    # Apply filter to each signal channel
    filtered_data = {}

    # Keep the time column unchanged
    time_col = None
    for col in df.columns:
        if col.lower() == 'time':
            time_col = col
            break

    if time_col is not None:
        filtered_data[time_col] = df[time_col].to_numpy()
        print(f"Time column '{time_col}' preserved in filtered data")
    else:
        print("Warning: No time column found in input data")

    filter_description = f"low-pass ({highcut_freq}Hz)"
    if apply_highpass:
        filter_description = f"high-pass ({lowcut_freq}Hz) + {filter_description}"

    print(f"Applying {filter_description} filters to channels...")

    for channel in tqdm(signal_column_names, desc="Filtering channels"):
        if channel in df.columns:
            # Extract signal data
            signal = df[channel].to_numpy()

            # First apply low-pass filter
            filtered_signal = filtfilt(b_low, a_low, signal)

            # Then apply high-pass filter if needed using SOS format
            if apply_highpass:
                from scipy.signal import sosfiltfilt
                filtered_signal = sosfiltfilt(sos_high, filtered_signal)

            # Check for NaN values and report
            if np.any(np.isnan(filtered_signal)):
                print(f"Warning: NaN values detected in {channel} after filtering")
                # Option: replace NaN with interpolated values or skip this channel
                nan_count = np.sum(np.isnan(filtered_signal))
                print(f"  {nan_count} NaN values out of {len(filtered_signal)} samples")

                # Simple NaN handling: replace with median of non-NaN values
                if nan_count < len(filtered_signal) * 0.1:  # Less than 10% NaN
                    median_val = np.nanmedian(filtered_signal)
                    filtered_signal = np.where(np.isnan(filtered_signal), median_val, filtered_signal)
                    print(f"  Replaced NaN values with median: {median_val:.3f}")
                else:
                    print(f"  Too many NaN values ({nan_count}/{len(filtered_signal)}), skipping channel")
                    continue

            # Store filtered signal
            filtered_data[channel] = filtered_signal
        else:
            print(f"Warning: Channel '{channel}' not found in data")

    # Convert back to polars DataFrame
    df_filtered = pl.DataFrame(filtered_data)

    print(f"Filtering completed successfully. Processed {len(signal_column_names)} channels.")
    return df_filtered

#### Downsample the data using averaging

In [5]:
def downsample_data(df, original_fs=5000, target_fs=25, fix_saturated=True):
    """
    Downsample the (filtered) data using averaging.

    Parameters:
    - df: polars DataFrame with filtered signal data
    - original_fs: original sampling frequency in Hz (default: 5000Hz)
    - target_fs: target sampling frequency in Hz (default: 25Hz)
    - fix_saturated: bool, if True removes saturated values before averaging (default: True)

    Returns:
    - df_downsampled: polars DataFrame with downsampled signals
    """
    # Calculate downsampling factor
    downsample_factor = original_fs // target_fs
    print(f"Downsampling from {original_fs}Hz to {target_fs}Hz (factor: {downsample_factor})")

    if fix_saturated:
        print("Saturation removal enabled - will exclude saturated values from averaging")

    # Get signal column names (exclude time column)
    signal_column_names = []
    for channel_group in signal_channels.values():
        signal_column_names.extend(channel_group)

    # Calculate number of complete windows
    n_samples = df.shape[0]
    n_windows = n_samples // downsample_factor
    print(f"Processing {n_samples} samples into {n_windows} downsampled points")

    # Initialize dictionary for downsampled data
    downsampled_data = {}

    # Downsample time column if present - check for both 'time' and 'Time'
    time_col = None
    for col in df.columns:
        if col.lower() == 'time':
            time_col = col
            break

    if time_col is not None:
        time_data = df[time_col].to_numpy()
        # Take every nth sample for time (or average if needed)
        downsampled_time = time_data[::downsample_factor][:n_windows]
        downsampled_data[time_col] = downsampled_time
        print(f"Time column '{time_col}' downsampled")
    else:
        print("Warning: No time column found for downsampling")

    # Define saturation thresholds based on physical constants
    # Use the predefined sensor saturation threshold
    saturation_threshold_nt = SENSOR_SATURATION  # Saturation threshold for the sensor
    print(f"Saturation threshold: ±{saturation_threshold_nt} nT")

    # Track saturation statistics
    total_saturated_samples = 0
    saturated_windows = 0

    # Downsample each signal channel using averaging
    print("Downsampling channels using averaging...")
    for channel in tqdm(signal_column_names, desc="Downsampling channels"):
        if channel in df.columns:
            # Extract signal data
            signal = df[channel].to_numpy()

            # Reshape for averaging (trim to complete windows)
            signal_windowed = signal[:n_windows * downsample_factor].reshape(n_windows, downsample_factor)

            if fix_saturated:
                # Apply saturation filtering before averaging
                downsampled_signal = []
                saturation_flags = []  # Track which windows had saturation issues
                channel_saturated_samples = 0
                channel_saturated_windows = 0

                for window in signal_windowed:
                    # Identify saturated samples (beyond threshold)
                    # Use > instead of >= to avoid edge case issues
                    saturated_mask = np.abs(window) > saturation_threshold_nt
                    saturated_count = np.sum(saturated_mask)

                    if saturated_count > 0:
                        channel_saturated_samples += saturated_count
                        channel_saturated_windows += 1

                        # Remove saturated values from averaging
                        valid_samples = window[~saturated_mask]

                        if len(valid_samples) > 0:
                            # Average only non-saturated samples
                            window_avg = np.mean(valid_samples)
                            saturation_flags.append(1)  # Partially saturated
                        else:
                            # If all samples are saturated, mark as invalid
                            # Use NaN to indicate unreliable data
                            window_avg = np.nan
                            saturation_flags.append(2)  # Fully saturated (unreliable)

                        downsampled_signal.append(window_avg)
                    else:
                        # No saturation, use normal averaging
                        downsampled_signal.append(np.mean(window))
                        saturation_flags.append(0)  # No saturation

                downsampled_signal = np.array(downsampled_signal)

                # Note: Saturation flags are tracked for statistics but not stored in output data

                # Update statistics
                total_saturated_samples += channel_saturated_samples
                saturated_windows += channel_saturated_windows

                if channel_saturated_samples > 0:
                    saturation_percentage = (channel_saturated_samples / (n_windows * downsample_factor)) * 100
                    print(f"  {channel}: {channel_saturated_samples} saturated samples ({saturation_percentage:.2f}%) in {channel_saturated_windows} windows")
            else:
                # Standard averaging without saturation handling
                downsampled_signal = np.mean(signal_windowed, axis=1)

            # Store downsampled signal
            downsampled_data[channel] = downsampled_signal
        else:
            print(f"Warning: Channel '{channel}' not found in filtered data")

    # Print saturation summary
    if fix_saturated and total_saturated_samples > 0:
        total_samples = n_windows * downsample_factor * len(signal_column_names)
        overall_saturation_percentage = (total_saturated_samples / total_samples) * 100
        print(f"\nSaturation Summary:")
        print(f"  Total saturated samples: {total_saturated_samples}")
        print(f"  Total windows with saturation: {saturated_windows}")
        print(f"  Overall saturation rate: {overall_saturation_percentage:.2f}%")

    # Convert to polars DataFrame
    df_downsampled = pl.DataFrame(downsampled_data)

    print(f"Downsampling completed. New shape: {df_downsampled.shape}")
    print(f"Effective sampling rate: {target_fs}Hz")

    return df_downsampled

#### Main pipeline:
- Load the TDMS file and extract the data
- Apply the anti-aliasing low-pass filter (10hz cutoff)
- Downsample the data to 25Hz using averaging
- Save the downsampled data to a parquet file

In [6]:

# Process all Normal, Additional and Insulin Clamp patients
patients_to_process = [patient_name for patient_name in patients_data.keys()
                      if "Normal" in patient_name or "Clamp" or "Additional" in patient_name]

print(f"Found {len(patients_to_process)} patients to process:")
for p in patients_to_process:
    print(f"  - {p}")

# Process each patient
for current_patient in patients_to_process:
    print(f"\n{'='*60}")
    print(f"PROCESSING PATIENT: {current_patient}")
    print(f"{'='*60}")

    try:
        # Get patient-specific data
        tdms_file = patients_data[current_patient]["tdms_file"]
        sub_dir = patients_data[current_patient]["sub_dir"]

        # Determine the correct path based on patient type
        if "Clamp" in current_patient:
            path = base_dir / "RawData" / Subject["Clamp"] / sub_dir / tdms_file
        elif "Normal" in current_patient:
            path = base_dir / "RawData" / Subject["Normal"] / sub_dir / tdms_file
        elif "Additional" in current_patient:
            path = base_dir / "RawData" / Subject["Additional"] / sub_dir / tdms_file
        else:
            print(f"Unknown patient type for {current_patient}, skipping...")
            continue

        print(f"Processing file: {path}")

        # Read the data from tdms file into polars dataframe
        try:
            df = read_tdms.read_tdms_file_to_dataframe(str(path), GMT)
            print(f"Successfully loaded data with {df.shape[0]} samples and {df.shape[1]} channels")
            print("Raw data is in Volts - converting to nanoTesla (nT)...")
        except Exception as e:
            print(f"Error loading TDMS file for {current_patient}: {e}")
            continue

        # Convert raw voltage signals to nanoTesla (nT)
        if df is not None:
            df_converted = convert_voltage_to_nanotesla(df, signal_channels, CONVERSION_FACTOR)
            print(f"Converted data shape: {df_converted.shape}")
        else:
            print(f"No data loaded to convert for {current_patient}")
            continue

        # Apply the anti-aliasing filter to all signal channels
        if df_converted is not None:
            print("Applying band-pass filter...")
            df_filtered = apply_antialiasing_filter(df_converted, lowcut_freq=LOWCUT_FREQ, highcut_freq=HIGHCUT_FREQ, sampling_freq=SAMPLING_FREQUENCY, filter_order=FILTER_ORDER)
            print(f"Filtered data shape: {df_filtered.shape}")
        else:
            print(f"No converted data available for filtering for {current_patient}")
            continue

        # Apply downsampling to the filtered data
        if df_filtered is not None:
            print("Downsampling filtered data to 25Hz...")
            df_downsampled = downsample_data(df_filtered, original_fs=SAMPLING_FREQUENCY, target_fs=25)
            print(f"Downsampled data shape: {df_downsampled.shape}")
        else:
            print(f"No filtered data available for downsampling for {current_patient}")
            continue

        # Save downsampled data to parquet file
        if df_downsampled is not None:
            print("Saving downsampled data to parquet file...")
            # Create output filename based on patient name
            patient_safe_name = current_patient.replace(" ", "_").replace("#", "")
            output_filename = f"{patient_safe_name}_downsampled_25hz.parquet"
            output_path = signals_dir / output_filename

            print(f"Saving downsampled data to: {output_path}")

            try:
                # Save to parquet format
                df_downsampled.write_parquet(output_path)
                print(f"Successfully saved downsampled data to {output_filename}")
                print(f"File size: {output_path.stat().st_size / (1024*1024):.2f} MB")

                # Update patients_data with signal_file field
                patients_data[current_patient]['signal_file'] = output_filename
                print(f"Updated patients_data with signal_file: {output_filename}")

            except Exception as e:
                print(f"Error saving parquet file for {current_patient}: {e}")
                continue
        else:
            print(f"No downsampled data available to save for {current_patient}")
            continue

        # Clean up memory
        del df, df_converted, df_filtered, df_downsampled
        gc.collect()

        print(f"Completed processing {current_patient}")

    except Exception as e:
        print(f"Error processing patient {current_patient}: {e}")
        continue

# Save updated patients_data back to JSON file
try:
    patients_json_path = labels_dir / patients_file
    with open(patients_json_path, 'w') as f:
        json.dump(patients_data, f, indent=2)
    print(f"\nSaved updated patients.json to: {patients_json_path}")
    print(f"Processing completed for {len(patients_to_process)} patients")
except Exception as e:
    print(f"Error saving updated patients.json: {e}")

Found 3 patients to process:
  - Additional #1
  - Additional #2
  - Additional #3

PROCESSING PATIENT: Additional #1
Processing file: ..\..\..\Data\RawData\Additional Subjects\Base\ML base_1.tdms
Reading ..\..\..\Data\RawData\Additional Subjects\Base\ML base_1.tdms...
Adjusted 'Time' column by GMT+2 hours
Successfully processed ML base_1.tdms
Successfully loaded data with 155000 samples and 9 channels
Raw data is in Volts - converting to nanoTesla (nT)...
Time column 'Time' preserved during conversion
Converting voltage signals to nanoTesla using factor: 20 nT/V


Converting channels: 100%|██████████| 8/8 [00:00<00:00, 1133.21it/s]


Voltage to nanoTesla conversion completed. Converted 8 channels.
Signal values are now in nanoTesla (nT) units.
Converted data shape: (155000, 9)
Applying band-pass filter...
High-pass normalized frequency: 0.000020
Using filter order 4 for high-pass filter
Time column 'Time' preserved in filtered data
Applying high-pass (0.05Hz) + low-pass (10Hz) filters to channels...


Filtering channels: 100%|██████████| 8/8 [00:00<00:00, 59.92it/s]


Filtering completed successfully. Processed 8 channels.
Filtered data shape: (155000, 9)
Downsampling filtered data to 25Hz...
Downsampling from 5000Hz to 25Hz (factor: 200)
Saturation removal enabled - will exclude saturated values from averaging
Processing 155000 samples into 775 downsampled points
Time column 'Time' downsampled
Saturation threshold: ±250 nT
Downsampling channels using averaging...


Downsampling channels:  50%|█████     | 4/8 [00:00<00:00, 39.02it/s]

  Liver1: 9417 saturated samples (6.08%) in 56 windows
  Liver2: 9708 saturated samples (6.26%) in 56 windows


Downsampling channels: 100%|██████████| 8/8 [00:00<00:00, 38.41it/s]



Saturation Summary:
  Total saturated samples: 19125
  Total windows with saturation: 112
  Overall saturation rate: 1.54%
Downsampling completed. New shape: (775, 9)
Effective sampling rate: 25Hz
Downsampled data shape: (775, 9)
Saving downsampled data to parquet file...
Saving downsampled data to: ..\..\..\Data\ProcessedData\Signal_Files\Additional_1_downsampled_25hz.parquet
Successfully saved downsampled data to Additional_1_downsampled_25hz.parquet
File size: 0.05 MB
Updated patients_data with signal_file: Additional_1_downsampled_25hz.parquet
Completed processing Additional #1

PROCESSING PATIENT: Additional #2
Processing file: ..\..\..\Data\RawData\Additional Subjects\GL Insulin recording\GL insulin_1.tdms
Reading ..\..\..\Data\RawData\Additional Subjects\GL Insulin recording\GL insulin_1.tdms...
Adjusted 'Time' column by GMT+2 hours
Successfully processed GL insulin_1.tdms
Successfully loaded data with 10017500 samples and 9 channels
Raw data is in Volts - converting to nanoTes

Converting channels: 100%|██████████| 8/8 [00:00<00:00, 29.87it/s]


Voltage to nanoTesla conversion completed. Converted 8 channels.
Signal values are now in nanoTesla (nT) units.
Converted data shape: (10017500, 9)
Applying band-pass filter...
High-pass normalized frequency: 0.000020
Using filter order 4 for high-pass filter
Time column 'Time' preserved in filtered data
Applying high-pass (0.05Hz) + low-pass (10Hz) filters to channels...


Filtering channels: 100%|██████████| 8/8 [00:04<00:00,  1.62it/s]


Filtering completed successfully. Processed 8 channels.
Filtered data shape: (10017500, 9)
Downsampling filtered data to 25Hz...
Downsampling from 5000Hz to 25Hz (factor: 200)
Saturation removal enabled - will exclude saturated values from averaging
Processing 10017500 samples into 50087 downsampled points
Time column 'Time' downsampled
Saturation threshold: ±250 nT
Downsampling channels using averaging...


Downsampling channels:  12%|█▎        | 1/8 [00:01<00:13,  1.88s/it]

  Head_left: 8555 saturated samples (0.09%) in 48 windows


Downsampling channels:  25%|██▌       | 2/8 [00:03<00:10,  1.80s/it]

  Head_right: 8551 saturated samples (0.09%) in 47 windows


Downsampling channels:  38%|███▊      | 3/8 [00:05<00:09,  1.81s/it]

  Hand1: 8550 saturated samples (0.09%) in 48 windows


Downsampling channels:  50%|█████     | 4/8 [00:07<00:07,  1.78s/it]

  Hand2: 8549 saturated samples (0.09%) in 48 windows


Downsampling channels:  62%|██████▎   | 5/8 [00:08<00:05,  1.74s/it]

  Liver1: 48449 saturated samples (0.48%) in 277 windows


Downsampling channels:  75%|███████▌  | 6/8 [00:10<00:03,  1.71s/it]

  Liver2: 48624 saturated samples (0.49%) in 268 windows


Downsampling channels: 100%|██████████| 8/8 [00:13<00:00,  1.73s/it]

  Background2: 8557 saturated samples (0.09%) in 48 windows

Saturation Summary:
  Total saturated samples: 139835
  Total windows with saturation: 784
  Overall saturation rate: 0.17%
Downsampling completed. New shape: (50087, 9)
Effective sampling rate: 25Hz
Downsampled data shape: (50087, 9)
Saving downsampled data to parquet file...
Saving downsampled data to: ..\..\..\Data\ProcessedData\Signal_Files\Additional_2_downsampled_25hz.parquet
Successfully saved downsampled data to Additional_2_downsampled_25hz.parquet
File size: 3.06 MB
Updated patients_data with signal_file: Additional_2_downsampled_25hz.parquet





Completed processing Additional #2

PROCESSING PATIENT: Additional #3
Processing file: ..\..\..\Data\RawData\Additional Subjects\June 8 2025 EMF Insulin mimic #1\morris_home_2.tdms
Reading ..\..\..\Data\RawData\Additional Subjects\June 8 2025 EMF Insulin mimic #1\morris_home_2.tdms...
Adjusted 'Time' column by GMT+2 hours
Successfully processed morris_home_2.tdms
Successfully loaded data with 6750000 samples and 9 channels
Raw data is in Volts - converting to nanoTesla (nT)...
Time column 'Time' preserved during conversion
Converting voltage signals to nanoTesla using factor: 20 nT/V


Converting channels: 100%|██████████| 8/8 [00:00<00:00, 47.69it/s]


Voltage to nanoTesla conversion completed. Converted 8 channels.
Signal values are now in nanoTesla (nT) units.
Converted data shape: (6750000, 9)
Applying band-pass filter...
High-pass normalized frequency: 0.000020
Using filter order 4 for high-pass filter
Time column 'Time' preserved in filtered data
Applying high-pass (0.05Hz) + low-pass (10Hz) filters to channels...


Filtering channels: 100%|██████████| 8/8 [00:03<00:00,  2.15it/s]


Filtering completed successfully. Processed 8 channels.
Filtered data shape: (6750000, 9)
Downsampling filtered data to 25Hz...
Downsampling from 5000Hz to 25Hz (factor: 200)
Saturation removal enabled - will exclude saturated values from averaging
Processing 6750000 samples into 33750 downsampled points
Time column 'Time' downsampled
Saturation threshold: ±250 nT
Downsampling channels using averaging...


Downsampling channels:  38%|███▊      | 3/8 [00:03<00:05,  1.09s/it]

  Hand1: 37891 saturated samples (0.56%) in 199 windows


Downsampling channels:  50%|█████     | 4/8 [00:04<00:04,  1.14s/it]

  Hand2: 35550 saturated samples (0.53%) in 194 windows


Downsampling channels:  62%|██████▎   | 5/8 [00:05<00:03,  1.12s/it]

  Liver1: 38923 saturated samples (0.58%) in 232 windows


Downsampling channels:  75%|███████▌  | 6/8 [00:06<00:02,  1.16s/it]

  Liver2: 41324 saturated samples (0.61%) in 247 windows


Downsampling channels: 100%|██████████| 8/8 [00:09<00:00,  1.13s/it]



Saturation Summary:
  Total saturated samples: 153688
  Total windows with saturation: 872
  Overall saturation rate: 0.28%
Downsampling completed. New shape: (33750, 9)
Effective sampling rate: 25Hz
Downsampled data shape: (33750, 9)
Saving downsampled data to parquet file...
Saving downsampled data to: ..\..\..\Data\ProcessedData\Signal_Files\Additional_3_downsampled_25hz.parquet
Successfully saved downsampled data to Additional_3_downsampled_25hz.parquet
File size: 2.06 MB
Updated patients_data with signal_file: Additional_3_downsampled_25hz.parquet
Completed processing Additional #3

Saved updated patients.json to: ..\..\..\Data\RawData\patients.json
Processing completed for 3 patients


#### One may start the analysis from here, by loading the downsampled data from parquet file.

#### Load the downsampled data from parquet file

In [7]:
# Load the downsampled data from parquet file
def load_downsampled_data(patient_name, signals_dir):
    """
    Load downsampled data from parquet file.

    Parameters:
    - patient_name: name of the patient
    - signals_dir: directory containing the parquet signal files

    Returns:
    - df_loaded: polars DataFrame with loaded data
    """
    # Create filename based on patient name
    patient_safe_name = patient_name.replace(" ", "_").replace("#", "")
    output_filename = f"{patient_safe_name}_downsampled_25hz.parquet"
    output_path = signals_dir / output_filename

    try:
        if output_path.exists():
            df_loaded = pl.read_parquet(output_path)
            print(f"Successfully loaded data from {output_filename}")
            print(f"Loaded data shape: {df_loaded.shape}")
            return df_loaded
        else:
            print(f"File not found: {output_path}")
            return None
    except Exception as e:
        print(f"Error loading parquet file: {e}")
        return None

#### Define plotting functions for time series visualization

In [8]:
def plot_timeseries_by_channel(df, signal_channels, patient_name, time_window_hours=None, y_label='Signal (nT)'):
    """
    Plot time series for each channel group.

    Parameters:
    - df: polars DataFrame with time series data
    - signal_channels: dictionary of channel groups
    - patient_name: name of the patient for plot title
    - time_window_hours: optional time window in hours to limit the plot
    """
    if df is None:
        print("No data available for plotting")
        return

    # Convert to pandas for easier plotting
    df_pandas = df.to_pandas()

    # Find the time column (case-insensitive)
    time_col = None
    for col in df_pandas.columns:
        if col.lower() == 'time':
            time_col = col
            break

    if time_col is None:
        print("No time column found for plotting")
        return

    # Limit time window if specified
    if time_window_hours is not None:
        start_time = df_pandas[time_col].min()
        end_time = start_time + pd.Timedelta(hours=time_window_hours)
        df_pandas = df_pandas[df_pandas[time_col] <= end_time]
        print(f"Plotting first {time_window_hours} hours of data")

    # Create subplots for each channel group
    n_groups = len(signal_channels)
    fig, axes = plt.subplots(n_groups, 1, figsize=(15, 4*n_groups))

    if n_groups == 1:
        axes = [axes]

    # Plot each channel group
    for i, (group_name, channels) in enumerate(signal_channels.items()):
        ax = axes[i]

        for channel in channels:
            if channel in df_pandas.columns:
                ax.plot(df_pandas[time_col], df_pandas[channel], label=channel, alpha=0.8)
            else:
                print(f"Warning: Channel '{channel}' not found in data")

        ax.set_title(f'{group_name} Channels - {patient_name}')
        ax.set_xlabel('Time')
        ax.set_ylabel(y_label)
        ax.legend()
        ax.grid(True, alpha=0.3)

        # Format x-axis for better readability
        ax.xaxis.set_major_formatter(mdates.DateFormatter('%H:%M'))
        ax.xaxis.set_major_locator(mdates.HourLocator(interval=1))
        plt.setp(ax.xaxis.get_majorticklabels(), rotation=45)

    plt.tight_layout()
    plt.show()

def plot_all_channels_overlay(df, signal_channels, patient_name, time_window_hours=None, y_label='Signal (nT)'):
    """
    Plot all channels overlaid on a single plot.

    Parameters:
    - df: polars DataFrame with time series data
    - signal_channels: dictionary of channel groups
    - patient_name: name of the patient for plot title
    - time_window_hours: optional time window in hours to limit the plot
    """
    if df is None:
        print("No data available for plotting")
        return

    # Convert to pandas for easier plotting
    df_pandas = df.to_pandas()

    # Find the time column (case-insensitive)
    time_col = None
    for col in df_pandas.columns:
        if col.lower() == 'time':
            time_col = col
            break

    if time_col is None:
        print("No time column found for plotting")
        return

    # Limit time window if specified
    if time_window_hours is not None:
        start_time = df_pandas[time_col].min()
        end_time = start_time + pd.Timedelta(hours=time_window_hours)
        df_pandas = df_pandas[df_pandas[time_col] <= end_time]
        print(f"Plotting first {time_window_hours} hours of data")

    # Create single plot with all channels
    plt.figure(figsize=(15, 8))

    # Define colors for each group
    colors = ['blue', 'red', 'green', 'orange', 'purple', 'brown']

    for i, (group_name, channels) in enumerate(signal_channels.items()):
        color = colors[i % len(colors)]

        for j, channel in enumerate(channels):
            if channel in df_pandas.columns:
                # Use different line styles for channels within the same group
                linestyle = '-' if j == 0 else '--'
                plt.plot(df_pandas[time_col], df_pandas[channel],
                        label=f'{group_name}: {channel}',
                        color=color, linestyle=linestyle, alpha=0.8)
            else:
                print(f"Warning: Channel '{channel}' not found in data")

    plt.title(f'All Channels - {patient_name}')
    plt.xlabel('Time')
    plt.ylabel(y_label)
    plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
    plt.grid(True, alpha=0.3)

    # Format x-axis for better readability
    plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%H:%M'))
    plt.gca().xaxis.set_major_locator(mdates.HourLocator(interval=1))
    plt.setp(plt.gca().xaxis.get_majorticklabels(), rotation=45)

    plt.tight_layout()
    plt.show()

#### Load and visualize the downsampled data

In [9]:
# Example patient name for Insulin Clamp subjects:
patient = "Insulin Clamp #1"  # Example patient name for Insulin Clamp subjects
# Patient = "Normal #1"  # Example patient name for Normal subjects

In [10]:
# Load the data
df_loaded = load_downsampled_data(patient, signals_dir)

if df_loaded is not None:
    print(f"Available columns: {df_loaded.columns}")

    # Check if time column exists and handle accordingly
    time_col = None
    for col in df_loaded.columns:
        if 'time' in col.lower():
            time_col = col
            break

    if time_col is not None:
        print(f"Data time range: {df_loaded[time_col].min()} to {df_loaded[time_col].max()}")
        print(f"Duration: {(df_loaded[time_col].max() - df_loaded[time_col].min()).total_seconds()/3600:.2f} hours")
    else:
        print("No time column found in the data")
        print("Data shape:", df_loaded.shape)
        print("First few rows:")
        print(df_loaded.head())

# Create visualizations if data is available
if df_loaded is not None and Visualize_Signal:
    print("Creating time series visualizations...")

    # Plot by channel groups (separate subplots)
    plot_timeseries_by_channel(df_loaded, signal_channels, patient)



Successfully loaded data from Insulin_Clamp_1_downsampled_25hz.parquet
Loaded data shape: (370600, 9)
Available columns: ['Time', 'Head_left', 'Head_right', 'Hand1', 'Hand2', 'Liver1', 'Liver2', 'Background1', 'Background2']
Data time range: 2025-01-23 08:36:51.884062 to 2025-01-23 12:43:55.844062
Duration: 4.12 hours


#### Plot correlation between sub-channels (Head_left vs Head_right)

In [11]:
def plot_channel_correlation(df, signal_channels, patient_name, time_window_hours=None):
    """
    Plot correlation between sub-channels within each group.

    Parameters:
    - df: polars DataFrame with time series data
    - signal_channels: dictionary of channel groups
    - patient_name: name of the patient for plot title
    - time_window_hours: optional time window in hours to limit the plot
    """
    if df is None:
        print("No data available for plotting")
        return

    # Convert to pandas for easier plotting
    df_pandas = df.to_pandas()

    # Find the time column (case-insensitive)
    time_col = None
    for col in df_pandas.columns:
        if col.lower() == 'time':
            time_col = col
            break

    if time_col is None:
        print("No time column found for plotting")
        return

    # Limit time window if specified
    if time_window_hours is not None:
        start_time = df_pandas[time_col].min()
        end_time = start_time + pd.Timedelta(hours=time_window_hours)
        df_pandas = df_pandas[df_pandas[time_col] <= end_time]
        print(f"Plotting first {time_window_hours} hours of data")

    # Create subplots for each channel group that has multiple channels
    groups_with_pairs = {name: channels for name, channels in signal_channels.items() if len(channels) >= 2}

    if not groups_with_pairs:
        print("No channel groups with multiple channels found for correlation plotting")
        return

    n_groups = len(groups_with_pairs)
    fig, axes = plt.subplots(n_groups, 2, figsize=(15, 4*n_groups))

    if n_groups == 1:
        axes = axes.reshape(1, -1)

    for i, (group_name, channels) in enumerate(groups_with_pairs.items()):
        # Take first two channels for correlation
        ch1, ch2 = channels[0], channels[1]

        if ch1 in df_pandas.columns and ch2 in df_pandas.columns:
            # Get data and clean it
            x_data = df_pandas[ch1].values
            y_data = df_pandas[ch2].values

            # Remove NaN and infinite values
            valid_mask = np.isfinite(x_data) & np.isfinite(y_data)
            x_clean = x_data[valid_mask]
            y_clean = y_data[valid_mask]

            if len(x_clean) == 0:
                print(f"Warning: No valid data points for {group_name} channels after cleaning")
                continue

            # Time series correlation plot
            ax1 = axes[i, 0]
            ax1.plot(df_pandas[time_col], df_pandas[ch1], label=ch1, alpha=0.8, color='blue')
            ax1.plot(df_pandas[time_col], df_pandas[ch2], label=ch2, alpha=0.8, color='red')
            ax1.set_title(f'{group_name} Channels - {patient_name}')
            ax1.set_xlabel('Time')
            ax1.set_ylabel('Signal (nT)')
            ax1.legend()
            ax1.grid(True, alpha=0.3)
            ax1.xaxis.set_major_formatter(mdates.DateFormatter('%H:%M'))
            ax1.xaxis.set_major_locator(mdates.HourLocator(interval=1))
            plt.setp(ax1.xaxis.get_majorticklabels(), rotation=45)

            # Scatter plot for correlation
            ax2 = axes[i, 1]
            ax2.scatter(x_clean, y_clean, alpha=0.5, s=1)
            ax2.set_xlabel(f'{ch1} (nT)')
            ax2.set_ylabel(f'{ch2} (nT)')

            # Calculate correlation coefficient (handle case with insufficient data)
            if len(x_clean) > 1 and np.std(x_clean) > 0 and np.std(y_clean) > 0:
                correlation = np.corrcoef(x_clean, y_clean)[0, 1]
                ax2.set_title(f'{group_name} Correlation: r = {correlation:.3f}')

                # Add trend line with error handling
                try:
                    if np.std(x_clean) > 1e-10:  # Check for sufficient variance
                        z = np.polyfit(x_clean, y_clean, 1)
                        p = np.poly1d(z)
                        x_trend = np.linspace(np.min(x_clean), np.max(x_clean), 100)
                        ax2.plot(x_trend, p(x_trend), "r--", alpha=0.8, linewidth=2)
                except (np.linalg.LinAlgError, np.RankWarning):
                    print(f"Warning: Could not fit trend line for {group_name} channels")
            else:
                ax2.set_title(f'{group_name} - Insufficient data for correlation')

            ax2.grid(True, alpha=0.3)

        else:
            print(f"Warning: Channels '{ch1}' or '{ch2}' not found in data")

    plt.tight_layout()
    plt.show()

def plot_channels_detailed_correlation(df, channel1, channel2, patient_name, time_window_hours=None):
    """
    Detailed correlation analysis between any two channels.

    Parameters:
    - df: polars DataFrame with time series data
    - channel1: name of first channel
    - channel2: name of second channel
    - patient_name: name of the patient for plot title
    - time_window_hours: optional time window in hours to limit the plot
    """
    if df is None:
        print("No data available for plotting")
        return

    # Convert to pandas for easier plotting
    df_pandas = df.to_pandas()

    # Find the time column (case-insensitive)
    time_col = None
    for col in df_pandas.columns:
        if col.lower() == 'time':
            time_col = col
            break

    if time_col is None:
        print("No time column found for plotting")
        return

    # Check if both channels exist
    if channel1 not in df_pandas.columns:
        print(f"Channel '{channel1}' not found in data. Available channels: {list(df_pandas.columns)}")
        return

    if channel2 not in df_pandas.columns:
        print(f"Channel '{channel2}' not found in data. Available channels: {list(df_pandas.columns)}")
        return

    # Limit time window if specified
    if time_window_hours is not None:
        start_time = df_pandas[time_col].min()
        end_time = start_time + pd.Timedelta(hours=time_window_hours)
        df_pandas = df_pandas[df_pandas[time_col] <= end_time]
        print(f"Plotting first {time_window_hours} hours of data")

    # Clean the data
    x_data = df_pandas[channel1].values
    y_data = df_pandas[channel2].values

    # Remove NaN and infinite values
    valid_mask = np.isfinite(x_data) & np.isfinite(y_data)

    if np.sum(valid_mask) == 0:
        print(f"Error: No valid data points found for {channel1} and {channel2}")
        return

    print(f"Using {np.sum(valid_mask)} valid data points out of {len(x_data)} total points")

    # Create figure with subplots
    fig = plt.figure(figsize=(20, 12))

    # 1. Time series overlay
    ax1 = plt.subplot(2, 3, 1)
    ax1.plot(df_pandas[time_col], df_pandas[channel1], label=channel1, alpha=0.8, color='blue')
    ax1.plot(df_pandas[time_col], df_pandas[channel2], label=channel2, alpha=0.8, color='red')
    ax1.set_title(f'{channel1} vs {channel2} Time Series - {patient_name}')
    ax1.set_xlabel('Time')
    ax1.set_ylabel('Signal (nT)')
    ax1.legend()
    ax1.grid(True, alpha=0.3)
    ax1.xaxis.set_major_formatter(mdates.DateFormatter('%H:%M'))
    plt.setp(ax1.xaxis.get_majorticklabels(), rotation=45)

    # 2. Scatter plot with correlation
    ax2 = plt.subplot(2, 3, 2)
    x_clean = x_data[valid_mask]
    y_clean = y_data[valid_mask]

    ax2.scatter(x_clean, y_clean, alpha=0.5, s=1)
    ax2.set_xlabel(f'{channel1} (nT)')
    ax2.set_ylabel(f'{channel2} (nT)')

    # Calculate correlation with error handling
    correlation = np.nan
    if len(x_clean) > 1 and np.std(x_clean) > 0 and np.std(y_clean) > 0:
        correlation = np.corrcoef(x_clean, y_clean)[0, 1]

        # Add trend line with error handling
        try:
            if np.std(x_clean) > 1e-10:
                z = np.polyfit(x_clean, y_clean, 1)
                p = np.poly1d(z)
                x_trend = np.linspace(np.min(x_clean), np.max(x_clean), 100)
                ax2.plot(x_trend, p(x_trend), "r--", alpha=0.8, linewidth=2)
        except (np.linalg.LinAlgError, np.RankWarning):
            print(f"Warning: Could not fit trend line for {channel1} vs {channel2}")

    ax2.set_title(f'{channel1} vs {channel2} Correlation: r = {correlation:.3f}')
    ax2.grid(True, alpha=0.3)

    # 3. Difference signal
    ax3 = plt.subplot(2, 3, 3)
    diff_signal = df_pandas[channel1] - df_pandas[channel2]
    valid_diff = diff_signal.dropna()

    ax3.plot(df_pandas[time_col], diff_signal, color='green', alpha=0.8)
    ax3.set_title(f'Difference Signal ({channel1} - {channel2})')
    ax3.set_xlabel('Time')
    ax3.set_ylabel('Difference (nT)')
    ax3.grid(True, alpha=0.3)
    ax3.xaxis.set_major_formatter(mdates.DateFormatter('%H:%M'))
    plt.setp(ax3.xaxis.get_majorticklabels(), rotation=45)

    # 4. Rolling correlation
    ax4 = plt.subplot(2, 3, 4)
    window_size = max(10, len(df_pandas) // 50)  # Ensure minimum window size
    try:
        rolling_corr = df_pandas[channel1].rolling(window=window_size).corr(df_pandas[channel2])
        ax4.plot(df_pandas[time_col], rolling_corr, color='purple', alpha=0.8)
        ax4.set_title(f'Rolling Correlation (window={window_size})')
    except Exception as e:
        ax4.text(0.5, 0.5, f'Rolling correlation failed:\n{str(e)}',
                transform=ax4.transAxes, ha='center', va='center')
        ax4.set_title('Rolling Correlation - Failed')

    ax4.set_xlabel('Time')
    ax4.set_ylabel('Correlation')
    ax4.grid(True, alpha=0.3)
    ax4.xaxis.set_major_formatter(mdates.DateFormatter('%H:%M'))
    plt.setp(ax4.xaxis.get_majorticklabels(), rotation=45)

    # 5. Histogram of differences
    ax5 = plt.subplot(2, 3, 5)
    if len(valid_diff) > 0:
        ax5.hist(valid_diff, bins=50, alpha=0.7, color='green', edgecolor='black')
        mean_diff = np.mean(valid_diff)
        std_diff = np.std(valid_diff)

        ax5.axvline(mean_diff, color='red', linestyle='--', label=f'Mean: {mean_diff:.3f}')
        ax5.axvline(mean_diff + std_diff, color='orange', linestyle='--', alpha=0.7, label=f'+1σ: {mean_diff + std_diff:.3f}')
        ax5.axvline(mean_diff - std_diff, color='orange', linestyle='--', alpha=0.7, label=f'-1σ: {mean_diff - std_diff:.3f}')
        ax5.legend()
    else:
        ax5.text(0.5, 0.5, 'No valid difference data', transform=ax5.transAxes, ha='center', va='center')

    ax5.set_title('Distribution of Differences')
    ax5.set_xlabel('Difference (nT)')
    ax5.set_ylabel('Frequency')
    ax5.grid(True, alpha=0.3)

    # 6. Cross-correlation
    ax6 = plt.subplot(2, 3, 6)
    try:
        # Downsample for cross-correlation to avoid memory issues
        downsample_factor = max(1, len(x_clean) // 1000)
        x1 = x_clean[::downsample_factor]
        x2 = y_clean[::downsample_factor]

        if len(x1) > 10 and np.std(x1) > 0 and np.std(x2) > 0:
            cross_corr = np.correlate(x1 - np.mean(x1), x2 - np.mean(x2), mode='full')
            cross_corr = cross_corr / (np.std(x1) * np.std(x2) * len(x1))

            lags = np.arange(-len(x1) + 1, len(x1))
            ax6.plot(lags, cross_corr, color='brown', alpha=0.8)
            ax6.axhline(y=0, color='black', linestyle='-', alpha=0.5)
            ax6.axvline(x=0, color='red', linestyle='--', alpha=0.5)

            max_corr_idx = np.argmax(np.abs(cross_corr))
            max_corr_lag = lags[max_corr_idx]
            max_corr_val = cross_corr[max_corr_idx]
        else:
            ax6.text(0.5, 0.5, 'Insufficient data for cross-correlation',
                    transform=ax6.transAxes, ha='center', va='center')
            max_corr_lag = 0
            max_corr_val = 0

    except Exception as e:
        ax6.text(0.5, 0.5, f'Cross-correlation failed:\n{str(e)}',
                transform=ax6.transAxes, ha='center', va='center')
        max_corr_lag = 0
        max_corr_val = 0

    ax6.set_title('Cross-Correlation')
    ax6.set_xlabel('Lag')
    ax6.set_ylabel('Cross-Correlation')
    ax6.grid(True, alpha=0.3)

    plt.tight_layout()
    plt.show()

    # Print summary statistics
    print("\n" + "="*60)
    print(f"CHANNEL CORRELATION ANALYSIS - {patient_name}")
    print("="*60)
    print(f"Channels analyzed: {channel1} vs {channel2}")
    print(f"Valid data points: {np.sum(valid_mask)}/{len(x_data)} ({100*np.sum(valid_mask)/len(x_data):.1f}%)")
    print(f"Correlation coefficient: {correlation:.4f}")
    if len(valid_diff) > 0:
        print(f"Mean difference: {np.mean(valid_diff):.4f} nT")
        print(f"Std deviation of difference: {np.std(valid_diff):.4f} nT")
    try:
        print(f"Max cross-correlation: {max_corr_val:.4f} at lag {max_corr_lag}")
    except:
        print("Cross-correlation analysis failed")
    print("="*60)

#### Run the analysis and visualizations for channel correlations

In [12]:
if df_loaded is None:
    print("No data loaded. Please check the previous steps.")
elif Visualize_Signal:
    print("Data loaded successfully. Proceeding with visualizations...")

    # Plot correlation between sub-channels
    plot_channel_correlation(df_loaded, signal_channels, patient)

    # Detailed correlation analysis for specific channels
    # Head channels
    plot_channels_detailed_correlation(df_loaded, "Head_left", "Head_right", patient)

    # Hand channels
    plot_channels_detailed_correlation(df_loaded, "Hand1", "Hand2", patient)

    # Cross-location comparison
    plot_channels_detailed_correlation(df_loaded, "Liver1", "Liver2", patient)

In [13]:
gc.collect()

64