### Overview - Graphing brain connectivity in schizophrenia from EEG data

EEG analysis was carried out using:
1. the raw EEG data, 
as well as the re-referenced data: 
2. the Average Reference Method and
3. the Zero Reference Method.
This allowed us to explore how the choice of reference electrode impacts connectivity outcomes.

EEG data were analyzed using three connectivity methods: Phase-Locking Value (PLV), Phase-Lag Index (PLI), and Directed Transfer Function (DTF), and statistical indices based on graph theory. 

##### In this notebook we will:
  * Graph analysis of EEG data measuring connectivity using three connectivity measures:
    * Directed Transfer Function (DTF)
    * Phase-Locking Value (PLV)
    * Phase-Lag Index (PLI)

In [0]:
# Create a temporary DataFrame for data cleaning purposes

# Filter the DataFrame where `subject` == `c` (control group) and where `subject` == `s` (study group) 

# Zero Reference Method Table
df_zero_ref_C = spark.sql("""SELECT * FROM main.solution_accelerator.eeg_zero_ref_data_silver WHERE subject = 'c' """)
df_zero_ref_S = spark.sql("""SELECT * FROM main.solution_accelerator.eeg_zero_ref_data_silver WHERE subject = 's' """)

# Average Reference Method Table
df_avg_ref_C = spark.sql("""SELECT * FROM main.solution_accelerator.eeg_avg_ref_data_silver WHERE subject = 'c' """)
df_avg_ref_S = spark.sql("""SELECT * FROM main.solution_accelerator.eeg_avg_ref_data_silver WHERE subject = 's' """)

# Show the DataFrame
display(df_zero_ref_C)
display(df_zero_ref_S)

display(df_avg_ref_C)
display(df_avg_ref_S)

##### We need to convert the PySpark Dataframe to a Pandas Dataframe so we can use with the scipy, mne and numpy packages

In [0]:
# Convert our PySpark Dataframe to a Pandas Dataframe
df_zero_ref_C_pd = df_zero_ref_C.toPandas()
df_zero_ref_S_pd = df_zero_ref_S.toPandas()

df_avg_ref_C_pd = df_avg_ref_C.toPandas()
df_avg_ref_S_pd = df_avg_ref_S.toPandas()

##### Graph analysis of EEG data measuring connectivity using three connectivity measures
###### Prepared the dataframes have been grouped by `subject` type

In [0]:
pip install mne

##### Directed Transfer Function (DTF)
Directed Transfer Function (DTF) is a frequency-domain measure derived from multivariate autoregressive (MVAR) modeling of EEG signals. It estimates the directed influence or connectivity between different brain regions in the frequency domain.

In [0]:
import pandas as pd
import mne
import numpy as np
from pyspark.sql.functions import col

# 1. Load the data from the Silver tables

# The Zero Reference Method gave us associated times for the frequencies

# Get the times out
times = df_zero_ref_C['data_time'].values

# Get column names as a list
column_names = df_zero_ref_C.columns

# Drop columns that are not electrodes
columns_to_drop = ['patient_id', 'data_time', 'subject']
df_channels = df_zero_ref_C.drop(*columns_to_drop)

# Get channel names as a list
channel_names = list(df_channels)

eeg_data = df_zero_ref_C[channel_names].values.T  # Transpose to get shape (n_channels, n_times)

# 2. Create an MNE Raw Object

# Define the sampling frequency (in Hz)
sampling_freq = 250 

# Create an MNE Info object
info = mne.create_info(ch_names=channel_names, sfreq=sampling_freq, ch_types='eeg')

# Create the MNE RawArray object
raw = mne.io.RawArray(eeg_data, info, times)

# # Optionally, set the times manually if they are not equidistant
# raw.set_times(times)

# 3. Graph and Analyze the Data with MNE

# Plot the data
raw.plot()

# Apply a band-pass filter
raw.filter(1., 50.)

# Create epochs 
events = np.array([[100, 0, 1], [300, 0, 2], [500, 0, 1]])  # Replace with your actual events
event_id = {'Event1': 1, 'Event2': 2}
epochs = mne.Epochs(raw, events, event_id, tmin=-0.2, tmax=0.5, baseline=(None, 0))

# Compute and plot evoked response
evoked = epochs['Event1'].average()
evoked.plot()