### Overview - Data Inspection, Create Reference Channel and Create Silver Tables 

While EEG recordings can be made without a reference electrode, the use of a reference electrode is essential for accurate and meaningful EEG analysis. It helps in grounding the electrical potentials, canceling out common-mode noise, facilitating signal comparison, and enabling various analytical techniques. The choice of reference scheme should be made based on the experimental requirements and analytical considerations.

##### In this notebook we will:
  * Contrast the different techniques used in EEG data analysis to re-reference the data.
    * Reference Electrode Standardization Technique (REST) Method
      - REST is a method used in electrochemistry to ensure that measurements taken with different reference electrodes are comparable.  
    * Average Reference Method
      - In this method, you calculate the average signal across all electrodes and subtract this average from each electrode. This method assumes that the average potential of all electrodes represents a good approximation of zero potential.
    * CSD 
  * Examine statistical metrics utilizing Databricks' integrated commands `describe` and `summary` for potential future data manipulation.
    * `describe`: Provides statistics including count, mean, standard deviation, minimum, and maximum.
    * `summary`: describe + interquartile range (IQR)
    * Look for missing or null values in the data that will cause outliers or skew the data, zero values are expected with EEG data so no need to wrangle them.
  * Create Silver Layer Table

##### This notebook we will create the REST Reference Method and respective tables

###### Retrieve data from Bronze Tables

In [0]:
from pyspark.sql.functions import col

df_bronze_control = spark.sql("""SELECT * FROM main.solution_accelerator.eeg_data_bronze_control WHERE patient_id in ('s11', 'h13') ORDER BY index_id ASC""")
df_bronze_study = spark.sql("""SELECT * FROM main.solution_accelerator.eeg_data_bronze_study WHERE patient_id in ('s11', 'h13') ORDER BY index_id ASC""")

# Inspect the DataFrames
# display(df_bronze_control)
# display(df_bronze_study)

df_bronze_control.orderBy(col("patient_id"), col("index_id").asc()).show()
df_bronze_study.orderBy(col("patient_id"), col("index_id").asc()).show()

# Union the PySpark DataFrames
df_bronze_patients = df_bronze_control.union(df_bronze_study)

###### Verify number of rows in DataFrame equals the original raw data

In [0]:
# Show the result
# display(df_bronze_patients)
# df_bronze_patients.groupby('patient_id').count().show()
# display(df_bronze_patients.groupBy('patient_id').count())

###### Convert PySpark Dataframe to Pandas Dataframe

In [0]:
# Convert our two PySpark Dataframes to Pandas Dataframes
# display(df_bronze_patients.head())

df_patients = df_bronze_patients.toPandas().sort_values(by=['index_id'], ascending=True)

# display(df_patients)
# display(df_patients.groupby('patient_id').count())

Databricks data profile. Run in Databricks to view.

##### Contrast the different techniques used in EEG data analysis to re-reference the data.

Each patient has distinct noise and artifact frequencies that are not part of the usable data and must be identified and filtered out.

Contrast the EEG reference points generated by employing multiple methods to determine the perferred EEG reference point for our specific data and use case.

Both the REST and Average Reference (AR) methods are used in EEG (Electroencephalography) to mitigate common noise sources and spatial biases. However, they differ in their approach to reference point selection and signal processing. 


In [0]:
# Helper library with many built-in functions
%pip install mne

###### Let's make a Databricks method to convert PySpark DataFrames to MNE Objects

In [0]:
import mne

# Sampling rate in Hz
sfreq = 250

# Get channel names , extract patient_id column
ch_names = [c for c in df_patients.head() if c != 'patient_id' and c != 'index_id']
# print(f"ch_names:::{ch_names}")

# Extract patient_id column
pt_names = list(df_patients['patient_id'].unique())
# print(f"patient_names:::{pt_names}")

mne_raw_all = {}

for pt in pt_names:
    print("PATIENT_NAME::", pt)
    # Create an info structure needed by MNE
    info = mne.create_info(ch_names=ch_names, sfreq=sfreq, ch_types='eeg')
    df_pt_data = df_patients.loc[df_patients['patient_id'] == pt]
    df_pt_data = df_pt_data.drop(columns=['patient_id', 'index_id'])
    # print("LEN::", len(df_pt_data.index))
    # Convert Pandas Dataframe to Numpy Array for each patient
    np_pt_data = df_pt_data.to_numpy() 
    # Create the MNE Raw object
    mne_raw_pt = mne.io.RawArray(np_pt_data.T, info)
    # The mne raw data object gives us time, assess it as `data, times = raw[:]`  
    # Channel mapping
    mne_raw_pt_w_montage = mne_raw_pt.set_montage('standard_1020')
    mne_raw_all[pt] = mne_raw_pt_w_montage
    
    # Plot the data so we can compare graphs to reference methods later
    mne_raw_pt_w_montage.plot(scalings=dict(eeg=50), start=150, duration=100)
    mne_raw_pt_w_montage.plot_sensors(ch_type="eeg")
    spectrum = mne_raw_pt_w_montage.compute_psd().plot(average=True, picks="data", exclude="bads", amplitude=False)
    
# Now we have our MNE Raw objects and are ready for further analysis


###### REST Method
Reference Electrode Standardization Technique infinity reference

In [0]:
import mne

####### CALCULATING THE REFERENCE ELECTRODE USING THE REST METHOD #######

# Extract patient_id column
pt_names = list(df_patients['patient_id'].unique())
# print(f"patient_names:::{pt_names}")
# print(f"mne_raw_all{mne_raw_all.keys()}")

mne_rest_all = {}

# Calculate the average signal across all channels for each patient
for pt in pt_names:
    mne_raw_pt = mne_raw_all[pt]
    # print(f"type mne_raw_pt:::{type(mne_raw_pt)}")
    if isinstance(mne_raw_pt, mne.io.RawArray):
        # Apply REST Method
        mne_raw_pt.del_proj()  # remove our average reference projector first
        sphere = mne.make_sphere_model("auto", "auto", mne_raw_pt.info)
        src = mne.setup_volume_source_space(sphere=sphere, exclude=30.0, pos=15.0)
        forward = mne.make_forward_solution(mne_raw_pt.info, trans=None, src=src, bem=sphere)
        raw_rest = mne_raw_pt.copy().set_eeg_reference("REST", forward=forward)
        # print(f"type raw_rest:::{type(raw_rest)}")
        mne_rest_all[pt] = raw_rest
        for title, _raw in zip(["Original", "REST (∞)"], [mne_raw_pt, raw_rest]):
            with mne.viz.use_browser_backend("matplotlib"):
                fig = _raw.plot(n_channels=len(mne_raw_pt), scalings=dict(eeg=50))
            # make room for title
            fig.subplots_adjust(top=0.9)
            fig.suptitle(f"{title} reference", size="xx-large", weight="bold")


In [0]:
%sql
-- Dropping the table because we may have updated the Dataframe

DROP TABLE IF EXISTS main.solution_accelerator.eeg_rest_ref_silver;


###### `Export` REST Data and Create Silver Layer Table 

In [0]:
# Export REST Data to Silver Table

from pyspark.sql.functions import lit

created = False

for pt in mne_rest_all:
    print(pt)
    mne_rest_pt = mne_rest_all[pt]
    # print(f"TYPE::{type(mne_rest_pt)}")
    df_rest_pd = mne_rest_pt.to_data_frame(picks=["eeg"])
    # print(f"PT::{pt} LEN:{len(df_rest_pd.index)}")
    # Add a new index column. We do this so the data goes in, in order for the sine waves
    df_rest_pd['index_id'] = df_rest_pd.index
    df_rest_spark = spark.createDataFrame(df_rest_pd)
    df_rest_spark = df_rest_spark.withColumn('patient_id', lit(pt))
    # Establish a persistent delta table by converting the Spark DataFrames into a Delta Table
    if not created:
        # Replace any previously existing table ("overwrite") and register the Spark DataFrames as a Delta table in Catalog
        print("CREATE")
        df_rest_spark.write.format("delta").mode("overwrite").saveAsTable("main.solution_accelerator.eeg_rest_ref_silver")
        created = True
    else:
        print("APPEND")
        df_rest_spark.write.format("delta").mode("append").saveAsTable("main.solution_accelerator.eeg_rest_ref_silver")

##### Examine statistical metrics utilizing Databricks' integrated commands `describe` and `summary` for potential future data manipulation.

In [0]:
# df_rest_master.describe()

In [0]:
# df_rest_master.summary()