<a href="https://colab.research.google.com/github/Ekliipce/Machine-Learning-for-Biomedical/blob/pre-processing/eeg/EEG_and_alcohol.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Electroencephalogram (EEG) and alcohol


## **EEG**
#### **What is EEG ?**
An electroencephalogram (EEG) is a test that records the brain's electrical activity noninvasively through electrodes placed on the scalp. The procedure involves placing these electrodes that are connected by wires to a computer, which then records and analyzes the electrical impulses in the brain. EEG is used for diagnosing and managing brain-related disorders like epilepsy, monitoring brain activity during surgeries, and conducting neuroscience research.

EEG patterns, consisting of different waves, are analyzed to understand normal or abnormal brain function. The procedure is safe, though preparation is required, and it might be slightly uncomfortable. EEG primarily detects activity in the brain's cortex with limited spatial resolution and can be affected by various factors like age and medication. Unlike MRI and CT scans that visualize brain structure, EEG captures real-time activity, making it a valuable tool in neuroscience and medicine.
<br><br>
#### **What does an EEG help diagnose?**

EEG is used primarily to diagnose conditions that affect brain activity. It’s particularly useful in identifying epilepsy and other seizure disorders by capturing the electrical activity of the brain. Besides, EEG can also help diagnose or manage other conditions like sleep disorders, depth of anesthesia, coma, encephalopathies, brain death, and certain psychiatric disorders. It is often used in conjunction with other diagnostic tools to provide comprehensive insights into brain health and function.
<br><br>

#### **What factors can influence the results of an EEG?**

Various factors can influence EEG results. Medications (such as sedatives, anti-epileptic drugs) can alter electrical activity in the brain, affecting the test's findings. The patient's age and overall brain development can also play a role in the results. The physical and mental state of the patient during the test, like being stressed, relaxed, asleep, or awake, can also influence the brain's electrical activity. External interference from electronic devices and not following preparatory instructions (like washing hair to ensure good electrode contact) can also impact the data quality and test outcomes.
<br><br>
#### **How reliable is EEG in diagnosing various brain disorders?**

EEG is a reliable tool for diagnosing disorders related to abnormal brain activity, like epilepsy. However, its reliability can be influenced by the technician's skill, the patient's cooperation, and the above-mentioned factors that might affect the results. While EEG provides valuable real-time data on brain function, it might not catch intermittent or infrequent abnormalities in brain activity if they don't occur during the test. Therefore, it's often used alongside other diagnostic methods, like MRI or CT scans, to provide a more complete picture of brain health and accurate diagnosis.


## **Brain and Alcohol**
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6668890/ searched deeper for Alcoholism and Human Electrophysiology.

Interestingly, the article suggests that the observed electrical abnormalities in the brains of alcoholics might not be a result of alcohol consumption per se, but rather a pre-existing condition, possibly serving as a risk marker for alcoholism. Some of these electrical characteristics, such as increased resting beta power and decreased active theta oscillations during cognitive tasks, have also been identified in individuals at high risk for developing alcoholism, even before any exposure to alcohol. Therefore, the text proposes that an inherent imbalance in CNS excitation and inhibition might predispose individuals to alcoholism. This imbalance is suggested to not only contribute to the risk of developing alcoholism but might also offer insights into the neurobiology of craving and relapse in alcoholism

## Dataset

In [1]:
%%shell
wget https://archive.ics.uci.edu/static/public/121/eeg+database.zip
unzip -q eeg+database.zip
gunzip -k eeg_full/*.gz
for file in eeg_full/*.tar; do tar -xf $file -C eeg_full; done
gunzip -k eeg_full/*/*.gz
rm eeg_full/*.tar.gz eeg_full/*.tar eeg_full/*/*.gz
mkdir train
mkdir test

--2023-10-13 15:46:42--  https://archive.ics.uci.edu/static/public/121/eeg+database.zip
Resolving archive.ics.uci.edu (archive.ics.uci.edu)... 128.195.10.252
Connecting to archive.ics.uci.edu (archive.ics.uci.edu)|128.195.10.252|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified
Saving to: ‘eeg+database.zip’

eeg+database.zip        [       <=>          ] 762.44M  64.0MB/s    in 12s     

2023-10-13 15:46:54 (61.9 MB/s) - ‘eeg+database.zip’ saved [799481741]





In [2]:
! echo 'file_name' > eeg_full.csv
! find eeg_full -type f -exec bash -c '[[ $(wc -l < "$1") -gt 4 ]]' _ {} \; -print >> eeg_full.csv

In [3]:
!pip install -q -U mne

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/7.7 MB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.1/7.7 MB[0m [31m4.6 MB/s[0m eta [36m0:00:02[0m[2K     [91m━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.7/7.7 MB[0m [31m25.9 MB/s[0m eta [36m0:00:01[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m7.7/7.7 MB[0m [31m75.8 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.7/7.7 MB[0m [31m56.9 MB/s[0m eta [36m0:00:00[0m
[?25h

In [4]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import re
import mne
import os

from tqdm import tqdm
import torch
from torch.utils.data import Dataset

In [23]:
def extract_raw_data(file):
  with open(file) as f:
    lines = f.readlines()

    data = []
    channel_names = []
    is_Alcolic = lines[0][5] == 'a'
    id_patient = int(lines[0][6:13])


    if(len(lines)<=3):
      return None,None,None

    l3 = lines[3].split()
    obj = l3[1]
    trial = int(lines[4].split()[-1])


    for line in lines[4:]:
      line_split = line.split()

      if (not line.startswith('#')):
        values = line_split[-1]
        data.append(values)
      else :
        ch_name = line_split[1]
        channel_names.append(ch_name)

    data = np.array(data, dtype="float").reshape((64, -1)) * 1e-6
    info = {"his_id" : id_patient,  "is_Alcolic" : is_Alcolic,'id': id_patient,
            "trial": trial, "obj": obj}
    return data, channel_names, info


In [6]:
! cat /content/eeg_full/co2c1000367/co2c1000367.rd.089

# co2c1000367.rd
# 120 trials, 64 chans, 416 samples 368 post_stim samples
# 3.906000 msecs uV


In [7]:
extract_raw_data('/content/eeg_full/co2c1000367/co2c1000367.rd.089')

(None, None, None)

In [8]:
def save_raw_files(file, save=False, train=True):
  file_name = file.split("/")[-1].replace(".", "_") + "_eeg"

  data, ch_names, info_patient = extract_raw_data(file)
  if(data is None):
    return None
  info = mne.create_info(ch_names=ch_names, sfreq=256, ch_types='eeg')

  raw = mne.io.RawArray(data=data, info=info, verbose=False)
  raw.info['subject_info'] = info_patient


  if (save):
    dir = "train" if train else "test"
    raw.save(f"{dir}/{file_name}.fif", overwrite=True, verbose=False)
  return raw



In [9]:
for dir_name, subdirs, files in tqdm(list(os.walk('/content/'))):
    for file_name in files[1:]:
      if ((".rd.") in file_name):
        current_file = os.path.join(dir_name, file_name)
        #train = "TRAIN" in current_file
        save_raw_files(current_file, save=True)


 19%|█▉        | 25/131 [01:20<05:42,  3.23s/it]


KeyboardInterrupt: ignored

Create two directroies which which will contain data as csv.

In [None]:
! [ -e train_csv ] || mkdir train_csv
! [ -e test_csv ] || mkdir test_csv

Generate a dataframe from an eeg file.

In [None]:

def generate_df(file,save=False,train=True,verbose=False,frequency = 256):
  df = pd.DataFrame()
  data, channel_names, info = extract_raw_data(file)
  file_name = file.split("/")[-1]
  for time_series,channel in zip(data, channel_names):
    time = np.arange(0,time_series.shape[-1])*1/time_series.shape[-1]
    cur_df = pd.DataFrame({'value':time_series,'channel':channel,**info,'time':time})
    if(df is None):
      df = cur_df
    else:
      df = pd.concat([df,cur_df],ignore_index=True)
  if(save):
    dir = "train" if train else "test"
    saving_path = f'{dir}_csv/{file_name}.csv'
    df.to_csv(saving_path)
  return df



In [None]:
df = generate_df('/content/SMNI_CMI_TRAIN/co2c0000338/co2c0000338.rd.014')
df.head()

In [None]:
df[df['channel'] == 'Y'].plot(x='time',y='value')

In [None]:
def generated_csv(content_dir:str='/content/',train=True):
  for dir_name, subdirs, files in tqdm(list(os.walk(content_dir))):

    for file_name in (files):
      current_file = os.path.join(dir_name, file_name)
      regex = r'.+\.rd\.\d+$'

      if (re.match(regex,file_name)):
        generate_df(current_file,save=True,train=train)


In [None]:
for data_dir,train in zip(['SMNI_CMI_TRAIN','SMNI_CMI_TEST'],[True,False]):
  generated_csv(data_dir,train)

In [None]:
content = list(os.walk('train'))
raw = mne.io.read_raw(f"{content[0][0]}/{content[0][2][0]}")
raw.plot()

In [None]:
raw.get_data()

In [None]:
from scipy.stats import skew, kurtosis

def extract_features(data):
  features_names = []
  features = []
  for i, channel_data in enumerate(data):
      mean = np.mean(channel_data)
      var = np.var(channel_data)
      skewness = skew(channel_data)
      kurt = kurtosis(channel_data)
      channel_features = [mean, var, skewness, kurt]
      channel_names = ['mean', 'var', 'skewness', 'kurt']

      features.extend(channel_features)
      features_names.extend(list(map(lambda x : x + f"_{i}", channel_names)))

  return features, features_names


In [None]:
def create_df(dir):
  df_data = []
  for dir, subdir, files in tqdm(list(os.walk(dir))):
    for f in files[1:]:
      data, ch_names, info= extract_raw_data(f"/content/{dir}/{f}")
      features, features_names = extract_features(data)
      features.extend([info['obj'], info['is_Alcolic']])
      df_data.append(features)
  features_names.extend(['obj', 'alcoholic'])
  return pd.DataFrame(df_data, columns=features_names)

df_train = create_df("eeg_full")
#df_test = create_df("SMNI_CMI_TEST")
df_train.head()

In [None]:
import torch
from torch.utils.data import Dataset

In [None]:
df_train.shape, df_test.shape

In [None]:
def preprocess(df):
  dummies = pd.get_dummies(df['obj'])
  df = df.drop(columns="obj")
  df = pd.concat([df, dummies], axis=1)
  df['alcoholic'] = df['alcoholic'].replace({True: 1, False: 0}).astype(int)

  return df.dropna()

df_train = preprocess(df_train)
df_test = preprocess(df_test)
df_train.head()

In [None]:
plt.hist(df_train['alcoholic'], alpha=0.7, edgecolor="k")
plt.title("Dataframe histogramme")
plt.ylabel("Count")
plt.xticks([0, 1], ['Control', 'Alcoholic'])
plt.show()

In [None]:
from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler()
scaler.fit(df_train)
scaled = scaler.transform(df_train)
scaled_test = scaler.transform(df_test )

df_train_scaled = pd.DataFrame(scaled, columns=df_train.columns)
df_test_scaled = pd.DataFrame(scaled, columns=df_train.columns)
df_train_scaled.head()

In [None]:
from sklearn.svm import SVR


X_train, y_train = df_train_scaled.loc[:, df_train_scaled.columns != 'alcoholic'], df_train_scaled['alcoholic']
X_test, y_test = df_test_scaled.loc[:, df_test_scaled.columns != 'alcoholic'], df_test_scaled['alcoholic']
regr = SVR(C=1.0, epsilon=0.2)
regr.fit(X_train, y_train)

In [None]:
from sklearn.metrics import accuracy_score

accuracy_score(regr.predict(X_test).astype(int), y_test.astype(int))

In [143]:
import torch
from torch.utils.data import Dataset

In [144]:
DEVICE = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(DEVICE)

cuda:0


In [145]:
class EEGDataset(Dataset):
  def __init__(self,eeg_dir:str,eeg_files:str):
    # generate X = ( eeg: 64x256 , s_type : 2x1)
    # generate y = ()
    self.eeg_dir = eeg_dir
    self.eeg_files = pd.read_csv(eeg_files)
    self.s_objects = ['S1','S2']
    self.s_object_table = {s_object: i for i,s_object in enumerate(self.s_objects)}
    self.num_objects = len(self.s_objects )


  def __getitem__(self, idx):

    file_name = self.eeg_files['file_name'][idx]
    data, _, info= extract_raw_data(file_name)
    if(data is None or info is None):
      return None
    object_vector = np.zeros(self.num_objects)
    obj_idx = self.s_object_table[info['obj']]
    object_vector[obj_idx] = 1
    object_tensor = torch.tensor(object_vector).to(DEVICE)

    alcoholic = torch.zeros(2,dtype=float).to(DEVICE)

    alcoholic[int(info['is_Alcolic'])] = 1
    tensor_data = torch.tensor(data).to(DEVICE)

    return (tensor_data,object_tensor),alcoholic

  def __len__(self):
    return len(self.eeg_files)

In [146]:
truth = True
alcoholic = torch.zeros(2,dtype=float).to(DEVICE)
alcoholic[int(truth)] = 1
alcoholic

tensor([0., 1.], device='cuda:0', dtype=torch.float64)

In [147]:
eeg_ds = EEGDataset('eeg_full','eeg_full.csv')
train_size = int(0.8 * len(eeg_ds))
test_size = len(eeg_ds) - train_size
train_ds,test_ds  = torch.utils.data.random_split(eeg_ds,[train_size,test_size])

In [148]:
from torch.utils.data import DataLoader
train_dataloader = DataLoader(train_ds, batch_size=64, shuffle=True)
test_dataloader = DataLoader(test_ds, batch_size=64, shuffle=True)

In [149]:
from torch import nn
import torch.nn.functional as F
class EEG_NN(nn.Module):
    def __init__(self):
      super(EEG_NN,self).__init__()

      self.conv1 = nn.Conv2d(1,64,3,dtype=torch.double)
      self.relu1 = nn.ReLU()
      self.maxpool1 = nn.MaxPool2d((2,2))

      self.conv2 = nn.Conv2d(64,32,3,dtype=torch.double)
      self.relu2 = nn.ReLU()
      self.maxpool2 = nn.MaxPool2d((2,2))

      self.conv3 = nn.Conv2d(32,16,3,dtype=torch.double)
      self.relu3 = nn.ReLU()
      self.maxpool3 = nn.MaxPool2d((2,2))

      self.flatten = nn.Flatten(1)
      self.lin1 = nn.Linear(2882,64,dtype=torch.double)
      self.lin2 = nn.Linear(64,2,dtype=torch.double)
      self.check_cuda()
    def check_cuda(self):
      if(torch.cuda.is_available()):
        print('CUDA seems to be available')
        self.to(DEVICE)

    def forward(self, x):
        x1,x2 = x
        x1 = x1.unsqueeze(1)
        # print(x1.size())
        x1 = self.conv1(x1)
        x1 = self.relu1(x1)
        x1 = self.maxpool1(x1)

        x1 = self.conv2(x1)
        x1 = self.relu2(x1)
        x1 = self.maxpool2(x1)

        x1 = self.conv3(x1)
        x1 = self.relu3(x1)
        x1 = self.maxpool3(x1)

        x1 = self.flatten(x1)
        x1 = torch.cat((x1,x2),dim=1)
        x1 = self.lin1(x1)
        x1 = self.lin2(x1)
        x1 = nn.Sigmoid()(x1)
        return x1


eeg_model = EEG_NN()

u = None
for x,y in train_dataloader:
  u = eeg_model(x)
  print(compute_num_correct_pred(u,y))

  break
u.size(), u.is_cuda

CUDA seems to be available
0


(torch.Size([64, 2]), True)

In [150]:
def train_model(model,train_loader,optimizer,loss_func):
    model.train()

    for x,y in tqdm(train_loader):
        out = model(x)
        loss = loss_func(out,y)
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()


In [151]:
def compute_num_correct_pred(y_prob:torch.tensor, y_label:torch.tensor):
  y_pred = (y_prob >= 0.5).float()

  correct_predictions = torch.all(y_pred == y_label,dim=1).sum()
  return int(correct_predictions)

In [152]:
def test(loader,net,verbose=False):
    net.eval()
    correct = 0
    with torch.no_grad():
        for x,y in tqdm(loader):
            out = net(x)
            correct += compute_num_correct_pred(out, y)
    if(verbose):
      print(f'{correct} prediction on {len(loader.dataset)} samples')
    return correct / len(loader.dataset)

In [153]:
def full_train(model,train_loader,test_loader,optimizer,loss_func,n_epochs,verbose=False):
  train_accs = []
  test_accs = []
  for i in range(n_epochs):
    train_model(model,train_loader,optimizer,loss_func)
    train_acc = test(train_loader,model)
    if(verbose):
      print(f'Train accuracy {train_acc:.2f}')

    test_acc =test(test_loader,model)
    if(verbose):
      print(f'Test accuracy {test_acc:.2f}')

    train_accs.append(train_acc)
    test_accs.append(test_acc)
  history_df = pd.DataFrame({'train_accuracy':train_accs,'test_accuracy':test_accs})
  history_df.to_csv(('history.csv'))
  return history_df

In [None]:
eeg_model = EEG_NN()
learning_rate = 0.001
optimizer = torch.optim.SGD(eeg_model.parameters(), lr=learning_rate)
loss_func = nn.BCELoss()
full_train(eeg_model,train_loader=train_dataloader,test_loader=test_dataloader,optimizer=optimizer,loss_func=loss_func,n_epochs=10)

CUDA seems to be available


100%|██████████| 139/139 [02:03<00:00,  1.13it/s]
100%|██████████| 139/139 [02:21<00:00,  1.02s/it]
100%|██████████| 35/35 [00:27<00:00,  1.26it/s]
100%|██████████| 139/139 [02:01<00:00,  1.14it/s]
100%|██████████| 139/139 [01:53<00:00,  1.22it/s]
100%|██████████| 35/35 [00:27<00:00,  1.29it/s]
 63%|██████▎   | 88/139 [01:17<00:41,  1.22it/s]