<a href="https://colab.research.google.com/github/Cipe96/EEG-Recognition/blob/main/Pre_Processing.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<font size=6>**EEG Recognition: Pre-Processing**</font>
</br><font size=3>*Marco Cipollina, Riccardo Era*</font>


<p style="font-size:4px;" align="justify">In questo notebook viene svolta l'analisi del dataset composto dalla sola prima run del famoso "EEG Motor Movement/Imagery". È possibile trovare informazioni dettagliate riguardo il dataset originale al seguente <a href="https://physionet.org/content/eegmmidb/1.0.0/">link</a>.</p>
<p style="font-size:4px;" align="justify">Oltre che poter osservare gli elementi principali, è possibile svolgere delle analisi specifiche a relativi campioni e volontari.</p>

<font size=4>**Indice:**</font>
*   [Import librerie](#1)
*   [Downloads](#2)
*   [Pre-Procesing](#3)

<a name="1"></a>
# **Import librerie**

Iniziamo installando la libreria MNE, essenziale per l'analisi di dati EEG grazie alla sua gestione di file in formato EDF.

In [None]:
%%capture
# evita l' output a video
!pip install mne

Importiamo le librerie e montiamo Google Drive per garantire l'accesso agli altri file.

In [None]:
import matplotlib.pyplot as plt
from collections import Counter
from google.colab import drive
import pandas as pd
import numpy as np
import json
import sys
import mne
import os

In [None]:
drive.mount('/content/drive', force_remount=True)

Mounted at /content/drive


<a name="2"></a>
# **Download**

In [None]:
#@title Percorso della cartella del progetto su Google Drive:

#@markdown Se la cartella del progetto si trova nella root di Drive, scrivere solo il suo nome:
PERCORSO_DRIVE = "EEG Recognition" #@param {type:"string"}

PERCORSO_DRIVE = '/content/drive/MyDrive/' + PERCORSO_DRIVE

In [None]:
sys.path.append(PERCORSO_DRIVE)                       # ci permetterà di importare le funzioni presenti in altri file
from shared_utilities import download_dataset

Scarichiamo dal file "EEG_Motor_Movement-Imagery_R01_ID.json" l'ID necessario per il download del dataset.

In [None]:
with open(PERCORSO_DRIVE + '/EEG_Motor_Movement-Imagery_R01_ID.json', 'r') as file:
  config = json.load(file)

DATASET_ID = config['DATASET_ID']

In [None]:
#@title Impostazioni download

#@markdown Nome del zip dataset post download:
DATASET_NAME = 'EEG_Motor_Movement-Imagery.zip' #@param {type:"string"}

download_dataset(DATASET_ID, DATASET_NAME, msg=True)

Downloading...
From (original): https://drive.google.com/uc?id=1WwuAh25Jfx-I8rY3vFGyXiI79YfLYUpH
From (redirected): https://drive.google.com/uc?id=1WwuAh25Jfx-I8rY3vFGyXiI79YfLYUpH&confirm=t&uuid=6e88ffc4-dd26-483b-9109-a60b460016cf
To: /content/EEG_Motor_Movement-Imagery.zip
100%|██████████| 76.6M/76.6M [00:05<00:00, 13.6MB/s]



File scaricato e salvato come EEG_Motor_Movement-Imagery.zip!



Dopo aver scaricato il dataset lo unzippiamo ed eliminiamo i file txt e la cartella sample_data creata automaticamente da Colab.

In [None]:
%%capture

! unzip "{DATASET_NAME}"              # unzippa il file zip
! rm /content/AMSL/*.txt              # elimina i file txt
! rm -r /content/sample_data          # elimina la cartella di default di Colab

# Pre-processing

In [None]:
from mne.preprocessing import ica
train = []
label_tr = []
val = []
label_val = []
test = []
label_ts = []
classe = 0

# Patch length --> 240 samples == 1.5 second

l_patch = 240

# Split into 5 frequency bands

high = {'alpha' : 13, 'beta' : 30, 'delta' : 4, 'gamma' : 40, 'theta' : 8, 'broadband' : None } # High frequencies
low = {'alpha' : 8, 'beta' : 13, 'delta' : 0.5, 'gamma' : 30, 'theta' : 4, 'broadband' : 1 } # Low frequencies

sec = 60 / 1.5 # samples taken in 1.5 seconds
ntr = sec * 0.7  # train set window
nvd = sec * 0.15 # validation set window
nts = sec * 0.15 # test set window

for eeg in files[:len(classi)]:

  raw = mne.io.read_raw_edf(input_fname = '/content/EEG_T0/'+eeg, preload = True, verbose = 'CRITICAL')
  raw.filter(l_freq = low['broadband'], h_freq = high['broadband'], n_jobs = 8, verbose = 'CRITICAL')

  # Impostiamo il riferimento medio
  raw.set_eeg_reference('average', projection=True)

  # Creiamo un'istanza ICA e la adattiamo ai dati
  ica = mne.preprocessing.ICA(n_components=64, random_state=97, max_iter=800)
  ica.fit(raw)

  # Identifica i componenti associati ai movimenti oculari (ad es. con canale EOG)
  eog_indices, eog_scores = ica.find_bads_eog(raw)  # Usa canali EOG, se presenti
  ica.exclude = eog_indices  # Escludi i componenti identificati come rumore

  # Identifichiamo automaticamente i componenti associati ai movimenti oculari
  ica.detect_artifacts(raw)

  # Rimuoviamo i componenti identificati
  raw_ica = ica.apply(raw.copy())

  rec = raw_ica.get_data()
  h=0

  while(h < l_patch*ntr):
    train.append(rec[:,h:h+l_patch]) # train set
    label_tr.append(classe)
    h=h+l_patch

  # RICORDA DI FARE EARLY STOP     D O P O
  while(h < (ntr+nvd)*l_patch):
    val.append(rec[:,h:h+l_patch]) # validation set
    label_val.append(classe)
    h=h+l_patch

  while(h < l_patch*(ntr+nvd+nts) and h + l_patch <= rec.shape[1]):
    test.append(rec[:,h:h+l_patch]) # test set
    label_ts.append(classe)
    h=h+l_patch

  classe = classe + 1


EEG channel type selected for re-referencing
Adding average EEG reference projection.
1 projection items deactivated
Average reference projection was added, but has not been applied yet. Use the apply_proj method to apply it.
Fitting ICA to data using 64 channels (please be patient, this may take a while)
Selecting by number: 15 components
Fitting ICA took 0.7s.


AttributeError: 'ICA' object has no attribute 'detect_artifacts'

In [None]:
train_array = np.array(train)  # Converti la lista train in un array NumPy
print(train_array.shape)
val_array = np.array(val)  # Converti la lista train in un array NumPy
print(val_array.shape)
test_array = np.array(test)  # Converti la lista train in un array NumPy
print(test_array.shape)

label_tr = np.array(label_tr)
print(label_tr.shape)
label_val = np.array(label_val)
print(label_val.shape)
label_ts = np.array(label_ts)
print(label_ts.shape)
print(rec.shape)

(9600 * 240) / 109 = 4360 /n
4360 * 0.7 = 3052 /n
4360 * 0.15 = 654