<a href="https://colab.research.google.com/github/arjunsinghrathore/Subject-Independent-Emotion-Recognition/blob/main/CHB_MIT_dataset.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#01. Overview of Datasets

by [David Luke Elliott](https://www.lancaster.ac.uk/psychology/about-us/people/david-elliott)
/ [GitHub](https://github.com/Eldave93) 

Welcome to the first notebook in my series on demonstrating the application of signal processing and machine learning classification to epileptic seizure detection!

The purpose for this notebook is:
1. To get a basic understanding of what a seizure is. 
2. How EEG can be used to measure it.
3. What datasets are out there to start building machine learning algorithms to detect it.

## Background: Epilepsy and Electroencephalography

Epilepsy is the tendency to have unprovoked and recurrent seizures. Epileptic seizures are often accompanied by an alteration of consciousness, symptomatic of abnormal, excessive, or synchronized neuronal discharges which are either widespread or localized in nature<sup>1,2</sup>. There are over 40 types of epilepsy<sup>3</sup> and over 40 different types of seizure; of which individuals may experience several<sup>4</sup>. Clinical manifestations of epilepsy are dependent on several factors; such as the particular epilepsy syndrome, patients age, the brain area that generates seizures, and if discharges remain local or propagate to other brain areas<sup>1</sup>. 

Whilst all seizures result from an increase in cellular excitability, the mechanisms of synchronization differ between seizures, broadly categorising them as focal or generalized epilepsies. Although historically atonic, tonic, clonic, tonic-clonic, myoclonic, or absence seizures were thought to be “primarily generalized” in nature, there is an increasing acceptance that these still originate in local microcircuits which then propagate to other areas<sup>5,6</sup>; representative of a larger shift towards viewing epilepsy as a dysfunction of neuronal networks than single sources<sup>7</sup>. 

The diagnosis of epilepsy relies on the identification of clinical features specific to a particular epilepsy syndrome. Electroencephalography (EEG), magnetic resonance imaging (MRI) reports, and verbal descriptions of seizures are the most commonly available information to neurologists; with hospital records, seizure diaries, and videos of patient events desirable but not always available<sup>8</sup>. In clinic scalp EEG is commonly used as it provides an un-invasive, easy, and inexpensive method to characterise the mean electrical activity generated by the synchronous firing of open field neurons at a high temporal resolution. 

The time series gained from an EEG amplifier is a digital sample of analogue voltage recordings generated by the synchronous firing of open field neurons in the brain. The digital EEG therefore approximates the continuous time signal of neural activity through the discrete sampling of points. The typical sampling rates for clinical EEG typically lie between 200 and 500Hz, meaning the spectral components generated by the cortex predominately focused on by neurologists, typically within the 1 to 30Hz range, can be estimated without aliasing<sup>12</sup>.

Typically, in the UK national health service (NHS), patients have an approximately 30-minute scalp EEG assessment, during which the patient may be asked to hyperventilate or exposed to photic stimulation to provoke a seizure. If a diagnosis is suspected, but not gained, a patient may then have a longer EEG assessment. Human experts, trained to qualitatively assess EEG records for epilepsy, will look at the EEG record to identify the presence and type of epilepsy, assessing the data based on a number of aspects. The spatial and temporal information is used to report seizures when the EEG appears to have seizure-like oscillations over a long duration and a number of channels. The pattern of EEG needs to be clearly different from the background activity, with consideration given to the difference of awake and asleep background EEG. The appearance of an epileptic event comparitive to artefacts or rhythms is also required to avoid falsely classify artefactual activity<sup>9</sup>. 

Manual review of EEG is time consuming, expensive, and prone to error<sup>9</sup>. Indeed, it has been found below 80\% of events were similarly identified between two or more experts on a previously marked EEG record<sup>10</sup>. Indeed, in developed countries, such as the UK, misdiagnosis rates are estimated to be between 20-30 percent, and consequently costly to the health service<sup>11</sup>. The limitations of scalp EEG no doubt factor into these misclassifications. Scalp EEG has limited spatial sensitivity, as the signal needs to propagate through several layers of non-neural tissue, and therefore require larger brain areas to have synchronous activity. Scalp EEG is also often contaminated with artefacts, which represent noise caused by sources other than the brain such as by ambient electromagnetic interference, eye blinks, and muscle movements. Due to these limitations, intra-cranial EEG is therefore more often used for pre-surgical analysis to determine brain regions for surgical resection, as it is less effected by artefacts and has better spatial sensitivity<sup>9</sup>.

---

1. Giourou,  E.,  Stavropoulou-Deli,  A.,  Giannakopoulou,  A.,Kostopoulos, G. K., & Koutroumanidis, M. (2015). In-troduction to Epilepsy and Related Brain Disorders. InN. S. Voros & C. P. Antonopoulos (Eds.),Cyberphys-ical systems for epilepsy and related brain disorders:Multi-parametric monitoring and analysis for diagno-sis and optimal disease management(Chap. 2, pp. 11–38). doi:10.1007/978-3-319-20049-1

2. Krumholz, A., Wiebe, S., Gronseth, G., Shinnar, S., Levisohn, P., Ting, T., . . . French, J. (2007). Evaluating an Apparent Unprovoked First Seizure in Adults (An Evidence-Based Review). Neurology, 69(21), 1996– 2007. doi:10.1212/01.wnl.0000285084.93652.43

3. Berg,  A.  T.,  Berkovic,  S.  F.,  Brodie,  M.  J.,  Buchhalter,  J.,Cross, J. H., Van Emde Boas, W., . . .  Scheffer, I. E.(2010). Revised terminology and concepts for organi-zation of seizures and epilepsies: Report of the ILAECommission on Classification and Terminology, 2005-2009.Epilepsia,51(4), 676–685. doi:10.1111/j.1528-1167.2010.02522.x

4. Blume, W. T., Lüders, H. O., Mizrahi, E., Tassinari, C., VanEmde Boas, W., & Engel J., J. (2001). Glossary of de-scriptive  terminology  for  ictal  semiology:  Report  ofthe  ILAE  Task  Force  on  classification  and  terminol-ogy.Epilepsia,42(9), 1212–1218. doi:10.1046/j.1528-1157.2001.22001.x

5. Paz,  J.  T.,  &  Huguenard,  J.  R.  (2014).  Optogenetics  andepilepsy: Past, present and future.Epilepsy Currents,15(1), 34–38. doi:10.5698/1535-7597-15.1.34

6. Holmes,  M.  D.,  Brown,  M.,  &  Tucker,  D.  M.  (2004).  Are"generalized" seizures truly generalized? Evidence oflocalized mesial frontal and frontopolar discharges inabsence.Epilepsia,45(12), 1568–1579. doi:10.1111/j.0013-9580.2004.23204.x

7. Spencer, S. (2002). Neural Networks in human epilepsy: ev-idence  of  and  implications  for  treatment.Epilepsia,43(3), 219–227

8. Bidwell2015

9. Varsavsky, A., Mareels, I., & Cook, M. (2011). EEG Generation and Measurement. InEpileptic seizures and theeeg: Measurement, models, detection and prediction(Chap. 2, p. 337). doi:doi:10.1201/b10459-3

10. Wilson,  S.  B.,  Scheuer,  M.  L.,  Plummer,  C.,  Young,  B.,& Pacia, S. (2003). Seizure detection: Correlation ofhuman  experts.Clinical Neurophysiology,114(11),2156–2164. doi:10.1016/S1388-2457(03)00212-8

11. NICE

12. Kaplan2000

# Environment Set-up

First lets set up our notebook environment with the packages we need. If you are following along on Google Colab, then this will install the packages you will need.

In [None]:
!pip install matplotlib pandas numpy scipy seaborn mne
!pip install beautifulsoup4 requests wget
!pip install h5py tables kaggle
!pip install wfdb pyEDFlib

Collecting mne
[?25l  Downloading https://files.pythonhosted.org/packages/60/f7/2bf5de3fad42b66d00ee27539bc3be0260b4e66fdecc12f740cdf2daf2e7/mne-0.23.0-py3-none-any.whl (6.9MB)
[K     |████████████████████████████████| 7.0MB 6.7MB/s 


This creates a class called color which can be used to change the appearance of strings printed in the outputs of each cell. I like using it for nicer outputs.

In [None]:
# colours for printing outputs
class color:
   PURPLE = '\033[95m'
   CYAN = '\033[96m'
   DARKCYAN = '\033[36m'
   BLUE = '\033[94m'
   GREEN = '\033[92m'
   YELLOW = '\033[93m'
   RED = '\033[91m'
   BOLD = '\033[1m'
   UNDERLINE = '\033[4m'
   END = '\033[0m'
  
print(color.BOLD+color.UNDERLINE+'Title'+color.END)
print('Hello World')

Lets create a function to list all the files/directories it finds in a location and save them to a list.

In [None]:
import glob            # for file locations
import pprint          # for pretty printing
import re

pp = pprint.PrettyPrinter()

def file_list(folder_path, output=False):
    # create an empty list
    file_list = []
    # for file name in the folder path...
    for filename in glob.glob(folder_path):
        # ... append it to the list
        file_list.append(filename)
        
    # sort alphabetically
    file_list.sort()
    
    # Output
    if output:
        print(str(len(file_list)) + " files found")
        pp.pprint(file_list)
    
    return file_list

# CHB-MIT Scalp EEG Database Pre Vs Ictal

The CHB-MIT dataset<sup>1</sup>, consists of records from 23 patients; with one case (chb21) taken from the same patient (chb01) 1.5 years later. The dataset was collected by investigators at the Children’s Hospital Boston and Massachusetts Institute of Technology (MIT). The median length of collection was for 36 hours with small gaps between records each hour due to hardware limitations.

The data contains 198 seizures of various types (focal, lateral, and generalised seizures). All signals were recorded at 256 samples per second with most files containing 23 EEG signals positioned using the International 10-20 system (as we will see later). 

This dataset is one of the most prominent datasets in the literature, as it provides long, continuous recordings for each patient, allowing for both patient specific and patient general models to be developed and tested.


| Subject   | Age/Gender | Seizure Events | Total Ictal Time (secs) | Total Inter-ictal Time (secs) |
|-----------|------------|----------------|------------------|----------------------|
| chb01/chb21 | 11, 13 (F) | 11 | 641  | 263461 |
| chb02       | 11 (M)     | 3  | 172  | 126751 |
| chb03       | 14 (F)     | 7  | 402  | 136366 |
| chb04       | 22 (M)     | 4  | 378  | 561414 |
| chb05       | 7 (F)      | 5  | 558  | 139813 |
| chb06       | 1.5 (F)    | 10 | 153  | 240075 |
| chb07       | 14.5 (F)   | 3  | 325  | 241044 |
| chb08       | 3.5 (M)    | 5  | 919  | 71084  |
| chb09       | 10 (F)     | 4  | 276  | 244043 |
| chb10       | 3 (M)      | 7  | 447  | 179612 |
| chb11       | 12 (F)     | 3  | 806  | 124416 |
| chb12       | 2 (F)      | 27 | 989  | 73466  |
| chb13       | 3 (F)      | 12 | 535  | 118232 |
| chb14       | 9 (F)      | 8  | 169  | 93405  |
| chb15       | 16 (M)     | 20 | 1992 | 142004 |
| chb16       | 7 (F)      | 10 | 84   | 68297  |
| chb17       | 12 (F)     | 3  | 293  | 75310  |
| chb18       | 18 (F)     | 6  | 317  | 127932 |
| chb19       | 19 (F)     | 3  | 236  | 107480 |
| chb20       | 6 (F)      | 8  | 294  | 99043  |
| chb22       | 9 (F)      | 3  | 204  | 111376 |
| chb23       | 6 (F)      | 7  | 424  | 95177  |
| chb24       | NR (NR)    | 16 | 511  | 76134  |
| **Total**   | -          | **185**| **11125**| **3515935**|

**NOTE**
- You may have noticed that in the table above it actually only totals to 185 seizures. Thats because the method I use to load the data into Python does not work on a select few files. This reduces the number of seizure events from 40 to 27 in patient 12 by not including files 27, 28, and 29.

---
1. Shoeb2009

## Data Information
The dataset is stored on Physionet which has some helpful tools to access the data. We are going to use one such package (wfdb) to get a list of the records in the dataset.

In [None]:
import wfdb 

dbs = wfdb.get_dbs()

records_list = wfdb.io.get_record_list('chbmit', records='all')
records_list[:5]

Using the above, lets get a list of the unique directory names

In [None]:
part_codes = sorted(list(set([record.split('/')[0] for record in records_list])))
part_codes

Each patient has an information file associate with it. Lets load one in and have a look at how it looks before we parse it into something more useful.

In [None]:
import os
from urllib.request import urlretrieve

def get_content(part_code):
  url = "https://physionet.org/physiobank/database/chbmit/"+part_code+'/'+part_code+'-summary.txt'
  filename = "./chbmit.txt"

  urlretrieve(url,filename)

  # read the file into a list
  with open(filename, encoding='UTF-8') as f:
      # read all the document into a list of strings (each line a new string)
      content = f.readlines()
      os.remove(filename)

  return content


In [None]:
get_content(part_codes[0])#[6]

Taking the above, the below function below just parses this file up into a Python dictionary format we can use later. See the output for an example of what it looks like.

In [None]:
import re
part_info_dict = {}
filenames_s = []

def info_dict(content):
  
  line_nos=len(content)
  line_no=1

  channels = []
  file_name = []
  file_info_dict={}

  for line in content:

    # if there is Channel in the line...
    if re.findall('Channel \d+', line):
      # split the line into channel number and channel reference
      channel = line.split(': ')
      # get the channel reference and remove any new lines
      channel = channel[-1].replace("\n", "")
      # put into the channel list
      channels.append(channel)

    # if the line is the file name
    elif re.findall('File Name', line):
      # if there is already a file_name
      if file_name and file_info_dict['Seizures Window']:
        # flush the current file info to it
        part_info_dict[file_name] = file_info_dict

      # get the file name
      file_name = re.findall('\w+\d+_\d+|\w+\d+\w+_\d+', line)[0]

      file_info_dict = {}
      # put the channel list in the file info dict and remove duplicates
      file_info_dict['Channels'] = list(set(channels))
      # reset the rest of the options
      file_info_dict['Start Time'] = ''
      file_info_dict['End Time'] = ''
      file_info_dict['Seizures Window'] = []

    # if the line is about the file start time
    elif re.findall('File Start Time', line):
      # get the start time
      file_info_dict['Start Time'] = re.findall('\d+:\d+:\d+', line)[0]

    # if the line is about the file end time
    elif re.findall('File End Time', line):
      # get the start time
      file_info_dict['End Time'] = re.findall('\d+:\d+:\d+', line)[0]

    elif re.findall('Seizure Start Time|Seizure End Time|Seizure \d+ Start Time|Seizure \d+ End Time', line):
      file_info_dict['Seizures Window'].append(int(re.findall('\d+', line)[-1]))

    # if last line in the list...
    if line_no == line_nos and file_info_dict['Seizures Window']:
      # flush the file info to it
      part_info_dict[file_name] = file_info_dict

    line_no+=1
    
        
for part_code in part_codes:
  content = get_content(part_code)
  info_dict(content)



In [None]:
print(color.BOLD+color.UNDERLINE+'part_info_dict'+color.END)
display(part_info_dict['chb01_03'])
print(color.UNDERLINE+'\nPart Keys'+color.END)
print(part_info_dict[list(part_info_dict.keys())[0]].keys())

As can be seen below there is a common set of channels found in ALL patients, but there are also some channels only found in individual patients. This is because sometimes channels were swapped during recording for others. 

In [None]:
import pandas as pd     # dataframes
import re

all_channels = []

for key in part_info_dict.keys():
    all_channels.extend(part_info_dict[key]['Channels'])
    
# turn the list into a pandas series
all_channels = pd.Series(all_channels)

# count how many times the channels appear in each participant
channel_counts = all_channels.value_counts()
channel_counts

To deal with the fact some channels are only found in individual patients, I tend to keep channels found in all the patients. This makes generalising models across patients easier, however if you are only training models to identify a particular patients seizures you wouldnt need to do this.

In [None]:
threshold = len(part_info_dict.keys())
channel_keeps = list(channel_counts[channel_counts >= threshold].index)
channel_keeps

## Load Data
Lets now load in some example data. First lets choose a file.

In [None]:
records_list_new = []

for record in records_list:
  try :
    part_info_dict[record.split('/')[1].split('.')[0]]
  except : 
    #print('Nope : ',record)
    continue
  if part_info_dict[record.split('/')[1].split('.')[0]]['Seizures Window'][1] - part_info_dict[record.split('/')[1].split('.')[0]]['Seizures Window'][0] >= 30*2:
    records_list_new.append(record)

In [None]:
len(records_list_new)

In [None]:
EXAMPLE_FILE = records_list_new[0]
EXAMPLE_ID = EXAMPLE_FILE.split('/')[1].split('.')[0]
EXAMPLE_ID

In [None]:
sub_freq = dict()

for record in records_list_new:
  if record.split('/')[1].split('.')[0][:-3] in sub_freq:
    sub_freq[record.split('/')[1].split('.')[0][:-3]] += 1
  else:
    sub_freq[record.split('/')[1].split('.')[0][:-3]] = 1

In [None]:
# sub_freq

In [None]:
records_list_new_new = []
count = 0
last = ''

for record in records_list_new:
  sub_f = sub_freq[record.split('/')[1].split('.')[0][:-3]]
  sub = record.split('/')[1].split('.')[0][:-3]
  if sub_f >= 3 and count < 3: 
    records_list_new_new.append(record)
    count += 1
    last = sub
  elif sub_f >= 3 and sub != last:
    count = 0
    records_list_new_new.append(record)
    count += 1
    last = sub

In [None]:
sub_freq2 = dict()

for record in records_list_new_new:
  if record.split('/')[1].split('.')[0][:-3] in sub_freq2:
    sub_freq2[record.split('/')[1].split('.')[0][:-3]] += 1
  else:
    sub_freq2[record.split('/')[1].split('.')[0][:-3]] = 1

In [None]:
sub_freq2

In [None]:
max = 0
min = 1000000
for record in records_list_new_new:
  temp = part_info_dict[record.split('/')[1].split('.')[0]]['Seizures Window'][0] #- part_info_dict[record.split('/')[1].split('.')[0]]['Seizures Window'][0]
  if temp > max:
    max = temp
  if temp < min:
    min = temp

In [None]:
max

In [None]:
min

Now using the function below I can download the data and then load it into a pandas dataframe

In [None]:
%%time
import pandas as pd
import numpy as np
import pyedflib

def data_load(file, selected_channels=[]):

  try: 
    url = "https://physionet.org/physiobank/database/chbmit/"+file
    filename = "./chbmit.edf"

    urlretrieve(url,filename)
    # use the reader to get an EdfReader file
    f = pyedflib.EdfReader(filename)
    os.remove(filename)
    
    # get a list of the EEG channels
    if len(selected_channels) == 0:
      selected_channels = f.getSignalLabels()

    # get the names of the signals
    channel_names = f.getSignalLabels()
    # get the sampling frequencies of each signal
    channel_freq = f.getSampleFrequencies()

    # make an empty file of 0's
    sigbufs = np.zeros((f.getNSamples()[0],len(selected_channels)))
    # for each of the channels in the selected channels
    for i, channel in enumerate(selected_channels):
      # add the channel data into the array
      sigbufs[:, i] = f.readSignal(channel_names.index(channel))
    
    # turn to a pandas df and save a little space
    df = pd.DataFrame(sigbufs, columns = selected_channels).astype('float32')
    
    # get equally increasing numbers upto the length of the data depending
    # on the length of the data divided by the sampling frequency
    index_increase = np.linspace(0,
                                 len(df)/channel_freq[0],
                                 len(df), endpoint=False)

    # round these to the lowest nearest decimal to get the seconds
    seconds = np.floor(index_increase).astype('uint16')

    # make a column the timestamp
    df['Time'] = seconds

    # make the time stamp the index
    #df = df.set_index('Time')

    # name the columns as channel
    #df.columns.name = 'Channel'

    return df, channel_freq[0]

  except:
    OSError
    return pd.DataFrame(), None


raw_data, freq = data_load(EXAMPLE_FILE, channel_keeps)

In [None]:
channel_keeps

In [None]:
display(raw_data)#[channel_keeps[0]])

In [None]:
from tqdm import tqdm

x_data = np.zeros((len(sub_freq2), 18, 256*20, len(channel_keeps)))
y_data = np.zeros((len(sub_freq2), 18))

countt = 0 
sub_count = 0

for record in tqdm(records_list_new_new):
  if countt >= 3:
    countt = 0
    sub_count += 1
  raw_data, freq = data_load(record, channel_keeps)
  if freq != 256:
    print('ERROR')
    break
  # seizure
  mid_s = int((part_info_dict[record.split('/')[1].split('.')[0]]['Seizures Window'][1] - part_info_dict[record.split('/')[1].split('.')[0]]['Seizures Window'][0])/2) - 10
  start_s = part_info_dict[record.split('/')[1].split('.')[0]]['Seizures Window'][0]
  end_s = part_info_dict[record.split('/')[1].split('.')[0]]['Seizures Window'][1] - 20
  # No seizure
  start = 0
  mid = int(part_info_dict[record.split('/')[1].split('.')[0]]['Seizures Window'][0]/2 ) - 10
  end = part_info_dict[record.split('/')[1].split('.')[0]]['Seizures Window'][0] - 1 - 20

  for i, channel in enumerate(channel_keeps):
    # No seizure
    x_data[sub_count, 0 + 6*countt, :, i] = np.array(raw_data[raw_data['Time'].between(start+1,start+20)][channel].tolist())
    y_data[sub_count, 0 + 6*countt] = 0
    x_data[sub_count, 1 + 6*countt, :, i] = np.array(raw_data[raw_data['Time'].between(mid+1,mid+20)][channel].tolist())
    y_data[sub_count, 1 + 6*countt] = 0
    x_data[sub_count, 2 + 6*countt, :, i] = np.array(raw_data[raw_data['Time'].between(end+1,end+20)][channel].tolist())
    y_data[sub_count, 2 + 6*countt] = 0
    # seizure
    x_data[sub_count, 3 + 6*countt, :, i] = np.array(raw_data[raw_data['Time'].between(start_s+1,start_s+20)][channel].tolist())
    y_data[sub_count, 3 + 6*countt] = 1
    x_data[sub_count, 4 + 6*countt, :, i] = np.array(raw_data[raw_data['Time'].between(mid_s+1,mid_s+20)][channel].tolist())
    y_data[sub_count, 4 + 6*countt] = 1
    x_data[sub_count, 5 + 6*countt, :, i] = np.array(raw_data[raw_data['Time'].between(end_s+1,end_s+20)][channel].tolist())
    y_data[sub_count, 5 + 6*countt] = 1

  countt += 1

In [None]:
x_data

In [None]:
y_data

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
import pickle
with open('/content/drive/MyDrive/EMOTION/VQVAE-trans/Trans-checkpoints-py/x_data_ss.pkl', 'wb') as filepath:
      pickle.dump(x_data, filepath)
with open('/content/drive/MyDrive/EMOTION/VQVAE-trans/Trans-checkpoints-py/y_data_ss.pkl', 'wb') as filepath:
      pickle.dump(y_data, filepath)

In [None]:
sub_count 

## Plot Data

Now lets plot the data. We will use the dictionary we made earlier to mark an annotation as to where the seizures are in the record.

In [None]:
def mne_object(data, freq, events = None):
  # create an mne info file with meta data about the EEG
  info = mne.create_info(ch_names=list(data.columns), 
                         sfreq=freq, 
                         ch_types=['eeg']*data.shape[-1])
  
  # data needs to be in volts rather than in microvolts
  data = data.apply(lambda x: x*1e-6)
  # transpose the data
  data_T = data.transpose()
  
  # create raw mne object
  raw = mne.io.RawArray(data_T, info)

  if events:
    start_times = np.array(events[::2])
    end_times = np.array(events[1::2])
    anno_length = end_times-start_times
    event_name = np.array(['Ictal']*len(anno_length))

    raw.set_annotations(mne.Annotations(start_times,
                                      anno_length,
                                      event_name))

  return raw

mne_data = mne_object(raw_data, freq, part_info_dict[EXAMPLE_ID]['Seizures Window'])


mne_data.plot(start = 50, 
              duration = 30, **plot_kwargs);

seiz_start_time = part_info_dict[EXAMPLE_ID]['Seizures Window'][0]
mne_data.plot(start = seiz_start_time, 
              duration = 30, **plot_kwargs);

Before we look at random segments again, lets take a second to look at the electrode placement. This is because this is the first *extracranial* dataset (meaning the electrodes are not implanted under the scalp).

Scalp EEG is typically gained through placing 21-256 Ag/AgCl electrodes on the scalp, to enable the measurement of the electrical potential between spatially different electrodes. One electrode is dedicated as a *reference* during recording and another as a *ground*. The terms “ground” and “reference” are sometimes used interchangeably, although they refer to separate processes. 

**Ground Electrode**. 
A ground electrode is a common reference for the system voltage that aims to cancel out the common-mode interference that occurs from the body naturally picking up electromagnetic interference; particularly around 50/60Hz due to power lines. Unless recording takes place in a Faraday cage, this interference often needs to be filtered out during pre-processing (see next notebook) if not already conducted at time of recording by the amplifier. The ground electrode can be placed anywhere on the body, although the forehead or the ear are the most common<sup>1</sup>. 

**Reference Electrode**. 
A reference electrode aims to remove unspecific brain activity by representing the electrical potential between an active electrode of interest and a relatively inactive reference. A reference electrode is also still affected by global voltage changes as it is collected against the signal ground. Referencing can be done either by using a physical reference electrode placed on the earlobe, using any electrode during recording and later re-referencing electrodes to the average output of all electrodes, or by measuring potential between two active electrodes (bipolar recording)<sup>2</sup>. The combination of an active electrode with a reference and a ground creates a *channel*, and the general configuration of these channels are called a *montage*.

These channels here are using a bi-polar montage. For ease of plotting, we will change their names to only be the first channel.

---

1. Light2010

2. Varsavsky2011

3. Teplan2002

In [None]:
replace_dict = {}
drop_list = []
# for the channel names in the data...
for channel_name in mne_data.info['ch_names']:
    # get the name to change too
    name_change = re.findall('\w+',channel_name)[0].title()
    # check if it is already in the change list
    if name_change in list(replace_dict.values()):
        drop_list.append(channel_name)
    else:
        # if its not already there get the origional name and what we want to 
        # change it to
        replace_dict[channel_name] = name_change

# drop the ones that would be repeats
mne_data.drop_channels(drop_list)
# rename the channels
mne_data.rename_channels(replace_dict)
# set the standard montage
mne_data.set_montage('standard_1020')

Now we have set the names and montage lets plot it

In [None]:
mne_data.plot_sensors(kind='topomap', show_names=True, to_sphere=True);
fig = mne_data.plot_sensors(kind='3d', show_names=True, show=False)
fig = fig.gca().view_init(azim=70, elev=15)
plt.show()

EEG has more than just temporal (changes over time) information, it has spatial as well. To demonstate, lets look at how the signal changes over the head before and during a seizure. 

We will go into different ways of breaking a signal down in more detail in the next notebook, so don't worry if you are not familiar with the welch method I use here. The basic take away is that there is generally more going on!

**NOTES**
- if you are familiar with the welch method we are just looking at the average power spectral density between 1-40Hz

In [None]:
from scipy import signal
def ave_freq(data):
    win = 4 * freq
    freqs, psd = signal.welch(data, freq, nperseg=win, scaling='spectrum')
    #print(freqs[4:160])
    return psd[:,4:160].mean(1)

inter_array = mne_data[:, 50*freq:80*freq][0]
ictal_array = mne_data[:, (seiz_start_time*freq):(seiz_start_time*freq)+30*freq][0]
topo_df = pd.DataFrame([ave_freq(inter_array),ave_freq(ictal_array)], index=['inter', 'ictal'])

fig, axs = plt.subplots(nrows=1, ncols=2, figsize=(15,5))

axs = axs.flatten()
for i, data_class in enumerate(topo_df.T):
    topo, cn = mne.viz.plot_topomap(topo_df.loc[data_class],
                                    mne_data.info,
                                    show=False,
                                    sensors=False,
                                    names=mne_data.info['ch_names'], 
                                    show_names=True,
                                    axes = axs[i],
                                    vmin = topo_df.values.min(),
                                    vmax = topo_df.values.max())
    axs[i].set_title(data_class)
    
fig.show()

Now, as we have done with the other datasets, lets randomly plot different parts of the data to see more examples.

In [None]:
files_with_seizures = []
for file_id in part_info_dict:
    # if there is something in the seizure window
    if part_info_dict[file_id]['Seizures Window']:
        files_with_seizures.append(file_id)

sampled_file = random.sample(files_with_seizures, 1)[0]
sampled_file_path = sampled_file.split('_')[0]+'/'+sampled_file+'.edf'
raw_data, freq = data_load(sampled_file_path, channel_keeps)
mne_data = mne_object(raw_data, freq, part_info_dict[sampled_file]['Seizures Window'])

print(color.BOLD+color.UNDERLINE+sampled_file+color.END)

mne_data.plot(start = 50, 
              duration = 30, **plot_kwargs);

mne_data.plot(start = part_info_dict[sampled_file]['Seizures Window'][0], 
              duration = 30, **plot_kwargs);

# CHB-MIT Scalp EEG Database Inter Vs Ictal

The CHB-MIT dataset<sup>1</sup>, consists of records from 23 patients; with one case (chb21) taken from the same patient (chb01) 1.5 years later. The dataset was collected by investigators at the Children’s Hospital Boston and Massachusetts Institute of Technology (MIT). The median length of collection was for 36 hours with small gaps between records each hour due to hardware limitations.

The data contains 198 seizures of various types (focal, lateral, and generalised seizures). All signals were recorded at 256 samples per second with most files containing 23 EEG signals positioned using the International 10-20 system (as we will see later). 

This dataset is one of the most prominent datasets in the literature, as it provides long, continuous recordings for each patient, allowing for both patient specific and patient general models to be developed and tested.


| Subject   | Age/Gender | Seizure Events | Total Ictal Time (secs) | Total Inter-ictal Time (secs) |
|-----------|------------|----------------|------------------|----------------------|
| chb01/chb21 | 11, 13 (F) | 11 | 641  | 263461 |
| chb02       | 11 (M)     | 3  | 172  | 126751 |
| chb03       | 14 (F)     | 7  | 402  | 136366 |
| chb04       | 22 (M)     | 4  | 378  | 561414 |
| chb05       | 7 (F)      | 5  | 558  | 139813 |
| chb06       | 1.5 (F)    | 10 | 153  | 240075 |
| chb07       | 14.5 (F)   | 3  | 325  | 241044 |
| chb08       | 3.5 (M)    | 5  | 919  | 71084  |
| chb09       | 10 (F)     | 4  | 276  | 244043 |
| chb10       | 3 (M)      | 7  | 447  | 179612 |
| chb11       | 12 (F)     | 3  | 806  | 124416 |
| chb12       | 2 (F)      | 27 | 989  | 73466  |
| chb13       | 3 (F)      | 12 | 535  | 118232 |
| chb14       | 9 (F)      | 8  | 169  | 93405  |
| chb15       | 16 (M)     | 20 | 1992 | 142004 |
| chb16       | 7 (F)      | 10 | 84   | 68297  |
| chb17       | 12 (F)     | 3  | 293  | 75310  |
| chb18       | 18 (F)     | 6  | 317  | 127932 |
| chb19       | 19 (F)     | 3  | 236  | 107480 |
| chb20       | 6 (F)      | 8  | 294  | 99043  |
| chb22       | 9 (F)      | 3  | 204  | 111376 |
| chb23       | 6 (F)      | 7  | 424  | 95177  |
| chb24       | NR (NR)    | 16 | 511  | 76134  |
| **Total**   | -          | **185**| **11125**| **3515935**|

**NOTE**
- You may have noticed that in the table above it actually only totals to 185 seizures. Thats because the method I use to load the data into Python does not work on a select few files. This reduces the number of seizure events from 40 to 27 in patient 12 by not including files 27, 28, and 29.

---
1. Shoeb2009

## Data Information
The dataset is stored on Physionet which has some helpful tools to access the data. We are going to use one such package (wfdb) to get a list of the records in the dataset.

In [None]:
import wfdb 

dbs = wfdb.get_dbs()

records_list = wfdb.io.get_record_list('chbmit', records='all')
records_list[:5]

In [None]:
records_list

Using the above, lets get a list of the unique directory names

In [None]:
part_codes = sorted(list(set([record.split('/')[0] for record in records_list])))
part_codes

Each patient has an information file associate with it. Lets load one in and have a look at how it looks before we parse it into something more useful.

In [None]:
import os
from urllib.request import urlretrieve

def get_content(part_code):
  url = "https://physionet.org/physiobank/database/chbmit/"+part_code+'/'+part_code+'-summary.txt'
  filename = "./chbmit.txt"

  urlretrieve(url,filename)

  # read the file into a list
  with open(filename, encoding='UTF-8') as f:
      # read all the document into a list of strings (each line a new string)
      content = f.readlines()
      os.remove(filename)

  return content


In [None]:
get_content(part_codes[23])#[6]

Taking the above, the below function below just parses this file up into a Python dictionary format we can use later. See the output for an example of what it looks like.

In [None]:
import re
part_info_dict = {}

def info_dict(content):
  
  line_nos=len(content)
  line_no=1
  count = 0

  channels = []
  file_name = []
  file_info_dict={}

  for line in content:

    # if there is Channel in the line...
    if re.findall('Channel \d+', line):
      # split the line into channel number and channel reference
      channel = line.split(': ')
      # get the channel reference and remove any new lines
      channel = channel[-1].replace("\n", "")
      # put into the channel list
      channels.append(channel)

    # if the line is the file name
    elif re.findall('File Name', line):
      # if there is already a file_name
      if file_name:
        # flush the current file info to it
        if file_info_dict['Seizures Window'] or not(count):
          count = 1
          part_info_dict[file_name] = file_info_dict

      # get the file name
      file_name = re.findall('\w+\d+_\d+|\w+\d+\w+_\d+', line)[0]

      file_info_dict = {}
      # put the channel list in the file info dict and remove duplicates
      file_info_dict['Channels'] = list(set(channels))
      # reset the rest of the options
      file_info_dict['Start Time'] = ''
      file_info_dict['End Time'] = ''
      file_info_dict['Seizures Window'] = []

    # if the line is about the file start time
    elif re.findall('File Start Time', line):
      # get the start time
      file_info_dict['Start Time'] = re.findall('\d+:\d+:\d+', line)[0]

    # if the line is about the file end time
    elif re.findall('File End Time', line):
      # get the start time
      file_info_dict['End Time'] = re.findall('\d+:\d+:\d+', line)[0]

    elif re.findall('Seizure Start Time|Seizure End Time|Seizure \d+ Start Time|Seizure \d+ End Time', line):
      file_info_dict['Seizures Window'].append(int(re.findall('\d+', line)[-1]))

    # if last line in the list...
    if line_no == line_nos:
      # flush the file info to it
      if file_info_dict['Seizures Window'] or not(count):
          count = 1
          part_info_dict[file_name] = file_info_dict

    line_no+=1
    
        
for part_code in part_codes:
  content = get_content(part_code)
  info_dict(content)

In [None]:
print(color.BOLD+color.UNDERLINE+'part_info_dict'+color.END)
display(part_info_dict['chb24_13'])
print(color.UNDERLINE+'\nPart Keys'+color.END)
print(part_info_dict[list(part_info_dict.keys())[0]].keys())

In [None]:
# part_info_dict_new = dict()

# for i in range(len(records_list)):
#   filee1 = records_list[i].split('/')[1].split('.')[0]
#   filee2 = records_list[i-1].split('/')[1].split('.')[0]
#   #print(filee)
#   try:
#     num = int(records_list[i].split('/')[1].split('.')[0][-2:])
#     num2 = int(records_list[i-1].split('/')[1].split('.')[0][-2:])
#     check1 = part_info_dict[filee1]
#     check2 = part_info_dict[filee2]
#   except:
#     continue
#   if part_info_dict[filee1]['Seizures Window'] and num > 2:
#     part_info_dict_new[filee2] = part_info_dict[filee2]
#     part_info_dict_new[filee1] = part_info_dict[filee1]

In [None]:
part_info_dict

As can be seen below there is a common set of channels found in ALL patients, but there are also some channels only found in individual patients. This is because sometimes channels were swapped during recording for others. 

In [None]:
import pandas as pd     # dataframes
import re

all_channels = []

for key in part_info_dict.keys():
    all_channels.extend(part_info_dict[key]['Channels'])
    
# turn the list into a pandas series
all_channels = pd.Series(all_channels)

# count how many times the channels appear in each participant
channel_counts = all_channels.value_counts()
channel_counts

To deal with the fact some channels are only found in individual patients, I tend to keep channels found in all the patients. This makes generalising models across patients easier, however if you are only training models to identify a particular patients seizures you wouldnt need to do this.

In [None]:
threshold = len(part_info_dict.keys())
channel_keeps = list(channel_counts[channel_counts >= threshold].index)
channel_keeps

## Load Data
Lets now load in some example data. First lets choose a file.

In [None]:
records_list_new = []

for record in records_list:
  try :
    part_info_dict[record.split('/')[1].split('.')[0]]
  except : 
    #print('Nope : ',record)
    continue
  if not(part_info_dict[record.split('/')[1].split('.')[0]]['Seizures Window']):
    records_list_new.append(record)
  elif part_info_dict[record.split('/')[1].split('.')[0]]['Seizures Window'][1] - part_info_dict[record.split('/')[1].split('.')[0]]['Seizures Window'][0] >= 30*2:
    records_list_new.append(record)

In [None]:
len(records_list_new)

In [None]:
EXAMPLE_FILE = records_list_new[0]
EXAMPLE_ID = EXAMPLE_FILE.split('/')[1].split('.')[0]
EXAMPLE_ID

In [None]:
sub_freq = dict()

for record in records_list_new:
  if record.split('/')[1].split('.')[0][:-3] in sub_freq:
    sub_freq[record.split('/')[1].split('.')[0][:-3]] += 1
  else:
    sub_freq[record.split('/')[1].split('.')[0][:-3]] = 1

In [None]:
sub_freq

In [None]:
records_list_new_new = []
count = 0
last = ''

for record in records_list_new:
  sub_f = sub_freq[record.split('/')[1].split('.')[0][:-3]]
  sub = record.split('/')[1].split('.')[0][:-3]
  if sub_f >= 4 and count < 4: 
    records_list_new_new.append(record)
    count += 1
    last = sub
  elif sub_f >= 4 and sub != last:
    count = 0
    records_list_new_new.append(record)
    count += 1
    last = sub

In [None]:
sub_freq2 = dict()

for record in records_list_new_new:
  if record.split('/')[1].split('.')[0][:-3] in sub_freq2:
    sub_freq2[record.split('/')[1].split('.')[0][:-3]] += 1
  else:
    sub_freq2[record.split('/')[1].split('.')[0][:-3]] = 1

In [None]:
sub_freq2

In [None]:
max = 0
min = 1000000
for record in records_list_new_new:
  if not(part_info_dict[record.split('/')[1].split('.')[0]]['Seizures Window']):
    continue
  temp = part_info_dict[record.split('/')[1].split('.')[0]]['Seizures Window'][0] #- part_info_dict[record.split('/')[1].split('.')[0]]['Seizures Window'][0]
  if temp > max:
    max = temp
  if temp < min:
    min = temp

In [None]:
max

In [None]:
min

Now using the function below I can download the data and then load it into a pandas dataframe

In [None]:
%%time
import pandas as pd
import numpy as np
import pyedflib

def data_load(file, selected_channels=[]):

  try: 
    url = "https://physionet.org/physiobank/database/chbmit/"+file
    filename = "./chbmit.edf"

    urlretrieve(url,filename)
    # use the reader to get an EdfReader file
    f = pyedflib.EdfReader(filename)
    os.remove(filename)
    
    # get a list of the EEG channels
    if len(selected_channels) == 0:
      selected_channels = f.getSignalLabels()

    # get the names of the signals
    channel_names = f.getSignalLabels()
    # get the sampling frequencies of each signal
    channel_freq = f.getSampleFrequencies()

    # make an empty file of 0's
    sigbufs = np.zeros((f.getNSamples()[0],len(selected_channels)))
    # for each of the channels in the selected channels
    for i, channel in enumerate(selected_channels):
      # add the channel data into the array
      sigbufs[:, i] = f.readSignal(channel_names.index(channel))
    
    # turn to a pandas df and save a little space
    df = pd.DataFrame(sigbufs, columns = selected_channels).astype('float32')
    
    # get equally increasing numbers upto the length of the data depending
    # on the length of the data divided by the sampling frequency
    index_increase = np.linspace(0,
                                 len(df)/channel_freq[0],
                                 len(df), endpoint=False)

    # round these to the lowest nearest decimal to get the seconds
    seconds = np.floor(index_increase).astype('uint16')

    # make a column the timestamp
    df['Time'] = seconds

    # make the time stamp the index
    #df = df.set_index('Time')

    # name the columns as channel
    #df.columns.name = 'Channel'

    return df, channel_freq[0]

  except:
    OSError
    return pd.DataFrame(), None


raw_data, freq = data_load(EXAMPLE_FILE, channel_keeps)

In [None]:
channel_keeps

In [None]:
display(raw_data)#[channel_keeps[0]])

In [None]:
from tqdm import tqdm

x_data = np.zeros((len(sub_freq2), 18, 256*20, len(channel_keeps)))
y_data = np.zeros((len(sub_freq2), 18))

countt = 0 
countt_s = 0
sub_count = 0

for record in tqdm(records_list_new_new):
  if countt >= 4:
    countt = 0
    countt_s = 0
    sub_count += 1
  raw_data, freq = data_load(record, channel_keeps)
  if freq != 256:
    print('ERROR')
    break
  # seizure
  if part_info_dict[record.split('/')[1].split('.')[0]]['Seizures Window']:
      # seizure
      mid_s = int((part_info_dict[record.split('/')[1].split('.')[0]]['Seizures Window'][1] - part_info_dict[record.split('/')[1].split('.')[0]]['Seizures Window'][0])/2) - 10
      start_s = part_info_dict[record.split('/')[1].split('.')[0]]['Seizures Window'][0]
      end_s = part_info_dict[record.split('/')[1].split('.')[0]]['Seizures Window'][1] - 20
      for i, channel in enumerate(channel_keeps):
        # seizure
        x_data[sub_count, 0 + 3*countt_s, :, i] = np.array(raw_data[raw_data['Time'].between(start_s+1,start_s+20)][channel].tolist())
        y_data[sub_count, 0 + 3*countt_s] = 1
        x_data[sub_count, 1 + 3*countt_s, :, i] = np.array(raw_data[raw_data['Time'].between(mid_s+1,mid_s+20)][channel].tolist())
        y_data[sub_count, 1 + 3*countt_s] = 1
        x_data[sub_count, 2 + 3*countt_s, :, i] = np.array(raw_data[raw_data['Time'].between(end_s+1,end_s+20)][channel].tolist())
        y_data[sub_count, 2 + 3*countt_s] = 1
      countt_s += 1
  # No seizure
  else:
      # No seizure
      timee = 3600
      start = 0
      end = timee - 1 - 20
      values = np.linspace(start,end,num=9, dtype = int)
      #print(values)
      for i, channel in enumerate(channel_keeps):
        # No seizure
        x_data[sub_count, 9, :, i] = np.array(raw_data[raw_data['Time'].between(values[0]+1,values[0]+20)][channel].tolist())
        y_data[sub_count, 9] = 0
        x_data[sub_count, 10, :, i] = np.array(raw_data[raw_data['Time'].between(values[1]+1,values[1]+20)][channel].tolist())
        y_data[sub_count, 10] = 0
        x_data[sub_count, 11, :, i] = np.array(raw_data[raw_data['Time'].between(values[2]+1,values[2]+20)][channel].tolist())
        y_data[sub_count, 11] = 0
        x_data[sub_count, 12, :, i] = np.array(raw_data[raw_data['Time'].between(values[3]+1,values[3]+20)][channel].tolist())
        y_data[sub_count, 12] = 0
        x_data[sub_count, 13, :, i] = np.array(raw_data[raw_data['Time'].between(values[4]+1,values[4]+20)][channel].tolist())
        y_data[sub_count, 13] = 0
        x_data[sub_count, 14, :, i] = np.array(raw_data[raw_data['Time'].between(values[5]+1,values[5]+20)][channel].tolist())
        y_data[sub_count, 14] = 0
        x_data[sub_count, 15, :, i] = np.array(raw_data[raw_data['Time'].between(values[6]+1,values[6]+20)][channel].tolist())
        y_data[sub_count, 15] = 0
        x_data[sub_count, 16, :, i] = np.array(raw_data[raw_data['Time'].between(values[7]+1,values[7]+20)][channel].tolist())
        y_data[sub_count, 16] = 0
        x_data[sub_count, 17, :, i] = np.array(raw_data[raw_data['Time'].between(values[8]+1,values[8]+20)][channel].tolist())
        y_data[sub_count, 17] = 0        

  countt += 1

In [None]:
x_data

In [None]:
y_data

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
import pickle
with open('/content/drive/MyDrive/EMOTION/VQVAE-trans/Trans-checkpoints-py/x_data_ss_i.pkl', 'wb') as filepath:
      pickle.dump(x_data, filepath)
with open('/content/drive/MyDrive/EMOTION/VQVAE-trans/Trans-checkpoints-py/y_data_ss_i.pkl', 'wb') as filepath:
      pickle.dump(y_data, filepath)

In [None]:
sub_count 

In [None]:
x_data.shape

## Plot Data

Now lets plot the data. We will use the dictionary we made earlier to mark an annotation as to where the seizures are in the record.

In [None]:
def mne_object(data, freq, events = None):
  # create an mne info file with meta data about the EEG
  info = mne.create_info(ch_names=list(data.columns), 
                         sfreq=freq, 
                         ch_types=['eeg']*data.shape[-1])
  
  # data needs to be in volts rather than in microvolts
  data = data.apply(lambda x: x*1e-6)
  # transpose the data
  data_T = data.transpose()
  
  # create raw mne object
  raw = mne.io.RawArray(data_T, info)

  if events:
    start_times = np.array(events[::2])
    end_times = np.array(events[1::2])
    anno_length = end_times-start_times
    event_name = np.array(['Ictal']*len(anno_length))

    raw.set_annotations(mne.Annotations(start_times,
                                      anno_length,
                                      event_name))

  return raw

mne_data = mne_object(raw_data, freq, part_info_dict[EXAMPLE_ID]['Seizures Window'])


mne_data.plot(start = 50, 
              duration = 30, **plot_kwargs);

seiz_start_time = part_info_dict[EXAMPLE_ID]['Seizures Window'][0]
mne_data.plot(start = seiz_start_time, 
              duration = 30, **plot_kwargs);

Before we look at random segments again, lets take a second to look at the electrode placement. This is because this is the first *extracranial* dataset (meaning the electrodes are not implanted under the scalp).

Scalp EEG is typically gained through placing 21-256 Ag/AgCl electrodes on the scalp, to enable the measurement of the electrical potential between spatially different electrodes. One electrode is dedicated as a *reference* during recording and another as a *ground*. The terms “ground” and “reference” are sometimes used interchangeably, although they refer to separate processes. 

**Ground Electrode**. 
A ground electrode is a common reference for the system voltage that aims to cancel out the common-mode interference that occurs from the body naturally picking up electromagnetic interference; particularly around 50/60Hz due to power lines. Unless recording takes place in a Faraday cage, this interference often needs to be filtered out during pre-processing (see next notebook) if not already conducted at time of recording by the amplifier. The ground electrode can be placed anywhere on the body, although the forehead or the ear are the most common<sup>1</sup>. 

**Reference Electrode**. 
A reference electrode aims to remove unspecific brain activity by representing the electrical potential between an active electrode of interest and a relatively inactive reference. A reference electrode is also still affected by global voltage changes as it is collected against the signal ground. Referencing can be done either by using a physical reference electrode placed on the earlobe, using any electrode during recording and later re-referencing electrodes to the average output of all electrodes, or by measuring potential between two active electrodes (bipolar recording)<sup>2</sup>. The combination of an active electrode with a reference and a ground creates a *channel*, and the general configuration of these channels are called a *montage*.

These channels here are using a bi-polar montage. For ease of plotting, we will change their names to only be the first channel.

---

1. Light2010

2. Varsavsky2011

3. Teplan2002

In [None]:
replace_dict = {}
drop_list = []
# for the channel names in the data...
for channel_name in mne_data.info['ch_names']:
    # get the name to change too
    name_change = re.findall('\w+',channel_name)[0].title()
    # check if it is already in the change list
    if name_change in list(replace_dict.values()):
        drop_list.append(channel_name)
    else:
        # if its not already there get the origional name and what we want to 
        # change it to
        replace_dict[channel_name] = name_change

# drop the ones that would be repeats
mne_data.drop_channels(drop_list)
# rename the channels
mne_data.rename_channels(replace_dict)
# set the standard montage
mne_data.set_montage('standard_1020')

Now we have set the names and montage lets plot it

In [None]:
mne_data.plot_sensors(kind='topomap', show_names=True, to_sphere=True);
fig = mne_data.plot_sensors(kind='3d', show_names=True, show=False)
fig = fig.gca().view_init(azim=70, elev=15)
plt.show()

EEG has more than just temporal (changes over time) information, it has spatial as well. To demonstate, lets look at how the signal changes over the head before and during a seizure. 

We will go into different ways of breaking a signal down in more detail in the next notebook, so don't worry if you are not familiar with the welch method I use here. The basic take away is that there is generally more going on!

**NOTES**
- if you are familiar with the welch method we are just looking at the average power spectral density between 1-40Hz

In [None]:
from scipy import signal
def ave_freq(data):
    win = 4 * freq
    freqs, psd = signal.welch(data, freq, nperseg=win, scaling='spectrum')
    #print(freqs[4:160])
    return psd[:,4:160].mean(1)

inter_array = mne_data[:, 50*freq:80*freq][0]
ictal_array = mne_data[:, (seiz_start_time*freq):(seiz_start_time*freq)+30*freq][0]
topo_df = pd.DataFrame([ave_freq(inter_array),ave_freq(ictal_array)], index=['inter', 'ictal'])

fig, axs = plt.subplots(nrows=1, ncols=2, figsize=(15,5))

axs = axs.flatten()
for i, data_class in enumerate(topo_df.T):
    topo, cn = mne.viz.plot_topomap(topo_df.loc[data_class],
                                    mne_data.info,
                                    show=False,
                                    sensors=False,
                                    names=mne_data.info['ch_names'], 
                                    show_names=True,
                                    axes = axs[i],
                                    vmin = topo_df.values.min(),
                                    vmax = topo_df.values.max())
    axs[i].set_title(data_class)
    
fig.show()

Now, as we have done with the other datasets, lets randomly plot different parts of the data to see more examples.

In [None]:
files_with_seizures = []
for file_id in part_info_dict:
    # if there is something in the seizure window
    if part_info_dict[file_id]['Seizures Window']:
        files_with_seizures.append(file_id)

sampled_file = random.sample(files_with_seizures, 1)[0]
sampled_file_path = sampled_file.split('_')[0]+'/'+sampled_file+'.edf'
raw_data, freq = data_load(sampled_file_path, channel_keeps)
mne_data = mne_object(raw_data, freq, part_info_dict[sampled_file]['Seizures Window'])

print(color.BOLD+color.UNDERLINE+sampled_file+color.END)

mne_data.plot(start = 50, 
              duration = 30, **plot_kwargs);

mne_data.plot(start = part_info_dict[sampled_file]['Seizures Window'][0], 
              duration = 30, **plot_kwargs);

# CHB-MIT Scalp EEG Database Inter Vs Pre

The CHB-MIT dataset<sup>1</sup>, consists of records from 23 patients; with one case (chb21) taken from the same patient (chb01) 1.5 years later. The dataset was collected by investigators at the Children’s Hospital Boston and Massachusetts Institute of Technology (MIT). The median length of collection was for 36 hours with small gaps between records each hour due to hardware limitations.

The data contains 198 seizures of various types (focal, lateral, and generalised seizures). All signals were recorded at 256 samples per second with most files containing 23 EEG signals positioned using the International 10-20 system (as we will see later). 

This dataset is one of the most prominent datasets in the literature, as it provides long, continuous recordings for each patient, allowing for both patient specific and patient general models to be developed and tested.


| Subject   | Age/Gender | Seizure Events | Total Ictal Time (secs) | Total Inter-ictal Time (secs) |
|-----------|------------|----------------|------------------|----------------------|
| chb01/chb21 | 11, 13 (F) | 11 | 641  | 263461 |
| chb02       | 11 (M)     | 3  | 172  | 126751 |
| chb03       | 14 (F)     | 7  | 402  | 136366 |
| chb04       | 22 (M)     | 4  | 378  | 561414 |
| chb05       | 7 (F)      | 5  | 558  | 139813 |
| chb06       | 1.5 (F)    | 10 | 153  | 240075 |
| chb07       | 14.5 (F)   | 3  | 325  | 241044 |
| chb08       | 3.5 (M)    | 5  | 919  | 71084  |
| chb09       | 10 (F)     | 4  | 276  | 244043 |
| chb10       | 3 (M)      | 7  | 447  | 179612 |
| chb11       | 12 (F)     | 3  | 806  | 124416 |
| chb12       | 2 (F)      | 27 | 989  | 73466  |
| chb13       | 3 (F)      | 12 | 535  | 118232 |
| chb14       | 9 (F)      | 8  | 169  | 93405  |
| chb15       | 16 (M)     | 20 | 1992 | 142004 |
| chb16       | 7 (F)      | 10 | 84   | 68297  |
| chb17       | 12 (F)     | 3  | 293  | 75310  |
| chb18       | 18 (F)     | 6  | 317  | 127932 |
| chb19       | 19 (F)     | 3  | 236  | 107480 |
| chb20       | 6 (F)      | 8  | 294  | 99043  |
| chb22       | 9 (F)      | 3  | 204  | 111376 |
| chb23       | 6 (F)      | 7  | 424  | 95177  |
| chb24       | NR (NR)    | 16 | 511  | 76134  |
| **Total**   | -          | **185**| **11125**| **3515935**|

**NOTE**
- You may have noticed that in the table above it actually only totals to 185 seizures. Thats because the method I use to load the data into Python does not work on a select few files. This reduces the number of seizure events from 40 to 27 in patient 12 by not including files 27, 28, and 29.

---
1. Shoeb2009

## Data Information
The dataset is stored on Physionet which has some helpful tools to access the data. We are going to use one such package (wfdb) to get a list of the records in the dataset.

In [None]:
import wfdb 

dbs = wfdb.get_dbs()

records_list = wfdb.io.get_record_list('chbmit', records='all')
records_list[:5]

In [None]:
records_list

Using the above, lets get a list of the unique directory names

In [None]:
part_codes = sorted(list(set([record.split('/')[0] for record in records_list])))
part_codes

Each patient has an information file associate with it. Lets load one in and have a look at how it looks before we parse it into something more useful.

In [None]:
import os
from urllib.request import urlretrieve

def get_content(part_code):
  url = "https://physionet.org/physiobank/database/chbmit/"+part_code+'/'+part_code+'-summary.txt'
  filename = "./chbmit.txt"

  urlretrieve(url,filename)

  # read the file into a list
  with open(filename, encoding='UTF-8') as f:
      # read all the document into a list of strings (each line a new string)
      content = f.readlines()
      os.remove(filename)

  return content


In [None]:
get_content(part_codes[23])#[6]

Taking the above, the below function below just parses this file up into a Python dictionary format we can use later. See the output for an example of what it looks like.

In [None]:
import re
part_info_dict = {}

def info_dict(content):
  
  line_nos=len(content)
  line_no=1
  count = 0

  channels = []
  file_name = []
  file_info_dict={}

  for line in content:

    # if there is Channel in the line...
    if re.findall('Channel \d+', line):
      # split the line into channel number and channel reference
      channel = line.split(': ')
      # get the channel reference and remove any new lines
      channel = channel[-1].replace("\n", "")
      # put into the channel list
      channels.append(channel)

    # if the line is the file name
    elif re.findall('File Name', line):
      # if there is already a file_name
      if file_name:
        # flush the current file info to it
        if file_info_dict['Seizures Window'] or not(count):
          count = 1
          part_info_dict[file_name] = file_info_dict

      # get the file name
      file_name = re.findall('\w+\d+_\d+|\w+\d+\w+_\d+', line)[0]

      file_info_dict = {}
      # put the channel list in the file info dict and remove duplicates
      file_info_dict['Channels'] = list(set(channels))
      # reset the rest of the options
      file_info_dict['Start Time'] = ''
      file_info_dict['End Time'] = ''
      file_info_dict['Seizures Window'] = []

    # if the line is about the file start time
    elif re.findall('File Start Time', line):
      # get the start time
      file_info_dict['Start Time'] = re.findall('\d+:\d+:\d+', line)[0]

    # if the line is about the file end time
    elif re.findall('File End Time', line):
      # get the start time
      file_info_dict['End Time'] = re.findall('\d+:\d+:\d+', line)[0]

    elif re.findall('Seizure Start Time|Seizure End Time|Seizure \d+ Start Time|Seizure \d+ End Time', line):
      file_info_dict['Seizures Window'].append(int(re.findall('\d+', line)[-1]))

    # if last line in the list...
    if line_no == line_nos:
      # flush the file info to it
      if file_info_dict['Seizures Window'] or not(count):
          count = 1
          part_info_dict[file_name] = file_info_dict

    line_no+=1
    
        
for part_code in part_codes:
  content = get_content(part_code)
  info_dict(content)

In [None]:
print(color.BOLD+color.UNDERLINE+'part_info_dict'+color.END)
display(part_info_dict['chb24_13'])
print(color.UNDERLINE+'\nPart Keys'+color.END)
print(part_info_dict[list(part_info_dict.keys())[0]].keys())

In [None]:
# part_info_dict_new = dict()

# for i in range(len(records_list)):
#   filee1 = records_list[i].split('/')[1].split('.')[0]
#   filee2 = records_list[i-1].split('/')[1].split('.')[0]
#   #print(filee)
#   try:
#     num = int(records_list[i].split('/')[1].split('.')[0][-2:])
#     num2 = int(records_list[i-1].split('/')[1].split('.')[0][-2:])
#     check1 = part_info_dict[filee1]
#     check2 = part_info_dict[filee2]
#   except:
#     continue
#   if part_info_dict[filee1]['Seizures Window'] and num > 2:
#     part_info_dict_new[filee2] = part_info_dict[filee2]
#     part_info_dict_new[filee1] = part_info_dict[filee1]

In [None]:
part_info_dict

As can be seen below there is a common set of channels found in ALL patients, but there are also some channels only found in individual patients. This is because sometimes channels were swapped during recording for others. 

In [None]:
import pandas as pd     # dataframes
import re

all_channels = []

for key in part_info_dict.keys():
    all_channels.extend(part_info_dict[key]['Channels'])
    
# turn the list into a pandas series
all_channels = pd.Series(all_channels)

# count how many times the channels appear in each participant
channel_counts = all_channels.value_counts()
channel_counts

To deal with the fact some channels are only found in individual patients, I tend to keep channels found in all the patients. This makes generalising models across patients easier, however if you are only training models to identify a particular patients seizures you wouldnt need to do this.

In [None]:
threshold = len(part_info_dict.keys())
channel_keeps = list(channel_counts[channel_counts >= threshold].index)
channel_keeps

## Load Data
Lets now load in some example data. First lets choose a file.

In [None]:
records_list_new = []

for record in records_list:
  try :
    part_info_dict[record.split('/')[1].split('.')[0]]
  except : 
    #print('Nope : ',record)
    continue
  if not(part_info_dict[record.split('/')[1].split('.')[0]]['Seizures Window']):
    records_list_new.append(record)
  elif part_info_dict[record.split('/')[1].split('.')[0]]['Seizures Window'][1] - part_info_dict[record.split('/')[1].split('.')[0]]['Seizures Window'][0] >= 30*2:
    records_list_new.append(record)

In [None]:
len(records_list_new)

In [None]:
EXAMPLE_FILE = records_list_new[0]
EXAMPLE_ID = EXAMPLE_FILE.split('/')[1].split('.')[0]
EXAMPLE_ID

In [None]:
sub_freq = dict()

for record in records_list_new:
  if record.split('/')[1].split('.')[0][:-3] in sub_freq:
    sub_freq[record.split('/')[1].split('.')[0][:-3]] += 1
  else:
    sub_freq[record.split('/')[1].split('.')[0][:-3]] = 1

In [None]:
sub_freq

In [None]:
records_list_new_new = []
count = 0
last = ''

for record in records_list_new:
  sub_f = sub_freq[record.split('/')[1].split('.')[0][:-3]]
  sub = record.split('/')[1].split('.')[0][:-3]
  if sub_f >= 4 and count < 4: 
    records_list_new_new.append(record)
    count += 1
    last = sub
  elif sub_f >= 4 and sub != last:
    count = 0
    records_list_new_new.append(record)
    count += 1
    last = sub

In [None]:
sub_freq2 = dict()

for record in records_list_new_new:
  if record.split('/')[1].split('.')[0][:-3] in sub_freq2:
    sub_freq2[record.split('/')[1].split('.')[0][:-3]] += 1
  else:
    sub_freq2[record.split('/')[1].split('.')[0][:-3]] = 1

In [None]:
sub_freq2

In [None]:
max = 0
min = 1000000
for record in records_list_new_new:
  if not(part_info_dict[record.split('/')[1].split('.')[0]]['Seizures Window']):
    continue
  temp = part_info_dict[record.split('/')[1].split('.')[0]]['Seizures Window'][0] #- part_info_dict[record.split('/')[1].split('.')[0]]['Seizures Window'][0]
  if temp > max:
    max = temp
  if temp < min:
    min = temp

In [None]:
max

In [None]:
min

Now using the function below I can download the data and then load it into a pandas dataframe

In [None]:
%%time
import pandas as pd
import numpy as np
import pyedflib

def data_load(file, selected_channels=[]):

  try: 
    url = "https://physionet.org/physiobank/database/chbmit/"+file
    filename = "./chbmit.edf"

    urlretrieve(url,filename)
    # use the reader to get an EdfReader file
    f = pyedflib.EdfReader(filename)
    os.remove(filename)
    
    # get a list of the EEG channels
    if len(selected_channels) == 0:
      selected_channels = f.getSignalLabels()

    # get the names of the signals
    channel_names = f.getSignalLabels()
    # get the sampling frequencies of each signal
    channel_freq = f.getSampleFrequencies()

    # make an empty file of 0's
    sigbufs = np.zeros((f.getNSamples()[0],len(selected_channels)))
    # for each of the channels in the selected channels
    for i, channel in enumerate(selected_channels):
      # add the channel data into the array
      sigbufs[:, i] = f.readSignal(channel_names.index(channel))
    
    # turn to a pandas df and save a little space
    df = pd.DataFrame(sigbufs, columns = selected_channels).astype('float32')
    
    # get equally increasing numbers upto the length of the data depending
    # on the length of the data divided by the sampling frequency
    index_increase = np.linspace(0,
                                 len(df)/channel_freq[0],
                                 len(df), endpoint=False)

    # round these to the lowest nearest decimal to get the seconds
    seconds = np.floor(index_increase).astype('uint16')

    # make a column the timestamp
    df['Time'] = seconds

    # make the time stamp the index
    #df = df.set_index('Time')

    # name the columns as channel
    #df.columns.name = 'Channel'

    return df, channel_freq[0]

  except:
    OSError
    return pd.DataFrame(), None


raw_data, freq = data_load(EXAMPLE_FILE, channel_keeps)

In [None]:
channel_keeps

In [None]:
display(raw_data)#[channel_keeps[0]])

In [None]:
from tqdm import tqdm

x_data = np.zeros((len(sub_freq2), 18, 256*20, len(channel_keeps)))
y_data = np.zeros((len(sub_freq2), 18))

countt = 0 
countt_s = 0
sub_count = 0

for record in tqdm(records_list_new_new):
  if countt >= 4:
    countt = 0
    countt_s = 0
    sub_count += 1
  raw_data, freq = data_load(record, channel_keeps)
  if freq != 256:
    print('ERROR')
    break
  # seizure
  if part_info_dict[record.split('/')[1].split('.')[0]]['Seizures Window']:
      # Pre Seizure
      start_p = 0
      mid_p = int(part_info_dict[record.split('/')[1].split('.')[0]]['Seizures Window'][0]/2 ) - 10
      end_p = part_info_dict[record.split('/')[1].split('.')[0]]['Seizures Window'][0] - 1 - 20
      for i, channel in enumerate(channel_keeps):
        # Pre seizure
        x_data[sub_count, 0 + 3*countt_s, :, i] = np.array(raw_data[raw_data['Time'].between(start_p+1,start_p+20)][channel].tolist())
        y_data[sub_count, 0 + 3*countt_s] = 1
        x_data[sub_count, 1 + 3*countt_s, :, i] = np.array(raw_data[raw_data['Time'].between(mid_p+1,mid_p+20)][channel].tolist())
        y_data[sub_count, 1 + 3*countt_s] = 1
        x_data[sub_count, 2 + 3*countt_s, :, i] = np.array(raw_data[raw_data['Time'].between(end_p+1,end_p+20)][channel].tolist())
        y_data[sub_count, 2 + 3*countt_s] = 1
      countt_s += 1
  # No seizure
  else:
      # No seizure
      timee = 3600
      start = 0
      end = timee - 1 - 20
      values = np.linspace(start,end,num=9, dtype = int)
      #print(values)
      for i, channel in enumerate(channel_keeps):
        # No seizure
        x_data[sub_count, 9, :, i] = np.array(raw_data[raw_data['Time'].between(values[0]+1,values[0]+20)][channel].tolist())
        y_data[sub_count, 9] = 0
        x_data[sub_count, 10, :, i] = np.array(raw_data[raw_data['Time'].between(values[1]+1,values[1]+20)][channel].tolist())
        y_data[sub_count, 10] = 0
        x_data[sub_count, 11, :, i] = np.array(raw_data[raw_data['Time'].between(values[2]+1,values[2]+20)][channel].tolist())
        y_data[sub_count, 11] = 0
        x_data[sub_count, 12, :, i] = np.array(raw_data[raw_data['Time'].between(values[3]+1,values[3]+20)][channel].tolist())
        y_data[sub_count, 12] = 0
        x_data[sub_count, 13, :, i] = np.array(raw_data[raw_data['Time'].between(values[4]+1,values[4]+20)][channel].tolist())
        y_data[sub_count, 13] = 0
        x_data[sub_count, 14, :, i] = np.array(raw_data[raw_data['Time'].between(values[5]+1,values[5]+20)][channel].tolist())
        y_data[sub_count, 14] = 0
        x_data[sub_count, 15, :, i] = np.array(raw_data[raw_data['Time'].between(values[6]+1,values[6]+20)][channel].tolist())
        y_data[sub_count, 15] = 0
        x_data[sub_count, 16, :, i] = np.array(raw_data[raw_data['Time'].between(values[7]+1,values[7]+20)][channel].tolist())
        y_data[sub_count, 16] = 0
        x_data[sub_count, 17, :, i] = np.array(raw_data[raw_data['Time'].between(values[8]+1,values[8]+20)][channel].tolist())
        y_data[sub_count, 17] = 0        

  countt += 1

In [None]:
x_data

In [None]:
y_data

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
import pickle
with open('/content/drive/MyDrive/EMOTION/VQVAE-trans/Trans-checkpoints-py/x_data_ss_ip.pkl', 'wb') as filepath:
      pickle.dump(x_data, filepath)
with open('/content/drive/MyDrive/EMOTION/VQVAE-trans/Trans-checkpoints-py/y_data_ss_ip.pkl', 'wb') as filepath:
      pickle.dump(y_data, filepath)

In [None]:
sub_count 

In [None]:
x_data.shape

## Plot Data

Now lets plot the data. We will use the dictionary we made earlier to mark an annotation as to where the seizures are in the record.

In [None]:
def mne_object(data, freq, events = None):
  # create an mne info file with meta data about the EEG
  info = mne.create_info(ch_names=list(data.columns), 
                         sfreq=freq, 
                         ch_types=['eeg']*data.shape[-1])
  
  # data needs to be in volts rather than in microvolts
  data = data.apply(lambda x: x*1e-6)
  # transpose the data
  data_T = data.transpose()
  
  # create raw mne object
  raw = mne.io.RawArray(data_T, info)

  if events:
    start_times = np.array(events[::2])
    end_times = np.array(events[1::2])
    anno_length = end_times-start_times
    event_name = np.array(['Ictal']*len(anno_length))

    raw.set_annotations(mne.Annotations(start_times,
                                      anno_length,
                                      event_name))

  return raw

mne_data = mne_object(raw_data, freq, part_info_dict[EXAMPLE_ID]['Seizures Window'])


mne_data.plot(start = 50, 
              duration = 30, **plot_kwargs);

seiz_start_time = part_info_dict[EXAMPLE_ID]['Seizures Window'][0]
mne_data.plot(start = seiz_start_time, 
              duration = 30, **plot_kwargs);

Before we look at random segments again, lets take a second to look at the electrode placement. This is because this is the first *extracranial* dataset (meaning the electrodes are not implanted under the scalp).

Scalp EEG is typically gained through placing 21-256 Ag/AgCl electrodes on the scalp, to enable the measurement of the electrical potential between spatially different electrodes. One electrode is dedicated as a *reference* during recording and another as a *ground*. The terms “ground” and “reference” are sometimes used interchangeably, although they refer to separate processes. 

**Ground Electrode**. 
A ground electrode is a common reference for the system voltage that aims to cancel out the common-mode interference that occurs from the body naturally picking up electromagnetic interference; particularly around 50/60Hz due to power lines. Unless recording takes place in a Faraday cage, this interference often needs to be filtered out during pre-processing (see next notebook) if not already conducted at time of recording by the amplifier. The ground electrode can be placed anywhere on the body, although the forehead or the ear are the most common<sup>1</sup>. 

**Reference Electrode**. 
A reference electrode aims to remove unspecific brain activity by representing the electrical potential between an active electrode of interest and a relatively inactive reference. A reference electrode is also still affected by global voltage changes as it is collected against the signal ground. Referencing can be done either by using a physical reference electrode placed on the earlobe, using any electrode during recording and later re-referencing electrodes to the average output of all electrodes, or by measuring potential between two active electrodes (bipolar recording)<sup>2</sup>. The combination of an active electrode with a reference and a ground creates a *channel*, and the general configuration of these channels are called a *montage*.

These channels here are using a bi-polar montage. For ease of plotting, we will change their names to only be the first channel.

---

1. Light2010

2. Varsavsky2011

3. Teplan2002

In [None]:
replace_dict = {}
drop_list = []
# for the channel names in the data...
for channel_name in mne_data.info['ch_names']:
    # get the name to change too
    name_change = re.findall('\w+',channel_name)[0].title()
    # check if it is already in the change list
    if name_change in list(replace_dict.values()):
        drop_list.append(channel_name)
    else:
        # if its not already there get the origional name and what we want to 
        # change it to
        replace_dict[channel_name] = name_change

# drop the ones that would be repeats
mne_data.drop_channels(drop_list)
# rename the channels
mne_data.rename_channels(replace_dict)
# set the standard montage
mne_data.set_montage('standard_1020')

Now we have set the names and montage lets plot it

In [None]:
mne_data.plot_sensors(kind='topomap', show_names=True, to_sphere=True);
fig = mne_data.plot_sensors(kind='3d', show_names=True, show=False)
fig = fig.gca().view_init(azim=70, elev=15)
plt.show()

EEG has more than just temporal (changes over time) information, it has spatial as well. To demonstate, lets look at how the signal changes over the head before and during a seizure. 

We will go into different ways of breaking a signal down in more detail in the next notebook, so don't worry if you are not familiar with the welch method I use here. The basic take away is that there is generally more going on!

**NOTES**
- if you are familiar with the welch method we are just looking at the average power spectral density between 1-40Hz

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
import pickle
with open('/content/drive/MyDrive/EMOTION/encoder/DSEQAE_ecoded_dataset_mse88_nzz7680_attn444__bino_all2.pkl', 'rb') as filepath:
          x_data = pickle.load(filepath)

In [None]:
# import mne

In [None]:
x_data.shape

In [None]:
import scipy.io as sio
import numpy as np
import pickle
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
import h5py
import scipy.io as sio
from sklearn.preprocessing import StandardScaler
from sklearn.manifold import TSNE
# from cuml.manifold import TSNE
%matplotlib inline

In [None]:
def ZscoreNormalization(x):
    """Z-score normaliaztion"""
    x = (x - np.mean(x)) / np.std(x)
    return x

In [None]:
xxx = ZscoreNormalization(x_data[10])

In [None]:
plt.plot(xxx[100:200])

In [None]:
plt.plot(xxx[1200])

In [None]:
np.mean(xxx[:1000], axis = 0)

In [None]:
np.mean(xxx[1000:2000], axis = 0)

In [None]:
np.mean(x_data[0,:1000], axis = 0)

In [None]:
from scipy import signal
def ave_freq(data):
    win = 4 * freq
    freqs, psd = signal.welch(data, freq, nperseg=win, scaling='spectrum')
    #print(freqs[4:160])
    return psd[:,4:160].mean(1)

inter_array = mne_data[:, 50*freq:80*freq][0]
ictal_array = mne_data[:, (seiz_start_time*freq):(seiz_start_time*freq)+30*freq][0]
topo_df = pd.DataFrame([ave_freq(inter_array),ave_freq(ictal_array)], index=['inter', 'ictal'])

fig, axs = plt.subplots(nrows=1, ncols=2, figsize=(15,5))

axs = axs.flatten()
for i, data_class in enumerate(topo_df.T):
    topo, cn = mne.viz.plot_topomap(topo_df.loc[data_class],
                                    mne_data.info,
                                    show=False,
                                    sensors=False,
                                    names=mne_data.info['ch_names'], 
                                    show_names=True,
                                    axes = axs[i],
                                    vmin = topo_df.values.min(),
                                    vmax = topo_df.values.max())
    axs[i].set_title(data_class)
    
fig.show()

Now, as we have done with the other datasets, lets randomly plot different parts of the data to see more examples.

In [None]:
files_with_seizures = []
for file_id in part_info_dict:
    # if there is something in the seizure window
    if part_info_dict[file_id]['Seizures Window']:
        files_with_seizures.append(file_id)

sampled_file = random.sample(files_with_seizures, 1)[0]
sampled_file_path = sampled_file.split('_')[0]+'/'+sampled_file+'.edf'
raw_data, freq = data_load(sampled_file_path, channel_keeps)
mne_data = mne_object(raw_data, freq, part_info_dict[sampled_file]['Seizures Window'])

print(color.BOLD+color.UNDERLINE+sampled_file+color.END)

mne_data.plot(start = 50, 
              duration = 30, **plot_kwargs);

mne_data.plot(start = part_info_dict[sampled_file]['Seizures Window'][0], 
              duration = 30, **plot_kwargs);