# Getting Started with EEG Data

In 2011-2012, the brain-computer interface (BCI) research group at Colorado State University recorded EEG signals from subjects in our lab and in their homes, using three different EEG systems. One goal of this work is to determine if inexpensive EEG systems (about $7,000) are as effective as more expensive ones (about $40,000) for conducting BCI experiments in the home.

On this page, we summarize the steps you can follow to download some of the data, load it into an ipython environment, and visualize it. We also show examples of looking at P300 ERP’s.

# Downloading EEG Data

EEG data from multiple subjects can be downloaded from our Public BCI Data site. Let’s select the data files for the first subject in each device column, for subjects recorded in our lab.

![Data download page](http://www.cs.colostate.edu/eeg/data/json/doc/tutorial/_build/html/_images/eegDownload.png)

# Extract the downloaded data
The zip file should contain six zipped data files. Extract the files, for example using the following commands:

```
> cd ~/Download

> unzip eeg.zip
Archive:  eeg.zip
 extracting: s20-activetwo-gifford-unimpaired.json.zip
 extracting: s21-activetwo-gifford-unimpaired.json.zip
 extracting: s20-gammasys-gifford-unimpaired.json.zip
 extracting: s21-gammasys-gifford-unimpaired.json.zip
 extracting: s20-mindset-gifford-unimpaired.json.zip
 extracting: s21-mindset-gifford-unimpaired.json.zip

> rm eeg.zip

> ls -l --block-size=M *json*
-rw-r--r-- 1 ... 84M Mar 12 10:50 s20-activetwo-gifford-unimpaired.json.zip
-rw-r--r-- 1 ...  5M Mar 12 10:50 s20-gammasys-gifford-unimpaired.json.zip
-rw-r--r-- 1 ... 29M Mar 12 10:50 s20-mindset-gifford-unimpaired.json.zip
-rw-r--r-- 1 ... 80M Mar 12 10:51 s21-activetwo-gifford-unimpaired.json.zip
-rw-r--r-- 1 ...  5M Mar 12 10:51 s21-gammasys-gifford-unimpaired.json.zip
-rw-r--r-- 1 ... 28M Mar 12 10:52 s21-mindset-gifford-unimpaired.json.zip

> unzip s20-gammasys-gifford-unimpaired.json.zip
Archive:  s20-gammasys-gifford-unimpaired.json.zip
  inflating: s20-gammasys-gifford-unimpaired.json

> unzip s20-mindset-gifford-unimpaired.json.zip
Archive:  s20-mindset-gifford-unimpaired.json.zip
  inflating: s20-mindset-gifford-unimpaired.json

> unzip s20-activetwo-gifford-unimpaired.json.zip
Archive:  s20-activetwo-gifford-unimpaired.json.zip
  inflating: s20-activetwo-gifford-unimpaired.json

> rm s20*zip
```

The unzipped data can loaded into an ipython environment.

In [None]:
# Imports
import matplotlib.pyplot as plt
import numpy as np
import json

In [1]:
# Open the data file
gammasys_data_file = open('s20-gammasys-gifford-unimpaired.json','r')

# Parse data to JSON
data = json.load(gammasys_data_file)

FileNotFoundError: [Errno 2] No such file or directory: 's20-gammasys-gifford-unimpaired.json'

The variable data is a list of dictionaries, each with the same keys.

In [2]:
len(data)

NameError: name 'data' is not defined

In [3]:
data[0].keys()

NameError: name 'data' is not defined

Here is a handy function to show keys and their values in each data element.

In [4]:
def summarize(datalist):
    for i,element in enumerate(datalist):
        keys = element.keys()
        print '\nData set', i
        keys.remove('eeg')
        for key in keys:
            print '  {}: {}'.format(key,element[key])
        eegtrials = element['eeg']
        shape = np.array(eegtrials['trial 1']).shape
        print ('  eeg: {:d} trials, each a matrix with {:d} rows' +
              ' and approximately {:d} columns').format( \
            len(eegtrials), shape[0], shape[1])

SyntaxError: Missing parentheses in call to 'print' (<ipython-input-4-8720a5ab976c>, line 6)

In [5]:
summarize(data)

NameError: name 'summarize' is not defined

# Plotting some EEG

The first element of the data list has key-value pair `protocol: 3minutes`, meaning that this element contains 3 minutes of EEG recorded while the subject was asked to relax and look at the computer screen. Let’s take a look at 2 seconds of this data.

The EEG consists of one matrix with 9 rows and 46,342 columns. The 9 rows correspond to the channels `channels:  ['F3', 'F4', 'C3', 'C4', 'P3', 'P4', 'O1', 'O2']` plus one more channel that is used to mark stimuli onset and offset, which is not used for the 3 minute protocol. The number of samples (in columns) in one second depends on the sample rate, which for this device, `device: GAMMAsys`, is 256 samples per second, `sample rate: 256`. Let’s plot data from all 9 channels for columns 1,000 to 1,512.

In [6]:
first = data[0]
eeg = np.array(first['eeg']['trial 1'])
eeg.shape

NameError: name 'data' is not defined

In [7]:
# Using ending semicolon to suppress output of plotting functions.
plt.figure(1);

plt.plot(eeg[:,4000:4512].T);

plt.axis('tight');

NameError: name 'plt' is not defined