<h2></h2>
<p></p>
<h1>Analysing BCI data</h1>
<p>
This tutorial will give some explanations of BCI data analysis. We will do it with using python mne library, but in other languages and libraries should me analogs. Some actual code you can find <a href="https://github.com/VitalyVV/meditation_project/blob/master/Untitled.ipynb">here</a>.
</p>
<h2>Why we need to analyse BCI data?</h2>
<p>The success of the experiment lies not only in the correct formulation of the experiment itself, but also in the correct analysis of the data received from BCI, so we should make proper analysis of it..</p>
<h2>Steps of analysis:</h2>
<ol>
    <li>
        <p>First of all, we should collect and load our data. If the experiment was carried out correctly and we have proper data obtained from the BCi, we need to bring data to a certain format (if this was not done automatically or if the data was taken from several sources). After that we have to upload them for further processing.
        </p>
    </li>
    <li>
        <p>After loading data we need to make preprocessing: cut unneeded parts and filter data. | Some information obtained from the BCI is often redundant, that is, they have extra parts (for example, sample indexes, accel data, time stamps or aux data). If this information is not needed for analysis, it is worth cutting it out, because extra data for analysis entail additional error and resource costs. | Also, when working with the BCI, it is necessary to take into account data contamination: it may be external stimuli such as the electrical network or information that interferes with research such as some body movements. This data must be filtered, because otherwise the data will carry a lot of interference and the error will be high.
        </p>
    </li>
    <li>
        <p>After preprocessing we should format our data in correspondence with functions, that we will use. | When working with functions and libraries, you should pay attention to the format of input and output data, because for most of the functions data should be strictly formatted.
        </p>
    </li>
    <li>
        <p>Then we preprocessed and formatted our data, we should develop some interpretation of this data. | The data received from BCI itself does not carry the meaning, if not interpreted. You need to come up with ideas on how to get something worthwhile from the data. The conclusion of the study is built on the basis of interpretation. This step should be treated very carefully.
        </p>
    </li>
    <li>
        <p>We also can make some visualisations. | Some information is easier to perceive and analyze visually. So you can see what to look for during data interpretation. This step can be carried out both before and after the interpretation of the data.
        </p>
    </li>

</ol>
<h2>Why it is enough?</h2>
<p>Performing the above steps is sufficient for data analysis. We got rid of the artifacts in the record obtained from the interface and filtered the data in a certain range, thereby obtaining pure processed data suitable for analysis. Therefore, the interpretation and visualization, if they are designed correctly, will give us with high probability useful data with which you can work in the study.</p>
<h2>More details with examples</h2>

First of all we need to import libraries:

In [22]:
import numpy as np
import mne
from pathlib import Path
from mne.preprocessing import ICA
import matplotlib.pyplot as plt
import pandas as pd
from mne.filter import filter_data 
%matplotlib inline

<h3>Collecting and Loading</h3>
<p>If you have prepaired formatted data, you can skip step of preformatting.</p>
<p>It is often better to have data in one format, so working with it will be much easier. In our study file format is following:</p>

In [23]:
filename="example_data.txt"
file = open(filename, "r").readlines()
for line in file:
    print(line[:-1])
    if line[0]=='3':
        break


%OpenBCI Raw EEG Data
%Number of channels = 8
%Sample Rate = 250.0 Hz
%First Column = SampleIndex
%Last Column = Timestamp 
%Other Columns = EEG data in microvolts followed by Accel Data (in G) interleaved with Aux Data
0, 0.00, -476.03, -1125.54, -1217.12, -1151.03, -1294.34, -1167.79, -994.50, -0.006, -0.176, 1.014, 13:15:54.702, 1553940954702
1, 0.00, -473.97, -1126.24, -1213.34, -1150.35, -1291.35, -1165.33, -991.43, 0.000, 0.000, 0.000, 13:15:54.762, 1553940954762
2, 0.00, -481.61, -1134.95, -1223.96, -1158.65, -1301.18, -1171.97, -1001.65, 0.000, 0.000, 0.000, 13:15:54.762, 1553940954762
3, 0.00, -489.50, -1140.30, -1235.65, -1163.79, -1312.16, -1180.82, -1012.47, 0.000, 0.000, 0.000, 13:15:54.762, 1553940954762


<p>Here strokes starting from "%" are comments which describe strokes of data. After comments are placed all data received from BCI in format specified in comments.
</p>
<p>In our study we have some problem with EEG data, because it is given to us in microvolts, but mne library works only with volts. That's why we need to convert units:</p>

In [24]:
def microvolts_to_volts(value):
    """
    Since openBCI writes data into micro volts and mne works with volts we
    will need to convert the data later.
    :param value: single micro volts value
    :return: same value in volts
    """
    return float(value) / 1000

# Converter of BCI file to valuable data
converter = {i: (microvolts_to_volts if i < 12 else lambda x: str(x).split(".")[1][:-1])
    for i in range(0, 13)}

<p>After formatting all data we can load data and specify all parameters. In our study we work with <a href="https://martinos.org/mne/dev/generated/mne.io.RawArray.html">mne library RawArray</a>. To build correct and readable data with it we should specify channel's names, channel's types, sample frequency and montage standart.</p>
<p>In our study we have 8 EEG channels in standart 10-20: fp2, fp1, f4, f3, c4, c3, o2, o1. And our sample rate is 250 Hz. We create instance of info (from mne library) and give all this data into it. Note that order of channels is referring to channel number on Cyton BCI board.</p>

In [25]:
ch_names = {"fp2":1, "fp1":2, "f4":3, "f3":4, "c4":5, "c3":6, "o2":7, "o1":8}

info = mne.create_info(
        ch_names=list(ch_names.keys()),
        ch_types=['eeg' for i in range(0, len(ch_names))],
        sfreq=250,
        montage='standard_1020'
    )

<p>Now we should actually load our data. Often loading functions also have some loading options, like specifying format of data, comments and etc. In our study we skip first and last 10000 rows (40 seconds) of data, because these parts are testing data, which has no meaning. In our format comments start from "%" and all data is separated by ",". So we wrote this in function attributes. Was used <a href="https://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html">numpy function loadtxt.</a></p>

In [35]:
skiprows = 10000
max_rows = 50000
raw_data = np.loadtxt(filename, comments="%", delimiter=",", converters=converter, skiprows=skiprows, max_rows=max_rows).T

After loading raw data looks like N-dimensional array:

In [27]:
print(raw_data.shape)

(14, 50000)


We have 14 columns of data and many rows. We had the same picture in example_data.txt file, so we assume, that loading was completed correctly.

<h3>Preprocessing</h3>

<p>In preprocessing step we should make filtering of our data and cut everything unneeded. At first it is better to cut data, because filtering is resource-consuming process and we should cut data beforehand. In our study we need only data from our channels, so we don't need order numbers, accel and aux data:</p>

In [28]:
cut_data = raw_data[list(ch_names.values())]

After cutting sizes of our array should lessen:

In [29]:
print(cut_data.shape)

(8, 50000)


<p>Now we have only 8 needed channels, so cutting was done correctly.</p>
<p>After that we can filter data. We may have to complete complicated filtering, like cutting off artifacts. It can be done via fourier (or wavelet) transform. But in our study we decided to only cut frequencies in range [2:50] Hz, because too high and low frequencies are not related to our study. We made it through mne library, using <a href="https://martinos.org/mne/dev/generated/mne.filter.filter_data.html">filter_data function</a>:

In [30]:
filtered_data = filter_data(cut_data, 250, l_freq=2, h_freq=50)

Setting up band-pass filter from 2 - 50 Hz

FIR filter parameters
---------------------
Designing a one-pass, zero-phase, non-causal bandpass filter:
- Windowed time-domain design (firwin) method
- Hamming window with 0.0194 passband ripple and 53 dB stopband attenuation
- Lower passband edge: 2.00
- Lower transition bandwidth: 2.00 Hz (-6 dB cutoff frequency: 1.00 Hz)
- Upper passband edge: 50.00 Hz
- Upper transition bandwidth: 12.50 Hz (-6 dB cutoff frequency: 56.25 Hz)
- Filter length: 413 samples (1.652 sec)



After filtering sizes of our array should remain the same, but data inside of array cells should differ:

In [31]:
print(filtered_data.shape)
print("Cut data cells:")
print(cut_data)
print("Filtered data cells:")
print(filtered_data)

(8, 50000)
Cut data cells:
[[ 9.09913  8.99734  8.62867 ...  8.55276  8.21078  8.22886]
 [17.18185 17.00076 16.55194 ... 18.05818 17.65871 17.80851]
 [10.69576 10.49193 10.01953 ...  8.51838  8.12472  8.26212]
 ...
 [ 2.1225   1.93251  1.47624 ...  1.58516  1.18929  1.33554]
 [ 9.2281   9.0396   8.58175 ...  7.71549  7.31146  7.46758]
 [ 9.01844  8.81586  8.35685 ...  7.58996  7.15511  7.28522]]
Filtered data cells:
[[ 2.70616862e-15 -2.12979924e-01 -4.68692954e-01 ...  2.96358353e-01
   6.78005885e-02  1.06858966e-15]
 [ 2.42167397e-15 -3.32791796e-01 -6.20281176e-01 ...  2.09838171e-01
  -5.38631254e-02  3.60822483e-15]
 [ 3.93435284e-15 -3.66733481e-01 -6.62942234e-01 ...  2.14771353e-01
  -4.28887500e-02  6.03683770e-16]
 ...
 [ 5.27355937e-16 -3.47085746e-01 -6.34175555e-01 ...  2.07797848e-01
  -5.09210456e-02  2.22044605e-16]
 [ 2.10942375e-15 -3.41411341e-01 -6.35585419e-01 ...  2.01001598e-01
  -5.89745826e-02  1.60982339e-15]
 [ 1.01307851e-15 -3.58927575e-01 -6.49495024e-01 

We can see, that size of array remained the same, and data changed, so we assume, that filtering was completed correctly.

<h3>Formatting</h3>

<p>In formatting step we should prepare our preprocessed data to further processing. In simple words, we should match constraints of functions we would use later. In our study we work with ICA plotting and vectorizations, so firstly we need to create mne object from our data and info created before:</p>

In [32]:
mne_data = mne.io.RawArray(filtered_data, info)

Creating RawArray with float64 data, n_channels=8, n_times=50000
    Range : 0 ... 49999 =      0.000 ...   199.996 secs
Ready.


<p>After matching data into mne object we need to cut our data into epochs, becasuse it will help us in further processing. Importance of cutting data depends on your goal, but it is commonly used for //TODO uses.</p>

In [33]:
def create_epochs(raw_data, duration=1):
    """
    Chops the RawArray onto Epochs given the time duration of every epoch
    :param raw_data: mne.io.RawArray instance
    :param duration: seconds for copping
    :return: mne Epochs class
    """
    events = mne.make_fixed_length_events(raw_data, duration=duration)
    epochs = mne.Epochs(raw_data, events, preload=True)
    return epochs

data_series = create_epochs(mne_data)

200 matching events found
Applying baseline correction (mode: mean)
Not setting metadata
0 projection items activated
Loading data for 200 events and 176 original time points ...
1 bad epochs dropped


Now size of our N-dimensional array changed:

In [34]:
print(data_series.get_data().shape)

(199, 8, 176)


It became 3D array.

<h3>Interpretation</h3>

//TODO 

<h3>Visualisation</h3>

//TODO