# Neural Analysis

Esta atividade consiste em identificar e classificar sinais biológicos a partir dos dados obtidos por meio de um equipamento que realiza a medição de sinais neurais de pessoas enquanto as mesmas jogavam um jogo. Tal experimento foi realizado para compor o trabalho de mestrado de Adam, do Warthog Robotics.

Tal atividade foi realizada como segundo trabalho prático da disciplina de Inteligência Artificial ministrada pela professora Solange Rezende no 1° Semestre de 2019 na USP de São Carlos.

## Introdução Teórica

Os dados obtidos consistem em sinais neurais advindos de um capacete utilizado para eletroencefalografia que utiliza o padrão 20-10 para colocação dos sensores. Desse modo, a seguinte imagem descreve a maneira com a qual os sensores estavam distribuídos na cabeça de cada um dos utilizadores.

![WhatsApp%20Image%202019-05-22%20at%2022.22.29.jpeg](attachment:WhatsApp%20Image%202019-05-22%20at%2022.22.29.jpeg)

Este experimento foi realizado com 8 pessoas diferentes, cada uma delas fornece quatro bases de dados. Durante o experimento, as mesmas deveriam jogar um jogo...

O projeto consiste em:

- Identificar, de modo manual, as características que definem os sinais biológicos que queremos extrair e extrair os sinais com esses valores da nossa base.
- Clusterizar esses sinais para tentar modelar as características principais.
- Colocar esses dados em um classificador e rodar o classificador nos dados para tentar realizar a identificação automática.

Para os cabras do meu grupo, precisa fazer o seguinte:

- Filtrar piscada e olho e batida de coração.
- Jogar num K-Means e outros clusterizadores para encontrar 3 ou 4 grupos dentro dos dados que pegamos.
- Classificar e então jogar num classificador supervisionado para ele aprender.
- Rodar o classificador nos dados.
- Printar os resultados e avaliar se conseguimos ou não identificar os artefatos biológicos.

## Obtenção dos dados

Os dados estão salvos como um arquivo .csv dentro da pasta data. Eles serão importados para iniciar os trabalhos de análise dos mesmos.

In [30]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

sns.set()

import warnings
warnings.filterwarnings('ignore')

In [24]:
df_neural = pd.read_csv("data/recordRaw-[2018.11.16-10.16.29].csv")
df_neural

Unnamed: 0,Time:512Hz,Epoch,Channel 1,Channel 2,Channel 3,Channel 4,Channel 5,Channel 6,Channel 7,Channel 8,...,Channel 10,Channel 11,Channel 12,Channel 13,Channel 14,Channel 15,Channel 16,Event Id,Event Date,Event Duration
0,0.000000,0,-104797280.0,-31468.654297,-125853.765625,-104797280.0,-104797280.0,68638.679688,61334.183594,-107463.726562,...,-1.836734e+04,67942.195312,-104797280.0,-104797280.0,47687.062500,14813.770508,-104797280.0,,,
1,0.001953,0,-104797048.0,-31445.656250,-125829.890625,-104797048.0,-104797048.0,68646.632812,61358.648438,-107438.632812,...,-1.804248e+04,68010.796875,-104797048.0,-104797048.0,47718.019531,14846.583008,-104797048.0,,,
2,0.003906,0,-104796432.0,-31431.007812,-125838.531250,-104796432.0,-104796432.0,68623.054688,61354.398438,-107447.960938,...,-1.756045e+04,68068.171875,-104796432.0,-104796432.0,47744.925781,14887.793945,-104796432.0,,,
3,0.005859,0,-104795864.0,-31421.486328,-125844.835938,-104795864.0,-104795864.0,68608.742188,61346.441406,-107460.992188,...,-1.730254e+04,68082.914062,-104795864.0,-104795864.0,47722.855469,14871.778320,-104795864.0,,,
4,0.007812,0,-104795496.0,-31453.273438,-125840.585938,-104795496.0,-104795496.0,68610.750000,61346.246094,-107458.312500,...,-1.720850e+04,68107.429688,-104795496.0,-104795496.0,47709.132812,14869.141602,-104795496.0,,,
5,0.009766,0,-104795664.0,-31443.703125,-125829.062500,-104795664.0,-104795664.0,68618.218750,61367.289062,-107434.335938,...,-1.741314e+04,68096.195312,-104795664.0,-104795664.0,47703.128906,14857.666992,-104795664.0,,,
6,0.011719,0,-104796256.0,-31444.093750,-125847.765625,-104796256.0,-104796256.0,68627.445312,61355.667969,-107449.226562,...,-1.787114e+04,67997.804688,-104796256.0,-104796256.0,47642.777344,14825.586914,-104796256.0,,,
7,0.013672,0,-104796880.0,-31437.013672,-125837.507812,-104796880.0,-104796880.0,68644.000000,61368.851562,-107435.015625,...,-1.818594e+04,68008.890625,-104796880.0,-104796880.0,47652.003906,14829.541992,-104796880.0,,,
8,0.015625,0,-104797224.0,-31450.490234,-125844.687500,-104797224.0,-104797224.0,68615.437500,61356.253906,-107450.984375,...,-1.832632e+04,68000.789062,-104797224.0,-104797224.0,47634.718750,14823.536133,-104797224.0,,,
9,0.017578,0,-104797168.0,-31418.410156,-125816.023438,-104797168.0,-104797168.0,68646.492188,61383.109375,-107424.273438,...,-1.821846e+04,68039.601562,-104797168.0,-104797168.0,47671.781250,14836.866211,-104797168.0,,,


In [4]:
df_neural.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 54368 entries, 0 to 54367
Data columns (total 21 columns):
Time:512Hz        54368 non-null float64
Epoch             54368 non-null int64
Channel 1         54368 non-null float64
Channel 2         54368 non-null float64
Channel 3         54368 non-null float64
Channel 4         54368 non-null float64
Channel 5         54368 non-null float64
Channel 6         54368 non-null float64
Channel 7         54368 non-null float64
Channel 8         54368 non-null float64
Channel 9         54368 non-null float64
Channel 10        54368 non-null float64
Channel 11        54368 non-null float64
Channel 12        54368 non-null float64
Channel 13        54368 non-null float64
Channel 14        54368 non-null float64
Channel 15        54368 non-null float64
Channel 16        54368 non-null float64
Event Id          0 non-null float64
Event Date        0 non-null float64
Event Duration    0 non-null float64
dtypes: float64(20), int64(1)
memory usage: 

## Tratando os dados

Como nós podemos ver, existem três colunas (Event Id, Event Date e Event Duration) que tem valores nulos para todas as entradas, além disso, conforme exposto na introdução teórica, alguns sensores podem ser desconsiderados.

In [5]:
#df_neural = df_neural.drop(["Event Id", "Event Date", "Event Duration", "Channel 1","Channel 4",
                            #"Channel 5","Channel 9","Channel 12","Channel 13", "Channel 16"], axis=1)
#df_neural

## Analisando os dados

Vamos analisar os dados que ainda temos para verificar se podemos identificar algumas características.

In [6]:
df_neural.describe()

Unnamed: 0,Time:512Hz,Epoch,Channel 1,Channel 2,Channel 3,Channel 4,Channel 5,Channel 6,Channel 7,Channel 8,...,Channel 10,Channel 11,Channel 12,Channel 13,Channel 14,Channel 15,Channel 16,Event Id,Event Date,Event Duration
count,54368.0,54368.0,54368.0,54368.0,54368.0,54368.0,54368.0,54368.0,54368.0,54368.0,...,54368.0,54368.0,54368.0,54368.0,54368.0,54368.0,54368.0,0.0,0.0,0.0
mean,53.092773,849.0,-104802500.0,-33037.728284,-124378.714743,-104802500.0,-104802500.0,67418.18938,68457.271626,-108798.978031,...,-1934815.0,66779.214954,-104802500.0,-104802500.0,34864.436411,-3141336.0,-104802500.0,,,
std,30.653973,490.463479,3496.102,647.828785,1495.156517,3496.102,3496.102,1225.346847,5031.156333,790.755097,...,13777900.0,1062.082387,3496.102,3496.102,8511.742391,17855660.0,3496.102,,,
min,0.0,0.0,-104809300.0,-34104.738281,-126418.320312,-104809300.0,-104809300.0,64641.851562,61322.757812,-110497.273438,...,-104809300.0,64160.453125,-104809300.0,-104809300.0,19392.873047,-104802200.0,-104809300.0,,,
25%,26.546387,424.0,-104805400.0,-33446.207031,-125909.152344,-104805400.0,-104805400.0,66621.136719,62372.087891,-109257.941406,...,-118419.5,66160.648438,-104805400.0,-104805400.0,27300.612305,-8516.383,-104805400.0,,,
50%,53.092773,849.0,-104802700.0,-33306.644531,-123872.324219,-104802700.0,-104802700.0,67509.015625,71092.144531,-108759.53125,...,-101598.5,66677.445312,-104802700.0,-104802700.0,37076.345703,-1128.516,-104802700.0,,,
75%,79.63916,1274.0,-104799700.0,-32779.666992,-122819.041016,-104799700.0,-104799700.0,68431.28125,73166.414062,-108321.882812,...,-41645.21,67360.378906,-104799700.0,-104799700.0,40698.268555,1223.12,-104799700.0,,,
max,106.185547,1698.0,-104795500.0,-31227.492188,-122633.164062,-104795500.0,-104795500.0,69493.507812,74430.179688,-107307.429688,...,-8205.177,68930.132812,-104795500.0,-104795500.0,47747.855469,14960.35,-104795500.0,,,


Vamos começar deixando registrado qual o formato dos dados no domínio do tempo.

In [7]:
#for i in range(17):
#    if(i == 0):
#        continue
#    plt.figure()
#    string = "Channel {}".format(i)
#    sns.lineplot(df_neural["Time:512Hz"],df_neural[string])


Agora, vamos olhar os dados no domínio da frequência usando dois tipos de transformadas diferentes: a Transformada de Fourier (FFT) e fazendo o diagrama de Bode dos canais.

In [8]:
#for i in range(17):
#    if(i == 0):
#        continue
#    plt.figure()
#    string = "Channel {}".format(i)
#    sns.lineplot((np.log10(2*3.14/df_neural["Time:512Hz"])),np.log10(np.fft.fft(df_neural[string])))

In [9]:
#for i in range(17):
#    if(i == 0):
#        continue
#    plt.figure()
#    string = "Channel {}".format(i)
#    sns.lineplot((2*3.14/df_neural["Time:512Hz"]),np.fft.fft(df_neural[string]))

## Extraindo os dados com filtros

In [10]:
df_knn = np.abs(df_neural.drop(["Epoch","Event Id", "Event Date", "Event Duration"], axis=1))

In [11]:
df_knn.describe()

Unnamed: 0,Time:512Hz,Channel 1,Channel 2,Channel 3,Channel 4,Channel 5,Channel 6,Channel 7,Channel 8,Channel 9,Channel 10,Channel 11,Channel 12,Channel 13,Channel 14,Channel 15,Channel 16
count,54368.0,54368.0,54368.0,54368.0,54368.0,54368.0,54368.0,54368.0,54368.0,54368.0,54368.0,54368.0,54368.0,54368.0,54368.0,54368.0,54368.0
mean,53.092773,104802500.0,33037.728284,124378.714743,104802500.0,104802500.0,67418.18938,68457.271626,108798.978031,104802500.0,1934815.0,66779.214954,104802500.0,104802500.0,34864.436411,3145907.0,104802500.0
std,30.653973,3496.102,647.828785,1495.156517,3496.102,3496.102,1225.346847,5031.156333,790.755097,3496.102,13777900.0,1062.082387,3496.102,3496.102,8511.742391,17854860.0,3496.102
min,0.0,104795500.0,31227.492188,122633.164062,104795500.0,104795500.0,64641.851562,61322.757812,107307.429688,104795500.0,8205.177,64160.453125,104795500.0,104795500.0,19392.873047,0.2929688,104795500.0
25%,26.546387,104799700.0,32779.666992,122819.041016,104799700.0,104799700.0,66621.136719,62372.087891,108321.882812,104799700.0,41645.21,66160.648438,104799700.0,104799700.0,27300.612305,1203.101,104799700.0
50%,53.092773,104802700.0,33306.644531,123872.324219,104802700.0,104802700.0,67509.015625,71092.144531,108759.53125,104802700.0,101598.5,66677.445312,104802700.0,104802700.0,37076.345703,4622.51,104802700.0
75%,79.63916,104805400.0,33446.207031,125909.152344,104805400.0,104805400.0,68431.28125,73166.414062,109257.941406,104805400.0,118419.5,67360.378906,104805400.0,104805400.0,40698.268555,12866.2,104805400.0
max,106.185547,104809300.0,34104.738281,126418.320312,104809300.0,104809300.0,69493.507812,74430.179688,110497.273438,104809300.0,104809300.0,68930.132812,104809300.0,104809300.0,47747.855469,104802200.0,104809300.0


### Montando a tabela com os dados da FFT

In [25]:
from scipy import fft

n = len(df_knn["Time:512Hz"])

knn_freq = pd.DataFrame()
for i in range(17):
    string = "Channel {}".format(i)
    if(i == 0):
        continue
    a = fft(df_knn[string])/n
    a = a[0:n//2]
    knn_freq[string] = a

In [28]:
k = np.arange(n)
T = n/512
frq = k/T
frq = frq[range(n//2)]
knn_freq["Freq"] = frq

In [29]:
knn_freq

Unnamed: 0,Channel 1,Channel 2,Channel 3,Channel 4,Channel 5,Channel 6,Channel 7,Channel 8,Channel 9,Channel 10,Channel 11,Channel 12,Channel 13,Channel 14,Channel 15,Channel 16,Freq
0,(1.1994948591434458e-11+2.1320198893810413e-48j),(33037.72828441707+0j),(124378.71474316233+0j),(104802542.7467628+0j),(104802542.7467628+0j),(67418.1893795523+0j),(68457.27162641914+0j),(108798.97803050507+0j),(104802542.7467628+0j),(1934815.352828629+0j),(66779.21495432458+0j),(104802542.7467628+0j),(104802542.7467628+0j),(34864.436410851165+0j),(3145906.662186155+0j),(104802542.7467628+0j),0.000000
1,(-9.064960997290212e-18+2.1612173806753245e-16j),(-176.07543428028907+232.90983256648013j),(-13.886884357122831-991.437976903808j),(-79.20258725359042+1888.3038572171247j),(-79.20258725359042+1888.3038572171247j),(-439.4838992613158-211.0494022471998j),(-631.35772142375+3126.8444089511127j),(8.286355297140652+348.2809365076467j),(-79.20258725359042+1888.3038572171247j),(1819336.590688977+118829.89105036817j),(200.94525013236253-309.2169998342318j),(-79.20258725359042+1888.3038572171247j),(-79.20258725359042+1888.3038572171247j),(-950.9788660777475-4362.5086662090325j),(-2690729.8312729653-1609643.5685311346j),(-79.20258725359042+1888.3038572171247j),0.009417
2,(-6.130268086288384e-20+1.1359469574325748e-16j),(-7.075291480337915+234.0932534013445j),(-103.42986158984776-72.57020951567837j),(-0.5356152035433595+992.5022075493558j),(-0.5356152035433595+992.5022075493558j),(251.79336950649596-378.6114330509613j),(464.2705659033283+729.6021502885815j),(-11.47796462715438+283.1371986052237j),(-0.5356152035433595+992.5022075493558j),(1843369.9561189949+217977.2829795405j),(-153.58395003941177-470.0258950221729j),(-0.5356152035433595+992.5022075493558j),(-0.5356152035433595+992.5022075493558j),(799.3010478015091-2656.206902402201j),(1480265.0744406823+2747859.641202067j),(-0.5356152035433595+992.5022075493558j),0.018835
3,(-4.1532708957388615e-19+7.541087676330338e-17j),(-8.747251511147098+119.22325065099942j),(91.89073624165418-195.56531006620733j),(-3.6288054702093326+658.8816596679902j),(-3.6288054702093326+658.8816596679902j),(-156.54966336206897-384.0490500397242j),(-253.0347365931334+591.6290847735535j),(41.95895782181968+190.17270257986638j),(-3.6288054702093326+658.8816596679902j),(1808122.120194918+302397.14368177607j),(-85.82348821104857-107.0295137528248j),(-3.6288054702093326+658.8816596679902j),(-3.6288054702093326+658.8816596679902j),(-134.6082431915547-1917.6197318465106j),(141847.69357016092-3094938.823653918j),(-3.6288054702093326+658.8816596679902j),0.028252
4,(8.233685746111046e-20+5.616414295022977e-17j),(12.595783017463978+125.6186382136173j),(-44.64640898774679-115.40656007139322j),(0.7193955000042912+490.71865106436724j),(0.7193955000042912+490.71865106436724j),(1.8294144038889069-45.56634551118444j),(96.26529949109498+621.853162933107j),(12.511796562553647+126.15153432364339j),(0.7193955000042912+490.71865106436724j),(1788333.658700962+415394.08494943933j),(-38.01295228931842-213.28022729448332j),(0.7193955000042912+490.71865106436724j),(0.7193955000042912+490.71865106436724j),(127.01323084141926-1031.5056030254975j),(-1690651.0385020382+2555305.659634536j),(0.7193955000042912+490.71865106436724j),0.037669
5,(1.4367105047292103e-18+4.2099975319245606e-17j),(28.44222167617864+62.627326610481205j),(39.91560921283714-89.71506914305915j),(12.552860310808324+367.8368797831009j),(12.552860310808324+367.8368797831009j),(48.543273096941725-191.70798255610723j),(14.870524980991641+305.76785671444475j),(1.2688340332861001+114.0643770165331j),(12.552860310808324+367.8368797831009j),(1756768.096606258+496354.38262045477j),(-94.5806789152497-95.02038775076517j),(12.552860310808324+367.8368797831009j),(12.552860310808324+367.8368797831009j),(-91.36710580721322-1105.463699743935j),(2726234.771603818-1307739.6063596844j),(12.552860310808324+367.8368797831009j),0.047087
6,(7.969175120006826e-19+3.675721362401429e-17j),(-8.60207337319665+54.182037686351265j),(-18.79054922917738-102.2301970900374j),(6.962846149227524+321.15597851194235j),(6.962846149227524+321.15597851194235j),(-67.82338479235618-101.03450787586j),(-53.722913255752+390.8574540770182j),(7.622768099898293+83.25306099018601j),(6.962846149227524+321.15597851194235j),(1707947.9295345317+598413.4961587727j),(16.010256082617946-127.64857997305005j),(6.962846149227524+321.15597851194235j),(6.962846149227524+321.15597851194235j),(102.43812415015736-378.49890343722217j),(-2960273.9467441062-271814.2438516304j),(6.962846149227524+321.15597851194235j),0.056504
7,(1.2237319547914143e-18+3.260249966466178e-17j),(2.3856798174019698+67.45771162543086j),(1.2173216791146502-44.52435364535713j),(10.692019188778119+284.85531544473264j),(10.692019188778119+284.85531544473264j),(61.79884599719093-67.96490501134016j),(80.84083601543296+296.4895704636814j),(-11.93091298613472+70.65167748435488j),(10.692019188778119+284.85531544473264j),(1671065.4499909084+684549.9643387743j),(-57.76542706707584-143.70853233310137j),(10.692019188778119+284.85531544473264j),(10.692019188778119+284.85531544473264j),(184.7814316004799-709.4544530310499j),(2354914.2168250317+1719014.4795539582j),(10.692019188778119+284.85531544473264j),0.065921
8,(5.102551407494443e-20+2.6529109996902485e-17j),(0.731476989536937+39.85802737045822j),(10.093122751934832-76.25574894040979j),(0.4458213045615487+231.79075452398973j),(0.4458213045615487+231.79075452398973j),(-19.346909885835455-134.51737832416796j),(-30.54679918213565+212.35618526848646j),(7.907302237049439+60.21162609064382j),(0.4458213045615487+231.79075452398973j),(1610159.124531062+766462.1369651812j),(-43.34069418788257-40.99695810091885j),(0.4458213045615487+231.79075452398973j),(0.4458213045615487+231.79075452398973j),(34.35661878140148-392.15975510629886j),(-1112025.1759411083-2622316.68118887j),(0.4458213045615487+231.79075452398973j),0.075338
9,(-3.994917874786348e-19+2.4718163361307418e-17j),(0.7605340779580121+59.07196542461065j),(-7.13881528124902-44.700086857771815j),(-3.490448901313971+215.96810962110698j),(-3.490448901313971+215.96810962110698j),(-30.66647958599419-31.437267925035282j),(-6.168955184167948+274.7792884101227j),(4.536573529607689+50.272040134201895j),(-3.490448901313971+215.96810962110698j),(1554346.1832790063+851879.4367419824j),(5.8730741433008244-80.34907393043645j),(-3.490448901313971+215.96810962110698j),(-3.490448901313971+215.96810962110698j),(222.5831477212919-437.3644178463561j),(-380539.3665545692+2748819.9637953155j),(-3.490448901313971+215.96810962110698j),0.084756


### Aplicando o filtro

In [None]:
from scipy.signal import butter, lfilter

def butter_bandpass(lowcut, highcut, fs, order=5):
    nyq = 0.5 * (fs)
    low = lowcut / nyq
    high = highcut / nyq
    b, a = butter(order, [low, high], btype='band')
    return b, a

def butter_bandpass_filter(data, lowcut, highcut, fs, order=5):
    b, a = butter_bandpass(lowcut, highcut, fs, order=order)
    y = lfilter(b, a, data)
    return y


In [None]:
knn_filtered = pd.DataFrame()
knn_filtered["Time"] = knn_freq['Time:512Hz']

for i in range(17):
    string = "Channel {}".format(i)
    if(i == 0):
        continue
    filtered = butter_bandpass_filter(knn_freq[string], 4.0, 7.0, 512)
    knn_filtered[string] = filtered

In [None]:
# Teoricamente esse dataframe tem os dados filtrados entre 4 e 7 Hz
knn_filtered