# **DATOS CIENTÍFICOS**

**ITBA - Maestría en Ciencia de Datos - 2023**

**Trabajo Práctico - Alen Jiménez**

- El objetivo de esta notebook es clasificar datos provenientes de sensores de ondas cerebrales, que se estructuran en series de tiempo.
- Las ondas cerebrales que van a ser estudiadas se corresponden con dos comportamientos: "ojos abiertos" y "pestañeos".
- Se va a intentar construir un modelo de clasificación que pueda distinguir estos dos comportamientos a partir de los datos provistos por las ondas cerebrales.
- El análisis descriptivo de todas las series disponibles se encuentra en la notebook analisis_filtros.ipynb

# Tabla de Contenidos
* [Set Up](#setup)

# Set Up <a class = 'anchor' id = 'setup'></a>

In [15]:

# Importamos bibliotecas

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import requests
from io import StringIO
import os
import sys, select
import time
import datetime
import math
from scipy import stats
from scipy.fftpack import fft
from scipy.signal import firwin, remez, kaiser_atten, kaiser_beta
from scipy.signal import butter, filtfilt, buttord
from scipy.signal import butter, lfilter
from scipy.signal import find_peaks
from collections import Counter
#from xgboost import XGBClassifier # Clasificador de XGBoost
#from bayes_opt import BayesianOptimization # Optimización Bayesiana
from sklearn import svm
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score, train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
#from keras.models import Sequential
#from keras.layers import Dense


In [2]:
# Directorio de trabajo

directorio_de_trabajo = 'C:/itba_datos_geograficos/ramele/tp'
os.chdir(directorio_de_trabajo)
print(f'Directorio actual de trabajo: {os.getcwd()}')

Directorio actual de trabajo: C:\itba_datos_geograficos\ramele\tp


In [3]:
# Importamos los archivos

df = pd.read_csv('datos/dffinal.csv', sep=',', decimal='.')

In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 60668 entries, 0 to 60667
Data columns (total 11 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   target      60668 non-null  int64  
 1   ptp         60668 non-null  int64  
 2   rms         60668 non-null  float64
 3   cf          60668 non-null  float64
 4   entropy     60668 non-null  float64
 5   activity    60668 non-null  float64
 6   complexity  60668 non-null  float64
 7   morbidity   60668 non-null  float64
 8   fractal     60668 non-null  float64
 9   psd9        0 non-null      float64
 10  psd         60668 non-null  object 
dtypes: float64(8), int64(2), object(1)
memory usage: 5.1+ MB


In [7]:
df.head()

Unnamed: 0,target,ptp,rms,cf,entropy,activity,complexity,morbidity,fractal,psd9,psd
0,0,1055,145.506732,4.219736,7.352986,21172.208984,0.188514,5.499801,1.020167,,<function psd at 0x000001F30118E1E0>
1,0,1072,146.862007,4.180795,7.359324,21568.449219,0.186804,5.54879,1.020052,,<function psd at 0x000001F30118E1E0>
2,0,1106,148.430658,4.136612,7.367413,22031.660156,0.185107,5.592249,1.019937,,<function psd at 0x000001F30118E1E0>
3,0,1138,150.193528,4.088059,7.367413,22558.095703,0.183148,5.644323,1.019937,,<function psd at 0x000001F30118E1E0>
4,0,1159,152.089419,4.037099,7.372794,23131.191406,0.180834,5.718295,1.019937,,<function psd at 0x000001F30118E1E0>


In [9]:
df.drop(['psd9','psd'], axis=1, inplace=True)

In [12]:
X = df.drop(['target'], axis=1)
y = df.target

In [13]:
X_train, X_test, y_train, y_test  = train_test_split(X, y, test_size=.4, random_state = 1, stratify=y)

In [None]:
clf = svm.SVC(kernel='linear', C = 1.0)
clf.fit(X_train,y_train)
predlabels = clf.predict(X_test)

In [16]:
C = confusion_matrix(y_test, predlabels)
acc = (float(C[0,0])+float(C[1,1])) / ( X_test.shape[0])
print(C)
print(acc)
target_names = ['Abiertos', 'Pestañeo']
report = classification_report(y_test, predlabels, target_names=target_names)
print(report)

[[10170  1966]
 [ 2305  9827]]
0.8240069226965552
              precision    recall  f1-score   support

    Abiertos       0.82      0.84      0.83     12136
    Pestañeo       0.83      0.81      0.82     12132

    accuracy                           0.82     24268
   macro avg       0.82      0.82      0.82     24268
weighted avg       0.82      0.82      0.82     24268



In [17]:
# all parameters not specified are set to their defaults
logisticRegr = LogisticRegression()
logisticRegr.fit(X_train,y_train)

# Returns a NumPy Array
# Predict for One Observation (image)
predlabels = logisticRegr.predict(X_test)

C = confusion_matrix(y_test, predlabels)
acc = (float(C[0,0])+float(C[1,1])) / ( X_test.shape[0])
print(C)
print(acc)
target_names = ['Abiertos', 'Pestañeo']
report = classification_report(y_test, predlabels, target_names=target_names)
print(report)

[[9043 3093]
 [3345 8787]]
0.734712378440745
              precision    recall  f1-score   support

    Abiertos       0.73      0.75      0.74     12136
    Pestañeo       0.74      0.72      0.73     12132

    accuracy                           0.73     24268
   macro avg       0.73      0.73      0.73     24268
weighted avg       0.73      0.73      0.73     24268

