## Exercício Classificação

Utilizando os dados do banco SMarket, prediga a direção de fechamento do mercado (Up ou Down) utilizando as variáveis disponíveis no banco.

Teste vários algoritmos de classificação que você já aprendeu. Veja qual dá o melhor resultado.

In [20]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

# Métricas e pre-processing
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score

# Algoritmos
from sklearn.linear_model import LogisticRegression
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis

In [2]:
# Importando o banco de dados
bd = pd.read_csv("https://raw.githubusercontent.com/neylsoncrepalde/ML_classes/master/Data/Smarket.csv")
bd.head()

Unnamed: 0,Year,Lag1,Lag2,Lag3,Lag4,Lag5,Volume,Today,Direction
0,2001,0.381,-0.192,-2.624,-1.055,5.01,1.1913,0.959,Up
1,2001,0.959,0.381,-0.192,-2.624,-1.055,1.2965,1.032,Up
2,2001,1.032,0.959,0.381,-0.192,-2.624,1.4112,-0.623,Down
3,2001,-0.623,1.032,0.959,0.381,-0.192,1.276,0.614,Up
4,2001,0.614,-0.623,1.032,0.959,0.381,1.2057,0.213,Up


## Dando uma olhada no banco

In [4]:
bd.describe()

Unnamed: 0,Year,Lag1,Lag2,Lag3,Lag4,Lag5,Volume,Today
count,1250.0,1250.0,1250.0,1250.0,1250.0,1250.0,1250.0,1250.0
mean,2003.016,0.003834,0.003919,0.001716,0.001636,0.00561,1.478305,0.003138
std,1.409018,1.136299,1.13628,1.138703,1.138774,1.14755,0.360357,1.136334
min,2001.0,-4.922,-4.922,-4.922,-4.922,-4.922,0.35607,-4.922
25%,2002.0,-0.6395,-0.6395,-0.64,-0.64,-0.64,1.2574,-0.6395
50%,2003.0,0.039,0.039,0.0385,0.0385,0.0385,1.42295,0.0385
75%,2004.0,0.59675,0.59675,0.59675,0.59675,0.597,1.641675,0.59675
max,2005.0,5.733,5.733,5.733,5.733,5.733,3.15247,5.733


In [5]:
bd.Direction.value_counts()

Up      648
Down    602
Name: Direction, dtype: int64

In [7]:
bd.isnull().sum()

Year         0
Lag1         0
Lag2         0
Lag3         0
Lag4         0
Lag5         0
Volume       0
Today        0
Direction    0
dtype: int64

In [10]:
bd1 = pd.get_dummies(bd, drop_first = True)
bd1.columns

Index(['Year', 'Lag1', 'Lag2', 'Lag3', 'Lag4', 'Lag5', 'Volume', 'Today',
       'Direction_Up'],
      dtype='object')

In [16]:
y = bd1.Direction_Up
X = bd1[['Lag1', 'Lag2', 'Lag3', 'Lag4', 'Lag5']]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.3, random_state=123)

## Montando os modelos

In [17]:
logreg = LogisticRegression()

In [18]:
logreg.fit(X_train, y_train)



LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='warn',
          n_jobs=None, penalty='l2', random_state=None, solver='warn',
          tol=0.0001, verbose=0, warm_start=False)

In [21]:
logreg.coef_

array([[-0.06179766, -0.11185042,  0.01879724,  0.00507131,  0.02565647]])

In [22]:
yhat_lr = logreg.predict(X_test)
pd.crosstab(y_test, yhat_lr)

col_0,0,1
Direction_Up,Unnamed: 1_level_1,Unnamed: 2_level_1
0,43,139
1,41,152


In [24]:
auc_lr = roc_auc_score(y_test, yhat_lr)
print('AUC for LogisticRegression: ', auc_lr)

AUC for LogisticRegression:  0.5119142515515573


## Modelo no LDA

In [30]:
logreg_lda = LinearDiscriminantAnalysis(solver="eigen")
logreg_lda.fit(X_train, y_train)

LinearDiscriminantAnalysis(n_components=None, priors=None, shrinkage=None,
              solver='eigen', store_covariance=False, tol=0.0001)

In [31]:
logreg_lda.coef_

array([[-0.07855567, -0.1422923 ,  0.02396899,  0.00656012,  0.03250705]])

In [32]:
yhat_lda = logreg_lda.predict(X_test)
pd.crosstab(y_test, yhat_lda)

col_0,0,1
Direction_Up,Unnamed: 1_level_1,Unnamed: 2_level_1
0,48,134
1,53,140


In [33]:
auc_lda = roc_auc_score(y_test, yhat_lda)
print('AUC for LDA: ', auc_lda)

AUC for LDA:  0.49456243238626657


## Modelo QDA

In [36]:
logreg_qda = QuadraticDiscriminantAnalysis()
logreg_qda.fit(X_train, y_train)

QuadraticDiscriminantAnalysis(priors=None, reg_param=0.0,
               store_covariance=False, store_covariances=None, tol=0.0001)

In [38]:
yhat_qda = logreg_qda.predict(X_test)
pd.crosstab(y_test, yhat_qda)

col_0,0,1
Direction_Up,Unnamed: 1_level_1,Unnamed: 2_level_1
0,35,147
1,39,154


In [47]:
auc_qda = roc_auc_score(y_test, yhat_qda)
print('AUC for QDA: ', auc_qda)

AUC for QDA:  0.4951175767237943
