## Seleção do Modelo (Regressão Linear Simples) 

$Atividade (2.5):$ Utilizando o conjunto de dados do covid-19 realize as seguintes atividades:
<ol>
    <li>Apresentar o rmse para os diferentes modelos de regressão (LinearRegressor, DecisionTreeRegressor, RandonForestRegressor) e utilizando o GridSearchCV para definir o melhor set de parâmetros para o RandonForest </li>
    <li>Fazer uma função para realizar a seleção das características, conforme a ordem de importância (min(modelo_reg.feature_importances_ )) do melhor regressor e ir removendo as características até impactar no rmse do conjunto de teste</li>
    <li>Investigar utilizar Support Vector Regressor (sklearn.svm.SVR) variando automaticamente os hiperparâmetros (kernel e C) e apresentar o RMSE</li>
</ol>

In [1]:
import pandas as pd
import numpy as np
import pickle
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestRegressor
from sklearn.svm import SVR
import itertools

### Carregando os pickles

In [2]:
def read_pickle(name):
    with (open(name, 'rb')) as openfile:
        while True:
            try:
                one_instance = pickle.load(openfile)
            except EOFError:
                break
    one_instance = np.asanyarray(one_instance)
    return one_instance

In [3]:
X_train = read_pickle('X_train.pickle')
X_test = read_pickle('X_test.pickle')
y_train = read_pickle('y_train.pickle')
y_test = read_pickle('y_test.pickle')

In [4]:
print(y_train.shape, X_train.shape, y_test.shape, X_test.shape)

(180,) (180, 7) (45,) (45, 7)


## Treinamendo do Modelo 

In [5]:
lin_reg = LinearRegression() # Instanciando o modelo
lin_reg.fit(X_train, y_train) # y = ax + b

## Avaliação do modelo

In [6]:
predictions = lin_reg.predict(X_test)

In [7]:
print(predictions.shape)
print(predictions)

(45,)
[-1.78261722e+00  2.45440159e+00  1.93809306e+01  4.33417000e+00
  7.03621254e+00  5.85261208e+01  1.56081032e+01  3.04782539e+00
  2.50361601e+01  1.50128695e+01  2.82037207e+00  5.48553192e+00
  1.09519975e+01  1.59979710e+01  2.37379742e+01  8.08585652e+00
  5.31186121e+00  3.20617344e+01  2.13947540e+01  2.98969520e+00
  5.51423101e+00  1.22494845e+01  7.19938843e+01  1.72698777e+01
  5.40412962e+00  2.51390688e+02  6.80034026e+00  5.09491194e+01
  6.52191797e-01  6.53968285e+00  4.19142448e+00  4.58953887e-01
  1.31122554e+02  1.27682860e+02 -4.07258394e-01  2.07442302e+00
  9.97089580e+00  1.33781535e+01 -3.64354237e-02  1.20338949e+00
  2.31708100e+00  1.24377508e+00  3.87195151e+00  4.26383484e+00
  1.68730486e+01]


In [8]:
lin_mse = mean_squared_error(y_test, predictions)
lin_rmse = np.sqrt(lin_mse)
print(lin_rmse)

12.219456416381817


## Selecionando o Regressor DecisionTree

In [9]:
from sklearn.tree import DecisionTreeRegressor

In [11]:
dt_reg = DecisionTreeRegressor() # Instanciando o modelo
dt_reg.fit(X_train,y_train) # Modelo treinado e ajustado

In [13]:
dt_reg.get_params() # hiperParametros

{'ccp_alpha': 0.0,
 'criterion': 'squared_error',
 'max_depth': None,
 'max_features': None,
 'max_leaf_nodes': None,
 'min_impurity_decrease': 0.0,
 'min_samples_leaf': 1,
 'min_samples_split': 2,
 'min_weight_fraction_leaf': 0.0,
 'random_state': None,
 'splitter': 'best'}

## Avaliando o Modelo

In [14]:
predictions = dt_reg.predict(X_test)

In [15]:
print(predictions.shape)
print(predictions)

(45,)
[  4.   3.  12.   8.  16.  38.  14.   5.  11.  14.   9.   3.  19.  14.
  15.   8.   9.  28.  13.   2.   7.  14.  66.  21.  15. 182.   9.  77.
   5.   9.   4.   3.  91. 168.   9.   3.  12.  24.   5.   9.  15.   3.
   4.   2.  15.]


In [16]:
dt_mse = mean_squared_error(y_test,predictions)
dt_rmse = np.sqrt(dt_mse)
dt_rmse

13.15885844424035

## Random Forest Regressor - Treinando o Modelo

In [17]:
rf_reg = RandomForestRegressor() # Instanciando o Modelo
rf_reg.fit(X_train,y_train)