## Tutorial de Feature Selection e Tuning de hiperparametros para modelo Regressor

##### Os seguintes passos são realizado nesse notebook:
#####    1 - import da mlutils
#####    2 - import dos modulos da biblioteca e auxiliares
#####    3 - import da base de dados
#####    4 - testes usando os métodos de seleção de variáveis
#####    5 - criação do dicionarios com os parametros a serem testados no tuning
#####    6 - chamada do tuning de hiperparametros

In [None]:
#1

!pip install mlutils

In [1]:
#2

from sklearn import datasets
from mlutils.feature_engineering import *
from mlutils.tuning_hyperparams import *
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import r2_score
import pandas as pd

In [2]:
boston_data = datasets.load_boston()
df_boston = pd.DataFrame(boston_data.data, columns=boston_data.feature_names)
df_boston["target"] = boston_data.target
df_boston.head()

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT,target
0,0.00632,18.0,2.31,0.0,0.538,6.575,65.2,4.09,1.0,296.0,15.3,396.9,4.98,24.0
1,0.02731,0.0,7.07,0.0,0.469,6.421,78.9,4.9671,2.0,242.0,17.8,396.9,9.14,21.6
2,0.02729,0.0,7.07,0.0,0.469,7.185,61.1,4.9671,2.0,242.0,17.8,392.83,4.03,34.7
3,0.03237,0.0,2.18,0.0,0.458,6.998,45.8,6.0622,3.0,222.0,18.7,394.63,2.94,33.4
4,0.06905,0.0,2.18,0.0,0.458,7.147,54.2,6.0622,3.0,222.0,18.7,396.9,5.33,36.2


## Feature Selection Stepwise OLS Linear Regression

In [3]:
#4

feature_selection_stepwise(df=df_boston, target='target', threshold_in=0.01, threshold_out=0.05, verbose=False)

['LSTAT', 'RM', 'PTRATIO', 'DIS', 'NOX', 'CHAS', 'B', 'ZN']

## Feature Selection f_regression

In [4]:
#4

feature_selection_f_regression(df=df_boston, target='target', num_feats=8)

['CRIM', 'INDUS', 'NOX', 'RM', 'RAD', 'TAX', 'PTRATIO', 'LSTAT']

## Feature Selection mutual_information

In [5]:
#4

feature_selection_mutual_information(df=df_boston, target='target', num_feats=8)

['CRIM', 'INDUS', 'NOX', 'RM', 'AGE', 'TAX', 'PTRATIO', 'LSTAT']

# Tuning de Hiperparametros

In [6]:
# Selecionando somente as colunas da seleção de variáveis e passando para o tuning, 
# junto com o dicionario dos hiperparamentros a serem testados pelo algoritmo escolhido, 
# baseando-se na metrica também escolhida.

In [7]:
df = df_boston[['CRIM', 'INDUS', 'NOX', 'RM', 'AGE', 'TAX', 'PTRATIO', 'LSTAT','target']]

In [8]:
#5

param_RF = [
        {"name": "min_samples_leaf", "type": "Integer", "low": 50, "high": 75},
        {"name": "max_depth", "type": "Integer", "low": 12, "high": 24},
    ]

In [9]:
#6

tuning_hyperparams(
        df=df,
        target='target',
        parameters=param_RF,
        algorithm=RandomForestRegressor,
        metric=r2_score,
        scoring_option="maximize",
        n_trials=20,
    )

[32m[I 2021-11-19 14:58:34,451][0m A new study created in memory with name: no-name-d78db035-4ca9-4f32-b135-e2a954fe2005[0m
[32m[I 2021-11-19 14:58:35,380][0m Trial 0 finished with value: 0.6302412844189328 and parameters: {'min_samples_leaf': 59, 'max_depth': 24}. Best is trial 0 with value: 0.6302412844189328.[0m
[32m[I 2021-11-19 14:58:36,312][0m Trial 1 finished with value: 0.5454987702199812 and parameters: {'min_samples_leaf': 69, 'max_depth': 19}. Best is trial 0 with value: 0.6302412844189328.[0m
[32m[I 2021-11-19 14:58:37,365][0m Trial 2 finished with value: 0.660314556326739 and parameters: {'min_samples_leaf': 54, 'max_depth': 14}. Best is trial 2 with value: 0.660314556326739.[0m
[32m[I 2021-11-19 14:58:38,416][0m Trial 3 finished with value: 0.6686598053853692 and parameters: {'min_samples_leaf': 51, 'max_depth': 23}. Best is trial 3 with value: 0.6686598053853692.[0m
[32m[I 2021-11-19 14:58:39,385][0m Trial 4 finished with value: 0.567570825410154 and par

{'min_samples_leaf': 50, 'max_depth': 24}