<a href="https://colab.research.google.com/github/JhonnyLimachi/Sigmoidal/blob/main/36_Grid_Search.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<img alt="Colaboratory logo" width="15%" src="https://raw.githubusercontent.com/carlosfab/escola-data-science/master/img/novo_logo_bg_claro.png">

#### **Data Science na Prática 4.0**
*by [sigmoidal.ai](https://sigmoidal.ai)*

---

# Grid Search

O Grid Search é uma técnica de otimização de hyper-parâmetros que nos ajuda a encontrar os melhores parâmetros para o nosso modelo.

<center><img src="https://editor.analyticsvidhya.com/uploads/73200GSRS-CV.png" width="60%"></center>


Basicamente, ele testa diferentes combinações de parâmetros, retendo os melhores resultados para cada um deles, retornando a melhor combinação possível de parâmetros para aquele conjunto de dados específico.

Essa é uma forma muito direta e eficiente de melhorar os resultados dos nossos modelos.


In [1]:
# importar os pacotes necessários
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report
import numpy as np

# garantir replicabilidade
np.random.seed(42)

# importar o arquivo
df = pd.read_csv("http://dl.dropboxusercontent.com/s/6d91j46mkcdj4qv/heart-disease-clean.csv?dl=1")

In [2]:
# 1. escolher e importar um modelo
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV

# 2. Instanciar e escolher os hyperparameters
model = LogisticRegression()

# 3. Separar os dados entre feature matrix e target vector
X = df.drop('num', axis=1)
y = df['num']

# 3.1 Dividir o dataset entre treino e teste
X_train, X_test, y_train, y_test = train_test_split(X, y)

# 3.2 Padronizar os dados de treino
scaler = StandardScaler().fit(X_train)
X_train_transformed = scaler.transform(X_train)

# 4. Grid Search
parameters = {
    'C': [0.001,0.01,0.1,1,10,100,1000],
    }

clf = GridSearchCV(model, parameters)
clf.fit(X_train_transformed, y_train)

In [3]:
# ver melhor parâmetro
print(clf.best_params_)

{'C': 1}


In [4]:
print("Melhor: {} usando {}".format(clf.best_score_, clf.best_params_))

Melhor: 0.8371014492753623 usando {'C': 1}


In [5]:
# ver todos parâmetros testados
means = clf.cv_results_['mean_test_score']
stds = clf.cv_results_['std_test_score']

for mean, std, params in zip(means, stds, clf.cv_results_['params']):
        print("%0.3f (+/-%0.03f) for %r"
              % (mean, std * 2, params))

0.692 (+/-0.061) for {'C': 0.001}
0.833 (+/-0.036) for {'C': 0.01}
0.828 (+/-0.034) for {'C': 0.1}
0.837 (+/-0.043) for {'C': 1}
0.824 (+/-0.026) for {'C': 10}
0.819 (+/-0.015) for {'C': 100}
0.819 (+/-0.015) for {'C': 1000}


In [6]:
# resultados armazenados
clf.cv_results_

{'mean_fit_time': array([0.00940008, 0.0082891 , 0.00431547, 0.01123371, 0.00526276,
        0.01070724, 0.01038671]),
 'std_fit_time': array([0.00512108, 0.00779915, 0.00023394, 0.00459629, 0.00036063,
        0.00682431, 0.00456618]),
 'mean_score_time': array([0.00132918, 0.00363302, 0.00139408, 0.00137963, 0.00261626,
        0.00285764, 0.00129619]),
 'std_score_time': array([0.00018573, 0.00288303, 0.00025962, 0.00016676, 0.00271251,
        0.00303897, 0.00012748]),
 'param_C': masked_array(data=[0.001, 0.01, 0.1, 1, 10, 100, 1000],
              mask=[False, False, False, False, False, False, False],
        fill_value='?',
             dtype=object),
 'params': [{'C': 0.001},
  {'C': 0.01},
  {'C': 0.1},
  {'C': 1},
  {'C': 10},
  {'C': 100},
  {'C': 1000}],
 'split0_test_score': array([0.7173913 , 0.84782609, 0.84782609, 0.80434783, 0.80434783,
        0.80434783, 0.80434783]),
 'split1_test_score': array([0.65217391, 0.82608696, 0.82608696, 0.84782609, 0.82608696,
        0.

## **Grid Search**

Nesta aula você irá conhecer a técnica Grid Search, e como ela pode ser utilizada para você identificar os melhores parâmetros do seu modelo de Machine Learning

## **Referências**

[Logistic Regression Model Tuning with scikit-learn](https://towardsdatascience.com/logistic-regression-model-tuning-with-scikit-learn-part-1-425142e01af5)

[Tune Hyperparameters for Classification Machine Learning Algorithms](https://machinelearningmastery.com/hyperparameters-for-classification-machine-learning-algorithms/)


[Tuning the hyper-parameters of an estimator](https://scikit-learn.org/stable/modules/grid_search.html)

[Parameter estimation using grid search with cross-validation](https://scikit-learn.org/stable/auto_examples/model_selection/plot_grid_search_digits.html#sphx-glr-auto-examples-model-selection-plot-grid-search-digits-py)

[Model selection: choosing estimators and their parameters](https://colab.research.google.com/drive/1MULiEX43y3MpI4mNpo68QULuJMSrAfW7#scrollTo=cdXMG8EEvnmb)