# Hiperparameter tuning

Los modelos tienen dos tipos de parametros:

* Parámetros entrenables o parámetros de modelo: aquellos que el algoritmo aprende a partir de los datos, por ej en una regresion coeficiontes o pesos, un intercepto o en un arbol las feature importances...

* Hiperparámetros: se configuran antes del entrenamiento, afecta a la forma en ola que el algoritmo aprende. Por ej. el max_depth de un arbol de decision o el max_iter de LogisticRegression o n_neighbors de KNN. Controlan la complejidad del modelo y como aprende.

En Scikit Learn tenemos clases para probar combinaciones de hyperparameters de forma automática:

* GridSearchCV
* RandomizedSearchCV

In [27]:
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from scipy.stats import randint
import numpy as np
import pandas as pd

In [5]:
np.arange(1, 21)

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
       18, 19, 20])

In [None]:
X, y = load_iris(return_X_y=True)

model = DecisionTreeClassifier(random_state=42)

params = {
    'max_depth': np.arange(1, 10),
    'min_samples_split': np.arange(2, 10),
    'criterion': ['gini', 'entropy'] 
}

grid = GridSearchCV(model, params, scoring='accuracy', verbose=1)   # con el scoring se pueden pasar mas métricas
                                                                    # cv = 5 por defecto
grid.fit(X, y)

Fitting 5 folds for each of 144 candidates, totalling 720 fits


In [16]:
print('best params: ',grid.best_params_)
print('best params: ',grid.best_score_)

best params:  {'criterion': 'gini', 'max_depth': np.int64(3), 'min_samples_split': np.int64(2)}
best params:  0.9733333333333334


In [17]:
# Opcion 1: usar el grid directamente, ya se puede usar igual que un modelo, con la funcion predict:
grid.predict([[5.1, 3.5, 1.4, 0.2]])

array([0])

In [20]:
# Opcion 2: Extraer el modelo del grid
best_model = grid.best_estimator_
print(type(best_model))
print(best_model.predict([[5.1, 3.5, 1.4, 0.2]]))

<class 'sklearn.tree._classes.DecisionTreeClassifier'>
[0]


In [21]:
#Opcion 3: Crear otro algoritmo utilizando los best params para hacer otro experimento o entrenar ya con todo el dataset
# habria que entrenarlo
model_2 = DecisionTreeClassifier(**grid.best_params_) # con los ** para que coja los datos
model_2.fit(X, y) #entrena
print(model_2.predict([[5.1, 3.5, 1.4, 0.2]])) #prueba

[0]


In [22]:
# si queremos sacar los parametros de un modelo ya creado
print(model_2.get_params())

{'ccp_alpha': 0.0, 'class_weight': None, 'criterion': 'gini', 'max_depth': np.int64(3), 'max_features': None, 'max_leaf_nodes': None, 'min_impurity_decrease': 0.0, 'min_samples_leaf': 1, 'min_samples_split': np.int64(2), 'min_weight_fraction_leaf': 0.0, 'monotonic_cst': None, 'random_state': None, 'splitter': 'best'}


## Multiples metricas

Al igual que en validacion cruzada con cross_validate podemos usar multiples metricas con el GridSearchCv

In [23]:
X, y = load_iris(return_X_y=True)

model = DecisionTreeClassifier(random_state=42)

params = {
    'max_depth': np.arange(1, 10),
    'min_samples_split': np.arange(2, 10),
    'criterion': ['gini', 'entropy'] 
}

grid = GridSearchCV(model, params, scoring=['accuracy', 'f1_macro', 'roc_auc_ovr'], refit='f1_macro', verbose=1)  # agregar refit
                                                                    # CV= 5 por defecto
grid.fit(X, y)

Fitting 5 folds for each of 144 candidates, totalling 720 fits


In [24]:
grid.best_score_

np.float64(0.973165236323131)

In [None]:
# ver los resultados del entrenamiento del GridSearchCV con todas las combinaciones:
pd.DataFrame(grid.cv_results_).head()

Unnamed: 0,mean_fit_time,std_fit_time,mean_score_time,std_score_time,param_criterion,param_max_depth,param_min_samples_split,params,split0_test_accuracy,split1_test_accuracy,split2_test_accuracy,split3_test_accuracy,split4_test_accuracy,mean_test_accuracy,std_test_accuracy,rank_test_accuracy,split0_test_f1_macro,split1_test_f1_macro,split2_test_f1_macro,split3_test_f1_macro,split4_test_f1_macro,mean_test_f1_macro,std_test_f1_macro,rank_test_f1_macro,split0_test_roc_auc_ovr,split1_test_roc_auc_ovr,split2_test_roc_auc_ovr,split3_test_roc_auc_ovr,split4_test_roc_auc_ovr,mean_test_roc_auc_ovr,std_test_roc_auc_ovr,rank_test_roc_auc_ovr
0,0.001718,0.000414,0.008897,0.002311,gini,1,2,"{'criterion': 'gini', 'max_depth': 1, 'min_sam...",0.666667,0.666667,0.666667,0.666667,0.666667,0.666667,0.0,129,0.555556,0.555556,0.555556,0.555556,0.555556,0.555556,0.0,129,0.833333,0.833333,0.833333,0.833333,0.833333,0.833333,0.0,129
1,0.001776,0.000285,0.010607,0.004281,gini,1,3,"{'criterion': 'gini', 'max_depth': 1, 'min_sam...",0.666667,0.666667,0.666667,0.666667,0.666667,0.666667,0.0,129,0.555556,0.555556,0.555556,0.555556,0.555556,0.555556,0.0,129,0.833333,0.833333,0.833333,0.833333,0.833333,0.833333,0.0,129
2,0.001102,0.000154,0.004993,0.000539,gini,1,4,"{'criterion': 'gini', 'max_depth': 1, 'min_sam...",0.666667,0.666667,0.666667,0.666667,0.666667,0.666667,0.0,129,0.555556,0.555556,0.555556,0.555556,0.555556,0.555556,0.0,129,0.833333,0.833333,0.833333,0.833333,0.833333,0.833333,0.0,129
3,0.001167,7.7e-05,0.005027,0.000306,gini,1,5,"{'criterion': 'gini', 'max_depth': 1, 'min_sam...",0.666667,0.666667,0.666667,0.666667,0.666667,0.666667,0.0,129,0.555556,0.555556,0.555556,0.555556,0.555556,0.555556,0.0,129,0.833333,0.833333,0.833333,0.833333,0.833333,0.833333,0.0,129
4,0.001142,0.000187,0.005508,0.000735,gini,1,6,"{'criterion': 'gini', 'max_depth': 1, 'min_sam...",0.666667,0.666667,0.666667,0.666667,0.666667,0.666667,0.0,129,0.555556,0.555556,0.555556,0.555556,0.555556,0.555556,0.0,129,0.833333,0.833333,0.833333,0.833333,0.833333,0.833333,0.0,129


## RandomizedSearchCV

Alternativa a GridSearchCV para reducir el tiempo de computo

Se seleccionan combinaciones de forma aleatoria según distribuciones predefinidas.

In [43]:
model = DecisionTreeClassifier(random_state=42)

params = {
    'max_depth': np.arange(1, 20),
    'min_samples_split': np.arange(2, 20),
    'criterion': ['gini', 'entropy'] 
}

grid= RandomizedSearchCV(model, params, n_iter=10, cv= 10, scoring='accuracy', random_state=42, verbose=1)  # agregar refit
                                                                    # CV= 5 por defecto
grid.fit(X, y)

Fitting 10 folds for each of 10 candidates, totalling 100 fits


In [45]:
print(grid.best_score_)
print(grid.best_params_)
print(grid.best_estimator_)
print(grid.best_index_)

0.96
{'min_samples_split': np.int64(13), 'max_depth': np.int64(9), 'criterion': 'entropy'}
DecisionTreeClassifier(criterion='entropy', max_depth=np.int64(9),
                       min_samples_split=np.int64(13), random_state=42)
0
