##### Hyperparameters are model parameters that are set before fitting data in the model. For instance n_neigbors in KNeighbors classifier, and alpha in Ridge Regressor. Hyperparameter tuning involves choosing the best set of hyperparameters for a model from a range of available hyperparameter values.


##### GridSearchCV can be used for hyperparameter tuning. Its arguments include a Regression/Classifier model, a dictionary containing the hyperparameters and a range of hyperparameter values as key-value pairs, and a KFold object.

###### Here, we are buildng a lasso regression model using optimal hyperparameters to predict the glucose levels in a dibetes dataframe

In [1]:
import pandas as pd 
import numpy as np 
import matplotlib.pyplot as plt

In [2]:
df = pd.read_csv('diabetes_clean.csv')

df

Unnamed: 0,pregnancies,glucose,diastolic,triceps,insulin,bmi,dpf,age,diabetes
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1
...,...,...,...,...,...,...,...,...,...
763,10,101,76,48,180,32.9,0.171,63,0
764,2,122,70,27,0,36.8,0.340,27,0
765,5,121,72,23,112,26.2,0.245,30,0
766,1,126,60,0,0,30.1,0.349,47,1


In [3]:


df

Unnamed: 0,pregnancies,glucose,diastolic,triceps,insulin,bmi,dpf,age,diabetes
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1
...,...,...,...,...,...,...,...,...,...
763,10,101,76,48,180,32.9,0.171,63,0
764,2,122,70,27,0,36.8,0.340,27,0
765,5,121,72,23,112,26.2,0.245,30,0
766,1,126,60,0,0,30.1,0.349,47,1


In [5]:
from sklearn.model_selection import train_test_split, GridSearchCV, RandomizedSearchCV, KFold

X= df.drop('glucose', axis=1).values
y = df.glucose

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size= 0.25, random_state=15)

In [6]:

from sklearn.linear_model import  Lasso


In [9]:
kf = KFold(n_splits=6, shuffle=True, random_state=13)
lasso = Lasso()
#Setting up the parameter grid(key-value pairs)
param_grid = {
    'alpha': np.linspace(0.00001, 1, 20)
}


#Instantiating lasso_cv
lasso_cv = GridSearchCV(lasso, param_grid=param_grid, cv=kf)


In [10]:
lasso_cv.fit(X_train, y_train)


#Evaluating the model's best parameters and scores
print("Tuned lasso best parameter(s): {}".format(lasso_cv.best_params_))

print('Tuned lasso best score: {}'.format(lasso_cv.best_score_))

Tuned lasso best parameter(s): {'alpha': 1e-05}
Tuned lasso best score: 0.31715323432017684


## Randomized Search CV

In [None]:
diabetes_df = pd.read_csv('diabetes_clean.csv')

In [12]:
from sklearn.model_selection import RandomizedSearchCV
from sklearn.linear_model import LogisticRegression

#Using Randomized Search CV to tune a Logistic Regression Model's hyperparameters to predict 'diabetes' of the same dataframe
X2 = df.drop('diabetes', axis=1).values
y2 = df.diabetes

X2_train, X2_test, y2_train, y2_test = train_test_split(X2, y2, test_size= 0.25, random_state=15, stratify = y2)

In [23]:
#Creating the parameter space
params = {
    'solver': ['liblinear', 'saga'],
    'penalty': ['l1','l2'],
    'tol': np.linspace(0.0001,1.0, 50),
    'C': np.linspace(0.1,1.0,50),
    'class_weight': ['balanced', {0:0.8,1:0.2}]
}

logreg=LogisticRegression(max_iter=2000)
logreg_cv = RandomizedSearchCV(logreg, params, cv=kf)


logreg_cv.fit(X2_train, y2_train)


#Printing the best parameters and score
print("Tuned Logistic Regression Parameters: {}".format(logreg_cv.best_params_))

print('Tuned Logistc Regression Best Score: {}'.format(logreg_cv.best_score_))

Tuned Logistic Regression Parameters: {'tol': 0.7347204081632653, 'solver': 'liblinear', 'penalty': 'l1', 'class_weight': 'balanced', 'C': 0.6510204081632653}
Tuned Logistc Regression Best Score: 0.7170138888888888
