# A06 Hyperparameter Tuning using GridSearchCV

In [33]:
#%matplotlib notebook
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib
import numpy as np
from mpl_toolkits.mplot3d import Axes3D

# Look pretty...

# matplotlib.style.use('ggplot')
plt.style.use('ggplot')

## Read Data




In [34]:
df=pd.read_csv("studentexamdata.txt", header=None, names=['score1','score2','admission'])
df

Unnamed: 0,score1,score2,admission
0,34.623660,78.024693,0
1,30.286711,43.894998,0
2,35.847409,72.902198,0
3,60.182599,86.308552,1
4,79.032736,75.344376,1
...,...,...,...
95,83.489163,48.380286,1
96,42.261701,87.103851,1
97,99.315009,68.775409,1
98,55.340018,64.931938,1


## Create Training and Testing datasets


In [35]:
X = df[['score1','score2']]
y = df[['admission']]


from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X,y, test_size= 0.25, random_state=20)


#Find the average accuracy of Logistic regression model (with C=0.00001) using K-Fold Cross Validation



In [36]:
from sklearn.linear_model import LogisticRegression

#Initiate logistic regression model with C = 0.00001
# C is the inverse of regularization strength, smaller C -> more regularization
logistic_model = LogisticRegression(C=0.00001)

#Initiate K-Fold with 4 Folds
from sklearn.model_selection import cross_val_score, KFold
kf = KFold(4, random_state=10, shuffle=True)

#Obtain K-Fold CV results 
cv_results = cross_val_score(logistic_model, X_train, y_train.values.ravel(), cv=kf, scoring='accuracy')
print("Cross validation Accuracy scores for each fold = ", cv_results)
print("Average CV Accuracy of Logistic Regression model = ", np.mean(cv_results))

Cross validation Accuracy scores for each fold =  [0.73684211 0.63157895 0.47368421 0.66666667]
Average CV Accuracy of Logistic Regression model =  0.6271929824561403


# PART A: Implement GridSearchCV to find the best value of hyperparameter C for Logistic Regression

1. Import GridSearchCV. Documentation : https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html#sklearn.model_selection.GridSearchCV 
2. Set up a parameter grid for "C", using np.linspace() to create 20 evenly spaced values ranging from 0.00001 to 10.
3. Call GridSearchCV(), passing logistic_model, the parameter grid, and setting cv equal to kf.
4. Fit the grid search object to the training data to perform a cross-validated grid search.





In [37]:
# Import GridSearchCV
from sklearn.model_selection import GridSearchCV

# Set up the parameter grid, C is inverse of regularization strength, small C -> more regularization
param_grid = {"C": np.linspace(0.00001, 10, 20)}

# Instantiate logistic_cv
logistic_cv = GridSearchCV(logistic_model, param_grid, cv=kf)

# Fit to the training data
logistic_cv.fit(X_train,y_train.values.ravel())
print("Tuned logistic regression paramaters: {}".format(logistic_cv.best_params_))
print("Tuned logistic regression score: {}".format(logistic_cv.best_score_))

Tuned logistic regression paramaters: {'C': 0.5263252631578947}
Tuned logistic regression score: 0.8135964912280702


After implementing GridSearchCV, the best accuracy score obtained is **0.8523391812865496** when value for C is set to **0.5263252631578947**.

# PART B: OPTIONAL Implement GridSearchCV to find the best combination of hyperparameters 'solver' and 'C' for Logistic Regression

1. Set up a parameter grid for "solver"=["newton-cg", "lbfgs", "liblinear"] and "C"= 20 evenly spaced values between 0.00001 and 10
2. Call GridSearchCV(), passing logistic_model, the parameter grid, and setting cv equal to kf.
3. Fit the grid search object to the training data to perform a cross-validated grid search.





In [32]:
# Import GridSearchCV
from sklearn.model_selection import GridSearchCV

# Set up the parameter grid for two hyperparameters C and solver
params_grid = {
    "C":np.linspace(0.00001, 10, 20),
    "solver":["newton-cg", "lbfgs", "liblinear"]
}
# Instantiate logistic_cv
logistic_cv = GridSearchCV(estimator = logistic_model,
                           param_grid=params_grid, cv=kf)

# Fit to the training data
logistic_cv.fit(X_train,y_train.values.ravel())
print("Tuned logistic regression paramaters: {}".format(logistic_cv.best_params_))
print("Tuned logistic regression score: {}".format(logistic_cv.best_score_))

Tuned logistic regression paramaters: {'C': 8.94736947368421, 'solver': 'liblinear'}
Tuned logistic regression score: 0.8676900584795322


After implementing GridSearchCV for the LogisticRegression model by varying parameters for regularization parameter (C) and different solvers, the best accuracy score obtained is **0.8676900584795322** when **C = 8.94736947368421** and solver is set to **liblinear**.

Name: - Anirudhha Sankhe

Registration No.: - M1910073

Class: - B.tech Mechanical

