###  What is GridSearchCV?
GridSearchCV is a library function that is a member of sklearn's model_selection package. It helps to loop through predefined hyperparameters and fit your estimator (model) on your training set. So, in the end, you can select the best parameters from the listed hyperparameters.

###  What Is Grid Search?
Grid search is a tuning technique that attempts to compute the optimum values of hyperparameters. It is an exhaustive search that is performed on a the specific parameter values of a model. The model is also known as an estimator. Grid search exercise can save us time, effort and resources.


In [7]:
import numpy as np
import pandas as pd
import seaborn as sns

from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import recall_score, precision_score, accuracy_score, r2_score
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier

In [3]:
datasets = pd.read_csv('../../../DataSets/titanic_processed.csv')

In [4]:
datasets.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 712 entries, 0 to 711
Data columns (total 10 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   Survived    712 non-null    int64  
 1   Pclass      712 non-null    int64  
 2   Sex         712 non-null    int64  
 3   Age         712 non-null    float64
 4   SibSp       712 non-null    int64  
 5   Parch       712 non-null    int64  
 6   Fare        712 non-null    float64
 7   Embarked_C  712 non-null    int64  
 8   Embarked_Q  712 non-null    int64  
 9   Embarked_S  712 non-null    int64  
dtypes: float64(2), int64(8)
memory usage: 55.8 KB


In [8]:
x = datasets.drop('Survived', axis = 1)
y = datasets['Survived']
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.2)

In [29]:
def summarize_classification(y_test, y_pred):
    
    accuracy = accuracy_score(y_test, y_pred, normalize=True)
    accuracy_number = accuracy_score(y_test, y_pred, normalize=False)
    r_score = r2_score(y_test, y_pred)
    recall_scores = recall_score(y_test, y_pred)
    precision = precision_score(y_test, y_pred)
    
    print('Test Data:\t', len(y_test))
    print('Accuracy:\t', accuracy)
    print('r2_score:\t', r_score)
    print('recall score:\t', recall_scores)
    print('Precision Score:\t',precision)
    print('Accuracy Number:\t', accuracy_number)

In [23]:
parameters = {'max_depth':[2, 4, 5, 7, 9, 10]}
grid_search = GridSearchCV(DecisionTreeClassifier(),parameters, cv = 3, return_train_score=True)
grid_search.fit(x_train, y_train)

GridSearchCV(cv=3, estimator=DecisionTreeClassifier(),
             param_grid={'max_depth': [2, 4, 5, 7, 9, 10]},
             return_train_score=True)

In [24]:
grid_search.best_params_

{'max_depth': 4}

In [25]:
for i in range(6):
    print('Parameter:\t', grid_search.cv_results_['params'][i])
    print()
    print('Mean Test score is :\t',grid_search.cv_results_['mean_test_score'][i])
    print()
    print('Rank:\t', grid_search.cv_results_['rank_test_score'][i])

Parameter:	 {'max_depth': 2}

Mean Test score is :	 0.7907917943005662

Rank:	 3
Parameter:	 {'max_depth': 4}

Mean Test score is :	 0.7942727188341223

Rank:	 1
Parameter:	 {'max_depth': 5}

Mean Test score is :	 0.785500789009561

Rank:	 4
Parameter:	 {'max_depth': 7}

Mean Test score is :	 0.7908010767659891

Rank:	 2
Parameter:	 {'max_depth': 9}

Mean Test score is :	 0.7820012995451592

Rank:	 6
Parameter:	 {'max_depth': 10}

Mean Test score is :	 0.7837556855100715

Rank:	 5


In [26]:
decision_tree_model = DecisionTreeClassifier(max_depth=grid_search.best_params_['max_depth']).fit(x_train, y_train)

In [27]:
y_pred = decision_tree_model.predict(x_test)

In [30]:
summarize_classification(y_test, y_pred)

Test Data:	 143
Accuracy:	 0.7692307692307693
r2_score:	 0.018102372034956238
recall score:	 0.6666666666666666
Precision Score:	 0.7058823529411765
Accuracy Number:	 110


In [32]:
parameters = {'penalty':['l1','l2'],
             'C':[0.1, 0.4, 0.8, 1, 2, 5]}
grid_search = GridSearchCV(LogisticRegression(solver='liblinear'), parameters, cv = 3, return_train_score=True)
grid_search.fit(x_train, y_train)
grid_search.best_params_

{'C': 2, 'penalty': 'l1'}

In [33]:
for i in range(6):
    print('Parameters:\t', grid_search.cv_results_['params'][i])
    print('Mean test score:\t', grid_search.cv_results_['mean_test_score'][i])
    print('Rank:\t',grid_search.cv_results_['rank_test_score'][i])

Parameters:	 {'C': 0.1, 'penalty': 'l1'}
Mean test score:	 0.7767010117887311
Rank:	 12
Parameters:	 {'C': 0.1, 'penalty': 'l2'}
Mean test score:	 0.7890374083356538
Rank:	 10
Parameters:	 {'C': 0.4, 'penalty': 'l1'}
Mean test score:	 0.7960456697298802
Rank:	 7
Parameters:	 {'C': 0.4, 'penalty': 'l2'}
Mean test score:	 0.7837649679754942
Rank:	 11
Parameters:	 {'C': 0.8, 'penalty': 'l1'}
Mean test score:	 0.7978186206256382
Rank:	 6
Parameters:	 {'C': 0.8, 'penalty': 'l2'}
Mean test score:	 0.7925461802654784
Rank:	 8


In [34]:
logistic_model = LogisticRegression(solver='liblinear',penalty = grid_search.best_params_['penalty'], C = grid_search.best_params_['C'])
final_model = logistic_model.fit(x_train, y_train)
y_pred = final_model.predict(x_test)

In [35]:
summarize_classification(y_test, y_pred)

Test Data:	 143
Accuracy:	 0.7972027972027972
r2_score:	 0.1371202663337494
recall score:	 0.7037037037037037
Precision Score:	 0.7450980392156863
Accuracy Number:	 114
