### Grid search is the process of performing hyper parameter tuning in order to determine the optimal values for a given model.

### This is significant as the performance of the entire model is based on the hyper parameter values specified.

### It is common that a small subset of those parameters can have a large impact on the predictive or computation performance of the model while others can be left to their default values. It is recommended to read the docstring of the estimator class to get a finer understanding of their expected behavior, possibly by reading the enclosed reference to the literature.

### In the first cell, basic operations on dataframe are performed and the dataset is split into independant and target variables

In [1]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn import preprocessing
from sklearn.linear_model import LogisticRegression

df=pd.read_csv('datasets/ChurnData.csv')

df = df[['tenure', 'age', 'address', 'income', 'ed', 'employ', 'equip',   'callcard', 'wireless','churn']]

df['churn']=df['churn'].astype('int')

df.head()

y=df['churn'].values
x=df[df.columns[:-1]].values

### Performing StandardScaler transformation on data

In [2]:
scaler=preprocessing.StandardScaler().fit(x)

In [3]:
x_new=scaler.transform(x)

### Splitting into training and testing sets

In [4]:
x_train,x_test,y_train,y_test=train_test_split(x_new,y,random_state=1,test_size=0.2)

### Using gridCV

In [15]:
from sklearn.model_selection import GridSearchCV
gsc = GridSearchCV(
        estimator=LogisticRegression(class_weight='balanced'),
        param_grid={
            'C':[0.1,1.0,10.0,100.0,1000.0],
            'solver':['newton-cg', 'lbfgs', 'liblinear', 'sag', 'saga'],
            'max_iter':[100,1000,10000]
        },
        cv=5, scoring='neg_mean_squared_error', verbose=0, n_jobs=-1)

### Using the best parameters found using gridCV to create the best model

In [16]:
grid_result = gsc.fit(x_train, y_train)
best_params = grid_result.best_params_
best_classifier = LogisticRegression( C=best_params["C"], solver=best_params["solver"], max_iter=best_params["max_iter"])

In [36]:
best_params

{'C': 0.1, 'max_iter': 100, 'solver': 'liblinear'}

In [18]:
best_classifier.fit(x_train,y_train)

LogisticRegression(C=0.1, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=100,
                   multi_class='auto', n_jobs=None, penalty='l2',
                   random_state=None, solver='liblinear', tol=0.0001, verbose=0,
                   warm_start=False)

In [22]:
predicted=best_classifier.predict(x_test)

### Evaluating the model

In [23]:
from sklearn import metrics

In [24]:
report = metrics.classification_report(y_test, predicted)

In [26]:
print(report)

              precision    recall  f1-score   support

           0       0.90      0.90      0.90        31
           1       0.67      0.67      0.67         9

    accuracy                           0.85        40
   macro avg       0.78      0.78      0.78        40
weighted avg       0.85      0.85      0.85        40



In [27]:
accuracy=metrics.accuracy_score(y_test,predicted)

In [28]:
accuracy

0.85

In [37]:
# use estimator.get_params() to get the current parameters of the machine learning model object