**Grid Search**

The majority of machine learning models contain parameters that can be adjusted to vary how the model learns. For instance, the logistic regression model, from sklearn, has a parameter C that controls regularization,which affects the complexity of the model.

In [1]:
from sklearn import datasets
iris = datasets.load_iris()

In [2]:
X = iris['data']
y = iris['target']

In [3]:
from sklearn.linear_model import LogisticRegression

In [4]:
logit = LogisticRegression(max_iter = 10000)

In [5]:
print(logit.fit(X,y))

LogisticRegression(max_iter=10000)


In [6]:
print(logit.score(X,y))

0.9733333333333334


By making the default setting of C = 1, we achieved a score of 0.973.

Now with the help of different values of C, we have to see if we can achieve any better by implementing a grid search with difference values of 0.973.


Knowing which values to set for the searched parameters will take a combination of domain knowledge and practice. Since the default value for C is 1, we will set a range of values surrounding it.

In [7]:
C = [0.25, 0.5, 0.75, 1, 1.25, 1.5, 1.75, 2]

First we will create an empty list to store the score within.

In [8]:
scores = []

In [9]:
for choice in C:
  logit.set_params(C=choice)
  logit.fit(X, y)
  scores.append(logit.score(X, y))

In [10]:
print(scores)

[0.9666666666666667, 0.9666666666666667, 0.9733333333333334, 0.9733333333333334, 0.98, 0.98, 0.9866666666666667, 0.9866666666666667]


We can see that the lower values of C performed poorer than the base parameter of 1. However, as we increased the value of C to 1.75 the model experienced increased accuracy.

It seems that increasing C beyond this amount does not help to increase the model accuracy.