### Finding the best parameters for a model using - GridSearchCV ###

In this example we shall use the SVM classifier and its associated parameters. However, we could do the same for RandomForest by trying different number of estimators,max depth, etc., or for any other algorithm.

In [1]:
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# GridSeachCV is used for finding optimal parameters for an algorithm
from sklearn.model_selection import GridSearchCV

In [2]:
# Generate a binary classification dataset.
X, y = make_classification(n_samples=1000, n_features=20, n_informative=2, n_redundant=10, random_state=42)

# Split the dataset into training set and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


In [3]:
# Glimpse of the data

print(X[:1],'\n',y[:1])

[[-0.49693203 -0.33912228  0.22914552 -0.18256897 -1.04110251  0.42154608
  -1.01517921  0.76766584 -1.63381878 -0.45398114 -0.12373268  0.12313375
   0.977832    0.37006112  0.2668184   0.15330558 -0.32011852 -1.17927302
   0.45780561  0.35600629]] 
 [0]


In [4]:
# Let us say we build a SVM classification model and do trail and error of what value of C may give us the best result
# We shall assume C=10

svc = SVC()

svc.fit(X_train,y_train)


print(f"Test set accuracy score with default values of SVC Classifier: {accuracy_score(svc.predict(X_test),y_test)*100}%")

Test set accuracy score with default values of SVC Classifier: 88.5%


In [5]:
# Here some parameter inputs we choose to try
# We always feed it as a dictionary 

param_grid = {'C': [0.1,0.5,1, 10, 100],'kernel': ['rbf', 'poly', 'sigmoid']}


In [6]:
# Run the gridsearch function
grid_search = GridSearchCV(estimator=svc, param_grid=param_grid, cv=15, scoring='accuracy', verbose=1, n_jobs=-1)

# Fit the grid search to the data
grid_search.fit(X, y)


Fitting 15 folds for each of 15 candidates, totalling 225 fits


##### Once the grid search has been fitted, we can check the best parameters and the best score achieved with those parameters:

In [7]:
# printing the best parameters found by GridSearchCV
print("Best Parameters: ", grid_search.best_params_)

# printing the best score found by GridSearchCV
print(f"Best Score: {grid_search.best_score_*100:.2f}%")


Best Parameters:  {'C': 1, 'kernel': 'rbf'}
Best Score: 87.40%


In [8]:
# Evaluate model on test set

# predict the target on the test dataset
predict_test = grid_search.predict(X_test)

# Accuracy Score on test dataset
print(f"accuracy_score on test dataset : {accuracy_score(y_test,predict_test)*100}%")


accuracy_score on test dataset : 90.0%
