## **Hyperparameters Tuning:**

Hyperparameter Tuning refers to the process of choosing the optimum set
of hyperparameters for a Machine Learning model. This process is also
called Hyperparameter Optimization.

* Hyperparameter Tuning Types:

      GridSearchCV
      RandomizedSeaechCV

In [1]:
# importing the dependencies
import numpy as np
import pandas as pd
import sklearn.datasets
from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import RandomizedSearchCV

We will be working on the breast cancer dataset

In [3]:
# loading the data from sklearn
breast_cancer_dataset = sklearn.datasets.load_breast_cancer()

In [4]:
print(breast_cancer_dataset)

{'data': array([[1.799e+01, 1.038e+01, 1.228e+02, ..., 2.654e-01, 4.601e-01,
        1.189e-01],
       [2.057e+01, 1.777e+01, 1.329e+02, ..., 1.860e-01, 2.750e-01,
        8.902e-02],
       [1.969e+01, 2.125e+01, 1.300e+02, ..., 2.430e-01, 3.613e-01,
        8.758e-02],
       ...,
       [1.660e+01, 2.808e+01, 1.083e+02, ..., 1.418e-01, 2.218e-01,
        7.820e-02],
       [2.060e+01, 2.933e+01, 1.401e+02, ..., 2.650e-01, 4.087e-01,
        1.240e-01],
       [7.760e+00, 2.454e+01, 4.792e+01, ..., 0.000e+00, 2.871e-01,
        7.039e-02]]), 'target': array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,
       0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0,
       1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0,
       1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1,
       1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0,
 

In [5]:
# load the data to DataFrame
data_frame = pd.DataFrame(breast_cancer_dataset.data, columns=breast_cancer_dataset.feature_names)

In [14]:
# adding the 'target' column to data frame
data_frame['label']=breast_cancer_dataset.target

In [15]:
#print first five rows of dataframe
data_frame.head()

Unnamed: 0,mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness,mean concavity,mean concave points,mean symmetry,mean fractal dimension,...,worst texture,worst perimeter,worst area,worst smoothness,worst compactness,worst concavity,worst concave points,worst symmetry,worst fractal dimension,label
0,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,0.2419,0.07871,...,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189,0
1,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,0.1812,0.05667,...,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902,0
2,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,0.2069,0.05999,...,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758,0
3,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,0.2597,0.09744,...,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173,0
4,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,0.1809,0.05883,...,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678,0


In [16]:
# number of rows and Columns in this dataset
data_frame.shape

(569, 31)

In [17]:
# checking for missing values
data_frame.isnull().sum()

mean radius                0
mean texture               0
mean perimeter             0
mean area                  0
mean smoothness            0
mean compactness           0
mean concavity             0
mean concave points        0
mean symmetry              0
mean fractal dimension     0
radius error               0
texture error              0
perimeter error            0
area error                 0
smoothness error           0
compactness error          0
concavity error            0
concave points error       0
symmetry error             0
fractal dimension error    0
worst radius               0
worst texture              0
worst perimeter            0
worst area                 0
worst smoothness           0
worst compactness          0
worst concavity            0
worst concave points       0
worst symmetry             0
worst fractal dimension    0
label                      0
dtype: int64

In [19]:
# checking the distribution of Target Varibale
data_frame['label'].value_counts()

1    357
0    212
Name: label, dtype: int64

1 -> Benign

0 -> Malignant

Separating the features and target

In [20]:
X = data_frame.drop(columns='label',axis=1)
Y = data_frame['label']

In [21]:
print(X)

     mean radius  mean texture  mean perimeter  mean area  mean smoothness  \
0          17.99         10.38          122.80     1001.0          0.11840   
1          20.57         17.77          132.90     1326.0          0.08474   
2          19.69         21.25          130.00     1203.0          0.10960   
3          11.42         20.38           77.58      386.1          0.14250   
4          20.29         14.34          135.10     1297.0          0.10030   
..           ...           ...             ...        ...              ...   
564        21.56         22.39          142.00     1479.0          0.11100   
565        20.13         28.25          131.20     1261.0          0.09780   
566        16.60         28.08          108.30      858.1          0.08455   
567        20.60         29.33          140.10     1265.0          0.11780   
568         7.76         24.54           47.92      181.0          0.05263   

     mean compactness  mean concavity  mean concave points  mea

In [22]:
print(Y)

0      0
1      0
2      0
3      0
4      0
      ..
564    0
565    0
566    0
567    0
568    1
Name: label, Length: 569, dtype: int64


In [23]:
# converting into numpy array
X = np.asarray(X)
Y = np.asarray(Y)

## **GridSearchCV**

It is used to determine the best parameters for our model

In [24]:
# loading the SVC model
model = SVC()

In [25]:
# hyperparameters

parameters = {
              'kernel':['linear','poly','rbf','sigmoid'],
              'C':[1,5,10,20]
}

In [26]:
# grid search
classifier = GridSearchCV(model, parameters,cv=5)

In [27]:
# fitting the data to our model
classifier.fit(X,Y)

In [28]:
classifier.cv_results_

{'mean_fit_time': array([1.62741342e+00, 4.11114693e-03, 4.82563972e-03, 1.52976036e-02,
        5.28752327e+00, 7.32231140e-03, 1.59525871e-02, 4.91553783e-02,
        4.61100979e+00, 6.86750412e-03, 6.41126633e-03, 2.39131451e-02,
        7.54003620e+00, 7.97333717e-03, 7.70645142e-03, 2.49055862e-02]),
 'std_fit_time': array([6.50543024e-01, 1.05378488e-04, 6.40418044e-05, 3.74805818e-04,
        1.95551818e+00, 5.78401095e-04, 9.30384611e-03, 9.77383314e-03,
        8.69109813e-01, 1.80674260e-04, 1.21342493e-04, 1.90640124e-03,
        2.39671332e+00, 4.29050370e-04, 1.27021725e-03, 2.56143383e-03]),
 'mean_score_time': array([0.00142589, 0.00124598, 0.00183554, 0.00403132, 0.00273929,
        0.00232105, 0.00450072, 0.01226754, 0.00131378, 0.00187106,
        0.00270739, 0.00594907, 0.00123806, 0.00205755, 0.00305076,
        0.00590973]),
 'std_score_time': array([2.42564892e-04, 5.00454171e-05, 5.66875758e-05, 4.61297994e-05,
        2.98676152e-03, 1.34392761e-04, 2.64724196e-

In [29]:
# best parameters

best_parameters = classifier.best_params_
print(best_parameters)

{'C': 10, 'kernel': 'linear'}


In [31]:
# highest accuracy
highest_accuracy = classifier.best_score_
print(highest_accuracy)

0.9525694767893185


In [32]:
# loading the results through pandas dataframe
result = pd.DataFrame(classifier.cv_results_)

In [33]:
result.head()

Unnamed: 0,mean_fit_time,std_fit_time,mean_score_time,std_score_time,param_C,param_kernel,params,split0_test_score,split1_test_score,split2_test_score,split3_test_score,split4_test_score,mean_test_score,std_test_score,rank_test_score
0,1.627413,0.650543,0.001426,0.000243,1,linear,"{'C': 1, 'kernel': 'linear'}",0.947368,0.929825,0.973684,0.921053,0.955752,0.945536,0.018689,4
1,0.004111,0.000105,0.001246,5e-05,1,poly,"{'C': 1, 'kernel': 'poly'}",0.842105,0.885965,0.929825,0.947368,0.938053,0.908663,0.039382,12
2,0.004826,6.4e-05,0.001836,5.7e-05,1,rbf,"{'C': 1, 'kernel': 'rbf'}",0.850877,0.894737,0.929825,0.947368,0.938053,0.912172,0.035444,11
3,0.015298,0.000375,0.004031,4.6e-05,1,sigmoid,"{'C': 1, 'kernel': 'sigmoid'}",0.54386,0.45614,0.464912,0.385965,0.451327,0.460441,0.050253,13
4,5.287523,1.955518,0.002739,0.002987,5,linear,"{'C': 5, 'kernel': 'linear'}",0.947368,0.938596,0.973684,0.929825,0.964602,0.950815,0.016216,2


In [34]:
grid_search_result = result[['param_C','param_kernel','mean_test_score']]

In [35]:
print(grid_search_result)

   param_C param_kernel  mean_test_score
0        1       linear         0.945536
1        1         poly         0.908663
2        1          rbf         0.912172
3        1      sigmoid         0.460441
4        5       linear         0.950815
5        5         poly         0.922729
6        5          rbf         0.931501
7        5      sigmoid         0.411178
8       10       linear         0.952569
9       10         poly         0.920975
10      10          rbf         0.922714
11      10      sigmoid         0.402391
12      20       linear         0.949061
13      20         poly         0.919221
14      20          rbf         0.920944
15      20      sigmoid         0.398867


Highest Accuracy = 95.2%

Best Parameters = {'C':10,'Kernel':'linear'}

## **RandomizedSeachCV**

In [36]:
# loading the SVC model
model = SVC()

In [37]:
# hyperparameters

parameters = {
              'kernel':['linear','poly','rbf','sigmoid'],
              'C':[1,5,10,20]
}

In [38]:
# grid search
classifier = RandomizedSearchCV(model, parameters,cv=5)

In [39]:
# fitting the data to our model
classifier.fit(X,Y)

In [40]:
classifier.cv_results_

{'mean_fit_time': array([2.78182030e-02, 2.44094849e-02, 5.03896947e+00, 4.58340645e-03,
        4.66151237e-03, 8.04909263e+00, 2.05189571e+00, 1.43458366e-02,
        4.45446968e-03, 5.18188477e-03]),
 'std_fit_time': array([7.40610189e-03, 1.42590305e-03, 1.42515353e+00, 5.37822481e-04,
        1.42971446e-04, 2.95376742e+00, 1.31468696e+00, 7.14723888e-04,
        5.26734795e-05, 1.17462544e-03]),
 'mean_score_time': array([0.00906668, 0.00591416, 0.00112896, 0.00149689, 0.00178046,
        0.00117831, 0.00134168, 0.00368762, 0.00137448, 0.00157704]),
 'std_score_time': array([5.28850623e-03, 5.81842640e-04, 3.64508871e-05, 5.43039021e-05,
        6.60765210e-05, 1.39470133e-04, 3.19848689e-04, 6.15642754e-05,
        2.47925656e-04, 3.85171324e-04]),
 'param_kernel': masked_array(data=['sigmoid', 'sigmoid', 'linear', 'rbf', 'rbf', 'linear',
                    'linear', 'sigmoid', 'poly', 'poly'],
              mask=[False, False, False, False, False, False, False, False,
        

In [41]:
# best parameters

best_parameters = classifier.best_params_
print(best_parameters)

{'kernel': 'linear', 'C': 10}


In [42]:
# highest accuracy
highest_accuracy = classifier.best_score_
print(highest_accuracy)

0.9525694767893185


In [43]:
# loading the results through pandas dataframe
result = pd.DataFrame(classifier.cv_results_)

In [44]:
result.head()

Unnamed: 0,mean_fit_time,std_fit_time,mean_score_time,std_score_time,param_kernel,param_C,params,split0_test_score,split1_test_score,split2_test_score,split3_test_score,split4_test_score,mean_test_score,std_test_score,rank_test_score
0,0.027818,0.007406,0.009067,0.005289,sigmoid,1,"{'kernel': 'sigmoid', 'C': 1}",0.54386,0.45614,0.464912,0.385965,0.451327,0.460441,0.050253,8
1,0.024409,0.001426,0.005914,0.000582,sigmoid,10,"{'kernel': 'sigmoid', 'C': 10}",0.482456,0.403509,0.421053,0.342105,0.362832,0.402391,0.048906,9
2,5.038969,1.425154,0.001129,3.6e-05,linear,10,"{'kernel': 'linear', 'C': 10}",0.938596,0.938596,0.973684,0.947368,0.964602,0.952569,0.0142,1
3,0.004583,0.000538,0.001497,5.4e-05,rbf,20,"{'kernel': 'rbf', 'C': 20}",0.877193,0.921053,0.921053,0.947368,0.938053,0.920944,0.024105,6
4,0.004662,0.000143,0.00178,6.6e-05,rbf,1,"{'kernel': 'rbf', 'C': 1}",0.850877,0.894737,0.929825,0.947368,0.938053,0.912172,0.035444,7


In [45]:
randomized_search_result = result[['param_C','param_kernel','mean_test_score']]

In [46]:
print(randomized_search_result)

  param_C param_kernel  mean_test_score
0       1      sigmoid         0.460441
1      10      sigmoid         0.402391
2      10       linear         0.952569
3      20          rbf         0.920944
4       1          rbf         0.912172
5      20       linear         0.949061
6       1       linear         0.945536
7      20      sigmoid         0.398867
8      10         poly         0.920975
9       5         poly         0.922729


Highest Accuracy = 95.2%

Best Parameters = {'C':10,'Kernel':'linear'}

        Hasrat Ali
        Thank You:)