# **Hyperparameter Optimization**

Hello! If you're wandering in search of a good piece of code that clears your confusion about Hyperparameter Optimization, I think all of your confusions end up here. Welcome😉!

To keep things simple, I will use the dataset from [Kaggle Mushroom Classification ](http://www.kaggle.com/uciml/mushroom-classification). Other things are just easy. Let's move.

In [3]:
import pandas as pd
df=pd.read_csv('/kaggle/input/mushroom-classification/mushrooms.csv')

Null values are always my headache. Are there any again?

In [4]:
df.isna().sum()

class                       0
cap-shape                   0
cap-surface                 0
cap-color                   0
bruises                     0
odor                        0
gill-attachment             0
gill-spacing                0
gill-size                   0
gill-color                  0
stalk-shape                 0
stalk-root                  0
stalk-surface-above-ring    0
stalk-surface-below-ring    0
stalk-color-above-ring      0
stalk-color-below-ring      0
veil-type                   0
veil-color                  0
ring-number                 0
ring-type                   0
spore-print-color           0
population                  0
habitat                     0
dtype: int64

Cool. No null values. Let's waste no time!🏃‍♂️🏃‍♀️

Now if you don't know what I am doing, it is just a way of converting string type values into integers, cause ML algos don't like to play with string type values(the same way I don't like coffee!) I will use LabelEncoder() to encode the values to integers.

In [6]:
from sklearn.preprocessing import LabelEncoder
le=LabelEncoder()
l1=df.columns
for i in l1:
    df[i]=le.fit_transform(df[i])



We need to predict if the mushrooms are edible or not. For that let's separate them into x as data and y as target.

In [8]:
x=df.drop('class',axis=1)
y=df['class']

Let's Split!

In [9]:
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2,random_state=0,stratify=y)


Okay the major thing comes from here. Keep your eyes wide open!

We are gonna use Support Vector Machine as our predictor algorithm as of now.(Keeping things simple!)

In [18]:
from sklearn.svm import SVC
svm=SVC(gamma='auto')#just initializing for ease here

Here GridSearchCV is the optimization algorithm we will use for now. Later in this notebook, I will talk about RandomSearchCV and show you the implementation of it too!

In [19]:
from sklearn.model_selection import GridSearchCV
gsc=GridSearchCV(svm,{
    'kernel':['linear','rbf'],
    'C':[1,5,25,100],
},cv=4,return_train_score=False)

In [20]:
gsc.fit(x,y)

GridSearchCV(cv=4, estimator=SVC(gamma='auto'),
             param_grid={'C': [1, 5, 25, 100], 'kernel': ['linear', 'rbf']})

The best thing about GridSearchCV is that it shows me a pretty nice overview of the optimization process with the help of its methods. Let's try some...

In [21]:
gsc.cv_results_

{'mean_fit_time': array([ 1.13900274,  0.38907087,  3.31196886,  0.38715851,  7.45822829,
         0.38309777, 26.3449676 ,  0.38580185]),
 'std_fit_time': array([6.21319553e-01, 2.07521385e-02, 2.65578384e+00, 1.06096026e-02,
        5.68425309e+00, 1.33488555e-02, 1.76637969e+01, 1.42218567e-02]),
 'mean_score_time': array([0.03430355, 0.07956892, 0.03127629, 0.0788998 , 0.03325409,
        0.07547158, 0.0335511 , 0.07767767]),
 'std_score_time': array([0.01087319, 0.00406143, 0.01074301, 0.00603785, 0.01211214,
        0.00245381, 0.01341908, 0.00245067]),
 'param_C': masked_array(data=[1, 1, 5, 5, 25, 25, 100, 100],
              mask=[False, False, False, False, False, False, False, False],
        fill_value='?',
             dtype=object),
 'param_kernel': masked_array(data=['linear', 'rbf', 'linear', 'rbf', 'linear', 'rbf',
                    'linear', 'rbf'],
              mask=[False, False, False, False, False, False, False, False],
        fill_value='?',
             dtyp

Though it gave me almost every minute details of the process. However, Mr. X, my manager, cannot read this pretty easily. So, let's apply some magic of dataframes over it!


In [25]:
results_overview=pd.DataFrame(gsc.cv_results_)
results_overview

Unnamed: 0,mean_fit_time,std_fit_time,mean_score_time,std_score_time,param_C,param_kernel,params,split0_test_score,split1_test_score,split2_test_score,split3_test_score,mean_test_score,std_test_score,rank_test_score
0,1.139003,0.62132,0.034304,0.010873,1,linear,"{'C': 1, 'kernel': 'linear'}",0.747907,0.950763,0.944855,0.764156,0.85192,0.096083,7
1,0.389071,0.020752,0.079569,0.004061,1,rbf,"{'C': 1, 'kernel': 'rbf'}",0.867061,0.998031,1.0,0.618907,0.871,0.155197,6
2,3.311969,2.655784,0.031276,0.010743,5,linear,"{'C': 5, 'kernel': 'linear'}",0.729197,0.966519,0.944855,0.74643,0.84675,0.109375,8
3,0.387159,0.01061,0.0789,0.006038,5,rbf,"{'C': 5, 'kernel': 'rbf'}",0.873461,1.0,1.0,0.61743,0.872723,0.156184,3
4,7.458228,5.684253,0.033254,0.012112,25,linear,"{'C': 25, 'kernel': 'linear'}",0.85229,0.966519,0.944855,0.763663,0.881832,0.080592,2
5,0.383098,0.013349,0.075472,0.002454,25,rbf,"{'C': 25, 'kernel': 'rbf'}",0.873461,1.0,1.0,0.61743,0.872723,0.156184,3
6,26.344968,17.663797,0.033551,0.013419,100,linear,"{'C': 100, 'kernel': 'linear'}",0.908912,0.983752,0.944855,0.763663,0.900295,0.083206,1
7,0.385802,0.014222,0.077678,0.002451,100,rbf,"{'C': 100, 'kernel': 'rbf'}",0.873461,1.0,1.0,0.61743,0.872723,0.156184,3


Awesome! Now Mr. X can see each details nicely! Still, we don't need these all columns for our mere interpretation, do we? I think not. Let's trim it.

In [29]:
results_tidy=results_overview[['param_C','param_kernel','mean_test_score']]
results_tidy

Unnamed: 0,param_C,param_kernel,mean_test_score
0,1,linear,0.85192
1,1,rbf,0.871
2,5,linear,0.84675
3,5,rbf,0.872723
4,25,linear,0.881832
5,25,rbf,0.872723
6,100,linear,0.900295
7,100,rbf,0.872723


It shows the hyperparam C with value 100 and 'linear' kernel gave me the best result. I can now clearly and undoubtedly use these values for fine tuning my hyperparams. Mr. X would be more than happy now😍!

Wait! Do I need to perform all these processes to see the best values? It is not THAT FUN. 
We have a built-in method for that too!

In [30]:
gsc.best_params_

{'C': 100, 'kernel': 'linear'}

See, the values we manually observed are equivalent to what *gsv.best_params_* gave me. Cool!

It is all okay, but I felt kind of delay while running the code. What could be the reason of it? Maybe the possible combinations of 2 different kernel types and 4 different C values(that made 8 possible combinations!) filled my memory! 

Well, this seemed not that good when Mr. X needs to check the C values in large number. He says he is not fascinated by this. I nod.

But don't you worry Mr. X, I have a robust and fast algorithm to perform the work! Lets try RandomizedSearchCV.



In [31]:
from sklearn.model_selection import RandomizedSearchCV

In [33]:
rscv=RandomizedSearchCV(svm,{
                            'kernel':['linear','rbf'],
                            'C':[1,5,25,100],
                            },n_iter=3)

**Note that I used a new parameter named *n_iter* and assigned it value 3. This is nothing, but a variable that tells the algorithm to iterate 3 times only. Confused with what output it gives? Well this just gives 3 random scores(thatswhy called RandomizedSearchCV) That way we can solve the problem of memory consume! 
There must be a puzzle roaming around your head.Lemme solve this.
You must have asked yourself, 'Hey! If it gives 3 random scores, does it mean that we get different scores everytime we run the code? Yes, try it yourself.
**
**
Still, RandomizedSearchCV is better than Grid when you don't have any idea what sized value you should probably feed your hyperparams. Just iterate in n times and get the result like this!
👇👇👇****

In [37]:
rscv.fit(x,y)

RandomizedSearchCV(estimator=SVC(gamma='auto'), n_iter=3,
                   param_distributions={'C': [1, 5, 25, 100],
                                        'kernel': ['linear', 'rbf']})

**Yes! It didn't take me as much time as GridSearchCV would take for equal no of combinations, because as I already said, this just picks a random value. Let's see the output!**

In [38]:
rscv.best_params_

{'kernel': 'linear', 'C': 100}

Again! The output is same for this too, because it has nothing to do with the actual score. The only thing we are trying to prove is RandomizedSearchCV can be a better alternative for fast and large iterations.


**THIS BRINGS US TO THE END OF LEARNING ABOUT HYPERPARAMETER OPTIMIZATION!**

**PLEASE SHARE AND FORK🙏**