# Hyperparameter Tuning

What are hyperparameters and its tuning?

![](1.png)

![](2.png)

These are the hyperparameters for their respective models.

![](3.png)
![](4.png)
![](5.png)

Grid Search and Random Search Approach

![](6.png)

There are two model as random forest classifier and support vector classifier. There are three creteria for random forest classifier as criteria, n_estimators and min_samples_leaf. Also, there are three creteria for support vector classifier as kernel, c and gamma.

![](7.png)

These are the possible combinations for the creteria.

Now, the poosile combinations can go upto 100. We have two approches to manage the combinations of hyperparameter tuning.

![](8.png)
![](9.png)
![](10.png)
![](11.png)

Statement of the Problem- Predict whether income exceeds $50K/yr based on census data. OR To determine whether a person makes over 50K a year or not.

To download the data, Please follow the link(https://archive.ics.uci.edu/ml/datasets/adult)

In [40]:
#Compare multiple Classifiers for diffrent train and test values

In [41]:
#import libraries
import pandas as pd

In [42]:
#Read dataset
data=pd.read_csv('04+-+decisiontreeAdultIncome.csv')

In [43]:
data

Unnamed: 0,age,wc,education,marital status,race,gender,hours per week,IncomeClass
0,38,Private,HS-grad,Divorced,White,Male,40,<=50K
1,28,Private,Bachelors,Married,Black,Female,40,<=50K
2,37,Private,Masters,Married,White,Female,40,<=50K
3,31,Private,Masters,Never-married,White,Female,50,>50K
4,42,Private,Bachelors,Married,White,Male,40,>50K
...,...,...,...,...,...,...,...,...
19782,53,Private,Masters,Married,White,Male,40,>50K
19783,22,Private,Some-college,Never-married,White,Male,40,<=50K
19784,40,Private,HS-grad,Married,White,Male,40,>50K
19785,58,Private,HS-grad,Widowed,White,Female,40,<=50K


Description about the data set and features

age: continuous.

workclass: Private, Self-emp-not-inc, Self-emp-inc, Federal-gov, Local-gov, State-gov, Without-pay, Never-worked.

fnlwgt: continuous.

education: Bachelors, Some-college, 11th, HS-grad, Prof-school, Assoc-acdm, Assoc-voc, 9th, 7th-8th, 12th, Masters, 1st-4th, 10th, Doctorate, 5th-6th, Preschool.

education-num: continuous.

marital-status: Married-civ-spouse, Divorced, Never-married, Separated, Widowed, Married-spouse-absent, Married-AF-spouse.

occupation: Tech-support, Craft-repair, Other-service, Sales, Exec-managerial, Prof-specialty, Handlers-cleaners, Machine-op-inspct, Adm-clerical, Farming-fishing, Transport-moving, Priv-house-serv, Protective-serv, Armed-Forces.
relationship: Wife, Own-child, Husband, Not-in-family, Other-relative, Unmarried.

race: White, Asian-Pac-Islander, Amer-Indian-Eskimo, Other, Black.

sex: Female, Male.

capital-gain: continuous.

capital-loss: continuous.

hours-per-week: continuous.

native-country: United-States, Cambodia, England, Puerto-Rico, Canada, Germany, Outlying-US(Guam-USVI-etc), India, Japan, Greece, South, China, Cuba, Iran, Honduras, Philippines, Italy, Poland, Jamaica, Vietnam, Mexico, Portugal, Ireland, France, Dominican-Republic, Laos, Ecuador, Taiwan, Haiti, Columbia, Hungary, Guatemala, Nicaragua, Scotland, Thailand, Yugoslavia, El-Salvador, Trinadad&Tobago, Peru, Hong, Holand-Netherlands.

In [44]:
#create dummy variables
data_prep=pd.get_dummies(data,drop_first=True)

In [45]:
#Create the X and Y variables

In [46]:
X=data_prep.iloc[:,:-1]

In [47]:
Y=data_prep.iloc[:,-1]

In [48]:
#Import Decision Tree classifier from sklearn

In [49]:
from sklearn.tree import DecisionTreeClassifier

In [50]:
dtc= DecisionTreeClassifier(random_state=1234)

In [51]:
#Import Random forest classifier from sklearn

In [52]:
from sklearn.ensemble import RandomForestClassifier

In [53]:
rfc= RandomForestClassifier(random_state=1234)

In [54]:
#Import and train Support Vector Classifier

In [55]:
from sklearn.svm import SVC

In [56]:
svc=SVC(random_state=1234)

In [57]:
#Import and train Logistic Regression Classifier

In [58]:
from sklearn.linear_model import LogisticRegression

In [59]:
lrc= LogisticRegression(random_state=1234)

In [60]:
# import gridsearchCV

In [61]:
from sklearn.model_selection import GridSearchCV

In [62]:
#Create parameter grid

In [63]:
rfc_param={'n_estimators':[10,15,20],
          'min_samples_split':[8,16],
          'min_samples_leaf':[1,2,3,4,5]}

In [64]:
#Different possibles combinations of parameter
#(3*2*5=30)---> models

In [65]:
#Create the grid search objects
rfc_grid = GridSearchCV(estimator=rfc, 
                        param_grid=rfc_param,
                        scoring='accuracy',
                        cv=10,
                        n_jobs=-1,
                        return_train_score=True)

In [66]:
#Number of jobs=Models*fold=30*10=300

In [67]:
rfc_param

{'n_estimators': [10, 15, 20],
 'min_samples_split': [8, 16],
 'min_samples_leaf': [1, 2, 3, 4, 5]}

In [68]:
rfc_grid

GridSearchCV(cv=10, estimator=RandomForestClassifier(random_state=1234),
             n_jobs=-1,
             param_grid={'min_samples_leaf': [1, 2, 3, 4, 5],
                         'min_samples_split': [8, 16],
                         'n_estimators': [10, 15, 20]},
             return_train_score=True, scoring='accuracy')

In [69]:
#Fit the data to grid search

In [70]:
rfc_grid_fit=rfc_grid.fit(X,Y)

In [71]:
#Get the result of gridsearchCV

In [72]:
cv_result_rfc=rfc_grid_fit.cv_results_

In [73]:
cv_result_rfc

{'mean_fit_time': array([0.45668049, 0.66292915, 0.81433065, 0.45361254, 0.60140274,
        0.82010536, 0.39444685, 0.57964857, 0.66115172, 0.44950752,
        0.86538608, 0.79417508, 0.38945613, 0.49168406, 0.64916961,
        0.37311504, 0.51233435, 0.68502252, 0.34607711, 0.53447428,
        0.67422717, 0.34477339, 0.48221769, 0.68746185, 0.47283773,
        0.72615461, 0.91873999, 0.42735655, 0.6708653 , 0.77842727]),
 'std_fit_time': array([0.11297166, 0.15059306, 0.09597708, 0.04941699, 0.05503439,
        0.08481615, 0.05022412, 0.04823105, 0.0960839 , 0.13203082,
        0.23014196, 0.14076624, 0.06592218, 0.07724044, 0.08649928,
        0.0666882 , 0.07205434, 0.09178141, 0.05774798, 0.07067668,
        0.09762898, 0.05930529, 0.0712308 , 0.20757168, 0.14407306,
        0.17313657, 0.14960031, 0.05085416, 0.14994126, 0.14702281]),
 'mean_score_time': array([0.01994457, 0.03131347, 0.04906604, 0.02234087, 0.0290211 ,
        0.03819802, 0.0256314 , 0.02533505, 0.03425844, 0.02

In [74]:
#Covert the results in data format

In [75]:
cv_result_rfc=pd.DataFrame.from_dict(rfc_grid_fit.cv_results_)

In [76]:
cv_result_rfc

Unnamed: 0,mean_fit_time,std_fit_time,mean_score_time,std_score_time,param_min_samples_leaf,param_min_samples_split,param_n_estimators,params,split0_test_score,split1_test_score,...,split2_train_score,split3_train_score,split4_train_score,split5_train_score,split6_train_score,split7_train_score,split8_train_score,split9_train_score,mean_train_score,std_train_score
0,0.45668,0.112972,0.019945,0.005834,1,8,10,"{'min_samples_leaf': 1, 'min_samples_split': 8...",0.790803,0.792319,...,0.869834,0.872642,0.870845,0.870227,0.870788,0.870459,0.871919,0.869167,0.87092,0.001005
1,0.662929,0.150593,0.031313,0.009652,1,8,15,"{'min_samples_leaf': 1, 'min_samples_split': 8...",0.797372,0.796362,...,0.872248,0.873596,0.873821,0.871799,0.872136,0.871301,0.873716,0.87248,0.87275,0.000923
2,0.814331,0.095977,0.049066,0.016139,1,8,20,"{'min_samples_leaf': 1, 'min_samples_split': 8...",0.798888,0.791814,...,0.872642,0.875449,0.874775,0.873821,0.873315,0.872312,0.874277,0.873716,0.873969,0.001056
3,0.453613,0.049417,0.022341,0.007222,1,16,10,"{'min_samples_leaf': 1, 'min_samples_split': 1...",0.80192,0.799899,...,0.856357,0.859614,0.856974,0.857087,0.856862,0.857207,0.858723,0.85687,0.857392,0.001058
4,0.601403,0.055034,0.029021,0.008303,1,16,15,"{'min_samples_leaf': 1, 'min_samples_split': 1...",0.803436,0.798888,...,0.857143,0.859894,0.858827,0.857648,0.858434,0.858049,0.859228,0.858555,0.858762,0.001066
5,0.820105,0.084816,0.038198,0.010298,1,16,20,"{'min_samples_leaf': 1, 'min_samples_split': 1...",0.80091,0.801415,...,0.859277,0.860961,0.859558,0.857985,0.858771,0.858274,0.859734,0.860127,0.859599,0.001225
6,0.394447,0.050224,0.025631,0.007937,2,8,10,"{'min_samples_leaf': 2, 'min_samples_split': 8...",0.805457,0.798383,...,0.857536,0.8604,0.85748,0.855346,0.857087,0.855298,0.858218,0.857151,0.857639,0.001506
7,0.579649,0.048231,0.025335,0.00442,2,8,15,"{'min_samples_leaf': 2, 'min_samples_split': 8...",0.799899,0.80091,...,0.858322,0.860961,0.858322,0.857929,0.858659,0.856646,0.858948,0.857712,0.858836,0.001313
8,0.661152,0.096084,0.034258,0.011579,2,8,20,"{'min_samples_leaf': 2, 'min_samples_split': 8...",0.804952,0.803941,...,0.858715,0.860624,0.860344,0.858322,0.859164,0.857375,0.859622,0.858611,0.859509,0.001232
9,0.449508,0.132031,0.028425,0.013594,2,16,10,"{'min_samples_leaf': 2, 'min_samples_split': 1...",0.813037,0.799394,...,0.848327,0.84945,0.848832,0.847765,0.847428,0.848167,0.848897,0.848784,0.848587,0.000792


In [77]:
cv_result_rfc.shape

(30, 33)

In [80]:
import pandas as pd
df=pd.DataFrame(cv_result_rfc)
df.to_csv('cv_result_rfc.csv')

We exported the result in csv and short the rank_test_score in ascending order in csv file.

In [81]:
#Implement the GridSearch for Logistic Regression

In [82]:
lrc_param={'C': [0.01,0.1,0.5,1,2,5,10],
            'penalty': ['l2'],
          'solver':['liblinear','lbfgs','saga']}

In [83]:
#number of combinations= 7*1*3=21

In [84]:
#Number of jobs= 21*10

In [85]:
#Create the grid search objects
lrc_grid = GridSearchCV(estimator=lrc, 
                        param_grid=lrc_param,
                        scoring='accuracy',
                        cv=10,
                        n_jobs=-1,
                        return_train_score=True)

In [86]:
#Fit the data to grid search

In [87]:
lrc_grid_fit=lrc_grid.fit(X,Y)

In [88]:
cv_result_lrc=lrc_grid_fit.cv_results_

In [89]:
cv_result_lrc

{'mean_fit_time': array([0.28104825, 0.69414241, 2.01503353, 0.27067676, 0.7547797 ,
        2.12504284, 0.42716384, 0.81452684, 2.61601517, 0.50085835,
        0.88180466, 2.09092236, 0.41401324, 0.5605083 , 1.74503152,
        0.38527086, 0.56552169, 1.80409346, 0.36981058, 0.57176945,
        1.63814776]),
 'std_fit_time': array([0.02768776, 0.08957076, 0.32239874, 0.0336067 , 0.06334647,
        0.37392686, 0.1516348 , 0.1749037 , 0.34654121, 0.12150061,
        0.11714403, 0.19616348, 0.07532369, 0.03307782, 0.227293  ,
        0.05832584, 0.03427507, 0.24272811, 0.0469753 , 0.03715675,
        0.29424509]),
 'mean_score_time': array([0.02094481, 0.00618327, 0.00668273, 0.00658424, 0.00698123,
        0.00598278, 0.01136973, 0.00927484, 0.0109709 , 0.0117681 ,
        0.00997198, 0.00578675, 0.00638223, 0.00538528, 0.0053853 ,
        0.00628402, 0.00614898, 0.00528724, 0.00678375, 0.00588677,
        0.005088  ]),
 'std_score_time': array([0.01505338, 0.00116371, 0.0033692 , 0.00

In [90]:
#Covert the results in data format

In [91]:
cv_result_lrc=pd.DataFrame.from_dict(lrc_grid_fit.cv_results_)

In [92]:
cv_result_lrc

Unnamed: 0,mean_fit_time,std_fit_time,mean_score_time,std_score_time,param_C,param_penalty,param_solver,params,split0_test_score,split1_test_score,...,split2_train_score,split3_train_score,split4_train_score,split5_train_score,split6_train_score,split7_train_score,split8_train_score,split9_train_score,mean_train_score,std_train_score
0,0.281048,0.027688,0.020945,0.015053,0.01,l2,liblinear,"{'C': 0.01, 'penalty': 'l2', 'solver': 'liblin...",0.805457,0.791814,...,0.810085,0.812444,0.810647,0.810703,0.809805,0.810208,0.810826,0.811388,0.811049,0.000936
1,0.694142,0.089571,0.006183,0.001164,0.01,l2,lbfgs,"{'C': 0.01, 'penalty': 'l2', 'solver': 'lbfgs'}",0.808994,0.800404,...,0.811545,0.812837,0.80885,0.809692,0.812107,0.811163,0.810377,0.812342,0.811532,0.00143
2,2.015034,0.322399,0.006683,0.003369,0.01,l2,saga,"{'C': 0.01, 'penalty': 'l2', 'solver': 'saga'}",0.806468,0.790298,...,0.809243,0.811938,0.809748,0.810142,0.809355,0.81004,0.810433,0.811331,0.810633,0.001171
3,0.270677,0.033607,0.006584,0.002608,0.1,l2,liblinear,"{'C': 0.1, 'penalty': 'l2', 'solver': 'libline...",0.808994,0.798383,...,0.814522,0.816431,0.813455,0.814858,0.813567,0.813353,0.814476,0.815037,0.814811,0.001136
4,0.75478,0.063346,0.006981,0.004136,0.1,l2,lbfgs,"{'C': 0.1, 'penalty': 'l2', 'solver': 'lbfgs'}",0.806973,0.797372,...,0.812725,0.814915,0.81323,0.813511,0.813286,0.811107,0.813128,0.814644,0.813564,0.001129
5,2.125043,0.373927,0.005983,0.002041,0.1,l2,saga,"{'C': 0.1, 'penalty': 'l2', 'solver': 'saga'}",0.806973,0.792825,...,0.810029,0.812332,0.810142,0.81031,0.809524,0.809815,0.810826,0.811331,0.810774,0.000952
6,0.427164,0.151635,0.01137,0.005521,0.5,l2,liblinear,"{'C': 0.5, 'penalty': 'l2', 'solver': 'libline...",0.808489,0.798888,...,0.814128,0.815532,0.81396,0.813792,0.813567,0.813072,0.81397,0.814476,0.81439,0.00092
7,0.814527,0.174904,0.009275,0.004014,0.5,l2,lbfgs,"{'C': 0.5, 'penalty': 'l2', 'solver': 'lbfgs'}",0.808994,0.797372,...,0.814241,0.815645,0.812837,0.813062,0.813455,0.811331,0.814364,0.814588,0.813963,0.001247
8,2.616015,0.346541,0.010971,0.009483,0.5,l2,saga,"{'C': 0.5, 'penalty': 'l2', 'solver': 'saga'}",0.807984,0.792319,...,0.810029,0.812332,0.810366,0.81031,0.809243,0.81004,0.810714,0.811331,0.810762,0.000987
9,0.500858,0.121501,0.011768,0.005275,1.0,l2,liblinear,"{'C': 1, 'penalty': 'l2', 'solver': 'liblinear'}",0.808489,0.799394,...,0.814297,0.815532,0.813623,0.813511,0.813398,0.812735,0.813746,0.813914,0.814148,0.000955


In [93]:
import pandas as pd
df=pd.DataFrame(cv_result_lrc)
df.to_csv('cv_result_lrc.csv')

In [94]:
#Implement the GridSearch for Support Vector Machine

In [95]:
#Define parameter for support Vector Classifier

In [96]:
svc_param={'C': [0.01,0.1,0.5,1,2,5,10],
            'kernel': ['rbf','linear'],
          'gamma':[0.1,0.25,0.5,1,5]}

In [97]:
#the parameter results in 7*2*5=70 different combinations

In [98]:
#CV=10 for 70 different combinations mean 700 jobs/models

In [99]:
#Create the grid search objects
svc_grid = GridSearchCV(estimator=svc, 
                        param_grid=svc_param,
                        scoring='accuracy',
                        cv=10,
                        n_jobs=-1,
                        return_train_score=True)

In [100]:
#Fit the data to grid search

In [101]:
svc_grid_fit=svc_grid.fit(X,Y)

In [102]:
cv_result_svc=svc_grid_fit.cv_results_

In [103]:
cv_result_svc

{'mean_fit_time': array([   41.70601852,    37.57959447,    40.05763092,    38.58436844,
           57.38280053,    37.22357903,    72.92646639,    34.58094566,
           89.98974562,    33.82671378,    35.5294312 ,    46.16309614,
           41.64465389,    46.34740345,    63.94713244,    43.67934613,
           90.85779889,    41.71572883,   133.61181278,    40.08246064,
           31.60671899,    76.82221236,    36.58339827,    76.47803891,
           73.5399507 ,    76.51566916,   105.22790399,    71.66398299,
          149.65542171,    72.3970345 ,    30.13881729,   113.56551971,
           53.14344594,   109.70718751,    79.27706966,   117.68642011,
          115.55160491,   114.71986372,   151.34661772,   117.90265114,
           33.48799736,   212.32206838,    68.51880541,   213.9950022 ,
          101.25822172,   212.52162554,   115.00451655,   211.30782766,
          155.43856766,   201.78652396,    38.34449923,   418.52680507,
           87.88139217,   413.40430143,   106.8

In [104]:
#Covert the results in data format

In [105]:
cv_result_svc=pd.DataFrame.from_dict(svc_grid_fit.cv_results_)

In [106]:
cv_result_svc

Unnamed: 0,mean_fit_time,std_fit_time,mean_score_time,std_score_time,param_C,param_gamma,param_kernel,params,split0_test_score,split1_test_score,...,split2_train_score,split3_train_score,split4_train_score,split5_train_score,split6_train_score,split7_train_score,split8_train_score,split9_train_score,mean_train_score,std_train_score
0,41.706019,1.619968,1.969547,0.201109,0.01,0.1,rbf,"{'C': 0.01, 'gamma': 0.1, 'kernel': 'rbf'}",0.736736,0.736736,...,0.736579,0.736579,0.736579,0.736635,0.736635,0.736594,0.736594,0.736594,0.736595,0.000021
1,37.579594,1.390514,1.186453,0.085674,0.01,0.1,linear,"{'C': 0.01, 'gamma': 0.1, 'kernel': 'linear'}",0.803941,0.793835,...,0.808794,0.810366,0.808513,0.808513,0.808232,0.808580,0.808692,0.809591,0.809145,0.000778
2,40.057631,2.157766,2.423865,0.406386,0.01,0.25,rbf,"{'C': 0.01, 'gamma': 0.25, 'kernel': 'rbf'}",0.736736,0.736736,...,0.736579,0.736579,0.736579,0.736635,0.736635,0.736594,0.736594,0.736594,0.736595,0.000021
3,38.584368,4.628833,1.412633,0.367932,0.01,0.25,linear,"{'C': 0.01, 'gamma': 0.25, 'kernel': 'linear'}",0.803941,0.793835,...,0.808794,0.810366,0.808513,0.808513,0.808232,0.808580,0.808692,0.809591,0.809145,0.000778
4,57.382801,8.022057,3.133083,0.934870,0.01,0.5,rbf,"{'C': 0.01, 'gamma': 0.5, 'kernel': 'rbf'}",0.736736,0.736736,...,0.736579,0.736579,0.736579,0.736635,0.736635,0.736594,0.736594,0.736594,0.736595,0.000021
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
65,643.097507,17.460349,0.596212,0.027681,10,0.5,linear,"{'C': 10, 'gamma': 0.5, 'kernel': 'linear'}",0.803436,0.796867,...,0.809018,0.810928,0.808906,0.809636,0.808850,0.809310,0.809591,0.810601,0.809875,0.000861
66,98.033350,4.721209,2.535846,0.122627,10,1,rbf,"{'C': 10, 'gamma': 1, 'kernel': 'rbf'}",0.793835,0.779687,...,0.903807,0.905717,0.903583,0.903807,0.903807,0.904150,0.905329,0.903532,0.904354,0.000833
67,654.046519,26.045299,0.593763,0.034749,10,1,linear,"{'C': 10, 'gamma': 1, 'kernel': 'linear'}",0.803436,0.796867,...,0.809018,0.810928,0.808906,0.809636,0.808850,0.809310,0.809591,0.810601,0.809875,0.000861
68,131.024122,10.375285,3.702564,0.143556,10,5,rbf,"{'C': 10, 'gamma': 5, 'kernel': 'rbf'}",0.773118,0.764528,...,0.903807,0.905717,0.903583,0.903807,0.903807,0.904150,0.905329,0.903532,0.904354,0.000833


In [107]:
import pandas as pd
df=pd.DataFrame(cv_result_svc)
df.to_csv('cv_result_svc.csv')

In [109]:
# Get the top ranked test score for all the three classifiers
rfc_top_rank = cv_result_rfc[cv_result_rfc['rank_test_score'] == 1].iloc[0]
lrc_top_rank = cv_result_lrc[cv_result_lrc['rank_test_score'] == 1].iloc[0]
svc_top_rank = cv_result_svc[cv_result_svc['rank_test_score'] == 1].iloc[0]

Print the train and test score for three classifiers

In [120]:
print('\n\n')
print ('                    ',
       '  Random Forest    ',
       '  Logistic Regression  ',
       '  Support Vector   ')
print ('                    ',
       '  ---------------- ',
       '  -------------------- ',
       '  ---------------- ')
print ('  Mean Test Score   : ', 
       str('%.4f' %rfc_top_rank['mean_test_score']),
       '            ',
       str('%.4f' %lrc_top_rank['mean_test_score']),
       '                ',
       str('%.4f' %svc_top_rank['mean_test_score'])
       )
print ('  Mean Train Score  : ', 
       str('%.4f' %rfc_top_rank['mean_train_score']),
       '            ',
       str('%.4f' %lrc_top_rank['mean_train_score']),
       '                ',
       str('%.4f' %svc_top_rank['mean_train_score'])
       )




                       Random Forest       Logistic Regression     Support Vector   
                       ----------------    --------------------    ---------------- 
  Mean Test Score   :  0.8184              0.8146                  0.8144
  Mean Train Score  :  0.8385              0.8148                  0.8429


It is Random Forest which gives us the better accuracy among all others.

In [121]:
# Print the best parameters of the Random Forest Classifier
print('\n The best Parameters are : ')
print(rfc_grid_fit.best_params_)


 The best Parameters are : 
{'min_samples_leaf': 5, 'min_samples_split': 16, 'n_estimators': 10}


# Perform RandomizedSearchCV for hyperparameter tuning

In [122]:
# Import libraries
import pandas as pd

In [123]:
# Read dataset
data = pd.read_csv('04+-+decisiontreeAdultIncome.csv')

In [124]:
# Create Dummy variables
data_prep = pd.get_dummies(data, drop_first=True)

In [125]:
# Create X and Y Variables
X = data_prep.iloc[:, :-1]
Y = data_prep.iloc[:, -1]

In [126]:
# Import and create Random Forest Classifier
from sklearn.ensemble import RandomForestClassifier
rfc = RandomForestClassifier(random_state=1234)

In [127]:
# Import RandomizedSearchCV
from sklearn.model_selection import RandomizedSearchCV

In [128]:
# define parameters for Random Forest
rfc_param = {'n_estimators':[10,15,20], 
            'min_samples_split':[8,16],
            'min_samples_leaf':[1,2,3,4,5]
            }

In [129]:
#The parameters results in 3 x 2 x 5 = 30 different combinations

In [130]:
# Create the RandomizedSearchCV object
rfc_rs = RandomizedSearchCV(estimator=rfc, 
                        param_distributions=rfc_param,
                        scoring='accuracy',
                        cv=10,
                        n_iter=10,
                        return_train_score=True,
                        random_state=1234)

In [133]:
#n_iter selects 10 combinations out of 30 possible
#Now 10 x 10 = 100 jobs will be executed

In [134]:
# Fit the data to RandomizedSearchCV object
rfc_rs_fit = rfc_rs.fit(X, Y)

In [135]:
# Get the results of RandomizedSearch
cv_results_rfc_rs = pd.DataFrame.from_dict(rfc_rs_fit.cv_results_)

In [136]:
# Print the best parameters of Randomized Search for Random Forest
print('\n The best Parameters are : ')
print(rfc_rs_fit.best_params_)


 The best Parameters are : 
{'n_estimators': 15, 'min_samples_split': 16, 'min_samples_leaf': 5}
