**Q1. Problem Statement: Grid Search**

Load the ‘voice.csv’ dataset into a DataFrame and perform the following tasks:
1.	Considering the ‘label’ column as the target variable, rename the column as ‘Gender_Identified’
2.	Using the preprocessing() function, label the target column
3.	Separate the target variable and the feature vectors
4.	Build a RandomForestClassifier model and find the best parameters using a Grid search
5.	Print the best parameters and the best estimator


**Step-1:** Importing the required libraries.

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split,GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt

**Step-2:** Loading the CSV data into a DataFrame.


In [None]:
data=pd.read_csv('/content/sample_data/voice.csv')
data.head()

Unnamed: 0,meanfreq,sd,median,Q25,Q75,IQR,skew,kurt,sp.ent,sfm,...,centroid,meanfun,minfun,maxfun,meandom,mindom,maxdom,dfrange,modindx,label
0,0.059781,0.064241,0.032027,0.015071,0.090193,0.075122,12.863462,274.402906,0.893369,0.491918,...,0.059781,0.084279,0.015702,0.275862,0.007812,0.007812,0.007812,0.0,0.0,male
1,0.066009,0.06731,0.040229,0.019414,0.092666,0.073252,22.423285,634.613855,0.892193,0.513724,...,0.066009,0.107937,0.015826,0.25,0.009014,0.007812,0.054688,0.046875,0.052632,male
2,0.077316,0.083829,0.036718,0.008701,0.131908,0.123207,30.757155,1024.927705,0.846389,0.478905,...,0.077316,0.098706,0.015656,0.271186,0.00799,0.007812,0.015625,0.007812,0.046512,male
3,0.151228,0.072111,0.158011,0.096582,0.207955,0.111374,1.232831,4.177296,0.963322,0.727232,...,0.151228,0.088965,0.017798,0.25,0.201497,0.007812,0.5625,0.554688,0.247119,male
4,0.13512,0.079146,0.124656,0.07872,0.206045,0.127325,1.101174,4.333713,0.971955,0.783568,...,0.13512,0.106398,0.016931,0.266667,0.712812,0.007812,5.484375,5.476562,0.208274,male


**Step-3:** Considering the 'label' column as target variable , renaming the column as 'Gender_Identified'.

In [None]:
data.rename(columns = {'label':'Gender_Identified'}, inplace = True)
data.head()

Unnamed: 0,meanfreq,sd,median,Q25,Q75,IQR,skew,kurt,sp.ent,sfm,...,centroid,meanfun,minfun,maxfun,meandom,mindom,maxdom,dfrange,modindx,Gender_Identified
0,0.059781,0.064241,0.032027,0.015071,0.090193,0.075122,12.863462,274.402906,0.893369,0.491918,...,0.059781,0.084279,0.015702,0.275862,0.007812,0.007812,0.007812,0.0,0.0,male
1,0.066009,0.06731,0.040229,0.019414,0.092666,0.073252,22.423285,634.613855,0.892193,0.513724,...,0.066009,0.107937,0.015826,0.25,0.009014,0.007812,0.054688,0.046875,0.052632,male
2,0.077316,0.083829,0.036718,0.008701,0.131908,0.123207,30.757155,1024.927705,0.846389,0.478905,...,0.077316,0.098706,0.015656,0.271186,0.00799,0.007812,0.015625,0.007812,0.046512,male
3,0.151228,0.072111,0.158011,0.096582,0.207955,0.111374,1.232831,4.177296,0.963322,0.727232,...,0.151228,0.088965,0.017798,0.25,0.201497,0.007812,0.5625,0.554688,0.247119,male
4,0.13512,0.079146,0.124656,0.07872,0.206045,0.127325,1.101174,4.333713,0.971955,0.783568,...,0.13512,0.106398,0.016931,0.266667,0.712812,0.007812,5.484375,5.476562,0.208274,male


**Step-4:** identifying the null values.

In [None]:
data.isnull().sum()

meanfreq             0
sd                   0
median               0
Q25                  0
Q75                  0
IQR                  0
skew                 0
kurt                 0
sp.ent               0
sfm                  0
mode                 0
centroid             0
meanfun              0
minfun               0
maxfun               0
meandom              0
mindom               0
maxdom               0
dfrange              0
modindx              0
Gender_Identified    0
dtype: int64

**Step-5:** Using the preprocessing() function labeling the target column.

In [None]:
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
data.Gender_Identified = le.fit_transform(data['Gender_Identified'])

In [None]:
data.sample(10)

Unnamed: 0,meanfreq,sd,median,Q25,Q75,IQR,skew,kurt,sp.ent,sfm,...,centroid,meanfun,minfun,maxfun,meandom,mindom,maxdom,dfrange,modindx,Gender_Identified
2264,0.21817,0.040306,0.22,0.206122,0.240408,0.034286,3.019389,12.813422,0.826914,0.215681,...,0.21817,0.197125,0.048144,0.27907,1.125499,0.023438,11.320312,11.296875,0.080103,0
2302,0.206964,0.055695,0.216829,0.195488,0.239451,0.043963,1.788383,5.904736,0.893281,0.365547,...,0.206964,0.181608,0.048583,0.277457,0.930804,0.09375,12.023438,11.929688,0.056189,0
1376,0.197676,0.056322,0.218584,0.147979,0.247599,0.09962,1.476049,5.256377,0.899415,0.323876,...,0.197676,0.124853,0.047151,0.27907,0.530887,0.023438,3.632812,3.609375,0.105195,1
671,0.156121,0.063649,0.176965,0.095662,0.194816,0.099155,3.270269,16.45823,0.887308,0.389775,...,0.156121,0.099671,0.040486,0.25641,0.596977,0.102539,0.791016,0.688477,0.499645,1
1465,0.182457,0.059544,0.184897,0.136138,0.226897,0.090759,1.328761,5.231848,0.923035,0.419386,...,0.182457,0.113367,0.047291,0.27907,0.867821,0.023438,6.421875,6.398438,0.120536,1
2132,0.191832,0.042054,0.194731,0.184405,0.208894,0.024489,3.182206,14.47612,0.853694,0.324583,...,0.191832,0.179635,0.065574,0.253968,1.150065,0.101562,6.859375,6.757812,0.295049,0
2155,0.187345,0.031208,0.186529,0.179118,0.198471,0.019353,4.114415,24.880305,0.806934,0.23016,...,0.187345,0.151647,0.016145,0.266667,1.189453,0.179688,6.578125,6.398438,0.270655,0
113,0.169353,0.068168,0.142837,0.115805,0.239285,0.12348,2.038745,7.546507,0.921624,0.48264,...,0.169353,0.11109,0.023256,0.181818,0.917969,0.007812,5.59375,5.585938,0.239254,1
164,0.171829,0.071933,0.180107,0.099059,0.239643,0.140584,19.305794,514.178305,0.926782,0.528599,...,0.171829,0.08987,0.015905,0.275862,0.078804,0.070312,0.25,0.179688,0.083092,1
2027,0.119572,0.095837,0.110706,0.012263,0.211533,0.19927,30.054855,999.054995,0.812738,0.448048,...,0.119572,0.169795,0.016227,0.258065,0.007812,0.007812,0.007812,0.0,0.0,0


**Step-6:** Separating the feature vectors and the target variable.

In [None]:
X=data.drop(['Gender_Identified'],axis=1)
y=data.Gender_Identified

In [None]:
X.head()

Unnamed: 0,meanfreq,sd,median,Q25,Q75,IQR,skew,kurt,sp.ent,sfm,mode,centroid,meanfun,minfun,maxfun,meandom,mindom,maxdom,dfrange,modindx
0,0.059781,0.064241,0.032027,0.015071,0.090193,0.075122,12.863462,274.402906,0.893369,0.491918,0.0,0.059781,0.084279,0.015702,0.275862,0.007812,0.007812,0.007812,0.0,0.0
1,0.066009,0.06731,0.040229,0.019414,0.092666,0.073252,22.423285,634.613855,0.892193,0.513724,0.0,0.066009,0.107937,0.015826,0.25,0.009014,0.007812,0.054688,0.046875,0.052632
2,0.077316,0.083829,0.036718,0.008701,0.131908,0.123207,30.757155,1024.927705,0.846389,0.478905,0.0,0.077316,0.098706,0.015656,0.271186,0.00799,0.007812,0.015625,0.007812,0.046512
3,0.151228,0.072111,0.158011,0.096582,0.207955,0.111374,1.232831,4.177296,0.963322,0.727232,0.083878,0.151228,0.088965,0.017798,0.25,0.201497,0.007812,0.5625,0.554688,0.247119
4,0.13512,0.079146,0.124656,0.07872,0.206045,0.127325,1.101174,4.333713,0.971955,0.783568,0.104261,0.13512,0.106398,0.016931,0.266667,0.712812,0.007812,5.484375,5.476562,0.208274


In [None]:
y.head()

0    1
1    1
2    1
3    1
4    1
Name: Gender_Identified, dtype: int64

**Step-7** Building a RandomForestClassifier model and finding the best parameters using Grid search.

In [None]:
params = { "criterion" : ["gini", "entropy"], "n_estimators": [100, 150, 200,300]}
rf_gsv=GridSearchCV(estimator=RandomForestClassifier(),param_grid=params,cv=3,scoring='accuracy')
rf_gsv.fit(X,y)

In [None]:
pd.DataFrame(rf_gsv.cv_results_).sort_values('rank_test_score')

Unnamed: 0,mean_fit_time,std_fit_time,mean_score_time,std_score_time,param_criterion,param_n_estimators,params,split0_test_score,split1_test_score,split2_test_score,mean_test_score,std_test_score,rank_test_score
4,1.18025,0.080899,0.024733,0.003156,entropy,100,"{'criterion': 'entropy', 'n_estimators': 100}",0.947917,0.982955,0.969697,0.966856,0.014445,1
6,1.36819,0.334688,0.024658,0.000513,entropy,200,"{'criterion': 'entropy', 'n_estimators': 200}",0.946023,0.982955,0.970644,0.96654,0.015354,2
3,1.737101,0.03751,0.045451,0.010404,gini,300,"{'criterion': 'gini', 'n_estimators': 300}",0.942235,0.982008,0.973485,0.965909,0.017098,3
5,1.333313,0.323902,0.027292,0.006674,entropy,150,"{'criterion': 'entropy', 'n_estimators': 150}",0.943182,0.982008,0.969697,0.964962,0.0162,4
2,1.106227,0.047773,0.025154,0.000337,gini,200,"{'criterion': 'gini', 'n_estimators': 200}",0.941288,0.982008,0.970644,0.964646,0.017156,5
1,0.980231,0.138713,0.024023,0.006419,gini,150,"{'criterion': 'gini', 'n_estimators': 150}",0.939394,0.981061,0.972538,0.964331,0.017973,6
7,2.661536,1.146583,0.036759,0.000174,entropy,300,"{'criterion': 'entropy', 'n_estimators': 300}",0.940341,0.982008,0.970644,0.964331,0.017586,6
0,0.651732,0.117702,0.019315,0.003853,gini,100,"{'criterion': 'gini', 'n_estimators': 100}",0.938447,0.981061,0.96875,0.962753,0.017906,8


**Step-8:** Printing the best parameters.

In [None]:
print("The best parameters are:")
rf_gsv.best_params_

The best parameters are:


{'criterion': 'entropy', 'n_estimators': 100}

**Step-9:** Printing the best estimator.

In [None]:
print("The best estimator is:")
rf_gsv.best_estimator_

The best estimator is:
