## Supervised Learning 
### Case Study - 4 (CS 33)

**Domain** – Media

**Focus** – optimize selection process


**Business challenge/requirement:**

Motion Studios is the largest Radio production house in Europe. Their total revenue $ 1B+. Company has launched a new reality show – "The Star RJ". The show is about finding a new Radio Jockey who will be the star presenter on upcoming shows.

In first round participants have to upload their voice clip online and the clip will be evaluated by experts for selection into the next round. There is a separate team in the first round for evaluation of male and female voice.


Response to the show is unprecedented and company is flooded with voice clips.

You as a ML expert have to classify the voice as either male/female so that first level of filtration is quicker.


### Importing libraries and loading data

In [95]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [96]:
df = pd.read_csv("datasets/voice-classification.csv")
df

Unnamed: 0,meanfreq,sd,median,Q25,Q75,IQR,skew,kurt,sp.ent,sfm,...,centroid,meanfun,minfun,maxfun,meandom,mindom,maxdom,dfrange,modindx,label
0,0.059781,0.064241,0.032027,0.015071,0.090193,0.075122,12.863462,274.402906,0.893369,0.491918,...,0.059781,0.084279,0.015702,0.275862,0.007812,0.007812,0.007812,0.000000,0.000000,male
1,0.066009,0.067310,0.040229,0.019414,0.092666,0.073252,22.423285,634.613855,0.892193,0.513724,...,0.066009,0.107937,0.015826,0.250000,0.009014,0.007812,0.054688,0.046875,0.052632,male
2,0.077316,0.083829,0.036718,0.008701,0.131908,0.123207,30.757155,1024.927705,0.846389,0.478905,...,0.077316,0.098706,0.015656,0.271186,0.007990,0.007812,0.015625,0.007812,0.046512,male
3,0.151228,0.072111,0.158011,0.096582,0.207955,0.111374,1.232831,4.177296,0.963322,0.727232,...,0.151228,0.088965,0.017798,0.250000,0.201497,0.007812,0.562500,0.554688,0.247119,male
4,0.135120,0.079146,0.124656,0.078720,0.206045,0.127325,1.101174,4.333713,0.971955,0.783568,...,0.135120,0.106398,0.016931,0.266667,0.712812,0.007812,5.484375,5.476562,0.208274,male
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3163,0.131884,0.084734,0.153707,0.049285,0.201144,0.151859,1.762129,6.630383,0.962934,0.763182,...,0.131884,0.182790,0.083770,0.262295,0.832899,0.007812,4.210938,4.203125,0.161929,female
3164,0.116221,0.089221,0.076758,0.042718,0.204911,0.162193,0.693730,2.503954,0.960716,0.709570,...,0.116221,0.188980,0.034409,0.275862,0.909856,0.039062,3.679688,3.640625,0.277897,female
3165,0.142056,0.095798,0.183731,0.033424,0.224360,0.190936,1.876502,6.604509,0.946854,0.654196,...,0.142056,0.209918,0.039506,0.275862,0.494271,0.007812,2.937500,2.929688,0.194759,female
3166,0.143659,0.090628,0.184976,0.043508,0.219943,0.176435,1.591065,5.388298,0.950436,0.675470,...,0.143659,0.172375,0.034483,0.250000,0.791360,0.007812,3.593750,3.585938,0.311002,female


In [97]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3168 entries, 0 to 3167
Data columns (total 21 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   meanfreq  3168 non-null   float64
 1   sd        3168 non-null   float64
 2   median    3168 non-null   float64
 3   Q25       3168 non-null   float64
 4   Q75       3168 non-null   float64
 5   IQR       3168 non-null   float64
 6   skew      3168 non-null   float64
 7   kurt      3168 non-null   float64
 8   sp.ent    3168 non-null   float64
 9   sfm       3168 non-null   float64
 10  mode      3168 non-null   float64
 11  centroid  3168 non-null   float64
 12  meanfun   3168 non-null   float64
 13  minfun    3168 non-null   float64
 14  maxfun    3168 non-null   float64
 15  meandom   3168 non-null   float64
 16  mindom    3168 non-null   float64
 17  maxdom    3168 non-null   float64
 18  dfrange   3168 non-null   float64
 19  modindx   3168 non-null   float64
 20  label     3168 non-null   obje

In [98]:
df.isnull().sum()

meanfreq    0
sd          0
median      0
Q25         0
Q75         0
IQR         0
skew        0
kurt        0
sp.ent      0
sfm         0
mode        0
centroid    0
meanfun     0
minfun      0
maxfun      0
meandom     0
mindom      0
maxdom      0
dfrange     0
modindx     0
label       0
dtype: int64

In [99]:
df.shape

(3168, 21)

In [100]:
print("Total number of males: {}.".format(df[df.label=='male'].shape[0]))

Total number of males: 1584.


In [101]:
print("Total number of females: {}.".format(df[df.label=='female'].shape[0]))

Total number of females: 1584.


### Creating variable X

In [102]:
x = df.iloc[:,:-1]
x.shape

(3168, 20)

In [103]:
display(x)

Unnamed: 0,meanfreq,sd,median,Q25,Q75,IQR,skew,kurt,sp.ent,sfm,mode,centroid,meanfun,minfun,maxfun,meandom,mindom,maxdom,dfrange,modindx
0,0.059781,0.064241,0.032027,0.015071,0.090193,0.075122,12.863462,274.402906,0.893369,0.491918,0.000000,0.059781,0.084279,0.015702,0.275862,0.007812,0.007812,0.007812,0.000000,0.000000
1,0.066009,0.067310,0.040229,0.019414,0.092666,0.073252,22.423285,634.613855,0.892193,0.513724,0.000000,0.066009,0.107937,0.015826,0.250000,0.009014,0.007812,0.054688,0.046875,0.052632
2,0.077316,0.083829,0.036718,0.008701,0.131908,0.123207,30.757155,1024.927705,0.846389,0.478905,0.000000,0.077316,0.098706,0.015656,0.271186,0.007990,0.007812,0.015625,0.007812,0.046512
3,0.151228,0.072111,0.158011,0.096582,0.207955,0.111374,1.232831,4.177296,0.963322,0.727232,0.083878,0.151228,0.088965,0.017798,0.250000,0.201497,0.007812,0.562500,0.554688,0.247119
4,0.135120,0.079146,0.124656,0.078720,0.206045,0.127325,1.101174,4.333713,0.971955,0.783568,0.104261,0.135120,0.106398,0.016931,0.266667,0.712812,0.007812,5.484375,5.476562,0.208274
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3163,0.131884,0.084734,0.153707,0.049285,0.201144,0.151859,1.762129,6.630383,0.962934,0.763182,0.200836,0.131884,0.182790,0.083770,0.262295,0.832899,0.007812,4.210938,4.203125,0.161929
3164,0.116221,0.089221,0.076758,0.042718,0.204911,0.162193,0.693730,2.503954,0.960716,0.709570,0.013683,0.116221,0.188980,0.034409,0.275862,0.909856,0.039062,3.679688,3.640625,0.277897
3165,0.142056,0.095798,0.183731,0.033424,0.224360,0.190936,1.876502,6.604509,0.946854,0.654196,0.008006,0.142056,0.209918,0.039506,0.275862,0.494271,0.007812,2.937500,2.929688,0.194759
3166,0.143659,0.090628,0.184976,0.043508,0.219943,0.176435,1.591065,5.388298,0.950436,0.675470,0.212202,0.143659,0.172375,0.034483,0.250000,0.791360,0.007812,3.593750,3.585938,0.311002


### Creating variable Y

In [104]:
y = df.iloc[:,-1]
y.shape

(3168,)

In [105]:
display(y)

0         male
1         male
2         male
3         male
4         male
         ...  
3163    female
3164    female
3165    female
3166    female
3167    female
Name: label, Length: 3168, dtype: object

### Label Encoding

In [106]:
from sklearn.preprocessing import LabelEncoder
gen_coder = LabelEncoder()
y = gen_coder.fit_transform(y)

y

array([1, 1, 1, ..., 0, 0, 0])

### Standardization

In [107]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(x)

x = scaler.transform(x)

### Train-Test Split

In [108]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.3, random_state=100)

---

### SVM

In [109]:
from sklearn.svm import SVC
from sklearn import metrics
from sklearn.metrics import classification_report,confusion_matrix

svc_model=SVC()

svc_model.fit(X_train,y_train)

y_pred=svc_model.predict(X_test)

print(y_pred)

[1 1 1 0 0 0 0 0 1 1 1 0 1 0 0 0 0 1 1 0 1 0 0 0 1 1 1 1 0 1 1 0 0 0 0 0 0
 0 1 1 0 1 1 1 0 0 1 0 0 1 1 0 1 0 0 1 0 1 1 1 1 0 1 0 0 0 0 1 0 1 0 0 1 1
 1 1 0 0 1 1 0 1 1 0 1 1 1 0 0 0 0 1 1 0 1 1 1 1 0 0 0 0 1 1 0 1 1 1 0 0 0
 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 0 0 0 0 0 1 0 0 0 1 1 1 0 1 0 0 0 1 0 0
 0 1 0 0 1 1 0 0 0 0 1 0 1 1 1 0 0 0 0 1 0 0 1 1 1 1 1 1 1 0 1 0 0 0 1 0 0
 1 0 0 0 0 0 1 0 0 1 0 1 0 1 0 1 0 0 1 1 1 1 1 0 0 0 0 0 0 0 1 0 1 0 1 1 1
 0 0 1 1 1 1 0 1 1 0 1 1 0 0 1 0 0 0 1 0 1 0 0 1 1 0 0 1 1 0 1 0 0 0 0 0 0
 1 1 1 1 1 0 0 1 1 1 0 1 1 0 0 0 0 1 1 0 0 1 0 0 0 1 0 1 0 1 0 0 0 0 0 0 0
 1 1 0 1 1 0 0 0 1 1 0 1 1 0 1 1 0 1 0 0 1 1 0 0 0 1 1 1 0 1 0 1 0 1 1 1 1
 0 1 0 0 0 1 1 0 1 1 1 0 1 1 0 0 0 1 1 1 0 0 1 1 0 1 0 1 1 0 0 0 1 1 0 0 0
 0 1 0 0 1 0 1 0 0 0 1 1 1 1 0 1 1 1 0 1 1 0 1 1 0 1 0 1 1 0 0 0 1 1 0 1 0
 0 0 0 1 1 1 1 0 0 1 0 0 0 0 0 0 1 1 0 0 0 1 0 1 1 1 1 0 0 0 0 0 1 1 1 1 1
 0 1 1 1 0 0 1 0 0 1 0 0 0 1 0 1 1 0 1 0 1 0 0 1 0 1 0 1 1 0 0 1 1 0 1 0 1
 1 0 1 0 1 1 0 0 1 0 0 1 

### Accuracy Score

In [110]:
print("Accuracy Score: {}".format(metrics.accuracy_score(y_test, y_pred)))

Accuracy Score: 0.9737118822292324


### Confusion Matrix

In [111]:
print("Confusion matrix:\n{}".format(confusion_matrix(y_test, y_pred)))

Confusion matrix:
[[458  13]
 [ 12 468]]


### Classification Report

In [112]:
print("Classification report:\n{}".format(classification_report(y_test, y_pred)))

Classification report:
              precision    recall  f1-score   support

           0       0.97      0.97      0.97       471
           1       0.97      0.97      0.97       480

    accuracy                           0.97       951
   macro avg       0.97      0.97      0.97       951
weighted avg       0.97      0.97      0.97       951



---

### Grid Search CV

In [113]:
from sklearn.model_selection import GridSearchCV

param_grid = {'C': [0.1,1, 10, 100], 'gamma': [1, 0.1, 0.01, 0.001]}

grid = GridSearchCV(SVC(),param_grid,refit=True,verbose=2)

grid.fit(X_train,y_train)

Fitting 5 folds for each of 16 candidates, totalling 80 fits
[CV] END .....................................C=0.1, gamma=1; total time=   0.1s
[CV] END .....................................C=0.1, gamma=1; total time=   0.1s
[CV] END .....................................C=0.1, gamma=1; total time=   0.1s
[CV] END .....................................C=0.1, gamma=1; total time=   0.1s
[CV] END .....................................C=0.1, gamma=1; total time=   0.1s
[CV] END ...................................C=0.1, gamma=0.1; total time=   0.0s
[CV] END ...................................C=0.1, gamma=0.1; total time=   0.0s
[CV] END ...................................C=0.1, gamma=0.1; total time=   0.0s
[CV] END ...................................C=0.1, gamma=0.1; total time=   0.0s
[CV] END ...................................C=0.1, gamma=0.1; total time=   0.0s
[CV] END ..................................C=0.1, gamma=0.01; total time=   0.0s
[CV] END ..................................C=0.1

GridSearchCV(estimator=SVC(),
             param_grid={'C': [0.1, 1, 10, 100],
                         'gamma': [1, 0.1, 0.01, 0.001]},
             verbose=2)

### Best Parameter

In [114]:
print("Best parameter is: {}".format(grid.best_params_))

Best parameter is: {'C': 1, 'gamma': 0.1}


### Predicting

In [115]:
print("Predictions:\n{}".format(grid.predict(X_test)))

Predictions:
[1 1 1 0 0 0 0 0 1 1 1 0 1 0 0 0 0 1 1 0 1 0 0 0 1 1 1 1 0 1 1 0 0 0 0 0 0
 0 1 1 0 1 1 1 0 0 1 0 0 1 1 0 1 0 0 1 0 1 1 1 1 0 1 0 0 0 0 1 0 1 0 0 1 1
 1 1 0 0 1 1 0 1 1 0 1 1 1 0 0 0 0 1 1 0 1 1 1 1 0 0 0 0 1 1 0 1 1 1 0 0 0
 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 0 0 0 0 0 1 0 0 0 1 1 1 0 1 0 0 0 1 0 0
 0 1 0 0 1 1 0 0 0 0 1 0 1 1 1 0 0 0 0 1 0 0 0 1 1 1 1 1 1 0 1 0 0 0 1 0 0
 1 0 0 0 0 0 1 0 0 1 0 1 0 1 0 1 0 0 1 1 1 1 1 0 0 0 0 0 0 0 1 0 1 0 1 1 1
 0 0 1 1 1 1 0 1 1 0 1 1 0 0 1 0 0 0 1 0 1 0 0 1 1 0 0 1 1 0 1 0 0 0 0 0 0
 1 1 1 1 1 0 0 1 1 1 0 1 1 0 0 0 0 1 1 0 0 1 0 0 0 1 0 1 0 1 0 0 0 0 0 0 0
 1 1 0 1 1 0 0 0 1 1 0 1 1 0 1 1 0 1 0 0 1 1 0 0 0 1 1 1 0 1 0 1 0 1 1 1 1
 0 1 0 0 0 1 1 0 1 1 1 0 1 1 0 0 0 1 1 1 0 0 1 1 0 1 0 1 1 0 0 0 1 1 0 0 0
 0 1 0 0 1 0 1 0 0 0 1 1 1 1 0 1 1 1 0 1 1 0 1 1 0 1 0 1 1 0 0 0 1 1 0 1 0
 0 0 0 1 1 1 1 0 0 1 0 0 0 0 0 0 1 1 0 0 0 1 0 1 1 1 1 0 0 0 0 0 1 1 1 1 1
 0 1 1 1 0 0 1 0 0 1 0 0 0 1 0 1 1 0 1 0 1 0 0 1 0 1 0 1 1 0 0 1 1 0 1 0 1
 1 0 1 0 1 1

### Accuracy Score

In [116]:
print("Accuracy score: {}".format(metrics.accuracy_score(y_test, grid.predict(X_test))))

Accuracy score: 0.9747634069400631


### Confusion matrix

In [117]:
print("Confustion matrix:\n{}".format(confusion_matrix(y_test, grid.predict(X_test))))

Confustion matrix:
[[459  12]
 [ 12 468]]


### Classification reports

In [118]:
print("Classification report:\n{}".format(classification_report(y_test, grid.predict(X_test))))

Classification report:
              precision    recall  f1-score   support

           0       0.97      0.97      0.97       471
           1       0.97      0.97      0.97       480

    accuracy                           0.97       951
   macro avg       0.97      0.97      0.97       951
weighted avg       0.97      0.97      0.97       951



---

### Naive Bayce GaussianNB

In [119]:
from sklearn.naive_bayes import GaussianNB

GNB= GaussianNB()
GNB.fit(X_train,y_train)
GNB_pred = GNB.predict(X_test)

print("Predictions:\n{}".format(GNB_pred))

Predictions:
[1 0 1 0 0 0 0 0 1 1 1 0 1 0 0 1 0 1 1 0 1 0 0 0 0 1 1 1 0 1 1 0 0 0 0 1 0
 0 1 1 0 1 1 1 0 0 1 0 0 1 1 0 0 0 0 1 0 1 1 1 1 0 1 0 0 0 1 1 0 1 0 1 1 1
 1 1 0 0 1 0 0 1 1 0 1 0 1 0 0 0 0 1 1 0 1 1 1 1 0 0 0 0 1 1 0 1 0 1 0 0 0
 0 0 1 1 1 1 1 1 1 0 1 1 1 0 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 0 1 0 0 0 1 1 1
 0 1 0 0 1 1 0 0 0 0 1 0 1 1 1 0 0 1 0 1 0 0 1 0 1 1 1 1 1 0 1 0 0 0 0 0 0
 1 0 0 0 0 0 1 0 0 0 1 1 0 1 0 1 0 0 1 1 1 1 1 0 0 0 0 0 0 1 1 0 1 0 0 1 1
 0 0 1 0 0 1 1 1 1 1 1 1 0 0 1 0 0 0 1 0 1 0 0 1 1 0 0 1 1 0 1 0 0 0 0 0 0
 1 1 1 0 1 1 0 1 1 1 0 1 1 0 0 0 0 1 1 0 0 1 0 0 0 1 0 1 0 1 0 0 1 0 1 1 0
 1 1 0 1 1 0 0 0 1 1 0 1 0 0 1 1 0 1 0 0 1 1 0 0 0 1 1 1 0 1 1 1 0 1 1 1 1
 0 1 0 0 0 1 1 0 0 1 1 0 1 0 0 1 0 1 1 0 1 0 1 1 0 1 0 0 1 0 0 0 1 1 0 0 0
 0 0 0 0 1 0 1 0 0 0 0 1 1 0 0 1 1 1 0 1 1 0 1 0 0 1 0 1 1 1 0 0 1 1 0 1 0
 0 0 0 1 1 1 1 1 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 1 1 0 0 0 0 1 1 1 1 1 1
 0 1 1 1 0 0 1 0 1 1 0 0 0 1 1 1 1 0 0 0 1 0 0 1 0 1 0 1 1 0 0 0 1 0 1 0 1
 1 1 1 0 1 1

In [120]:
print(pd.DataFrame(GNB_pred).groupby(0).agg({0:np.size}))

     0
0     
0  465
1  486


In [121]:
print("Accuracy Score: {}".format(metrics.accuracy_score(y_test,GNB_pred)))

Accuracy Score: 0.8696109358569927


In [122]:
print("Classification report:\n{}".format(classification_report (y_test,GNB_pred)))

Classification report:
              precision    recall  f1-score   support

           0       0.87      0.86      0.87       471
           1       0.87      0.88      0.87       480

    accuracy                           0.87       951
   macro avg       0.87      0.87      0.87       951
weighted avg       0.87      0.87      0.87       951



In [123]:
print("Confusion matrix:\n{}".format(confusion_matrix (y_test,GNB_pred)))

Confusion matrix:
[[406  65]
 [ 59 421]]


---

### Applying Grid Search CV for Naive Bayce

In [138]:
param_grid2 = {'var_smoothing': np.logspace(0,-9, num=100)} 

grid = GridSearchCV(GaussianNB(), param_grid2, refit=True,verbose=2)
grid.fit(X_train,y_train)

Fitting 5 folds for each of 100 candidates, totalling 500 fits
[CV] END ..................................var_smoothing=1.0; total time=   0.0s
[CV] END ..................................var_smoothing=1.0; total time=   0.0s
[CV] END ..................................var_smoothing=1.0; total time=   0.0s
[CV] END ..................................var_smoothing=1.0; total time=   0.0s
[CV] END ..................................var_smoothing=1.0; total time=   0.0s
[CV] END ...................var_smoothing=0.8111308307896871; total time=   0.0s
[CV] END ...................var_smoothing=0.8111308307896871; total time=   0.0s
[CV] END ...................var_smoothing=0.8111308307896871; total time=   0.0s
[CV] END ...................var_smoothing=0.8111308307896871; total time=   0.0s
[CV] END ...................var_smoothing=0.8111308307896871; total time=   0.0s
[CV] END ....................var_smoothing=0.657933224657568; total time=   0.0s
[CV] END ....................var_smoothing=0.6

[CV] END .................var_smoothing=0.008111308307896872; total time=   0.0s
[CV] END .................var_smoothing=0.008111308307896872; total time=   0.0s
[CV] END .................var_smoothing=0.008111308307896872; total time=   0.0s
[CV] END .................var_smoothing=0.008111308307896872; total time=   0.0s
[CV] END .................var_smoothing=0.006579332246575682; total time=   0.0s
[CV] END .................var_smoothing=0.006579332246575682; total time=   0.0s
[CV] END .................var_smoothing=0.006579332246575682; total time=   0.0s
[CV] END .................var_smoothing=0.006579332246575682; total time=   0.0s
[CV] END .................var_smoothing=0.006579332246575682; total time=   0.0s
[CV] END .................var_smoothing=0.005336699231206307; total time=   0.0s
[CV] END .................var_smoothing=0.005336699231206307; total time=   0.0s
[CV] END .................var_smoothing=0.005336699231206307; total time=   0.0s
[CV] END .................va

[CV] END ................var_smoothing=4.328761281083062e-05; total time=   0.0s
[CV] END ................var_smoothing=4.328761281083062e-05; total time=   0.0s
[CV] END ................var_smoothing=4.328761281083062e-05; total time=   0.0s
[CV] END ................var_smoothing=4.328761281083062e-05; total time=   0.0s
[CV] END ................var_smoothing=4.328761281083062e-05; total time=   0.0s
[CV] END ................var_smoothing=3.511191734215127e-05; total time=   0.0s
[CV] END ................var_smoothing=3.511191734215127e-05; total time=   0.0s
[CV] END ................var_smoothing=3.511191734215127e-05; total time=   0.0s
[CV] END ................var_smoothing=3.511191734215127e-05; total time=   0.0s
[CV] END ................var_smoothing=3.511191734215127e-05; total time=   0.0s
[CV] END ................var_smoothing=2.848035868435799e-05; total time=   0.0s
[CV] END ................var_smoothing=2.848035868435799e-05; total time=   0.0s
[CV] END ................var

[CV] END ................var_smoothing=1.873817422860383e-07; total time=   0.0s
[CV] END ................var_smoothing=1.873817422860383e-07; total time=   0.0s
[CV] END ................var_smoothing=1.519911082952933e-07; total time=   0.0s
[CV] END ................var_smoothing=1.519911082952933e-07; total time=   0.0s
[CV] END ................var_smoothing=1.519911082952933e-07; total time=   0.0s
[CV] END ................var_smoothing=1.519911082952933e-07; total time=   0.0s
[CV] END ................var_smoothing=1.519911082952933e-07; total time=   0.0s
[CV] END ................var_smoothing=1.232846739442066e-07; total time=   0.0s
[CV] END ................var_smoothing=1.232846739442066e-07; total time=   0.0s
[CV] END ................var_smoothing=1.232846739442066e-07; total time=   0.0s
[CV] END ................var_smoothing=1.232846739442066e-07; total time=   0.0s
[CV] END ................var_smoothing=1.232846739442066e-07; total time=   0.0s
[CV] END ...................

[CV] END ................................var_smoothing=1e-09; total time=   0.0s
[CV] END ................................var_smoothing=1e-09; total time=   0.0s
[CV] END ................................var_smoothing=1e-09; total time=   0.0s
[CV] END ................................var_smoothing=1e-09; total time=   0.0s


GridSearchCV(estimator=GaussianNB(),
             param_grid={'var_smoothing': array([1.00000000e+00, 8.11130831e-01, 6.57933225e-01, 5.33669923e-01,
       4.32876128e-01, 3.51119173e-01, 2.84803587e-01, 2.31012970e-01,
       1.87381742e-01, 1.51991108e-01, 1.23284674e-01, 1.00000000e-01,
       8.11130831e-02, 6.57933225e-02, 5.33669923e-02, 4.32876128e-02,
       3.51119173e-02, 2.84803587e-02, 2.3101297...
       1.23284674e-07, 1.00000000e-07, 8.11130831e-08, 6.57933225e-08,
       5.33669923e-08, 4.32876128e-08, 3.51119173e-08, 2.84803587e-08,
       2.31012970e-08, 1.87381742e-08, 1.51991108e-08, 1.23284674e-08,
       1.00000000e-08, 8.11130831e-09, 6.57933225e-09, 5.33669923e-09,
       4.32876128e-09, 3.51119173e-09, 2.84803587e-09, 2.31012970e-09,
       1.87381742e-09, 1.51991108e-09, 1.23284674e-09, 1.00000000e-09])},
             verbose=2)

### Best parameter

In [139]:
best_params = grid.best_params_
print("Best parameter: {}".format(best_params))

Best parameter: {'var_smoothing': 0.0002848035868435802}


In [142]:
grid_predictions = grid.predict(X_test)
print("Predictions:\n{}".format(grid_predictions))

Predictions:
[1 0 1 0 0 0 0 0 1 1 1 0 1 0 0 1 0 1 1 0 1 0 0 0 0 1 1 1 0 1 1 0 0 0 0 1 0
 0 1 1 0 1 1 1 0 0 1 0 0 1 1 0 0 0 0 1 0 1 1 1 1 0 1 0 0 0 1 1 0 1 0 1 1 1
 1 1 0 0 1 0 0 1 1 0 1 0 1 0 0 0 0 1 1 0 1 1 1 1 0 0 0 0 1 1 0 1 0 1 0 0 0
 0 0 1 1 1 1 1 1 1 0 1 1 1 0 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 0 1 0 0 0 1 1 1
 0 1 0 0 1 1 0 0 0 0 1 0 1 1 1 0 0 1 0 1 0 0 1 0 1 1 1 1 1 0 1 0 0 0 0 0 0
 1 0 1 0 0 0 1 0 0 0 1 1 0 1 0 1 0 0 1 1 1 1 1 0 0 0 0 0 0 1 1 0 1 0 0 1 1
 0 0 1 0 0 1 1 1 1 1 1 1 0 0 1 0 0 0 1 0 1 0 0 1 1 0 0 1 1 0 1 0 0 0 0 0 0
 1 1 1 0 1 1 0 1 1 1 0 1 1 0 0 0 0 1 1 0 0 1 0 0 0 1 0 1 0 1 0 0 1 0 1 1 0
 1 1 0 1 1 0 0 0 1 1 0 1 0 0 1 1 0 1 0 0 1 1 0 0 0 1 1 1 0 1 1 1 0 1 1 1 1
 0 1 0 0 0 1 1 0 0 1 1 0 1 0 0 1 0 1 1 0 1 0 1 1 0 1 0 0 1 0 0 0 1 1 0 0 0
 0 0 0 0 1 0 1 0 0 0 0 1 1 0 0 1 1 1 0 1 1 0 1 0 0 1 0 1 1 1 0 0 1 1 0 1 0
 0 0 0 1 1 1 1 1 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 1 1 0 0 0 0 1 1 1 1 1 1
 0 1 1 1 0 0 1 0 1 1 0 0 0 1 1 1 1 0 0 0 1 0 0 1 0 1 0 1 1 0 0 0 1 0 1 0 1
 1 1 1 0 1 1

### Accuracy Score

In [143]:
print("Accuracy score: {}".format(metrics.accuracy_score(y_test, grid_predictions)))

Accuracy score: 0.868559411146162


### Classification report

In [144]:
print("Classification report:\n{}".format(classification_report(y_test, grid_predictions)))

Classification report:
              precision    recall  f1-score   support

           0       0.87      0.86      0.87       471
           1       0.86      0.88      0.87       480

    accuracy                           0.87       951
   macro avg       0.87      0.87      0.87       951
weighted avg       0.87      0.87      0.87       951



### Confusion matrix

In [146]:
print("Confusion matix:\n{}".format(confusion_matrix(y_test, grid_predictions)))

Confusion matix:
[[405  66]
 [ 59 421]]


## Conclusion

### SVM model accuracy = 97.3%

### Grid Search CV accuracy = 97.4%

### Naive Bayce GaussianNB accuracy = 86.961%

### Grid Search CV for Naive Bayce accuracy = 86.85%

---