# Instruction

In this part of the assignment, you will prepare the data to analyze the "meaningful votes" for the European Union Withdrawal Agreement and carry out a classification task. 

There were three attempts to pass a version of the withdrawal agreement (formed late 2018) in the House of Commons, but in all three attempts, the government led by Prime Minister Theresa May failed to pass. The failures were due to the large number of rebels among Conservative MPs. 

If you are not familiar with the story about it you can rely on the following sources:

- Aidt, T., Grey, F. & Savu, A. The Meaningful Votes: Voting on Brexit in the British House of Commons. *Public Choice* (2019).
  - https://link.springer.com/article/10.1007/s11127-019-00762-9
  - An academic article to analyze the situation
  - The analysis is similar to what you will do
- Wikipedia:
  - https://en.wikipedia.org/wiki/Parliamentary_votes_on_Brexit



There are three meaningful votes (see the links above) and the results are accessibe from here:

- Vote1: https://votes.parliament.uk/Votes/Commons/Division/562
- Vote2: https://votes.parliament.uk/Votes/Commons/Division/623
- Vote3: https://votes.parliament.uk/Votes/Commons/Division/664

I compiled the results of three meaningful votes, along with the [Revoke Article 50 and remain in the EU petition](https://petition.parliament.uk/archived/petitions/241584) (from Assignment 2), in a csv file.

## Your task

1. Get other datasets and merge them with the voting record data
2. Complete a machine learning task to predict rebels among Conservative MPs


In [1]:
import numpy as np
import pandas as pd


In [2]:
import matplotlib.pyplot as plt
import seaborn as sns

# Get the main data from the GV918 data repository (4 percent)

Get the data hosted on:
https://github.com/University-of-Essex-Dept-of-Government/GV918-UK-politics-data

- The parliamnetary votes as well as the petition outcomes are in `df_meaningful_vote.csv`


In [3]:
!git clone https://github.com/University-of-Essex-Dept-of-Government/GV918-UK-politics-data

fatal: destination path 'GV918-UK-politics-data' already exists and is not an empty directory.


In [4]:
df_meaningful_vote = pd.read_csv("/content/GV918-UK-politics-data/Data/df_meaningful_vote.csv")

In [5]:
df_meaningful_vote.head()

Unnamed: 0,index,MemberId,Name,Party,MemberFrom,vote1,vote2,vote3,ons_code,signature_count_241584
0,0,8,Theresa May,Conservative,Maidenhead,1.0,1.0,1.0,E14000803,13559
1,1,15,David Lidington,Conservative,Aylesbury,1.0,1.0,1.0,E14000538,10129
2,2,18,Cheryl Gillan,Conservative,Chesham and Amersham,1.0,1.0,1.0,E14000631,13543
3,3,55,Desmond Swayne,Conservative,New Forest West,1.0,1.0,1.0,E14000828,7920
4,4,69,Oliver Heald,Conservative,North East Hertfordshire,1.0,1.0,1.0,E14000845,10974


# Other data sources 

In this section, you will get the data from several sources and merge them with the main dataframe. 



## Referendum votes, general election data (3 percent)

In this section you will merge two additoional datasets. 

1. Election outcomes of 2017 (You can use the code below)
2. Constituency level referendum output (We used this data in the previous class)

Once you merge, create a new variable pf the number of petition signatures per electorate.

In [6]:
df_elec = pd.read_csv("http://researchbriefings.files.parliament.uk/documents/CBP-7979/HoC-GE2017-results-by-candidate.csv")
df_const = pd.read_csv("http://researchbriefings.files.parliament.uk/documents/CBP-7979/HoC-GE2017-constituency-results.csv")
df_const = df_const[['ons_id', 'electorate']]

df_merging_elec_const = pd.merge(df_const, df_elec)


In [7]:
df_merging_elec_const.rename(({'ons_id':'ons_code'}), axis = 1, inplace=True)
df_merging_elec_const.head()
df_columns_to_merge_1 = df_meaningful_vote.drop('ons_code', axis = 1)
df_columns_to_merge_2 = df_meaningful_vote.drop('signature_count_241584', axis = 1)
df_columns_to_merge_def = pd.merge(df_columns_to_merge_1, df_columns_to_merge_2)

df_1_merge = pd.merge(df_merging_elec_const, df_columns_to_merge_def)

## 3. MPs positions data (3 percent)

The last dataset to merge is MPs position for Brexit referendum. The data is coming from Aidt et al (2019) paper.

In [8]:
df_mp_positions = pd.read_csv("/content/GV918-UK-politics-data/Data/mp_positions-cleaned.csv")
df_mp_positions.head()

Unnamed: 0,Party,Name,MP vote for Brexit,Constituency
0,Con,Nigel Adams,Leave,Selby and Ainsty
1,Con,Bim Afolami,Remain,Hitchin and Harpenden
2,Con,Stuart Andrew,Leave,Pudsey
3,Con,Edward Argar,Remain,Charnwood
4,Con,Victoria Atkins,Remain,Louth and Horncastle


In [9]:
df_mp_positions.rename(({'Party': 'party_abbreviation'}), axis = 1, inplace=True)
df_mp_positions
df_main_dataset = pd.merge(df_mp_positions, df_1_merge)
                        
df_main_dataset.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)
df_main_dataset

Unnamed: 0,party_abbreviation,Name,MP vote for Brexit,Constituency,ons_code,electorate,ons_region_id,constituency_name,county_name,region_name,country_name,constituency_type,party_name,firstname,surname,gender,sitting_mp,former_mp,votes,share,change,index,MemberId,Party,MemberFrom,vote1,vote2,vote3,signature_count_241584
0,Con,Nigel Adams,Leave,Selby and Ainsty,E14000917,75918,E12000003,Selby and Ainsty,North Yorkshire,Yorkshire and The Humber,England,County,Conservative,Nigel,Adams,Male,Yes,Yes,32921,0.587078,0.062023,108,4057,Conservative,Selby and Ainsty,1.0,1.0,1.0,7303
1,Con,Bim Afolami,Remain,Hitchin and Harpenden,E14000749,75916,E12000006,Hitchin and Harpenden,Hertfordshire,East,England,County,Conservative,Bim,Afolami,Male,No,No,31189,0.530579,-0.038053,193,4639,Conservative,Hitchin and Harpenden,1.0,1.0,1.0,16696
2,Con,Stuart Andrew,Leave,Pudsey,E14000886,72622,E12000003,Pudsey,West Yorkshire,Yorkshire and The Humber,England,Borough,Conservative,Stuart,Andrew,Male,Yes,Yes,25550,0.473508,0.009373,99,4032,Conservative,Pudsey,1.0,1.0,1.0,10134
3,Con,Edward Argar,Remain,Charnwood,E14000625,78071,E12000004,Charnwood,Leicestershire,East Midlands,England,County,Conservative,Edward,Argar,Male,Yes,Yes,33318,0.603849,0.060729,141,4362,Conservative,Charnwood,1.0,1.0,1.0,6532
4,Con,Victoria Atkins,Remain,Louth and Horncastle,E14000798,79007,E12000004,Louth and Horncastle,Lincolnshire,East Midlands,England,County,Conservative,Victoria,Atkins,Female,Yes,Yes,33733,0.639234,0.127572,146,4399,Conservative,Louth and Horncastle,1.0,1.0,1.0,4293
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
594,DUP,Sammy Wilson,Leave,East Antrim,N06000005,62908,N92000002,East Antrim,Northern Ireland,Northern Ireland,Northern Ireland,County,Democratic Unionist Party,Sammy,Wilson,Male,Yes,Yes,21873,0.573447,0.212131,363,1593,Democratic Unionist Party,East Antrim,0.0,0.0,0.0,5278
595,SNP,Pete Wishart,Remain,Perth and North Perthshire,S14000054,71762,S92000003,Perth and North Perthshire,Scotland,Scotland,Scotland,County,Scottish National Party,Pete,Wishart,Male,Yes,Yes,21804,0.423173,-0.081974,312,1440,Scottish National Party,Perth and North Perthshire,0.0,0.0,0.0,9877
596,Con,Sarah Wollaston,Remain,Totnes,E14001001,68914,E12000009,Totnes,Devon,South West,England,County,Conservative,Sarah,Wollaston,Female,Yes,Yes,26972,0.536543,0.006976,420,4073,Conservative,Totnes,0.0,0.0,0.0,10948
597,Lab,Mohammad Yasin,Remain,Bedford,E14000552,71829,E12000006,Bedford,Bedfordshire,East,England,Borough,Labour,Mohammad,Yasin,Male,No,No,22712,0.468482,0.066451,568,4598,Labour,Bedford,0.0,0.0,0.0,7924


# Machine learning (25 percent)

Using the dataset you have prepared, run the classification problem below:

- Data: Conservative MPs meaningful votes 
- Output: Rebellion in the meaningful motes
  - Rebel = Conservative MP who voted no (if you don't understand the logic, refer to Aidt et al (2019))
- You can choose input but at least you should include
  - Per electorate signature for the petition
  - MPs position in the referendum
  - Referendum outcomes at the constituency
  - Electoral strength measured by the percentage of votes


## ML procedures

You need to take the following steps:

1. Train-test split
2. Data wrangling (including standardization)
3. Model fitting
  - Run multiple algorithms. Explain the model choice (i.e. why you think the algorithm is worth trying)
  - Carry out parameter tuning
4. Evaluate/compare models
  - How is the performance of different algorithms?
5. Summarise finding and provide some discussion in writing (300 words or more). The discussion can include: 
    - Which algorism worked the best?
    - Which meaningful vote the model explain the most?


In [10]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import f1_score, make_scorer
f1 = make_scorer(f1_score, average = 'binary', pos_label = 1)
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import AdaBoostClassifier
from sklearn.svm import SVC
svcmod = SVC(gamma='auto')

#Train Test Split

In [11]:
df_def_data_set = df_main_dataset.replace({"Leave":1, "Remain":0, "No declared":0})
df_def_data_set_DEF = df_def_data_set[df_def_data_set["MP vote for Brexit"] != 'No declared']
df_def_data_set_DEF

Unnamed: 0,party_abbreviation,Name,MP vote for Brexit,Constituency,ons_code,electorate,ons_region_id,constituency_name,county_name,region_name,country_name,constituency_type,party_name,firstname,surname,gender,sitting_mp,former_mp,votes,share,change,index,MemberId,Party,MemberFrom,vote1,vote2,vote3,signature_count_241584
0,Con,Nigel Adams,1,Selby and Ainsty,E14000917,75918,E12000003,Selby and Ainsty,North Yorkshire,Yorkshire and The Humber,England,County,Conservative,Nigel,Adams,Male,Yes,Yes,32921,0.587078,0.062023,108,4057,Conservative,Selby and Ainsty,1.0,1.0,1.0,7303
1,Con,Bim Afolami,0,Hitchin and Harpenden,E14000749,75916,E12000006,Hitchin and Harpenden,Hertfordshire,East,England,County,Conservative,Bim,Afolami,Male,No,No,31189,0.530579,-0.038053,193,4639,Conservative,Hitchin and Harpenden,1.0,1.0,1.0,16696
2,Con,Stuart Andrew,1,Pudsey,E14000886,72622,E12000003,Pudsey,West Yorkshire,Yorkshire and The Humber,England,Borough,Conservative,Stuart,Andrew,Male,Yes,Yes,25550,0.473508,0.009373,99,4032,Conservative,Pudsey,1.0,1.0,1.0,10134
3,Con,Edward Argar,0,Charnwood,E14000625,78071,E12000004,Charnwood,Leicestershire,East Midlands,England,County,Conservative,Edward,Argar,Male,Yes,Yes,33318,0.603849,0.060729,141,4362,Conservative,Charnwood,1.0,1.0,1.0,6532
4,Con,Victoria Atkins,0,Louth and Horncastle,E14000798,79007,E12000004,Louth and Horncastle,Lincolnshire,East Midlands,England,County,Conservative,Victoria,Atkins,Female,Yes,Yes,33733,0.639234,0.127572,146,4399,Conservative,Louth and Horncastle,1.0,1.0,1.0,4293
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
594,DUP,Sammy Wilson,1,East Antrim,N06000005,62908,N92000002,East Antrim,Northern Ireland,Northern Ireland,Northern Ireland,County,Democratic Unionist Party,Sammy,Wilson,Male,Yes,Yes,21873,0.573447,0.212131,363,1593,Democratic Unionist Party,East Antrim,0.0,0.0,0.0,5278
595,SNP,Pete Wishart,0,Perth and North Perthshire,S14000054,71762,S92000003,Perth and North Perthshire,Scotland,Scotland,Scotland,County,Scottish National Party,Pete,Wishart,Male,Yes,Yes,21804,0.423173,-0.081974,312,1440,Scottish National Party,Perth and North Perthshire,0.0,0.0,0.0,9877
596,Con,Sarah Wollaston,0,Totnes,E14001001,68914,E12000009,Totnes,Devon,South West,England,County,Conservative,Sarah,Wollaston,Female,Yes,Yes,26972,0.536543,0.006976,420,4073,Conservative,Totnes,0.0,0.0,0.0,10948
597,Lab,Mohammad Yasin,0,Bedford,E14000552,71829,E12000006,Bedford,Bedfordshire,East,England,Borough,Labour,Mohammad,Yasin,Male,No,No,22712,0.468482,0.066451,568,4598,Labour,Bedford,0.0,0.0,0.0,7924


In [12]:
X = df_def_data_set.loc[df_main_dataset['party_name'] =='Conservative'][["signature_count_241584", "votes", "share", "electorate", "change", "MP vote for Brexit"]]
Y = df_def_data_set.loc[df_main_dataset['party_name'] =='Conservative'][["vote3"]]

In [13]:
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.3, random_state=101)

In [14]:
Y_test = Y_test.dropna()
#Y_train = Y_train.dropna()
for i in Y_train.index:
  if(Y_train['vote3'][i] != 1.0 and Y_train['vote3'][i] != 0.0):
    Y_train['vote3'][i] = 0.0
"""
for i in Y_train.index:
  print(Y_train['vote3'][i])

Y_train = Y_train.dropna()
print(len(Y_train))
"""

"\nfor i in Y_train.index:\n  print(Y_train['vote3'][i])\n\nY_train = Y_train.dropna()\nprint(len(Y_train))\n"

Standarizing Data

In [15]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()

In [16]:
scaler.fit(X_train)
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

#Train Model

In [17]:
from sklearn.linear_model import LinearRegression
model_lm = LinearRegression()

Estimating Model

In [18]:
from sklearn.linear_model import LogisticRegression
logitmod = LogisticRegression()

In [19]:
logitmod.fit(X_train, Y_train)

  y = column_or_1d(y, warn=True)


LogisticRegression()

In [20]:
pred_test = logitmod.predict(X_test)

In [21]:
from sklearn.metrics import classification_report, confusion_matrix

In [22]:
confusion_matrix(Y_test, pred_test)

array([[ 0,  8],
       [ 0, 83]])

In [23]:
df_logitmode = pd.DataFrame(logitmod.coef_, columns = X.columns)
df_logitmode

Unnamed: 0,signature_count_241584,votes,share,electorate,change,MP vote for Brexit
0,0.221354,-0.102249,0.077499,0.207998,0.451745,-0.719913


#Model Evaluations

In [24]:
from sklearn.linear_model import LinearRegression
model_lm = LinearRegression()
X_train_scaled = scaler.transform(X_train)
X_test_scaled = scaler.transform(X_test)
model_lm.fit(X_train_scaled, Y_train)
pred_train = model_lm.predict(X_train_scaled)

  "X does not have valid feature names, but"
  "X does not have valid feature names, but"


In [25]:
from sklearn.metrics import mean_absolute_error, mean_squared_error

In [26]:
pred_test = model_lm.predict(X_test_scaled)

In [27]:
MAE = mean_absolute_error(Y_test, pred_test)
MSE = mean_squared_error(Y_test, pred_test)
RMSE = np.sqrt(mean_squared_error(Y_test, pred_test))

In [28]:
[MAE, MSE, RMSE]

[0.18548623404046669, 0.08171671595821028, 0.2858613579310962]

#KNN CLASSIFIER

In [29]:
knnmod = KNeighborsClassifier(n_neighbors=2)
knnmod.fit(X_train,Y_train)
pred_knn = knnmod.predict(X_test)

  return self._fit(X, y)


In [30]:
confusion_matrix(Y_test, pred_knn)

array([[ 1,  7],
       [24, 59]])

In [31]:
print(classification_report(Y_test, pred_knn))

              precision    recall  f1-score   support

         0.0       0.04      0.12      0.06         8
         1.0       0.89      0.71      0.79        83

    accuracy                           0.66        91
   macro avg       0.47      0.42      0.43        91
weighted avg       0.82      0.66      0.73        91



## Parameter Tunning KNN

In [32]:
knn2 = KNeighborsClassifier()
ks = list(range(1, 26))+ []
parameter_grid = {'n_neighbors': ks}
knn_cv = GridSearchCV(knn2, parameter_grid, cv=10, scoring=f1)
#fit model to data
knn_cv.fit(X_train, Y_train.values.ravel())

GridSearchCV(cv=10, estimator=KNeighborsClassifier(),
             param_grid={'n_neighbors': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
                                         13, 14, 15, 16, 17, 18, 19, 20, 21, 22,
                                         23, 24, 25]},
             scoring=make_scorer(f1_score, average=binary, pos_label=1))

In [33]:
print(knn_cv.best_score_)
print(knn_cv.best_params_)

0.9288367729831144
{'n_neighbors': 11}


In [34]:
knn_cv.cv_results_['mean_test_score']

array([0.88136921, 0.84096344, 0.91028541, 0.89290315, 0.91990746,
       0.9133355 , 0.92331046, 0.92007159, 0.92614447, 0.92331046,
       0.92883677, 0.92883677, 0.92883677, 0.92883677, 0.92883677,
       0.92883677, 0.92883677, 0.92883677, 0.92883677, 0.92883677,
       0.92883677, 0.92883677, 0.92883677, 0.92883677, 0.92883677])

In [35]:
pred_knn = knn_cv.predict(X_test)

In [36]:
print(confusion_matrix(Y_test, pred_knn))
print(classification_report(Y_test, pred_knn))

[[ 0  8]
 [ 0 83]]
              precision    recall  f1-score   support

         0.0       0.00      0.00      0.00         8
         1.0       0.91      1.00      0.95        83

    accuracy                           0.91        91
   macro avg       0.46      0.50      0.48        91
weighted avg       0.83      0.91      0.87        91



  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


##Random Forest Classifier

In [37]:
model_rf = RandomForestClassifier()
model_rf.fit(X_train, Y_train)
pred_rf = model_rf.predict(X_test)

  


In [38]:
print(confusion_matrix(Y_test, pred_rf))
print(classification_report(Y_test, pred_rf))


[[ 0  8]
 [ 2 81]]
              precision    recall  f1-score   support

         0.0       0.00      0.00      0.00         8
         1.0       0.91      0.98      0.94        83

    accuracy                           0.89        91
   macro avg       0.46      0.49      0.47        91
weighted avg       0.83      0.89      0.86        91



###PARAMETER TUNNING RANDOM FOREST

In [39]:
model_rf2 = RandomForestClassifier()
parameter_grid = {'n_estimators': [32, 64, 100, 128, 200],
                  'max_features': [2, 3, 4, 5, 10]}
model_rf_cv = GridSearchCV(model_rf2, parameter_grid, cv=10, scoring=f1)
#fit model to data
model_rf_cv.fit(X_train, Y_train.values.ravel())

50 fits failed out of a total of 250.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_score='raise'.

Below are more details about the failures:
--------------------------------------------------------------------------------
50 fits failed with the following error:
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/sklearn/model_selection/_validation.py", line 680, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "/usr/local/lib/python3.7/dist-packages/sklearn/ensemble/_forest.py", line 467, in fit
    for i, t in enumerate(trees)
  File "/usr/local/lib/python3.7/dist-packages/joblib/parallel.py", line 1043, in __call__
    if self.dispatch_one_batch(iterator):
  File "/usr/local/lib/python3.7/dist-packages/joblib/parallel.py", line 861, in dispatch_one_batch
    self._dispatch(tasks)
  File "/usr/local/lib/python3.7/

GridSearchCV(cv=10, estimator=RandomForestClassifier(),
             param_grid={'max_features': [2, 3, 4, 5, 10],
                         'n_estimators': [32, 64, 100, 128, 200]},
             scoring=make_scorer(f1_score, average=binary, pos_label=1))

In [40]:
model_rf_cv.best_params_

{'max_features': 2, 'n_estimators': 128}

In [41]:
pred_rf_cv = model_rf_cv.predict(X_test)

In [42]:
print(confusion_matrix(Y_test, pred_rf_cv))
print(classification_report(Y_test, pred_rf_cv))

[[ 0  8]
 [ 3 80]]
              precision    recall  f1-score   support

         0.0       0.00      0.00      0.00         8
         1.0       0.91      0.96      0.94        83

    accuracy                           0.88        91
   macro avg       0.45      0.48      0.47        91
weighted avg       0.83      0.88      0.85        91



#ADABOOST CLASSIFIER

In [43]:
AdaBoostClassifier()

AdaBoostClassifier()

In [44]:
model_ab = AdaBoostClassifier()
model_ab.fit(X_train, Y_train)
pred_ab = model_ab.predict(X_test)

  y = column_or_1d(y, warn=True)


In [45]:
print(confusion_matrix(Y_test, pred_ab))
print(classification_report(Y_test, pred_ab))

[[ 1  7]
 [ 6 77]]
              precision    recall  f1-score   support

         0.0       0.14      0.12      0.13         8
         1.0       0.92      0.93      0.92        83

    accuracy                           0.86        91
   macro avg       0.53      0.53      0.53        91
weighted avg       0.85      0.86      0.85        91



##Parameter Tunning ADABOOST

In [46]:
model_ab2 = AdaBoostClassifier()
parameter_grid = {'n_estimators': [30, 50, 100, 200],
                  'learning_rate':[0.01, 0.1, 0.2, 0.5, 1.0]}
model_ab_cv = GridSearchCV(model_ab2, parameter_grid, cv=10, scoring=f1)
#fit model to data
model_ab_cv.fit(X_train, Y_train.values.ravel())

GridSearchCV(cv=10, estimator=AdaBoostClassifier(),
             param_grid={'learning_rate': [0.01, 0.1, 0.2, 0.5, 1.0],
                         'n_estimators': [30, 50, 100, 200]},
             scoring=make_scorer(f1_score, average=binary, pos_label=1))

In [47]:
model_ab_cv.best_params_

{'learning_rate': 0.01, 'n_estimators': 30}

In [48]:
pred_ab_cv = model_ab_cv.predict(X_test)


In [49]:
print(confusion_matrix(Y_test, pred_ab_cv))
print(classification_report(Y_test, pred_ab_cv))


[[ 0  8]
 [ 0 83]]
              precision    recall  f1-score   support

         0.0       0.00      0.00      0.00         8
         1.0       0.91      1.00      0.95        83

    accuracy                           0.91        91
   macro avg       0.46      0.50      0.48        91
weighted avg       0.83      0.91      0.87        91



  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


#SUPPORT VECTOR CLASSIFIER

In [50]:
from sklearn.svm import SVC
svcmod = SVC(gamma='auto')

from sklearn import svm, datasets
from sklearn.model_selection import GridSearchCV

In [51]:
svcmod.fit(X_train, Y_train)

  y = column_or_1d(y, warn=True)


SVC(gamma='auto')

In [52]:
pred_svc = svcmod.predict(X_test)

In [53]:
print(confusion_matrix(Y_test, pred_svc))
print(classification_report(Y_test, pred_svc))

[[ 0  8]
 [ 0 83]]
              precision    recall  f1-score   support

         0.0       0.00      0.00      0.00         8
         1.0       0.91      1.00      0.95        83

    accuracy                           0.91        91
   macro avg       0.46      0.50      0.48        91
weighted avg       0.83      0.91      0.87        91



  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


In [54]:
param_grid = {'C':[1,10,100,1000], # cost for miss classification
              'gamma':[1,0.1,0.001,0.0001], # flexibility of the model 
              'kernel':['rbf']}
svc_cv = GridSearchCV(SVC(),param_grid, refit = True, verbose=2)
svc_cv.fit(X_train,Y_train.values.ravel())

Fitting 5 folds for each of 16 candidates, totalling 80 fits
[CV] END ...........................C=1, gamma=1, kernel=rbf; total time=   0.0s
[CV] END ...........................C=1, gamma=1, kernel=rbf; total time=   0.0s
[CV] END ...........................C=1, gamma=1, kernel=rbf; total time=   0.0s
[CV] END ...........................C=1, gamma=1, kernel=rbf; total time=   0.0s
[CV] END ...........................C=1, gamma=1, kernel=rbf; total time=   0.0s
[CV] END .........................C=1, gamma=0.1, kernel=rbf; total time=   0.0s
[CV] END .........................C=1, gamma=0.1, kernel=rbf; total time=   0.0s
[CV] END .........................C=1, gamma=0.1, kernel=rbf; total time=   0.0s
[CV] END .........................C=1, gamma=0.1, kernel=rbf; total time=   0.0s
[CV] END .........................C=1, gamma=0.1, kernel=rbf; total time=   0.0s
[CV] END .......................C=1, gamma=0.001, kernel=rbf; total time=   0.0s
[CV] END .......................C=1, gamma=0.001

GridSearchCV(estimator=SVC(),
             param_grid={'C': [1, 10, 100, 1000],
                         'gamma': [1, 0.1, 0.001, 0.0001], 'kernel': ['rbf']},
             verbose=2)

In [55]:
pred_svc = svc_cv.predict(X_test)
print(classification_report(Y_test, pred_svc))
print(confusion_matrix(Y_test, pred_svc))

              precision    recall  f1-score   support

         0.0       0.00      0.00      0.00         8
         1.0       0.91      1.00      0.95        83

    accuracy                           0.91        91
   macro avg       0.46      0.50      0.48        91
weighted avg       0.83      0.91      0.87        91

[[ 0  8]
 [ 0 83]]


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
