# Instruction

In this part of the assignment, you will prepare the data to analyze the "meaningful votes" for the European Union Withdrawal Agreement and carry out a classification task. 

There were three attempts to pass a version of the withdrawal agreement (formed in late 2018) in the House of Commons, but in all three attempts, the government led by Prime Minister Theresa May failed to pass. The failures were due to the large number of rebels among Conservative MPs. 

If you are not familiar with the story about it you can rely on the following sources:

- Aidt, T., Grey, F. & Savu, A. The Meaningful Votes: Voting on Brexit in the British House of Commons. *Public Choice* (2019).
  - https://link.springer.com/article/10.1007/s11127-019-00762-9
  - An academic article to analyze the situation
  - The analysis is similar to what you will do
- Wikipedia:
  - https://en.wikipedia.org/wiki/Parliamentary_votes_on_Brexit



There are three meaningful votes (see the links above) and the results are accessibe from here:

- Vote1: https://votes.parliament.uk/Votes/Commons/Division/562
- Vote2: https://votes.parliament.uk/Votes/Commons/Division/623
- Vote3: https://votes.parliament.uk/Votes/Commons/Division/664

I compiled the results of three meaningful votes, along with the [Revoke Article 50 and remain in the EU petition](https://petition.parliament.uk/archived/petitions/241584) (from Assignment 2), in a csv file.

## Your task

1. Get other datasets and merge them with the voting record data
2. Complete a machine learning task to predict rebels among Conservative MPs


In [1]:
import numpy as np
import pandas as pd
import json
from urllib.request import urlopen

# Get the main data from the GV918 data repository (5 percent)

Get the data hosted on:
https://github.com/University-of-Essex-Dept-of-Government/GV918-UK-politics-data

- The parliamnetary votes as well as the petition outcomes are in `df_meaningful_vote.csv`


In [2]:
!git clone https://github.com/University-of-Essex-Dept-of-Government/GV918-UK-politics-data.git

Cloning into 'GV918-UK-politics-data'...
remote: Enumerating objects: 57, done.[K
remote: Counting objects: 100% (1/1), done.[K
remote: Total 57 (delta 0), reused 0 (delta 0), pack-reused 56[K
Unpacking objects: 100% (57/57), done.


In [5]:
votes = pd.read_csv("/content/GV918-UK-politics-data/Data/df_meaningful_vote.csv")

In [41]:
votes

Unnamed: 0,index,MemberId,Name,Party,MemberFrom,vote1,vote2,vote3,ons_code,signature_count_241584
0,0,8,Theresa May,Conservative,Maidenhead,1.0,1.0,1.0,E14000803,13559
1,1,15,David Lidington,Conservative,Aylesbury,1.0,1.0,1.0,E14000538,10129
2,2,18,Cheryl Gillan,Conservative,Chesham and Amersham,1.0,1.0,1.0,E14000631,13543
3,3,55,Desmond Swayne,Conservative,New Forest West,1.0,1.0,1.0,E14000828,7920
4,4,69,Oliver Heald,Conservative,North East Hertfordshire,1.0,1.0,1.0,E14000845,10974
...,...,...,...,...,...,...,...,...,...,...
633,633,4698,Janet Daby,Labour,Lewisham East,0.0,0.0,0.0,E14000787,13684
634,634,4358,Wendy Morton,Conservative,Aldridge-Brownhills,,1.0,1.0,E14000531,3025
635,635,3928,Nick Smith,Labour,Blaenau Gwent,,0.0,0.0,W07000072,2264
636,636,4491,Vicky Foxcroft,Labour,"Lewisham, Deptford",,0.0,0.0,E14000789,24819


# Other data sources 

In this section, you will get the data from several sources and merge them with the main dataframe. 



## Referendum votes, general election data (5 percent)

In this section, you will merge two additional datasets. 

1. Election outcomes of 2017 (You can use the code below)
2. Constituency level referendum output (We used this data in the previous class)

Once you merge, create a new variable of the number of petition signatures per electorate.

In [58]:
df_elec = pd.read_csv("/content/GV918-UK-politics-data/Data/HoC-GE2017-results-by-candidate.csv")
df_const = pd.read_csv("/content/GV918-UK-politics-data/Data/HoC-GE2017-constituency-results.csv")
df_const = df_const[['ons_id', 'electorate']]

In [59]:
df_votes = pd.merge(votes, df_const, how = "left", left_on = "ons_code", right_on = "ons_id")
df_votes.head()

Unnamed: 0,index,MemberId,Name,Party,MemberFrom,vote1,vote2,vote3,ons_code,signature_count_241584,ons_id,electorate
0,0,8,Theresa May,Conservative,Maidenhead,1.0,1.0,1.0,E14000803,13559,E14000803,76076
1,1,15,David Lidington,Conservative,Aylesbury,1.0,1.0,1.0,E14000538,10129,E14000538,82546
2,2,18,Cheryl Gillan,Conservative,Chesham and Amersham,1.0,1.0,1.0,E14000631,13543,E14000631,71654
3,3,55,Desmond Swayne,Conservative,New Forest West,1.0,1.0,1.0,E14000828,7920,E14000828,68786
4,4,69,Oliver Heald,Conservative,North East Hertfordshire,1.0,1.0,1.0,E14000845,10974,E14000845,75965


In [60]:
df_votes["sig_per_electorate"] = df_votes['signature_count_241584'] / df_votes['electorate']

In [None]:
df_brexit = pd.read_csv("/content/GV918-UK-politics-data/Data/merged_brexit_data.csv")
df_brexit = df_brexit[["ons_id", "leave_pct"]]

In [62]:
df_votes = pd.merge(df_votes, df_brexit, on = "ons_id")
df_votes

Unnamed: 0,index,MemberId,Name,Party,MemberFrom,vote1,vote2,vote3,ons_code,signature_count_241584,ons_id,electorate,sig_per_electorate,leave_pct
0,0,8,Theresa May,Conservative,Maidenhead,1.0,1.0,1.0,E14000803,13559,E14000803,76076,0.178230,0.450257
1,1,15,David Lidington,Conservative,Aylesbury,1.0,1.0,1.0,E14000538,10129,E14000538,82546,0.122707,0.517879
2,2,18,Cheryl Gillan,Conservative,Chesham and Amersham,1.0,1.0,1.0,E14000631,13543,E14000631,71654,0.189005,0.449850
3,3,55,Desmond Swayne,Conservative,New Forest West,1.0,1.0,1.0,E14000828,7920,E14000828,68786,0.115140,0.552577
4,4,69,Oliver Heald,Conservative,North East Hertfordshire,1.0,1.0,1.0,E14000845,10974,E14000845,75965,0.144461,0.514301
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
633,633,4698,Janet Daby,Labour,Lewisham East,0.0,0.0,0.0,E14000787,13684,E14000787,68124,0.200869,0.353705
634,634,4358,Wendy Morton,Conservative,Aldridge-Brownhills,,1.0,1.0,E14000531,3025,E14000531,60363,0.050113,0.677963
635,635,3928,Nick Smith,Labour,Blaenau Gwent,,0.0,0.0,W07000072,2264,W07000072,51227,0.044195,0.620280
636,636,4491,Vicky Foxcroft,Labour,"Lewisham, Deptford",,0.0,0.0,E14000789,24819,E14000789,78468,0.316295,0.244262


## 3. MPs positions data (5 percent)

The last dataset to merge is MPs position for Brexit referendum. The data is coming from Aidt et al (2019) paper.

In [63]:
df_mp_positions = pd.read_csv("/content/GV918-UK-politics-data/Data/mp_positions-cleaned.csv")
df_mp_positions = df_mp_positions[["Constituency", "MP vote for Brexit"]]

In [None]:
df_votes = pd.merge(df_votes, df_mp_positions, left_on = "MemberFrom", right_on = "Constituency")


# Machine learning 

Using the dataset you have prepared, run the classification problem below:

- Data: Conservative MPs meaningful votes 
- Output: Rebellion in the meaningful motes
  - Rebel = Conservative MP who voted no (if you don't understand the logic, refer to Aidt et al (2019))
- You can choose input but at least you should include
  - Per electorate signature for the petition
  - MPs position in the referendum
  - Referendum outcomes at the constituency
  - Electoral strength measured by the percentage of votes




## ML procedures (25 percent)

Now you carry out machine learning task. You need to take the following steps:

1. Train-test split
2. Data wrangling (including standardization)
3. Model fitting
  - Run multiple algorithms. Explain the model choice (i.e. why you think the algorithm is worth trying)
  - Carry out parameter tuning
4. Evaluate/compare models
  - How is the performance of different algorithms?


## Interpretations (10 percent)

Summarise the finding and provide some discussion in writing (300 words or more). The discussion can include: 

- Which algorism worked the best? Why?
- Which meaningful vote does the model explain the most? Why?
