### Scrapping data from a single vote.
Coverting the webpage for a single vote into a Pandas dataframe is simple.
After importing pandas, pass a URL to the `pd.read_html` command

In [2]:
import pandas as pd
# Converting the webpage for a single vote into a pandas df
dfs = pd.read_html("https://www.ourcommons.ca/Members/en/votes/43/2/17", header=0)
df = dfs[0]
df

HTTPError: HTTP Error 404: Not Found

In [2]:
df.shape

(319, 4)

In [3]:
# To keep things clean, we will change the names of the colmns
# headeers and then display the five row of data
df.columns = ["Member", "Party", "Vote", "Paired"]
df.head()

Unnamed: 0,Member,Party,Vote,Paired
0,Mr. Ziad Aboultaif(Edmonton Manning),Conservative,Nay,
1,Mr. Scott Aitchison(Parry Sound—Muskoka),Conservative,Nay,
2,Mr. Dan Albas(Central Okanagan—Similkameen—Nic...,Conservative,Nay,
3,Mr. Omar Alghabra(Mississauga Centre),Liberal,Nay,
4,Ms. Leona Alleslev(Aurora—Oak Ridges—Richmond ...,Conservative,Nay,


In [4]:
# Check how to party numbers broke down `values_count()`
df["Party"].value_counts().to_frame()

Unnamed: 0,Party
Liberal,146
Conservative,115
Bloc Québécois,31
NDP,22
Green Party,3
Independent,2


In [5]:
# How the vote went
df["Vote"].value_counts()

Nay    263
Yea     56
Name: Vote, dtype: int64

## Automating vote tabulation

Build a program that will automate the process of scrapping and then tabulating hundreds of votes at a time

1. Identify the votes we want to analyse.
    * Use `pd.read_html` to scrape the page that lists all the votes on a private members' bills from the first session.
    * Pass the data to a new df `vote_list`

In [6]:
# Votes on private members bill from the first session of the 42nd

dfs_vote_list =pd.read_html('https://www.ourcommons.ca/Members/en/votes?parlSession=42-1&billDocumentTypeId=4',
                           header=0)
vote_list = dfs_vote_list[0]

Now, since I're going to have to keep track of the number of voting types as their records are scraped, I’ll initialize some variables and set them to zero. There will be a party line and non party line variable for each party so we’ll know how many of each category we’ve seen. I’ll also track the total number of voting bills we’re covering.

In [7]:
total_votes = 0
partyLine_votes_Conservative = 0
non_partyLine_votes_Conservative = 0
partyLine_votes_Liberal = 0
non_partyLine_votes_Liberal = 0
partyLine_votes_NDP = 0
non_partyLine_votes_NDP = 0
partyLine_votes_Bloc = 0
non_partyLine_votes_Bloc = 0
partyLine_votes_GreenParty = 0
non_partyLine_votes_GreenParty = 0
partyLine_votes_Independent = 0
non_partyLine_votes_Independent = 0

Once the core program is run, we’ll iterate through four functions - one for each party. The functions will enumerate the Yeas and Nays and then test for the presence of at least one Yea and one Nay to identify split voting. If there was a split, the function will increment the non party line variable by one. Otherwise, the party line variable will be incremented by one.

In [8]:
df[df['Party'].str.contains('Conservative')]

Unnamed: 0,Member,Party,Vote,Paired
0,Mr. Ziad Aboultaif(Edmonton Manning),Conservative,Nay,
1,Mr. Scott Aitchison(Parry Sound—Muskoka),Conservative,Nay,
2,Mr. Dan Albas(Central Okanagan—Similkameen—Nic...,Conservative,Nay,
4,Ms. Leona Alleslev(Aurora—Oak Ridges—Richmond ...,Conservative,Nay,
5,Mr. Dean Allison(Niagara West),Conservative,Nay,
...,...,...,...,...
306,Mr. Len Webber(Calgary Confederation),Conservative,Nay,
309,Mr. John Williamson(New Brunswick Southwest),Conservative,Nay,
311,Mrs. Alice Wong(Richmond Centre),Conservative,Nay,
314,Mr. David Yurdiga(Fort McMurray—Cold Lake),Conservative,Nay,


In [18]:
# def conservative_votes():
#     global partyLine_votes_Conservative
#     global non_partyLine_votes_Conservative
    
#     df_party = df[df['Party'].str.contains('Conservative')]
#     vote_output_yea = df_party['Vote'].str.contains('Yea')
#     total_votes_yea = vote_output_yea.sum()
#     vote_output_nay = df_party['Vote'].str.contains('Nay')
#     total_votes_nay = vote_output_nay.sum()
#     if total_votes_yea > 0 and total_votes_nay > 0:
#         non_partyLine_votes_Conservative += 1
#     else:
#         partyLine_votes_Conservative += 1
            
            
# def liberal_votes():
#     global partyLine_votes_Liberal
#     global non_partyLine_votes_Liberal
    
#     df_party = df[df['Party'].str.contains('Liberal')]
#     vote_output_yea = df_party['Vote'].str.contains('Yea')
#     total_votes_yea = vote_output_yea.sum()
#     vote_output_nay = df_party['Vote'].str.contains('Nay')
#     total_votes_nay = vote_output_nay.sum()
#     if total_votes_yea > 0 and total_votes_nay > 0:
#         non_partyLine_votes_Liberal += 1
#     else:
#         partyLine_votes_Liberal += 1

# def ndp_votes():
#     global partyLine_votes_NDP
#     global non_partyLine_votes_NDP
    
#     df_party = df[df['Party'].str.contains('NDP')]
#     vote_output_yea = df_party['Vote'].str.contains('Yea')
#     total_votes_yea = vote_output_yea.sum()
#     vote_output_nay = df_party['Vote'].str.contains('Nay')
#     total_votes_nay = vote_output_nay.sum()
#     if total_votes_yea > 0 and total_votes_nay > 0:
#         non_partyLine_votes_NDP += 1
#     else:
#         partyLine_votes_NDP += 1
# def bloc_votes():
#     global partyLine_votes_Bloc
#     global non_partyLine_votes_Bloc
    
#     df_party = df[df['Party'].str.contains('Bloc Québécois')]
#     vote_output_yea = df_party['Vote'].str.contains('Yea')
#     total_votes_yea = vote_output_yea.sum()
#     vote_output_nay = df_party['Vote'].str.contains('Nay')
#     total_votes_nay = vote_output_nay.sum()
#     if total_votes_yea > 0 and total_votes_nay > 0:
#         non_partyLine_votes_Bloc += 1
#     else:
#         partyLine_votes_Bloc += 1
            
def party_votes(party_name, partyLine_vote, non_partyLine_votes):
    
    df_party = df[df['Party'].str.contains(party_name)]
    vote_output_yea = df_party['Vote'].str.contains('Yea')
    total_votes_yea = vote_output_yea.sum()
    vote_output_nay = df_party['Vote'].str.contains('Nay')
    total_votes_nay = vote_output_nay.sum()
    if total_votes_yea > 0 and total_votes_nay > 0:
        partyLine_vote += 1
    else:
        non_partyLine_votes += 1

    

Our next job will be to build a list of the URLs we’ll be scraping.
We’re only interesting in collecting the relevant vote numbers from each row so we can add them to the base URL (identified as https://www.ourcommons.ca/Members/en/votes/42/1/ in the code).

In [19]:
dfs_vote_list = pd.read_html("https://www.ourcommons.ca/Members/en/votes?parlSession=42-1&billDocumentTypeId=3", header=0)
vote_list = dfs_vote_list[0]
vote_list

Unnamed: 0,Vote Number,Vote Respecting,Subject,Votes (Yeas / Nays / Paired),Vote Result,Date
0,No. 1379,House Government Bill,Motion respecting Senate amendments to Bill C-...,161 / 58 / 2,Agreed To,"June 19, 2019"
1,No. 1378,House Government Bill,Motion for closure,149 / 67 / 2,Agreed To,"June 19, 2019"
2,No. 1374,House Government Bill,Motion respecting Senate amendments to Bill C-...,190 / 86 / 2,Agreed To,"June 18, 2019"
3,No. 1373,House Government Bill,Motion for closure,157 / 113 / 2,Agreed To,"June 18, 2019"
4,No. 1372,House Government Bill,"3rd reading and adoption of Bill C-102, An Act...",167 / 123 / 2,Agreed To,"June 18, 2019"
...,...,...,...,...,...,...
434,No. 17,House Government Bill,"2nd reading of Bill C-4, An Act to amend the C...",219 / 90 / 0,Agreed To,"March 7, 2016"
435,No. 16,House Government Bill,"2nd reading of Bill C-4, An Act to amend the C...",91 / 220 / 0,Negatived,"March 7, 2016"
436,No. 7,House Government Bill,"3rd reading and adoption of Bill C-3, An Act f...",227 / 96 / 0,Agreed To,"December 10, 2015"
437,No. 6,House Government Bill,"Concurrence at report stage of Bill C-3, An Ac...",227 / 96 / 0,Agreed To,"December 10, 2015"


In [20]:
vote_list.columns = ["Number", "Type", "Subject", "Votes", "Result", "Date"]
vote_list["Number"] = vote_list["Number"].str.extract("(\d+)", expand=False)
base_url = "https://www.ourcommons.ca/Members/en/votes/42/1/"



`url_data` is the name of a new dataframe I create to contain our set of production-ready URLs. I then run a for-loop that will iterate through each number from the Number column and add it to the end of the base URL. Each finished URL will then be appended to the `url_data` dataframe.



In [21]:
# Create a `Vote` column in the data frame
url_data = pd.DataFrame(columns=["Vote"])

Vote = []

for num in vote_list["Number"]:
    newURL = base_url + num
    Vote.append(newURL)
url_data["Vote"] = Vote

url_data

Unnamed: 0,Vote
0,https://www.ourcommons.ca/Members/en/votes/42/...
1,https://www.ourcommons.ca/Members/en/votes/42/...
2,https://www.ourcommons.ca/Members/en/votes/42/...
3,https://www.ourcommons.ca/Members/en/votes/42/...
4,https://www.ourcommons.ca/Members/en/votes/42/...
...,...
434,https://www.ourcommons.ca/Members/en/votes/42/...
435,https://www.ourcommons.ca/Members/en/votes/42/...
436,https://www.ourcommons.ca/Members/en/votes/42/1/7
437,https://www.ourcommons.ca/Members/en/votes/42/1/6


I’ll want to save those URLs to a permanent file so they’ll be available if I want to run similar queries later. Just be careful not to run this command more than once, as it will add a second (or third) identical set of URLs to the file, doubling (or tripling) the number of requests your program will make.

In [22]:
url_data.to_csv(r"url-text-42-1-privatemembers",
               header=None, index=None,
               sep=" ",
               mode="a")

This brings us at last to the program’s core. We’ll use another for-loop to iterate through each URL in the file, read and convert the content to a dataframe, rename a couple of column headers to make them easier to work with, and then test for unanimous votes (i.e., bills which generated no Nay votes at all).

Why bother? Because unanimous votes - often motions to honour individuals or institutions - will teach us nothing about normal voting patterns and, on the contrary, could skew our results. If a vote was unanimous, continue will tell Python to skip it and move on to the next URL.

For all other votes (else), the code will call each of the four functions and then increment the total_votes variable by one.



In [23]:
df

Unnamed: 0,Member,Party,Vote,Paired
0,Mr. Ziad Aboultaif(Edmonton Manning),Conservative,Nay,
1,Mr. Scott Aitchison(Parry Sound—Muskoka),Conservative,Nay,
2,Mr. Dan Albas(Central Okanagan—Similkameen—Nic...,Conservative,Nay,
3,Mr. Omar Alghabra(Mississauga Centre),Liberal,Nay,
4,Ms. Leona Alleslev(Aurora—Oak Ridges—Richmond ...,Conservative,Nay,
...,...,...,...,...
314,Mr. David Yurdiga(Fort McMurray—Cold Lake),Conservative,Nay,
315,Mrs. Salma Zahid(Scarborough Centre),Liberal,Nay,
316,Ms. Lenore Zann(Cumberland—Colchester),Liberal,Nay,
317,Mr. Bob Zimmer(Prince George—Peace River—North...,Conservative,Nay,


In [None]:
URLS = open("url-text-42-1-privatemembers", "r")
for url in URLS:
    # Read next HTML page in set:
    dfs = pd.read_html(url, header=0)
    dfd = dfs[0]
    dfd.rename(columns={"Member Voted":"Vote"}, inplace=True)
    dfd.rename(columns={'Political Affiliation':'Party'}, inplace=True)
    
    # ignore unanimous votes:
    
    vote_output_nay = df[df["Vote"].str.contains("Nay", na=False)]
    total_votes_nay = vote_output_nay["Vote"].str.contains("Nay", 
                                                           na=False)
    filtered_votes = total_votes_nay.sum()
    if filtered_votes ==0:
        continue
    # Call functions to tabulate votes 
    else:
#         conservative_votes()
#         liberal_votes()
#         ndp_votes()
#         bloc_votes()
#         total_votes += 1
        party_votes("Conservative", partyLine_votes_Conservative, non_partyLine_votes_Conservative )
        party_votes("Liberal", partyLine_votes_Liberal, non_partyLine_votes_Liberal )
        party_votes("NDP", partyLine_votes_NDP, non_partyLine_votes_NDP )
        party_votes("Bloc Québécois", partyLine_votes_Bloc, non_partyLine_votes_Bloc )
        total_votes += 1
        

In [None]:
print("We counted", total_votes, "votes in total.")
print("Conservative members voted the party line",
      
      partyLine_votes_Conservative, 
      "times, and split their vote",
      non_partyLine_votes_Conservative, "times.")

# Python for Data Science Project

This repository contains from exploring the IMB course titled _Python for Data Science Project_

## Learning Objectives

In this repo I :
- Demonstrate my skills for working with Python and Data
- Create a dashboard that shows key performance indicators from a specific data set