![Callysto.ca Banner](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-top.jpg?raw=true)

<a href="https://hub.callysto.ca/jupyter/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2Fcallysto%2Fcurriculum-notebooks&branch=master&subPath=SocialStudies/HansardAnalysis/hansard-analysis.ipynb&depth=1" target="_parent"><img src="https://raw.githubusercontent.com/callysto/curriculum-notebooks/master/open-in-callysto-button.svg?sanitize=true" width="123" height="24" alt="Open in Callysto"/></a>

# Open Parliament

The [Hansard](https://en.wikipedia.org/wiki/Hansard) is a transcript of debates in the Canadian Parliament. It is available from the official [Parliament of Canada website](https://www.parl.ca) as well as other sources such as [Open Parliament](https://openparliament.ca) and [LiPaD: The Linked Parliamentary Data Project](https://www.lipad.ca).

Later on during this notebook, we'll be also be using information from [openparliament.ca](https://openparliament.ca/) which sources modern data in regard to government-related information.

We have downloaded the 2020 files from LiPaD, and can load them by selecting the following code cell and clicking the `▶Run` button.

In [1]:
# Python libraries

from collections import Counter
import re
import plotly.express as px
import json
import numpy as np
import pandas as pd

# attempt to import requests
# installing requests - if Jupyter Notebook does not have the package

try:
    import requests
except:
    %pip conda requests
    import requests
    
# attempt to import BeautifulSoup
# installing BeautifulSoup - if Jupyter Notebook does not have the package
    
try:
    import bs4
    from bs4 import BeautifulSoup
except:
    %pip conda bs4
    import bs4
    from bs4 import BeautifulSoup

try:
    from wordcloud import WordCloud
except:
    %pip install wordcloud
    from wordcloud import WordCloud
    
hansard = pd.read_csv('https://raw.githubusercontent.com/callysto/data-files/main/SocialStudies/HansardAnalysis/proceedings2020.csv')
print(f'There are {hansard.shape[0]} rows and {hansard.shape[1]} columns of data:')
hansard.columns

There are 4945 rows and 15 columns of data:


Index(['basepk', 'hid', 'speechdate', 'pid', 'opid', 'speakeroldname',
       'speakerposition', 'maintopic', 'subtopic', 'subsubtopic', 'speechtext',
       'speakerparty', 'speakerriding', 'speakername', 'speakerurl'],
      dtype='object')

## Who Spoke?

Let's have a look at who spoke during these debates.

In [2]:
speakers = hansard.drop_duplicates(subset=['speakername'])[['speakername','speakerparty','speakerriding','speakerurl']]
speakers = speakers.dropna().reset_index().drop(columns=['index'])
print('There were',speakers.shape[0],'speakers from the',speakers['speakerparty'].unique(),'parties.')

There were 312 speakers from the ['Liberal' 'New Democratic Party' 'Green Party' 'Conservative'
 'Bloc Québécois' 'Independent'] parties.


We can compare that to the list of Members of Parliament from the [43rd Parliament](https://en.wikipedia.org/wiki/43rd_Canadian_Parliament) that started on December 5, 2019.

In [3]:
members = pd.read_csv('https://www.ourcommons.ca/members/en/search/csv?parliament=43')
print('There were',members.shape[0],'Members from the',members['Political Affiliation'].unique(),'parties.')

There were 340 Members from the ['Conservative' 'Liberal' 'NDP' 'Bloc Québécois' 'Green Party'
 'Independent'] parties.


So of the 338 Members of Parliament we had 312 unique speakers, meaning that 26 Members are not recorded as speaking during 2020. Let's see if we can identify who are they were.

In [4]:
members['Name'] = members['First Name'] +' '+ members['Last Name']
silent = []
for member in members['Name']:
    if member not in speakers['speakername'].values:
        silent.append(member)
print('That is',len(silent),'Members not recorded as speaking in 2020:')
print(silent)

That is 36 Members not recorded as speaking in 2020:
['Jaime Battiste', 'Terry Beech', 'Bob Benzen', 'Stéphane Bergeron', 'Sylvie Bérubé', 'Yves-François Blanchet', 'Élisabeth Brière', 'Jim Carr', 'Sean Casey', 'Serge Cormier', 'Neil Ellis', 'Marie-Hélène Gaudreau', 'Marci Ien', 'Robert Kitchen', 'Andréanne Larouche', 'Sébastien Lemire', 'Dave MacKenzie', 'Simon Marcil', 'Ken McDonald', 'John McKay', 'Alexandra Mendès', 'Bill Morneau', 'Robert Morrissey', 'Joyce Murray', 'Robert Oliphant', "Seamus O'Regan", 'Louis Plamondon', 'Marcus Powlowski', 'Michelle Rempel Garner', 'Raj Saini', 'Judy A. Sgro', 'Terry Sheehan', 'Doug Shipley', 'Rachael Thomas', 'Chris Warkentin', 'Alice Wong']


Of course 36 is not equal to 26, but we will leave it to you to compare the list `silent` to the list from `speakers['speakername'].unique()` if you are interested.

## Who Spoke Most?

We can check how many times each speaker is recorded in the Hansard.

In [5]:
hansard_speakers = pd.DataFrame(hansard['speakername'].value_counts())
hansard_speakers

Unnamed: 0,speakername
Anthony Rota,261
Kevin Lamoureux,234
Some hon. members,117
Carol Hughes,104
Bruce Stanton,103
...,...
Mark Holland,1
Karina Gould,1
Dan Vandal,1
Mumilaaq Qaqqaq,1


Let's also calculate the length (number of characters) of each of those speeches, and sort them by who said the most.

In [6]:
hansard['speechlength'] = hansard['speechtext'].str.len()
hansard.groupby('speakername').sum(numeric_only=True).sort_values('speechlength', ascending=False)

Unnamed: 0_level_0,speechlength
speakername,Unnamed: 1_level_1
Kevin Lamoureux,229100.0
Garnett Genuis,151029.0
Arif Virani,100618.0
Gérard Deltell,99181.0
Paul Manly,98090.0
...,...
Chandra Arya,269.0
The Speaker,114.0
An hon. member,78.0
The Assistant Deputy Speaker (Mrs. Alexandra Mendès),42.0


We can also visualize the number of times *any* MP spoke with a histogram:

In [35]:
px.histogram(hansard_speakers, x='speakername', title='Histogram of Number of Speeches by Member', labels={'speakername':'Number of Speeches'}).update(layout_showlegend=False)

### Thinking Proportionally
Now, the above plots are useful in finding out which parties spoke the most, but it would be pretty reasonable to expect the parties with the most members to have the longest or most frequent speeches. In the next step, we'll look at the composition of the 43rd Parliament, and normalize the above two plots to the number of members each party has in Parliament:

In [8]:
seats = pd.DataFrame(list(zip(['Liberal', 'Conservative', 'New Democratic Party', 'Bloc Québécois', 'Green Party', 'Independent'],[157, 121, 32, 24, 3, 1])), columns=['Party', 'Seats']).set_index('Party')
px.bar(seats, x=seats.index, y='Seats', title='Number of Seats in 43rd Parliament, by Party', color= seats.index, color_discrete_map={'Liberal': 'red', 'Conservative': 'blue', 'New Democratic Party': 'orange', "Bloc Québécois": 'lightblue', 'Green Party': 'green', 'Independent': 'lightseagreen'}).update(layout_showlegend=False)

In [9]:
freq_norm = hansard['speakerparty'].value_counts().div(seats['Seats'], axis=0)
freq_norm = pd.DataFrame({'party':freq_norm.index, 'frequency':freq_norm.values})
px.bar(freq_norm, x='party', y='frequency', title='Hansard Speaker Frequency by Party (Normalized by Number of Seats)', color='party', color_discrete_map={'Liberal': 'red', 'Conservative': 'blue', 'New Democratic Party': 'orange', "Bloc Québécois": 'lightblue', 'Green Party': 'green', 'Independent': 'lightseagreen'}).update(layout_showlegend=False)

### Questions:

1. Can you think of any reasons why certain people in Parliament might speak more than others?
2. How might the number of times someone speaks in Parliament relate to their influence or effectiveness as a representative?
3. Why do you think certain parties speak more than others? Vice-versa, why do you think certain parties speak less than others?
4. What ethical considerations should be taken into account when analyzing and interpreting data on parliamentary speeches?

###  Topics of Importance
We can also look at specific topics that are being addressed the most and vice versa, alongside a particular member of Parliament's topic.

In [10]:
hansard_topics = pd.DataFrame(hansard.groupby('subtopic')['subtopic'].aggregate('count').reset_index(name='count'))
hansard_topics = hansard_topics.sort_values(by=['count']).reset_index()
display(hansard_topics)

Unnamed: 0,index,subtopic,count
0,0,100th Anniversary of the Sainte-Thérèse Women'...,1
1,283,National Security and Intelligence Committee o...,1
2,282,National Internment Education Day,1
3,281,National Freshwater Strategy Act,1
4,279,National Football League,1
...,...,...,...
465,94,Citizenship Act,208
466,76,Canada-United States-Mexico Agreement Implemen...,286
467,361,Resumption of Debate on Address in Reply,325
468,114,Criminal Code,383


We can take a look at what the top 10 *most spoken* topics, alongside the top 10 *least spoken* topics at Parliament.

In [11]:
top_10_fig = px.bar(hansard_topics.tail(10), title="Top 10 Topics spoken in Parliament", y="subtopic", x="count", labels={'subtopic': "Topic"}, orientation='h', color='count')
top_10_fig.update_layout(showlegend=False).update_layout(yaxis_title=None).show()

bot_10_fig = px.bar(hansard_topics.head(10), title="Bottom 10 Topics spoken in Parliament", y="subtopic", x="count", labels={'subtopic': "Topic"}, orientation='h')
bot_10_fig.update_layout(showlegend=False).update_layout(yaxis_title=None).show()

Looking at both bar charts, are certain topics *not* being addressed as much? Vice-versa, are certain topics you think are being addressed too often?

We can also look at which *members of Parliament* speak on topics that you find *important*. In the cell below, input different `subtopic` names in the cell below and see which members of Parliament talk about your particular topic!

In [12]:
list_of_topics = hansard_topics['subtopic'].unique()
print(list_of_topics)

["100th Anniversary of the Sainte-Thérèse Women's Organization"
 'National Security and Intelligence Committee of Parliamentarians'
 'National Internment Education Day' 'National Freshwater Strategy Act'
 'National Football League' 'National Defence Act'
 'National Caregiver Week' 'Nagorno-Karabakh Region' 'Métis Week'
 'Murray Drudge' 'Movember'
 'Montreal Island North Health and Social Services Centre'
 'Model United Nations' 'Moby Bukhari' 'Mississauga Food Bank'
 'Missing and Murdered Indigenous Women and Girls' 'Mining Industry'
 'Micah Messent' 'Meteorological Service of Canada'
 'Mental Illness Awareness Week' 'National Volunteer Week'
 'Navigable Waters Act' 'Navratri' 'Nelly Dubourg' 'Order of Canada'
 'Opioids' 'Opening of Session' 'Ontario By-elections' 'Old Age Security'
 'Okill Stuart' 'Okanagan Nation Alliance'
 'Office of the Correctional Investigator' 'Oaths of Office'
 'Member for Yukon' 'OCHL Volunteer' 'Nowruz' 'Nobel Prize Winner'
 'No. 2 Construction Battalion' 'Ni

The cell above holds all the subtopics spoken in Parliament. The various subtopics can be inputted in the `topic` variable in the code cell below.

In [13]:
# Change the value of topic to any subtopic you're interested in
# For example, instead of 'Health' you can input 'Petitions'
topic = 'Health'

members_by_topic = pd.DataFrame(hansard.loc[hansard['subtopic'] == topic]) 
members_by_topic = members_by_topic.drop_duplicates(subset=['speakername']) 
members_by_topic = members_by_topic.drop(columns=['basepk', 'hid', 'speechdate', 'pid', 'opid', 'speakerposition', 'subsubtopic', 'speechtext', 'speechtext', 'speakeroldname', 'speakerurl', 'speakerriding']).reset_index(drop=True) 
if members_by_topic.empty:
    print('No matches. Did you make sure to capitalize and space correctly?')
else:
    display(members_by_topic)

Unnamed: 0,maintopic,subtopic,speakerparty,speakername,speechlength
0,Oral Questions,Health,Liberal,Patty Hajdu,527.0
1,Oral Questions,Health,Conservative,Michelle Rempel,423.0
2,Oral Questions,Health,Liberal,Anthony Rota,89.0
3,Oral Questions,Health,Liberal,Pablo Rodriguez,375.0
4,Oral Questions,Health,Conservative,Rosemarie Falk,594.0
5,Oral Questions,Health,Liberal,Deb Schulte,755.0
6,Oral Questions,Health,Conservative,Todd Doherty,597.0
7,Oral Questions,Health,Bloc Québécois,Alain Therrien,682.0
8,Oral Questions,Health,Bloc Québécois,Marilène Gill,596.0
9,Oral Questions,Health,Bloc Québécois,Gabriel Ste-Marie,460.0


<div class="alert alert-block alert-info">
<b>Optional:</b> The below code cell randomly replaces the name of each party in the dataframe with a letter, allowing you to guess the party based on their topic 10 most spoken topics! Another cell after the plots reveals which party is which letter, but if you want to use the party name in the plots you can comment out (place a <tt>#</tt> at the beginning of each line) the below cell.
</div>

In [14]:
# Obscure party names
# Note, if you don't want to randomize the parties, add a # on each of the lines below
# A quick short-cut to do this is to select everything below and press (Ctrl + /)

import random 
letters = ['Party A', 'Party B', 'Party C', 'Party D', 'Party E', 'Party F']
parties = hansard['speakerparty'].dropna().unique().tolist()
random.shuffle(letters)
random.shuffle(parties)

mapping = {}
for key in parties:
    for value in letters:
        mapping[key] = value
        letters.remove(value)
        break
        
hansard['speakerparty'] = hansard['speakerparty'].replace(mapping)

We can investigate this concept by looking at each party's most important topics using the `speakerparty` column.

In [15]:
colors = ['red', 'orange', 'green', 'blue', 'lightblue', 'lightseagreen']
for index, party in enumerate(hansard['speakerparty'].dropna().unique()):
    party_topics = pd.DataFrame(hansard.groupby(['subtopic', 'speakerparty'])['subtopic'].aggregate('count').reset_index(name='count'))
    party_topics = party_topics.sort_values(by=['count'])
    party_topics = party_topics[party_topics['speakerparty'] == party]
    fig = px.bar(party_topics.tail(10), title=f"{party}'s Top 10 Topics", y='subtopic', x='count', orientation='h')
    fig.update_traces(marker_color=colors[index]).update_layout(yaxis_title=None, showlegend=False, height=500).show()


Uncomment the below cell (remove the `#`) to reveal the party names:

In [16]:
# mapping

### Questions:

1. Which topics stand out between the different parties of Parliament?
2. What is the significance of studying and analyzing the topics discussed by members of Parliament in Canadian politics?
3. How might the frequency of discussions on specific topics reflect the priorities or concerns of the government and the society?
4. What challenges might arise when analyzing and interpreting data on the topics discussed in Parliament?

### Investigating Canadian Parliament's 'Eh'-Pi

An API, which stands for **Application Programming Interface**, is like a bridge that allows different software applications to communicate and interact with each other. 

Imagine you're at a restaurant. The _menu_ acts as an API because it provides an simplfied way for you to interact with the kitchen. Instead of going into the kitchen directly and asking the chef how to cook your dish, you simply order off the menu. The kitchen staff then uses the instructions provided on the menu to prepare and serve your menu.

The API we'll be using is from [openparliament.ca](https://openparliament.ca)

Let's obtain information from _openparliament_ by making a request to a specific web address. 

In [17]:
r = requests.get('http://api.openparliament.ca/votes/?format=json&limit=100')
data = r.json()

df = pd.DataFrame(data['objects'])
display(df)

Unnamed: 0,result,url,session,bill_url,nay_total,date,yea_total,paired_total,number,description
0,Passed,/votes/44-1/366/,44-1,/bills/44-1/C-47/,146,2023-06-08,177,2,366,{'fr': '3e lecture et adoption du projet de lo...
1,Passed,/votes/44-1/365/,44-1,/bills/44-1/C-47/,145,2023-06-07,178,2,365,{'fr': 'Adoption à l’étape du rapport du proje...
2,Failed,/votes/44-1/364/,44-1,/bills/44-1/C-47/,207,2023-06-07,113,2,364,"{'fr': 'Projet de loi C-47, Loi portant exécut..."
3,Failed,/votes/44-1/363/,44-1,/bills/44-1/C-47/,208,2023-06-07,112,2,363,"{'fr': 'Projet de loi C-47, Loi portant exécut..."
4,Failed,/votes/44-1/362/,44-1,/bills/44-1/C-47/,208,2023-06-07,114,2,362,"{'fr': 'Projet de loi C-47, Loi portant exécut..."
...,...,...,...,...,...,...,...,...,...,...
95,Passed,/votes/44-1/271/,44-1,/bills/44-1/S-224/,0,2023-03-22,322,4,271,"{'fr': '2e lecture du projet de loi S-224, Loi..."
96,Failed,/votes/44-1/270/,44-1,/bills/44-1/C-289/,171,2023-03-22,149,4,270,"{'fr': '2e lecture du projet de loi C-289, Loi..."
97,Passed,/votes/44-1/269/,44-1,/bills/44-1/S-209/,114,2023-03-22,208,4,269,"{'fr': '2e lecture du projet de loi S-209, Loi..."
98,Passed,/votes/44-1/268/,44-1,,0,2023-03-21,325,4,268,{'fr': 'Quatrième rapport du Comité permanent ...


Here we have information of the past 100 bills that have been in circulation in Parliament. However, some of the data we obtained isn't in the correct format we want it in. We want our dataframe to be _clean_ in order to use it in an effective manner. Data cleaning refers to the process of identifying and/or correcting errors, inconsistencies, and inaccuracies in a dataframe. This could in the form of removing missing values, standardizing formats, and dealing with any inconsistencies.

In our first step of data cleaning, let's separate the `description` column to two different columns, `english_desc` and `french_desc`.

In [18]:
df['english_desc'] = df['description'].apply(lambda x: x['en'])
df['french_desc'] = df['description'].apply(lambda x: x['fr'])
df = df.drop(columns=['description'])
display(df)

Unnamed: 0,result,url,session,bill_url,nay_total,date,yea_total,paired_total,number,english_desc,french_desc
0,Passed,/votes/44-1/366/,44-1,/bills/44-1/C-47/,146,2023-06-08,177,2,366,"3rd reading and adoption of Bill C-47, An Act ...","3e lecture et adoption du projet de loi C-47, ..."
1,Passed,/votes/44-1/365/,44-1,/bills/44-1/C-47/,145,2023-06-07,178,2,365,"Concurrence at report stage of Bill C-47, An A...",Adoption à l’étape du rapport du projet de loi...
2,Failed,/votes/44-1/364/,44-1,/bills/44-1/C-47/,207,2023-06-07,113,2,364,"Bill C-47, An Act to implement certain provisi...","Projet de loi C-47, Loi portant exécution de c..."
3,Failed,/votes/44-1/363/,44-1,/bills/44-1/C-47/,208,2023-06-07,112,2,363,"Bill C-47, An Act to implement certain provisi...","Projet de loi C-47, Loi portant exécution de c..."
4,Failed,/votes/44-1/362/,44-1,/bills/44-1/C-47/,208,2023-06-07,114,2,362,"Bill C-47, An Act to implement certain provisi...","Projet de loi C-47, Loi portant exécution de c..."
...,...,...,...,...,...,...,...,...,...,...,...
95,Passed,/votes/44-1/271/,44-1,/bills/44-1/S-224/,0,2023-03-22,322,4,271,"2nd reading of Bill S-224, An Act to amend the...","2e lecture du projet de loi S-224, Loi modifia..."
96,Failed,/votes/44-1/270/,44-1,/bills/44-1/C-289/,171,2023-03-22,149,4,270,"2nd reading of Bill C-289, An Act to amend the...","2e lecture du projet de loi C-289, Loi modifia..."
97,Passed,/votes/44-1/269/,44-1,/bills/44-1/S-209/,114,2023-03-22,208,4,269,"2nd reading of Bill S-209, An Act respecting P...","2e lecture du projet de loi S-209, Loi institu..."
98,Passed,/votes/44-1/268/,44-1,,0,2023-03-21,325,4,268,Fourth report of the Standing Committee on Int...,Quatrième rapport du Comité permanent du comme...


Next, let's remove any bills that don't have an `url` or a *None* as a value for their `url`.

In [19]:
temp_fig = df.dropna().reset_index(drop=True)

bill_names = [re.search(f"/bills/{session_name}/(.*)/", bill_url).group(1)
              for bill_url, session_name in zip(temp_fig['bill_url'], temp_fig['session'])]

temp_fig['bill_name'] = bill_names
display(temp_fig)

Unnamed: 0,result,url,session,bill_url,nay_total,date,yea_total,paired_total,number,english_desc,french_desc,bill_name
0,Passed,/votes/44-1/366/,44-1,/bills/44-1/C-47/,146,2023-06-08,177,2,366,"3rd reading and adoption of Bill C-47, An Act ...","3e lecture et adoption du projet de loi C-47, ...",C-47
1,Passed,/votes/44-1/365/,44-1,/bills/44-1/C-47/,145,2023-06-07,178,2,365,"Concurrence at report stage of Bill C-47, An A...",Adoption à l’étape du rapport du projet de loi...,C-47
2,Failed,/votes/44-1/364/,44-1,/bills/44-1/C-47/,207,2023-06-07,113,2,364,"Bill C-47, An Act to implement certain provisi...","Projet de loi C-47, Loi portant exécution de c...",C-47
3,Failed,/votes/44-1/363/,44-1,/bills/44-1/C-47/,208,2023-06-07,112,2,363,"Bill C-47, An Act to implement certain provisi...","Projet de loi C-47, Loi portant exécution de c...",C-47
4,Failed,/votes/44-1/362/,44-1,/bills/44-1/C-47/,208,2023-06-07,114,2,362,"Bill C-47, An Act to implement certain provisi...","Projet de loi C-47, Loi portant exécution de c...",C-47
...,...,...,...,...,...,...,...,...,...,...,...,...
59,Failed,/votes/44-1/274/,44-1,/bills/44-1/C-283/,177,2023-03-22,146,2,274,"2nd reading of Bill C-283, An Act to amend the...","2e lecture du projet de loi C-283, Loi modifia...",C-283
60,Passed,/votes/44-1/273/,44-1,/bills/44-1/C-241/,152,2023-03-22,172,2,273,"3rd reading and adoption of Bill C-241, An Act...","3e lecture et adoption du projet de loi C-241,...",C-241
61,Passed,/votes/44-1/271/,44-1,/bills/44-1/S-224/,0,2023-03-22,322,4,271,"2nd reading of Bill S-224, An Act to amend the...","2e lecture du projet de loi S-224, Loi modifia...",S-224
62,Failed,/votes/44-1/270/,44-1,/bills/44-1/C-289/,171,2023-03-22,149,4,270,"2nd reading of Bill C-289, An Act to amend the...","2e lecture du projet de loi C-289, Loi modifia...",C-289


Perfect! Now we have _clean_ data in the correct format in which we can use it properly. 

We can find the total percentage of bills in Parliament that have either **passed** or **failed** alongside the individual bills.  

In [20]:
res = temp_fig['result'].value_counts().reset_index()
total_percentage = px.pie(res, values='result', names='index', title="Percentage of Bills that have Passed/Failed").show()
vote_fig = px.bar(temp_fig, x='bill_name', y='number', color='result',hover_data=['yea_total', 'nay_total'],  height=400, title="Bills that have Passed/Failed").show()

  Looking at the figures above, is the percentage of bills that pass/fail surprising? Think about the government that has the majority of seats and the bills that are frequently being passed. Is there a correlation between these factors?

Let's take a deeper look at bills that have passed/failed multiple times. This is usually the result of bills having multiple readings/being at different stages, thus being altered at each step to suit the needs of every party in Parliament.

In [21]:
# Change name_of_bill to take a look at the different readings/stages of bills in Parliament
# In order to look at different bills, change "C-21". For example, you can input "C-11" in place of "C-21"
name_of_bill = "C-21" 

party_names = ['Green Party of Canada', "Liberal Party of Canada", "Bloc Québécoi", "New Democratic Party", "Conservative Party of Canada"]
df_with_bill = temp_fig.loc[temp_fig['bill_name'] == name_of_bill]

if len(df_with_bill) == 0:
    print("No results, use the plots above to find a bill to investigate.")

for index, row in df_with_bill.iterrows():
    r = requests.get(f"http://api.openparliament.ca{row['url']}?format=json")
    data = r.json()
    vote_info = pd.DataFrame(data['party_votes'])
    vote_info.drop(columns=['party'])
    vote_info['party'] = party_names
    
    voter_percentage = vote_info['vote'].value_counts(normalize=True)
    vote_info = vote_info.style.set_caption(row['english_desc'])
    display(vote_info)
    print("Percentage of parties who voted yes/no:\n", voter_percentage.to_string(),'\n')

Unnamed: 0,vote,party,disagreement
0,Yes,Green Party of Canada,0.0
1,Yes,Liberal Party of Canada,0.013423
2,Yes,Bloc Québécoi,0.0
3,Yes,New Democratic Party,0.0
4,No,Conservative Party of Canada,0.0


Percentage of parties who voted yes/no:
 Yes    0.8
No     0.2 



Unnamed: 0,vote,party,disagreement
0,No,Green Party of Canada,0.0
1,No,Liberal Party of Canada,0.0
2,No,Bloc Québécoi,0.0
3,No,New Democratic Party,0.0
4,Yes,Conservative Party of Canada,0.0


Percentage of parties who voted yes/no:
 No     0.8
Yes    0.2 



Unnamed: 0,vote,party,disagreement
0,Yes,Green Party of Canada,0.0
1,Yes,Liberal Party of Canada,0.0
2,Yes,Bloc Québécoi,0.0
3,Yes,New Democratic Party,0.0
4,No,Conservative Party of Canada,0.0


Percentage of parties who voted yes/no:
 Yes    0.8
No     0.2 



Unnamed: 0,vote,party,disagreement
0,Yes,Green Party of Canada,0.0
1,Yes,Liberal Party of Canada,0.0
2,Yes,Bloc Québécoi,0.0
3,Yes,New Democratic Party,0.0
4,Yes,Conservative Party of Canada,0.0


Percentage of parties who voted yes/no:
 Yes    1.0 



Unnamed: 0,vote,party,disagreement
0,Yes,Green Party of Canada,0.0
1,Yes,Liberal Party of Canada,0.0
2,Yes,Bloc Québécoi,0.0
3,Yes,New Democratic Party,0.0
4,No,Conservative Party of Canada,0.0


Percentage of parties who voted yes/no:
 Yes    0.8
No     0.2 



Unnamed: 0,vote,party,disagreement
0,No,Green Party of Canada,0.0
1,No,Liberal Party of Canada,0.0
2,No,Bloc Québécoi,0.0
3,No,New Democratic Party,0.0
4,Yes,Conservative Party of Canada,0.0


Percentage of parties who voted yes/no:
 No     0.8
Yes    0.2 



We can take a deeper dive and see how different members of Parliament vote on certain bills. 

In [22]:
pd.set_option('display.max_rows', None)
display(temp_fig[['url', 'english_desc']].head(None))

Unnamed: 0,url,english_desc
0,/votes/44-1/366/,"3rd reading and adoption of Bill C-47, An Act ..."
1,/votes/44-1/365/,"Concurrence at report stage of Bill C-47, An A..."
2,/votes/44-1/364/,"Bill C-47, An Act to implement certain provisi..."
3,/votes/44-1/363/,"Bill C-47, An Act to implement certain provisi..."
4,/votes/44-1/362/,"Bill C-47, An Act to implement certain provisi..."
5,/votes/44-1/361/,"Bill C-47, An Act to implement certain provisi..."
6,/votes/44-1/360/,"Bill C-47, An Act to implement certain provisi..."
7,/votes/44-1/359/,"Bill C-47, An Act to implement certain provisi..."
8,/votes/44-1/358/,"Bill C-47, An Act to implement certain provisi..."
9,/votes/44-1/357/,"Bill C-47, An Act to implement certain provisi..."


Listed above are a list of `urls` of bills and their corresponding descriptions. You can use this list of `urls` to find a particular bill to explore in the cell below.

In [23]:
# Change bill_to_explore to take a look at the different bills members of Parliament voted on.
# In order to look at different bills, change "/votes/44-1/333/". For example, you can input "/votes/44-1/279/" in place of "/votes/44-1/333/"
bill_to_explore = '/votes/44-1/333/'

r = requests.get(f"http://api.openparliament.ca/votes/ballots/?format=json&vote={bill_to_explore}")
data = r.json()
politician_vote_info = pd.DataFrame(data['objects'])

politician_urls = politician_vote_info['politician_url']
membership_urls = [f"http://api.openparliament.ca{url}?format=json" for url in politician_urls]

responses = [requests.get(url) for url in membership_urls]
data = [response.json() for response in responses]

parties = [d['memberships'][0]['party']['name']['en'] for d in data]
provinces = [d['memberships'][0]['riding']['province'] for d in data]

politician_vote_info['party'] = np.array(parties)
politician_vote_info['province_info'] = np.array(provinces)

politician_vote_info['name'] = politician_vote_info['politician_url'].str.extract("/politicians/(.*)/", expand=False)
display(politician_vote_info)


Unnamed: 0,politician_membership_url,ballot,politician_url,vote_url,party,province_info,name
0,/politicians/memberships/4603/,No,/politicians/brendan-hanley/,/votes/44-1/333/,Liberal Party of Canada,YT,brendan-hanley
1,/politicians/memberships/4367/,No,/politicians/michael-mcleod/,/votes/44-1/333/,Liberal Party of Canada,NT,michael-mcleod
2,/politicians/memberships/4652/,Didn't vote,/politicians/alain-rayes/,/votes/44-1/333/,Independent,QC,alain-rayes
3,/politicians/memberships/1211/,Didn't vote,/politicians/justin-trudeau/,/votes/44-1/333/,Liberal Party of Canada,QC,justin-trudeau
4,/politicians/memberships/4404/,Didn't vote,/politicians/anthony-rota/,/votes/44-1/333/,Liberal Party of Canada,ON,anthony-rota
5,/politicians/memberships/4156/,Didn't vote,/politicians/yvonne-jones/,/votes/44-1/333/,Liberal Party of Canada,NL,yvonne-jones
6,/politicians/memberships/4385/,Didn't vote,/politicians/michelle-rempel/,/votes/44-1/333/,Conservative Party of Canada,AB,michelle-rempel
7,/politicians/memberships/1067/,Didn't vote,/politicians/randy-hoback/,/votes/44-1/333/,Conservative Party of Canada,SK,randy-hoback
8,/politicians/memberships/4185/,Didn't vote,/politicians/francois-philippe-champagne/,/votes/44-1/333/,Liberal Party of Canada,QC,francois-philippe-champagne
9,/politicians/memberships/1028/,Didn't vote,/politicians/kirsty-duncan/,/votes/44-1/333/,Liberal Party of Canada,ON,kirsty-duncan


Looking at the description of the `url` for */votes/44-1/333/*, it states:
> 3rd reading and adoption of Bill C-21, An Act to amend certain Acts and to make certain consequential amendments (firearms)
> 
Now we can look more in depth on why potential members of Parliament chose to vote the way they did on this particular bill.

We can also look at how parties voted on certain bills by combining members of Parliament whom share the same party.

In [24]:
party_counts = politician_vote_info.groupby(['party', 'ballot'])['name'].agg('count').reset_index()
party_counts.rename(columns={"name": "count"}, inplace=True)
display(party_counts)

Unnamed: 0,party,ballot,count
0,Conservative Party of Canada,Didn't vote,3
1,Conservative Party of Canada,No,3
2,Independent,Didn't vote,1
3,Liberal Party of Canada,Didn't vote,5
4,Liberal Party of Canada,No,2
5,Liberal Party of Canada,Yes,5
6,New Democratic Party,Yes,1


In [25]:
party_fig = px.bar(party_counts, x='party', y='count', color='ballot', title='Ballot votes of each Party').show()

### Questions:

1. What factors do you think influence how political parties decide to vote on specific bills?
2. How can data science techniques be used to analyze and predict how certain parties may vote on a particular bill?
3. Why is it important for political parties to have a consistent voting pattern on bills in Parliament?
4. In what ways can the study of party voting patterns help citizens understand the political landscape and hold their representatives accountable?

### Making Soup
To get the Hansard Data we will be scraping from the website https://openparliament.ca (we previously used openparliament's API). To do this, we use the requests module to send a request. It returns the HTML markup for the web page. To understand the markup, we will be using BS4. This is a module that sorts through the markup and allows us to pull specific data that we need.

In [26]:
# You have to hard code in the date of the debate
# If you want to chnage the date of the debate, change '2023/03/31' with another valid date using the format YYYY/MM/DD
dateOfDebate = ('2023/03/31/')
# url is https://openparliament.ca/debates/2023/02/17/?singlepage=1
page = requests.get('https://openparliament.ca/debates/' + dateOfDebate + '?singlepage=1').text
# data is soup
#?singlepage=1' gets all of the speakers
data = BeautifulSoup(page, 'html.parser')

Now let's start scraping! The cell below is extracting information from your inputted webpage about a debate. Specifically, it's looking for certain elements on the page and collects it such as the name of the speaker, their political party, their affiliation, and what they said during the debate. 

In [27]:
# i is item in list
# row statement_browser statement - is the block of code that allows me to find the Name; Party and What They Said
debateDict = {'Name': [],
              'Party' : [],
              'Affiliation' : [],
              'Said' : []
             }
for i in data.findAll("div", class_="row statement_browser statement"):
    #getting the name of the speaker
    try:
        name = i.find('span', class_='pol_name').text
        name = str(name)
    except AttributeError:
        continue
    
    print(name)
    #Try is to find if they have spoken already, if they have, we do not find their party or affiliation
    try:
        index = debateDict['Name'].index(name)
        indexFound = True
    # If it throws an error, then they are not pre-existing in the dict
    except ValueError:
        indexFound = False
        # Finding the affiliation
        try:
            affiliation = i.find('span', class_="pol_affil").text
            affiliation = str(affiliation)
            affiliation = affiliation.replace("						", "")
        except AttributeError:
            affiliation = 'N/A'
        print(affiliation)
        #
        # For speakers without party tags
        try:
            party = i.find('p', class_='partytag').text
            party = str(party)

        except AttributeError:
            party = 'N/A'
        print(party)
    
    said = i.find('div', class_='text').text
    print(said)
    print('\n')
    
    if indexFound:
        debateDict["Said"][index] = debateDict["Said"][index] + said
    else:
        debateDict['Name'].append(name)
        debateDict['Party'].append(party)
        debateDict['Affiliation'].append(affiliation)
        debateDict['Said'].append(said)
        
print(debateDict)
    
    #As it grabs each piece of data ( The speaker; the party; the affiliation; and what they said)

Ziad Aboultaif

Edmonton Manning, AB
Conservative

Madam Speaker, Canada is a safe haven for money laundering. It is a known fact, and it is getting worse by the day. 
Would the minister be able to advise us of the following? First, how much would this bill limit or downsize the money laundering market in Canada and, second, what is the amount of money laundering in Canada that is known to the minister or the government?


François-Philippe Champagne

Saint-Maurice—Champlain, QC
Liberal

Madam Speaker, my hon. colleague knows I have enormous respect for him, and I take it from his comments that he will be supporting Bill C-42.
The genesis of Bill C-42 is to combat money laundering. It is to make Canada best in class. It is to make Canada a leader in the G7. The faster this House can pass Bill C-42, the better off we will all be. I dream that we could even do that by unanimous consent so that we can move to phase this in very quickly. The reason is that the longer we wait, the less we w

Let's visualize what we scraped in a *dataframe*. 

In [28]:
dataFrame = pd.DataFrame.from_dict(debateDict)
dataFrame

Unnamed: 0,Name,Party,Affiliation,Said
0,Ziad Aboultaif,Conservative\n,"\nEdmonton Manning, AB","Madam Speaker, Canada is a safe haven for mone..."
1,François-Philippe Champagne,Liberal\n,"\nSaint-Maurice—Champlain, QC","Madam Speaker, my hon. colleague knows I have ..."
2,Gabriel Ste-Marie,Bloc\n,"\nJoliette, QC","Madam Speaker, I thank the hon. minister for t..."
3,Daniel Blaikie,NDP\n,"\nElmwood—Transcona, MB","Madam Speaker, we know that a well-designed re..."
4,Brad Vis,Conservative\n,"\nMission—Matsqui—Fraser Canyon, BC","Madam Speaker, proposed subsection 21.‍303(3),..."
5,Chandra Arya,Liberal\n,"\nNepean, ON","Madam Speaker, corporations exist basically to..."
6,The Assistant Deputy Speaker (Mrs. Alexandra M...,Liberal\n,\nAlexandra Mendes,I have to give the minister time to answer.\nT...
7,Marty Morantz,Conservative\n,"\nCharleswood—St. James—Assiniboia—Headingley, MB","Madam Speaker, there are a number of areas of ..."
8,Mike Morrice,Green\n,"\nKitchener Centre, ON","Madam Speaker, around the world we are seeing ..."
9,Kevin Lamoureux,Liberal\n,"\nWinnipeg North, MB","Madam Speaker, budget 2023 continues to demons..."


Notice how in some of our columns, there appears to be a '\n'. In Python, this represents a **newline character**, which essentially represents a line break. Treat it as if you're starting a new paragraph when you see a '\n'. For our purposes, we're going to remove this newline character for the sake of continuity of our dataframe. 

In [29]:
dataFrame['Party'].replace('\n', '', regex=True, inplace=True)
dataFrame['Affiliation'].replace('\n', '', regex=True, inplace=True)
dataFrame

Unnamed: 0,Name,Party,Affiliation,Said
0,Ziad Aboultaif,Conservative,"Edmonton Manning, AB","Madam Speaker, Canada is a safe haven for mone..."
1,François-Philippe Champagne,Liberal,"Saint-Maurice—Champlain, QC","Madam Speaker, my hon. colleague knows I have ..."
2,Gabriel Ste-Marie,Bloc,"Joliette, QC","Madam Speaker, I thank the hon. minister for t..."
3,Daniel Blaikie,NDP,"Elmwood—Transcona, MB","Madam Speaker, we know that a well-designed re..."
4,Brad Vis,Conservative,"Mission—Matsqui—Fraser Canyon, BC","Madam Speaker, proposed subsection 21.‍303(3),..."
5,Chandra Arya,Liberal,"Nepean, ON","Madam Speaker, corporations exist basically to..."
6,The Assistant Deputy Speaker (Mrs. Alexandra M...,Liberal,Alexandra Mendes,I have to give the minister time to answer.\nT...
7,Marty Morantz,Conservative,"Charleswood—St. James—Assiniboia—Headingley, MB","Madam Speaker, there are a number of areas of ..."
8,Mike Morrice,Green,"Kitchener Centre, ON","Madam Speaker, around the world we are seeing ..."
9,Kevin Lamoureux,Liberal,"Winnipeg North, MB","Madam Speaker, budget 2023 continues to demons..."


Now let's define a way to find nouns for our dataframe. The easiest way to do this is through the use of a **function**. 

In Python, a function is a named block of code that performs a specific task or operation. It is like a reusable recipe or set of instructions that you can give to the computer to perform a certain action.

Functions are useful because they allow you to organize your code into smaller, manageable pieces. Instead of writing the same code over and over again, you can define a function and reuse it whenever needed by calling it with different inputs. This promotes code reusability, modularity, and makes your code easier to read and maintain.

In [30]:
# api problem
'''here is the problem
2023-05-17 18:24:38.251539: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
'''
try:
    import spacy
    nlp = spacy.load('en_core_web_sm')
except:
    !pip install spacy --user
    !python -m spacy download en_core_web_sm
    import spacy
    nlp = spacy.load('en_core_web_sm')
from IPython.display import clear_output
clear_output()

def find_nouns(text):
    nouns = []
    try:
        for token in nlp(text):
            if token.pos_ == 'NOUN':
                nouns.append(token.lemma_)
    except:
        pass
    return nouns




[notice] A new release of pip available: 22.3.1 -> 23.1.2
[notice] To update, run: python.exe -m pip install --upgrade pip
2023-06-09 18:53:06.274305: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2023-06-09 18:53:06.274681: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
Traceback (most recent call last):
  File "c:\Users\calga\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 187, in _run_module_as_main
    mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
  File "c:\Users\calga\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 146, in _get_module_details
    return _get_module_details(pkg_main_name, error)
  File "c:\Users\calga\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 110, in _get_module_details
    __import__(pkg_name)
  File "c:\Users\calga\A

AttributeError: module 'tensorflow._api.v2.compat.v2.__internal__' has no attribute 'register_load_context_function'

## What Every Party Said

Using our function defined earlier, let's look at the top 25 nouns spoken by each party in Parliament. You can also alter the variable `n` below to look at the top `n` nouns spoken by a party. 

In [None]:
# Alter this variable n if you'd like to see other top 'n' values 
# For example, changing '25' for '30' would give you the top 30 nouns spoken by a party
n = 25

In [None]:
party = 'Liberal'
pos = 'NOUN'

exclude_words = ['government', 'member', 'people', 'time', 'year', 'legislation', 'bill', 'madam']

word_list = []
index = dataFrame[dataFrame["Party"]==party].index.values
cell_values = ''
for item in index:
    cell_values = cell_values + dataFrame.iloc[item]["Said"]
for words in cell_values.split(' '):
    for word in find_nouns(words):
        if word not in exclude_words:
            word_list.append(word)
common_words = pd.DataFrame.from_dict(Counter(word_list), orient='index').sort_values(0, ascending=False).head(n)
title = 'Top '+str(n)+' '+pos.lower()+'s'+' spoken by the '+party+' Party'
lib_fig = px.bar(common_words, title=title, labels={'index':pos.capitalize(), 'value':'Count'}).update_layout(showlegend=False)
lib_fig.update_traces(marker_color='red')

In [None]:
party = 'Conservative'
pos = 'NOUN'

exclude_words = ['government', 'member', 'people', 'time', 'year', 'legislation', 'bill', 'madam']

word_list = []
index = dataFrame[dataFrame["Party"]==party].index.values
cell_values = ''
for item in index:
    cell_values = cell_values + dataFrame.iloc[item]["Said"]
for words in cell_values.split(' '):
    for word in find_nouns(words):
        if word not in exclude_words:
            word_list.append(word)
common_words = pd.DataFrame.from_dict(Counter(word_list), orient='index').sort_values(0, ascending=False).head(n)
title = 'Top '+str(n)+' '+pos.lower()+'s'+' spoken by the '+party+' Party'
con_fig = px.bar(common_words, title=title, labels={'index':pos.capitalize(), 'value':'Count'}).update_layout(showlegend=False)
con_fig.update_traces(marker_color='blue')

In [None]:
party = 'NDP'
pos = 'NOUN'

exclude_words = ['government', 'member', 'people', 'time', 'year', 'legislation', 'bill', 'madam']

word_list = []
index = dataFrame[dataFrame["Party"]==party].index.values
cell_values = ''
for item in index:
    cell_values = cell_values + dataFrame.iloc[item]["Said"]
for words in cell_values.split(' '):
    for word in find_nouns(words):
        if word not in exclude_words:
            word_list.append(word)
common_words = pd.DataFrame.from_dict(Counter(word_list), orient='index').sort_values(0, ascending=False).head(n)
title = 'Top '+str(n)+' '+pos.lower()+'s'+' spoken by the '+party+' Party'
ndp_fig = px.bar(common_words, title=title, labels={'index':pos.capitalize(), 'value':'Count'}).update_layout(showlegend=False)
ndp_fig.update_traces(marker_color='orange')

In [None]:
party = 'Bloc'
pos = 'NOUN'

exclude_words = ['government', 'member', 'people', 'time', 'year', 'legislation', 'bill', 'madam']

word_list = []
index = dataFrame[dataFrame["Party"]==party].index.values
cell_values = ''
for item in index:
    cell_values = cell_values + dataFrame.iloc[item]["Said"]
for words in cell_values.split(' '):
    for word in find_nouns(words):
        if word not in exclude_words:
            word_list.append(word)
common_words = pd.DataFrame.from_dict(Counter(word_list), orient='index').sort_values(0, ascending=False).head(n)
title = 'Top '+str(n)+' '+pos.lower()+'s'+' spoken by the '+party+' Party'
bloc_fig = px.bar(common_words, title=title, labels={'index':pos.capitalize(), 'value':'Count'}).update_layout(showlegend=False)
bloc_fig.update_traces(marker_color='lightblue')

In [None]:
party = 'Green'
pos = 'NOUN'

exclude_words = ['government', 'member', 'people', 'time', 'year', 'legislation', 'bill', 'madam']

word_list = []
index = dataFrame[dataFrame["Party"]==party].index.values
cell_values = ''
for item in index:
    cell_values = cell_values + dataFrame.iloc[item]["Said"]
for words in cell_values.split(' '):
    for word in find_nouns(words):
        if word not in exclude_words:
            word_list.append(word)
common_words = pd.DataFrame.from_dict(Counter(word_list), orient='index').sort_values(0, ascending=False).head(n)
title = 'Top '+str(n)+' '+pos.lower()+'s'+' spoken by the '+party+' Party'
grn_fig = px.bar(common_words, title=title, labels={'index':pos.capitalize(), 'value':'Count'}).update_layout(showlegend=False)
grn_fig.update_traces(marker_color='green')

## What Your Representitive Said

What can also look at the `Name` column in our dataframe to see what nouns are common in our members of Parliament.

In [None]:
# Change the value of speaker in order to find different common nouns
# For example, instead of 'Clifford Small', you can input 'Jenny Kwan'
speaker = 'Clifford Small'

pos = 'NOUN'
# Alter this variable n if you'd like to see other top 'n' values 
# For example, changing '25' for '30' would give you the top 30 nouns spoken by an speaker
n = 25
exclude_words = ['government', 'member', 'people', 'time', 'year', 'legislation', 'bill', 'madam']

word_list = []
index = dataFrame[dataFrame["Name"]==speaker].index.values
cell_value = ''
for item in index:
    cell_value = cell_value + dataFrame.iloc[item]["Said"]
for words in cell_value.split(" "):
    for word in find_nouns(words):
        if word not in exclude_words:
            word_list.append(word)
common_words = pd.DataFrame.from_dict(Counter(word_list), orient='index').sort_values(0, ascending=False).head(n)
title = 'Top '+str(n)+' '+speaker+' '+pos.lower()+'s'
px.bar(common_words, title=title, labels={'index': 'Word', 'value': 'Count'}).update_layout(showlegend=False, height=300).show()

## By Area

Lastly, we can also find the common nouns of representatives in certain `provinces`, `cities`, or `ridings`.

In [None]:
# area can be city, riding, or province; it is just looking for a sub-string in the Affiliation cell
# For example, instead of 'AB', you can insert 'Edmonton'
area = 'AB'
pos = 'NOUN'
n = 25
exclude_words = ['government', 'member', 'people', 'time', 'year', 'legislation', 'bill', 'madam']

word_list = []
cell_values = ''
for item in range(len(dataFrame.index)):
    if area in dataFrame.iloc[item]["Affiliation"]:
        cell_values = cell_values + dataFrame.iloc[item]["Said"]
    else:
        continue
for words in cell_values.split(' '):
    for word in find_nouns(words):
        if word not in exclude_words:
            word_list.append(word)
common_words = pd.DataFrame.from_dict(Counter(word_list), orient='index').sort_values(0, ascending=False).head(n)
title = 'Top '+str(n)+' '+pos.lower()+'s'+' spoken by the representitives for '+area
px.bar(common_words, title=title, labels={'index':pos.capitalize(), 'value':'Count'}).update_layout(showlegend=False)

### Questions:

1. What are the benefits and limitations of web scraping as a method to collect data from online sources, such as the debates in the Canadian Parliament?
2. How can the analysis of debates and identification of common nouns be used to compare and contrast the priorities of different political parties over time?
3. Can the analysis of common nouns in the debates help us understand the language and rhetoric used by political parties and its impact on public discourse?
4. What are the potential biases or limitations in analyzing debates and identifying common nouns, and how can they be addressed to ensure the accuracy and reliability of the findings?

# Conclusion

The Canadian government provides transcripts of debates in the House of Commons, called the [Hansard](https://en.wikipedia.org/wiki/Hansard). In this notebook we imported the Hansard data from 2020 and identified the frequencies of some [parts of speech](https://universaldependencies.org/docs/u/pos) using [natural language processing]([spaCy](https://spacy.io)). We also found which parties spoke the most relative to their seats alongside how often certain members of Parliament spoke. 

We also used the Hansard to find out which topics each party prioritized and experimented if you were able to identify parties based on the top 10 topics they spoke about. 

Lastly, using [openparliament.ca](https:https://openparliament.ca/), we identified trends of bill voting, specifically how certain members/parties of Parliament voted on certain bills alongside trends on which bills passed or failed. 

Perhaps you can try extension activities such as investigating predictions on which bills pass or fail in Parliament, identifying the most common [named entities](https://www.geeksforgeeks.org/python-named-entity-recognition-ner-using-spacy), or creating [word clouds](https://github.com/callysto/curriculum-notebooks/blob/master/EnglishLanguageArts/WordClouds/word-clouds.ipynb)

[![Callysto.ca License](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-bottom.jpg?raw=true)](https://github.com/callysto/curriculum-notebooks/blob/master/LICENSE.md)