# Data Cleaning and EDA

## Initial Overview


In [None]:
# libraries
import pandas as pd
import numpy as np

In [None]:
# read in the csv
df = pd.read_csv('data/contributions/od_cntrbtn_audt_e.csv')

In [6]:
df.head()

Unnamed: 0,Political Entity,Recipient ID,Recipient,Recipient last name,Recipient first name,Recipient middle initial,Political Party of Recipient,Electoral District,Electoral event,Fiscal/Election date,...,Contributor first name,Contributor middle initial,Contributor City,Contributor Province,Contributor Postal code,Contribution Received date,Monetary amount,Non-Monetary amount,Contribution given through,Leadership contestant
0,﻿Candidates,4716,"Béchard, Bruno-Marie",Béchard,Bruno-Marie,,Liberal Party of Canada,Sherbrooke,38th general election,2004-06-28,...,DONAT,,Magog,QC,J1X 2C3,,400.0,0.0,,
1,Candidates,4716,"Béchard, Bruno-Marie",Béchard,Bruno-Marie,,Liberal Party of Canada,Sherbrooke,38th general election,2004-06-28,...,REAL,,Sherbrooke,QC,J1L 2B6,,500.0,0.0,,
2,Candidates,4716,"Béchard, Bruno-Marie",Béchard,Bruno-Marie,,Liberal Party of Canada,Sherbrooke,38th general election,2004-06-28,...,ANDRE,,Ascot,QC,J1K 3B4,,500.0,0.0,,
3,Candidates,4716,"Béchard, Bruno-Marie",Béchard,Bruno-Marie,,Liberal Party of Canada,Sherbrooke,38th general election,2004-06-28,...,VIOLETTE,,Shebrooke,QC,J1H 4J9,,2500.0,0.0,,
4,Candidates,4716,"Béchard, Bruno-Marie",Béchard,Bruno-Marie,,Liberal Party of Canada,Sherbrooke,38th general election,2004-06-28,...,JEAN,,Katevale,QC,J0B 1W0,,500.0,0.0,,


In [7]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5815963 entries, 0 to 5815962
Data columns (total 27 columns):
 #   Column                        Dtype  
---  ------                        -----  
 0   Political Entity              object 
 1   Recipient ID                  int64  
 2   Recipient                     object 
 3   Recipient last name           object 
 4   Recipient first name          object 
 5   Recipient middle initial      object 
 6   Political Party of Recipient  object 
 7   Electoral District            object 
 8   Electoral event               object 
 9   Fiscal/Election date          object 
 10  Form ID                       object 
 11  Financial Report              object 
 12  Part Number of Return         object 
 13  Financial Report part         object 
 14  Contributor type              object 
 15  Contributor name              object 
 16  Contributor last name         object 
 17  Contributor first name        object 
 18  Contributor middle ini

In [None]:
# check for any missing contributions - unlikely to be able to fill in the gaps
df['Monetary amount'].isna().any()

False

- so no missing contributions, which is excellent in terms of lack of gaps

### Election Events

Best way forward to join StatsCan sets on electoral outcome would be to identify the elections needed/separate as needed

In [9]:
df.sort_values(by=['Fiscal/Election date'], ascending= False)

Unnamed: 0,Political Entity,Recipient ID,Recipient,Recipient last name,Recipient first name,Recipient middle initial,Political Party of Recipient,Electoral District,Electoral event,Fiscal/Election date,...,Contributor first name,Contributor middle initial,Contributor City,Contributor Province,Contributor Postal code,Contribution Received date,Monetary amount,Non-Monetary amount,Contribution given through,Leadership contestant
1177195,Registered associations,52760,Perth--Wellington Federal Liberal Association,Perth--Wellington Federal Liberal Association,,,Liberal Party of Canada,Perth--Wellington,Annual,2024-12-31,...,,,,,,,3181.00,0.00,,
614555,Nomination contestants,54712,"Nazeer, Nadirah",Nazeer,Nadirah,,Conservative Party of Canada,Oakville West,,2024-11-26,...,Angela,,Montreal,QC,H3E 2B7,2025-01-08,1000.00,0.00,,
614549,Nomination contestants,54712,"Nazeer, Nadirah",Nazeer,Nadirah,,Conservative Party of Canada,Oakville West,,2024-11-26,...,Mohamad,,Milton,ON,L9E 1J1,2024-12-03,250.00,0.00,,
614550,Nomination contestants,54712,"Nazeer, Nadirah",Nazeer,Nadirah,,Conservative Party of Canada,Oakville West,,2024-11-26,...,Palma,,Oakville,ON,L6J 4 J,2024-11-25,1000.00,0.00,,
614551,Nomination contestants,54712,"Nazeer, Nadirah",Nazeer,Nadirah,,Conservative Party of Canada,Oakville West,,2024-11-26,...,Michael,,Toronto,ON,M2P 1W8,2024-11-22,300.00,0.00,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
601793,Nomination contestants,3865,"Fraser, Scott",Fraser,Scott,,New Democratic Party,Nanaimo--Alberni,,2004-02-01,...,Scott,,Tofino,BC,V0R2Z0,2004-03-07,0.00,1742.79,,
601167,Nomination contestants,1883,"Broadbent, Ed",Broadbent,Ed,,New Democratic Party,Ottawa Centre,,2004-01-20,...,of Canada,,Ottawa,On,K1P 5S9,,2830.00,0.00,,
603277,Nomination contestants,1883,"Broadbent, Ed",Broadbent,Ed,,New Democratic Party,Ottawa Centre,,2004-01-20,...,,,,,,,160.00,0.00,,
603276,Nomination contestants,1883,"Broadbent, Ed",Broadbent,Ed,,New Democratic Party,Ottawa Centre,,2004-01-20,...,,,,,,,40.13,0.00,,


In [10]:
df['Electoral event'].unique()

array(['38th general election', '39th general election',
       'November 27, 2006 By-elections', 'May 27, 2004, By-election',
       'May 24, 2005 By-election', 'March 17, 2008 By-elections',
       '40th general election', 'September 17, 2007 By-elections',
       'September 8, 2008 By-elections', '41st general election',
       'November 26, 2012 By-elections', 'September 22, 2008 By-election',
       'November 9, 2009 By-elections', 'November 29, 2010 By-elections',
       'March 19, 2012 By-election', 'June 30, 2014 By-elections',
       'November 25, 2013 By-elections', 'May 13, 2013 By-election',
       'November 17, 2014 By-elections', '42nd general election',
       'October 24, 2016 By-election', 'April 3, 2017 By-elections',
       'May 6, 2019 By-election', 'October 23, 2017 By-elections',
       'February 25, 2019, By-elections',
       'December 11, 2017, By-elections', 'June 18, 2018, By-election',
       'December 3, 2018, By-election', 'October 19, 2015 By-elections',


Noting above there are A LOT of electoral events - for now we focus on the general elections, by-election results is available on Elections Canada website

In [11]:
df[df['Electoral event'] == '44th general election']

Unnamed: 0,Political Entity,Recipient ID,Recipient,Recipient last name,Recipient first name,Recipient middle initial,Political Party of Recipient,Electoral District,Electoral event,Fiscal/Election date,...,Contributor first name,Contributor middle initial,Contributor City,Contributor Province,Contributor Postal code,Contribution Received date,Monetary amount,Non-Monetary amount,Contribution given through,Leadership contestant
118278,Candidates,47414,"Ward, Jeff",Ward,Jeff,,New Democratic Party,Sydney--Victoria,44th general election,2021-09-20,...,Yvonne,,Membertou,NS,B1S 3K8,2021-09-08,300.00,0.0,,
118279,Candidates,47414,"Ward, Jeff",Ward,Jeff,,New Democratic Party,Sydney--Victoria,44th general election,2021-09-20,...,Parker,,Coxhealth,NS,B1R 1T8,2021-09-05,500.00,0.0,,
118280,Candidates,47414,"Ward, Jeff",Ward,Jeff,,New Democratic Party,Sydney--Victoria,44th general election,2021-09-20,...,Tanaysha,R,Membertou,NS,B1S 0K2,2021-09-15,300.00,0.0,,
118281,Candidates,47414,"Ward, Jeff",Ward,Jeff,,New Democratic Party,Sydney--Victoria,44th general election,2021-09-20,...,Tayna,P,North York,ON,M2M 2W2,2021-09-11,215.00,0.0,,
118282,Candidates,47414,"Ward, Jeff",Ward,Jeff,,New Democratic Party,Sydney--Victoria,44th general election,2021-09-20,...,Todd,,Dutch Brook,NS,B1L 1E9,2021-09-19,400.00,0.0,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
135436,Candidates,48462,"Zimmer, Katelyn",Zimmer,Katelyn,,Liberal Party of Canada,Moose Jaw--Lake Centre--Lanigan,44th general election,2021-09-20,...,,,,,,,90.00,0.0,,
135437,Candidates,48463,"Yeh, Teresa",Yeh,Teresa,,New Democratic Party,Brampton North,44th general election,2021-09-20,...,,,,,,,56.79,0.0,,
135438,Candidates,48465,"Dwyer, Madelaine",Dwyer,Madelaine,,New Democratic Party,Charleswood--St. James--Assiniboia--Headingley,44th general election,2021-09-20,...,,,,,,,250.00,0.0,,
135439,Candidates,48470,"Niles, Tyler",Niles,Tyler,,People's Party of Canada,Mission--Matsqui--Fraser Canyon,44th general election,2021-09-20,...,,,,,,,600.00,0.0,,


In [23]:
# save individual dataframes just mostly for reference/isolation as needed
general_elections_list = ['38th general election', '39th general election', '40th general election',
                          '41st general election', '42nd general election','43rd general election', '44th general election']

for election in general_elections_list:
    election_df = df[df['Electoral event'] == election]
    election_df.to_csv(f'data/contributions/general_elections/{election}.csv')

### StatsCan/Election Canada Data

In [20]:
stats_df = pd.read_csv('data/stats_can_general_elections/44_general_election.csv')
stats_df

Unnamed: 0,Province,Electoral District Name/Nom de circonscription,Electoral District Number/Numéro de circonscription,Population,Electors/Électeurs,Polling Stations/Bureaux de scrutin,Valid Ballots/Bulletins valides,Percentage of Valid Ballots /Pourcentage des bulletins valides,Rejected Ballots/Bulletins rejetés,Percentage of Rejected Ballots /Pourcentage des bulletins rejetés,Total Ballots Cast/Total des bulletins déposés,Percentage of Voter Turnout/Pourcentage de la participation électorale,Elected Candidate/Candidat élu
0,Newfoundland and Labrador/Terre-Neuve-et-Labrador,Avalon,10001,86494,70903,230,37144,99.3,273,0.7,37417,52.8,"McDonald, Ken Liberal/Libéral"
1,Newfoundland and Labrador/Terre-Neuve-et-Labrador,Bonavista--Burin--Trinity,10002,74116,59605,273,29991,98.4,482,1.6,30473,51.1,"Rogers, Churence Liberal/Libéral"
2,Newfoundland and Labrador/Terre-Neuve-et-Labrador,Coast of Bays--Central--Notre Dame,10003,77680,63631,244,31834,97.9,695,2.1,32529,51.1,"Small, Clifford Conservative/Conservateur"
3,Newfoundland and Labrador/Terre-Neuve-et-Labrador,Labrador,10004,27197,20239,87,9653,99.0,94,1.0,9747,48.2,"Jones, Yvonne Liberal/Libéral"
4,Newfoundland and Labrador/Terre-Neuve-et-Labrador,Long Range Mountains,10005,86553,70208,263,36447,98.8,461,1.2,36908,52.6,"Hutchings, Gudie Liberal/Libéral"
...,...,...,...,...,...,...,...,...,...,...,...,...,...
333,British Columbia/Colombie-Britannique,Victoria,59041,117133,101151,232,66748,99.3,468,0.7,67216,66.5,"Collins, Laurel NDP-New Democratic Party/NPD-N..."
334,British Columbia/Colombie-Britannique,West Vancouver--Sunshine Coast--Sea to Sky Cou...,59042,119113,98717,256,63459,99.6,279,0.4,63738,64.6,"Weiler, Patrick Liberal/Libéral"
335,Yukon,Yukon,60001,35874,30482,88,19406,99.3,142,0.7,19548,64.1,"Hanley, Brendan Liberal/Libéral"
336,Northwest Territories/Territoires du Nord-Ouest,Northwest Territories/Territoires du Nord-Ouest,61001,41786,30519,101,14095,98.9,155,1.1,14250,46.7,"McLeod, Michael Liberal/Libéral"


Will need to do a merge/melt, or stack all the frames...