## In-class Lesson 1: Reading and writing csv files using `pandas`

I've always found that learning a programming language sticks in my mind best when I have a specific project to work on. It keeps me motivated and gets me beyond running exercises that can seem pointless. 

So today, we will jump into Python with a small project. Our goal is to read in a csv file of campaign finance independent expenditures, do some cleaning and analysis and write out a csv. 


## Import the `pandas` package

In [1]:
import pandas as pd

## Read our csv

In [3]:
df = pd.read_csv('./data/ies.csv')

df


Unnamed: 0,committee_id,committee_name,report_year,file_number,payee_name,expenditure_description,expenditure_date,dissemination_date,expenditure_amount,support_oppose_indicator,candidate_id,candidate_name,candidate_office,candidate_office_state,candidate_office_district,candidate_party,election_type,election_type_full
0,C00487470,CLUB FOR GROWTH ACTION,2022,1593238,CLUB FOR GROWTH,RADIO AD COSTS (FROM ADVANCE LINE 21),4/20/22 0:00,4/20/22 0:00,37.29,S,H2NC05157,"HINES, ROBERT",H,NC,13,REP,P2022,
1,C00487470,CLUB FOR GROWTH ACTION,2022,1573503,PRIME MEDIA PARTNERS,DIGITAL ADVERTISING,3/10/22 0:00,3/10/22 0:00,8942.00,O,S2OH00436,"VANCE, J D",S,OH,0,REP,P2022,
2,C00487470,CLUB FOR GROWTH ACTION,2021,1547523,CLUB FOR GROWTH,INTERNET COMMUNICATIONS (FROM ADVANCE LINE 21),10/25/21 0:00,10/25/21 0:00,10.17,O,S2OH00436,"VANCE, J D",S,OH,0,REP,P2022,
3,C00487470,CLUB FOR GROWTH ACTION,2021,1514640,"RUMBLEUP, LLC",TEXT MESSAGING,4/27/21 0:00,4/27/21 0:00,5711.28,S,H2TX06251,"WRIGHT, SUSAN",H,TX,6,REP,S2021,SPECIAL-GENERAL
4,C00487470,CLUB FOR GROWTH ACTION,2021,1524799,CLUB FOR GROWTH,MAIL PRODUCTION (FROM ADVANCE LINE 21),7/12/21 0:00,7/12/21 0:00,166.22,O,H2OH15145,"LARE, JEFF",H,OH,15,REP,S2021,SPECIAL-PRIMARY
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
802,C00487470,CLUB FOR GROWTH ACTION,2022,1593238,"GRP BUYING, LLC",TV AD PLACEMENT,4/18/22 0:00,4/20/22 0:00,352920.00,S,H2NC05157,"HINES, ROBERT",H,NC,13,REP,P2022,
803,C00487470,CLUB FOR GROWTH ACTION,2021,1537325,"MEDIUM BUYING, LLC",TV AD PLACEMENT,8/30/21 0:00,9/7/21 0:00,1303766.25,S,S2NC00505,"BUDD, THEODORE P",S,NC,0,REP,P2022,
804,C00487470,CLUB FOR GROWTH ACTION,2022,1577405,CLUB FOR GROWTH,INTERNET COMMUNICATIONS (FROM ADVANCE LINE 21),3/17/22 0:00,3/17/22 0:00,56.53,S,S8AL00381,"BROOKS, MO",S,AL,0,REP,P2022,
805,C00487470,CLUB FOR GROWTH ACTION,2022,1593238,CLUB FOR GROWTH,TV AD COSTS (FROM ADVANCE LINE 21),4/20/22 0:00,4/20/22 0:00,37.29,S,H2NC05157,"HINES, ROBERT",H,NC,13,REP,P2022,


## Inspect our data

In [4]:
df.dtypes

committee_id                  object
committee_name                object
report_year                    int64
file_number                    int64
payee_name                    object
expenditure_description       object
expenditure_date              object
dissemination_date            object
expenditure_amount           float64
support_oppose_indicator      object
candidate_id                  object
candidate_name                object
candidate_office              object
candidate_office_state        object
candidate_office_district      int64
candidate_party               object
election_type                 object
election_type_full            object
dtype: object

In [10]:
df.tail(5)

Unnamed: 0,committee_id,committee_name,report_year,file_number,payee_name,expenditure_description,expenditure_date,dissemination_date,expenditure_amount,support_oppose_indicator,candidate_id,candidate_name,candidate_office,candidate_office_state,candidate_office_district,candidate_party,election_type,election_type_full
802,C00487470,CLUB FOR GROWTH ACTION,2022,1593238,"GRP BUYING, LLC",TV AD PLACEMENT,4/18/22 0:00,4/20/22 0:00,352920.0,S,H2NC05157,"HINES, ROBERT",H,NC,13,REP,P2022,
803,C00487470,CLUB FOR GROWTH ACTION,2021,1537325,"MEDIUM BUYING, LLC",TV AD PLACEMENT,8/30/21 0:00,9/7/21 0:00,1303766.25,S,S2NC00505,"BUDD, THEODORE P",S,NC,0,REP,P2022,
804,C00487470,CLUB FOR GROWTH ACTION,2022,1577405,CLUB FOR GROWTH,INTERNET COMMUNICATIONS (FROM ADVANCE LINE 21),3/17/22 0:00,3/17/22 0:00,56.53,S,S8AL00381,"BROOKS, MO",S,AL,0,REP,P2022,
805,C00487470,CLUB FOR GROWTH ACTION,2022,1593238,CLUB FOR GROWTH,TV AD COSTS (FROM ADVANCE LINE 21),4/20/22 0:00,4/20/22 0:00,37.29,S,H2NC05157,"HINES, ROBERT",H,NC,13,REP,P2022,
806,C00487470,CLUB FOR GROWTH ACTION,2022,1557769,"BIG DOG STRATEGIES, LLC","MAIL PRODUCTION, POSTAGE",1/14/22 0:00,1/18/22 0:00,35186.58,S,S8AL00381,"BROOKS, MO",S,AL,0,REP,P2022,


## Selecting a subset of columns

In [13]:
mask = ['expenditure_amount', 
        'support_oppose_indicator', 
        'candidate_name']

subset = df[mask]
subset

Unnamed: 0,expenditure_amount,support_oppose_indicator,candidate_name
0,37.29,S,"HINES, ROBERT"
1,8942.00,O,"VANCE, J D"
2,10.17,O,"VANCE, J D"
3,5711.28,S,"WRIGHT, SUSAN"
4,166.22,O,"LARE, JEFF"
...,...,...,...
802,352920.00,S,"HINES, ROBERT"
803,1303766.25,S,"BUDD, THEODORE P"
804,56.53,S,"BROOKS, MO"
805,37.29,S,"HINES, ROBERT"


## Summing

In [16]:
subset['expenditure_amount'].sum()

37227226.67

## Sorting

In [18]:
subset.sort_values(by=['expenditure_amount'], ascending=False)

Unnamed: 0,expenditure_amount,support_oppose_indicator,candidate_name
295,2782500.00,O,"VANCE, J D"
654,2018594.54,S,"BARNETTE, KATHY"
499,1384811.25,S,"BUDD, THEODORE P"
803,1303766.25,S,"BUDD, THEODORE P"
376,1269747.33,O,"GIBBONS, MICHAEL"
...,...,...,...
152,1.34,O,"MCCRORY, PATRICK LLOYD"
209,-11912.95,S,"HINES, ROBERT"
659,-17191.47,S,"MILLER, MARY"
181,-126042.86,S,"HINES, ROBERT"


## Converting dates

In [21]:
df['expenditure_date'] = pd.to_datetime(df['expenditure_date'])

## Filtering

In [24]:
df[df['expenditure_date'] > '2022-03-01']

Unnamed: 0,committee_id,committee_name,report_year,file_number,payee_name,expenditure_description,expenditure_date,dissemination_date,expenditure_amount,support_oppose_indicator,candidate_id,candidate_name,candidate_office,candidate_office_state,candidate_office_district,candidate_party,election_type,election_type_full
0,C00487470,CLUB FOR GROWTH ACTION,2022,1593238,CLUB FOR GROWTH,RADIO AD COSTS (FROM ADVANCE LINE 21),2022-04-20,4/20/22 0:00,37.29,S,H2NC05157,"HINES, ROBERT",H,NC,13,REP,P2022,
1,C00487470,CLUB FOR GROWTH ACTION,2022,1573503,PRIME MEDIA PARTNERS,DIGITAL ADVERTISING,2022-03-10,3/10/22 0:00,8942.00,O,S2OH00436,"VANCE, J D",S,OH,0,REP,P2022,
5,C00487470,CLUB FOR GROWTH ACTION,2022,1590097,"THE STONERIDGE GROUP, LLC","MAIL PRODUCTION, POSTAGE",2022-04-20,4/21/22 0:00,12664.68,S,H4WV02080,"MOONEY, ALEXANDER XAVIER",H,WV,2,REP,P2022,
7,C00487470,CLUB FOR GROWTH ACTION,2022,1600846,COLD SPARK MEDIA,"MAIL PRODUCTION, POSTAGE",2022-06-02,6/7/22 0:00,11610.57,S,H0IL15129,"MILLER, MARY",H,IL,15,REP,P2022,
9,C00487470,CLUB FOR GROWTH ACTION,2022,1600846,COLD SPARK MEDIA,"MAIL PRODUCTION, POSTAGE",2022-06-02,6/7/22 0:00,11610.57,O,H2IL13120,"DAVIS, RODNEY L",H,IL,15,REP,P2022,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
800,C00487470,CLUB FOR GROWTH ACTION,2022,1579070,"MEDIUM BUYING, LLC",RADIO AD PLACEMENT,2022-04-04,4/6/22 0:00,33356.25,S,H2NC05157,"HINES, ROBERT",H,NC,13,REP,P2022,
801,C00487470,CLUB FOR GROWTH ACTION,2022,1591723,CLUB FOR GROWTH,MAIL PRODUCTION (FROM ADVANCE LINE 21),2022-05-03,5/3/22 0:00,227.45,S,H0IL15129,"MILLER, MARY",H,IL,15,REP,P2022,
802,C00487470,CLUB FOR GROWTH ACTION,2022,1593238,"GRP BUYING, LLC",TV AD PLACEMENT,2022-04-18,4/20/22 0:00,352920.00,S,H2NC05157,"HINES, ROBERT",H,NC,13,REP,P2022,
804,C00487470,CLUB FOR GROWTH ACTION,2022,1577405,CLUB FOR GROWTH,INTERNET COMMUNICATIONS (FROM ADVANCE LINE 21),2022-03-17,3/17/22 0:00,56.53,S,S8AL00381,"BROOKS, MO",S,AL,0,REP,P2022,


## Grouping

In [27]:
output = subset.groupby(['candidate_name', 
                'support_oppose_indicator']
              ).sum().reset_index()

output

Unnamed: 0,candidate_name,support_oppose_indicator,expenditure_amount
0,"BARNETTE, KATHY",S,2166930.86
1,"BRITT, KATIE BOYD",O,2645101.82
2,"BROOKS, MO",S,1764446.66
3,"BROWN, SAM",O,268715.83
4,"BUDD, THEODORE P",S,6934688.66
5,"BURLISON, ERIC",S,281480.66
6,"CHENEY, ELIZABETH MRS.",O,51084.47
7,"DAUGHTRY, KELLY",O,884450.96
8,"DAVIS, RODNEY L",O,944007.6
9,"ELLZEY, JOHN KEVIN SR",O,392855.28


## Writing a csv

In [28]:
output.to_csv('./data/cfg_sums.csv', index=False)