/
write_by_year.Rmd
84 lines (62 loc) · 1.9 KB
/
write_by_year.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
---
title: "Cleaning Campaign Spending Data"
output: html_notebook
---
This script downloads the campaign spending dataset and writes separate CSVs by
year. Cleaning methods and decisions are documented.
_Reminder: Don't forget to set your working directory._
```{r}
# Load packages
library(tidyverse)
library(lubridate)
```
```{r}
# Download file
if (file.exists("data/expenditures.csv.zip") == FALSE) {
download.file(
"https://www.strongspace.com/shared/f5g1t7fcsb",
destfile = "data/expenditures.csv.zip",
method = "curl"
)}
unzip("data/expenditures.csv.zip", exdir = "data")
```
```{r}
# Read and clean files
# Main expenditures dataset
expend <- read_csv("data/expenditures.csv", col_names = c(
"filing_id", "payee", "street", "city", "state", "date", "amount", "cat"
))
expend <- mutate(expend, just_year = year(date)) # Extract year
# Filing ID data
ids <- read_csv("https://www.strongspace.com/shared/qrdxsuqoqm")
# Join expenditures and filing ID data
expend <- left_join(expend, ids, by = "filing_id")
```
Some of these years look like errors, so I'm throwing out the ones that are a
single line.
```{r}
unique(expend$just_year)
```
```{r}
# Which ones only have 1 row?
weird_years <- c(2020, 2091, 2019, 2912, 5012, 2102, 2106, 1899)
# Count rows
weird_rows <- sapply(
weird_years, function(x) {nrow(filter(expend, just_year == x))}
)
weird_years[which(weird_rows == 1)]
```
```{r}
# Remove weird years
expend <- filter(expend, !(just_year %in% weird_years[which(weird_rows == 1)]))
```
```{r}
# Write CSVs for each year
writeYears <- function(chosen_year, df) {
write_csv(filter(df, just_year == chosen_year), paste0("data/", chosen_year, "_campaign_spending", ".csv"))
}
# Write all but NAs
lapply(unique(expend$just_year), writeYears, df = filter(expend, !is.na(just_year)))
# Write data frame of NA years
write_csv(filter(expend, is.na(just_year)), "data/NA_campaign_spending.csv")
```