## code: gathering the data

by: Allie

**Purpose**

This code chunk assumes that hourly air pollutant data has been extracted from Environment Canterbury's website, with files formatted as "hourly_(site location).csv" and placed in a directory "hourly_data" within the present working directory. 

As per Environment Canterbury's data tool, data is separated by measurement site. This code therefore combines data for each site into a single dataframe. It also corrects the time and date formatting and removes obsolete columns.  

**Source:** https://data.ecan.govt.nz/Catalogue/Method?MethodId=23#tab-data

**Note:** This block has been separated from the main code block as it takes a while to complete and only needs to be performed once. 


In [None]:
library(tidyverse)

# Get the list of files in the "hourly_data" folder with the "hourly_" prefix
data_files = list.files(path = "hourly_data\\", pattern = "^hourly_")

# Read these files and pipe to a bind_rows function to combine them into one frame
data_full = lapply(paste("hourly_data\\",data_files,sep=""),read.csv,header=TRUE) %>% bind_rows

# Correct the date and time formatting using as.POSIX, separate date and time, then obtain weekdays.
nz_time = "Pacific/Auckland"
data_full$observation_date = as.POSIXct(data_full$observation_date,format="%d/%m/%Y %I:%M:%S %p")
data_full$obs_date = as.Date(data_full$observation_date,tz=nz_time)
data_full$obs_time = format(data_full$observation_date,"%H:%M:%S",tz=nz_time)
data_full$weekday = weekdays(data_full$obs_date)

# Arrange by site, date, contaminant, and time
data_full = data_full %>% arrange(site_name,obs_date,contaminant,obs_time)

# Write full file to current working directory
write.csv(data_full, file="full_hourly_data.csv", row.names=FALSE)

# Create a smaller subset of this data, retaining only the relevant columns and write to working directory
data_smaller = data_full %>% select(site_name,contaminant,value,obs_date,obs_time,weekday)
write.csv(data_smaller, file="full_hourly_smaller.csv", row.names=FALSE)