In [1]:
knitr::opts_chunk$set(echo = TRUE)



# CliFlo weather data acquisition

This script was used to acquire the weather data integrated in this project. The following libraries are required.


In [2]:
library(tidyverse)
library(clifro)
library(feather)
library(sf)


── [1mAttaching packages[22m ─────────────────────────────────────── tidyverse 1.3.1 ──

[32m✔[39m [34mggplot2[39m 3.3.5     [32m✔[39m [34mpurrr  [39m 0.3.4
[32m✔[39m [34mtibble [39m 3.1.5     [32m✔[39m [34mdplyr  [39m 1.0.7
[32m✔[39m [34mtidyr  [39m 1.1.4     [32m✔[39m [34mstringr[39m 1.4.0
[32m✔[39m [34mreadr  [39m 2.0.2     [32m✔[39m [34mforcats[39m 0.5.1

── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()

Linking to GEOS 3.9.1, GDAL 3.2.3, PROJ 7.2.1



## Suburb and Region assignment

In order to integrate the weather data, it required a suburb column. This was attached using the functions below. The data sets used were obtained from LINZ. The intersect function finds where polygons in the shape files interact. This was used to find which areas station coordinates were within.


In [3]:
assignSuburb = function(df){
  #assigns lat and long to a suburb in the FENZ localities database
  #make sure coord columns are named lat lon
  #map data for fenz suburbs
  map.fenz = st_read("spatialData/lds-fire-and-emergency-nz-localities-SHP/fire-and-emergency-nz-localities.shp"
  )
  
  points = st_transform(st_as_sf(df, coords=c("lon","lat"), crs =4326),crs=2193) #transform to planar projection
  
  #intersect points
  intersect = st_intersection(points, map.fenz) %>% 
    st_drop_geometry() %>% 
    rename(SuburbName = suburb_4th, 
           SuburbID = id)
  left_join(df, intersect)
}


In [4]:
assignRegion = function(df){
  #assigns lat and long to a suburb in the FENZ localities database
  #make sure coord columns are named lat lon
  
  #map datat for regions
  map.lds = st_read("spatialData/lds-nz-land-districts-SHP/nz-land-districts.shp")
  
  points = st_transform(st_as_sf(df, coords=c("lon","lat"), crs =4326),crs=2193) #transform to planar projection
  
  #intersect points
  intersect = st_intersection(points, map.lds) %>% 
    st_drop_geometry() %>% 
    rename(RegionName = name,
           RegionID = id)
  left_join(df, intersect)
}


## Collection

The following function collected weather data for weather stations in Canterbury. Data was collected from CliFlo using the `clifro` library.

### CliFLo user

`clifro` requires a a CliFlo account. The `cf_user` function gives accessusing

### Station list

The station list was downloaded from cliflo manually. This is set out in Appendix B of the report.

### Datatype

The dataype object is a list of numbers that correspond to selections in cliflo menus

### Query

clifro was halted when it tried to perform more than 10 queries in quick succession. clifro was also halted when it tried to query more than 20 stations simultaneously. To avoid these issues we broke the queries into parts. The data retrieved was written to file and processed in the next step.


In [5]:
clifloCollect = function(){
  #account credentials (username, password)
  me = cf_user("lpl251234", "12F8X76R")
  
  #manually collected station list
  
  #fix station list - quick fix for broken station list. 
  # readLines("collectedData/clifloStationList-Active.csv") %>%
  #   str_replace(pattern = ", ", replacement = "") %>%
  #   writeLines("collectedData/clifloStationList-Active.csv")
  # read.csv("collectedData/clifloStationList-Active.csv") %>% as_tibble %>% 
  #   separate(col = Long.dec_deg., into = c("Long.dec_deg.","drop"), sep = " ") %>% 
  #   rename(lat = Lat.dec_deg., lon = Long.dec_deg.) %>% 
  #   select(-drop) %>% select(-`Dist_Km`) %>% 
  #   write.csv(file = "collectedData/clifloStationList-Active.csv")
  
  NZstations = read.csv("collectedData/clifloStationList-Active.csv")
  
  #add suburb column
  NZstations = NZstations %>% assignSuburb()
  
  #add region column
  NZstations = NZstations %>% assignRegion()
  
  #select stations to query
  cantStations = NZstations %>% filter(RegionName == "Canterbury")
  st = cf_station(cantStations[1, "Agent"])
  
  #select daatatypes 
  dataTypes = cf_datatype(1,1,3,2,2)
  
  #setup 
  cantWeather = cf_query(user = me, 
                         datatype = dataTypes,
                         station = st,
                         start_date = "2018-01-01 00",
                         end_date = "2018-01-01 00") %>% 
    as_tibble() %>% 
    rename
  cantWeather = cantWeather[-1,]
  
  # loop queries to avoid row limit
  queryCliflo = function(){
    
    st0 = cf_station(cantStations[1:10, "Agent"])
    st1 = cf_station(cantStations[11:20, "Agent"])
    st2 = cf_station(cantStations[21:30, "Agent"])
    st3 = cf_station(cantStations[31:40, "Agent"])
    st4 = cf_station(cantStations[41:50, "Agent"])
    st5 = cf_station(cantStations[51:54, "Agent"])
    
    response = cf_query(user = me, 
                        datatype = dataTypes,
                        station = st0,
                        start_date = "2018-01-01 00",
                        end_date = "2021-10-30 00") %>% as_tibble()
    
    cantWeather = rbind(cantWeather, response)
    
    response = cf_query(user = me, 
                        datatype = dataTypes,
                        station = st1,
                        start_date = "2018-01-01 00",
                        end_date = "2021-10-30 00") %>% as_tibble()
    
    cantWeather = rbind(cantWeather, response)
    
    response = cf_query(user = me, 
                        datatype = dataTypes,
                        station = st2,
                        start_date = "2018-01-01 00",
                        end_date = "2021-10-30 00") %>% as_tibble()
    
    cantWeather = rbind(cantWeather, response)
    
    response = cf_query(user = me, 
                        datatype = dataTypes,
                        station = st3,
                        start_date = "2018-01-01 00",
                        end_date = "2021-10-30 00") %>% as_tibble()
    
    cantWeather = rbind(cantWeather, response)
    
    response = cf_query(user = me, 
                        datatype = dataTypes,
                        station = st4,
                        start_date = "2018-01-01 00",
                        end_date = "2021-10-30 00") %>% as_tibble()
    
    cantWeather = rbind(cantWeather, response)
    response = cf_query(user = me, 
                        datatype = dataTypes,
                        station = st5,
                        start_date = "2018-01-01 00",
                        end_date = "2021-10-30 00") %>% as_tibble()
    
    cantWeather = rbind(cantWeather, response)
    
    return(cantWeather)
  }
  
  cantWeather = queryCliflo()
  cantStations = cantStations %>% rename(Station = Name)
  
  write_feather(left_join(cantStations, cantWeather), "outputData/weatherFull-canterbury.feather")
}


## Clean up

The following function drops irrelevant columns and renames for joining with other datasets. Files were written in the `.feather` format for quickness during development. Output files were written as `.csv` for universal adaption.


In [6]:
cleanUp.export.WeatherData = function(){
  cantWeather = read_feather("outputData/weatherFull-canterbury.feather")
  
  # Trim and rearragne for tidy output
  cantWeather = cantWeather %>% 
    mutate(
      medianTemp.degC = (`Tmax(C)` + `Tmin(C)`)/2 #easier metric to work with
    ) %>% 
    rename( #rename to remove brackets, dots for units
      windspeed.kmhr = `WSpd(km/hr)`,
      rain.mm = `Rain(mm)`,
      sunshine.hrs = `Sun(Hrs)`,
      date = `Day(Local_Date)`,
      TownCity = city_name,
      cityID =city_id
    ) %>% 
    select( #drop unrequired columns
      date,
      lat, lon,
      SuburbName, SuburbID,
      TownCity, cityID,
      RegionName, RegionID,
      windspeed.kmhr, medianTemp.degC, sunshine.hrs
    )
  write.csv(cantWeather, "outputData/weatherTrimmed-canterbury.csv") #final write output
  print(cantWeather)
}


## Main function

The main function below executed the previously defined function. This style of programming was used to compartmentalize the different tasks. This code was originally developed in a single r script.


In [7]:
main = function(){
  clifloCollect()
  cleanUp.export.WeatherData()
}
main()


Reading layer `fire-and-emergency-nz-localities' from data source 
  `/Users/ll/Documents/MADS/DATA422/AOJCCYDPLL_Data422GroupProject/spatialData/lds-fire-and-emergency-nz-localities-SHP/fire-and-emergency-nz-localities.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 7375 features and 15 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 1089971 ymin: 4747979 xmax: 2470566 ymax: 6223164
Projected CRS: NZGD2000 / New Zealand Transverse Mercator 2000


“attribute variables are assumed to be spatially constant throughout all geometries”
Joining, by = c("X", "Agent", "Network", "Start_Date", "End_Date", "Percent_Complete", "Name")



Reading layer `nz-land-districts' from data source 
  `/Users/ll/Documents/MADS/DATA422/AOJCCYDPLL_Data422GroupProject/spatialData/lds-nz-land-districts-SHP/nz-land-districts.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 12 features and 2 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 1066764 ymin: 4700195 xmax: 2523319 ymax: 6237682
Projected CRS: NZGD2000 / New Zealand Transverse Mercator 2000


“attribute variables are assumed to be spatially constant throughout all geometries”
Joining, by = c("X", "Agent", "Network", "Start_Date", "End_Date", "Percent_Complete", "Name", "SuburbID", "parent_id", "SuburbName", "suburb_3rd", "suburb_2nd", "suburb_1st", "type_order", "type", "city_id", "city_name", "has_addres", "start_date", "end_date", "majorlocal", "majorloc_1")

connecting to CliFlo...

reading data...

UserName is = lpl251234
Total number of rows output = 7
Number of rows remaining in subscription = 437276
Copyright NIWA 2021 Subject to NIWA's Terms and Conditions
See: https://cliflo.niwa.co.nz/doc/terms.html
Comments to: cliflo@niwa.co.nz



connecting to CliFlo...

reading data...

UserName is = lpl251234
Total number of rows output = 40041
Note: The end date was revised to meet the maximum number of rows allowed per query [40000]
or due to running out of rows in your subscription. Also, one or more datatypes may have been disabled due to the above.
Number of rows remaini

[90m# A tibble: 32,138 × 12[39m
   date      lat   lon SuburbName  SuburbID TownCity  cityID RegionName RegionID
   [3m[90m<chr>[39m[23m   [3m[90m<dbl>[39m[23m [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m          [3m[90m<int>[39m[23m [3m[90m<chr>[39m[23m      [3m[90m<int>[39m[23m [3m[90m<chr>[39m[23m         [3m[90m<int>[39m[23m
[90m 1[39m 201801… -[31m42[39m[31m.[39m[31m5[39m  173. Hanmer Spr…      498 Hanmer S… [4m1[24m[4m0[24m[4m0[24m031 Canterbury     [4m1[24m010
[90m 2[39m 201801… -[31m42[39m[31m.[39m[31m5[39m  173. Hanmer Spr…      498 Hanmer S… [4m1[24m[4m0[24m[4m0[24m031 Canterbury     [4m1[24m010
[90m 3[39m 201801… -[31m42[39m[31m.[39m[31m5[39m  173. Hanmer Spr…      498 Hanmer S… [4m1[24m[4m0[24m[4m0[24m031 Canterbury     [4m1[24m010
[90m 4[39m 201801… -[31m42[39m[31m.[39m[31m5[39m  173. Hanmer Spr…      498 Hanmer S… [4m1[24m[4m0[24m[4m0[24m031 Canterbury     [4m1[24m010
[90m