## Introduction

TBA

## Preliminary Results

### R Preamble

In [48]:
# Run this first.
set.seed(5)

library(tidyverse)
library(tidymodels)
library(RColorBrewer)

### Loading and wrangling the dataset

Note that the raw UBC and SFU datasets span different date ranges. However, our analysis will focus mainly on the `min_temp_c` variable, which might vary depending on the date. To prevent this from potentially skewing our results, we filter our wrangled dataset such that it spans the intersection of the UBC and SFU date ranges (from January 1 to June 30 1995).

In [53]:
# Load and label the raw datasets for UBC and SFU, respectively.
ubc_raw <- read_csv("./ubc.csv", show_col_types = FALSE) |>
    mutate(location = as.factor("UBC"))

sfu_raw <- read_csv("./sfu.csv", show_col_types = FALSE) |>
    mutate(location = as.factor("SFU"))

# Join the datasets, select the relevant variables, and tidy the data.
ubc_sfu_raw <- bind_rows(ubc_raw, sfu_raw) |>
    rename(date_time = "Date/Time", min_temp_c = "Min Temp (°C)") |>
    select(location, date_time, min_temp_c) |>
    filter(!is.na(min_temp_c))

# Find the intersection of the date ranges and filter the dataset.
date_range <- ubc_sfu_raw |>
    group_by(location) |>
    summarize(min = min(date_time), max = max(date_time)) |>
    ungroup() |>
    summarize(min = max(min), max = min(max))

ubc_sfu_data <- ubc_sfu_raw |>
    filter(between(date_time, date_range$min, date_range$max))

# Preview the dataset.
head(ubc_sfu_data)
tail(ubc_sfu_data)

location,date_time,min_temp_c
<fct>,<date>,<dbl>
UBC,1995-01-01,-3.5
UBC,1995-01-02,-3.5
UBC,1995-01-03,-4.5
UBC,1995-01-04,-5.0
UBC,1995-01-05,-4.0
UBC,1995-01-06,-4.0


location,date_time,min_temp_c
<fct>,<date>,<dbl>
SFU,1995-06-25,12
SFU,1995-06-26,12
SFU,1995-06-27,16
SFU,1995-06-28,19
SFU,1995-06-29,20
SFU,1995-06-30,21
