# PREDICTING VANCOUVER WEATHER PROPOSAL

## Introduction 

The dataset we are using for our analysis is the daily weather data recorded in Vancouver (on visualcrossing.com) between 1st January 2022, and 28th February 2023. This dataset contains information such as the daily temperature, humidity levels, windspeed, precipitation cover and so on. It also includes the daily precipitation type in the form of picking an icon to show whether it is a clear day, raining, snowing, and so on. Vancouver is known for raining a lot throughout the year, but there are still many days where there is no precipitation or where the precipitation is something different. We have decided to use this dataset to reasonably predict the precipitation type on any given day based on other meteorological factors, including the average temperature, humidity, and wind speed. Therefore our question is, what is the type of precipitation (or lack thereof) expected on a certain day in Vancouver, given the temperature, humidity, and wind speed?


Reference: https://www.visualcrossing.com/

## Methods
We will use the temperature (in Celsius), humidity, and the wind velocity in order to determine the expected precipitation type. Before working with the actual data, we are planning to predict an estimated precipitation type by using common sense. For example, when the temperature is high, the expected precipitation is likely going to be on the lower side. Higher wind speed would indicate the higher precipitation rate. Then, we will use the functions and codes that we learned from classification lessons to determine the actual precipitation type for specific days in Vancouver. We are planning to create three different scatterplots with each having temperature, humidity, and cloud cover as y-axis and the date as x-axis for all three graphs. Then, we would categorize each point as its precipitation type.


## Preliminary exploratory data analysis

In [2]:
library(repr)
library(tidyverse)
library(tidymodels)

── [1mAttaching packages[22m ─────────────────────────────────────── tidyverse 1.3.1 ──

[32m✔[39m [34mggplot2[39m 3.3.6     [32m✔[39m [34mpurrr  [39m 0.3.4
[32m✔[39m [34mtibble [39m 3.1.7     [32m✔[39m [34mdplyr  [39m 1.0.9
[32m✔[39m [34mtidyr  [39m 1.2.0     [32m✔[39m [34mstringr[39m 1.4.0
[32m✔[39m [34mreadr  [39m 2.1.2     [32m✔[39m [34mforcats[39m 0.5.1

── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()

── [1mAttaching packages[22m ────────────────────────────────────── tidymodels 1.0.0 ──

[32m✔[39m [34mbroom       [39m 1.0.0     [32m✔[39m [34mrsample     [39m 1.0.0
[32m✔[39m [34mdials       [39m 1.0.0     [32m✔[39m [34mtune        [39m 1.0.0
[32m✔[39m [34minfer       [39m 1.0.2     [32m✔[39m [34mworkflows   [39m 1.0.0
[32m✔

In [39]:
# read data as tibble
weather_data <- read_csv("data/vancouver_weather.csv")
weather_data



[1mRows: [22m[34m876[39m [1mColumns: [22m[34m33[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m   (6): name, preciptype, conditions, description, icon, stations
[32mdbl[39m  (24): tempmax, tempmin, temp, feelslikemax, feelslikemin, feelslike, de...
[34mdttm[39m  (2): sunrise, sunset
[34mdate[39m  (1): datetime

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.


name,datetime,tempmax,tempmin,temp,feelslikemax,feelslikemin,feelslike,dew,humidity,⋯,solarenergy,uvindex,severerisk,sunrise,sunset,moonphase,conditions,description,icon,stations
<chr>,<date>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,⋯,<dbl>,<dbl>,<dbl>,<dttm>,<dttm>,<dbl>,<chr>,<chr>,<chr>,<chr>
vancouver,2020-10-31,10.3,4.7,7.3,10.3,4.6,7.2,4.7,83.8,⋯,7.6,4,,2020-10-31 07:59:48,2020-10-31 17:51:33,0.50,Partially cloudy,Partly cloudy throughout the day.,partly-cloudy-day,"71608099999,CWWA,71784099999,71892099999,71042099999,71201099999,F1856"
vancouver,2020-11-01,12.6,4.5,7.5,12.6,3.1,7.3,5.4,86.8,⋯,8.6,4,,2020-11-01 07:01:25,2020-11-01 16:49:54,0.54,Partially cloudy,Partly cloudy throughout the day.,partly-cloudy-day,"71608099999,CWWA,71784099999,71892099999,71042099999,71201099999,F1856"
vancouver,2020-11-02,13.3,5.2,10.1,13.3,5.2,10.0,6.7,80.4,⋯,6.6,4,,2020-11-02 07:03:02,2020-11-02 16:48:16,0.58,Partially cloudy,Partly cloudy throughout the day.,partly-cloudy-day,"71608099999,CWWA,71784099999,71892099999,71042099999,71201099999,F1856"
vancouver,2020-11-03,11.1,9.6,10.4,11.1,8.6,10.3,9.8,96.3,⋯,0.5,0,,2020-11-03 07:04:39,2020-11-03 16:46:40,0.61,"Rain, Overcast",Cloudy skies throughout the day with a chance of rain throughout the day.,rain,"71608099999,CWWA,71784099999,D3147,E9431,71892099999,71042099999,71201099999,F1856"
vancouver,2020-11-04,14.7,11.7,13.4,14.7,11.7,13.4,12.6,95.3,⋯,1.0,1,,2020-11-04 07:06:16,2020-11-04 16:45:06,0.64,"Rain, Overcast",Cloudy skies throughout the day with a chance of rain throughout the day.,rain,"71608099999,CWWA,71784099999,71892099999,71042099999,71201099999,F1856"
vancouver,2020-11-05,14.3,6.6,10.1,14.3,6.2,10.1,6.5,78.6,⋯,3.3,2,,2020-11-05 07:07:53,2020-11-05 16:43:34,0.68,"Rain, Partially cloudy",Partly cloudy throughout the day with early morning rain.,rain,"71608099999,CWWA,71784099999,CWSK,CWEL,CWEZ,CWWK,71892099999,CWMM,71042099999,71201099999,F1856"
vancouver,2020-11-06,9.3,3.8,6.7,9.0,0.4,4.7,2.9,76.9,⋯,8.1,4,,2020-11-06 07:09:31,2020-11-06 16:42:03,0.71,Partially cloudy,Partly cloudy throughout the day.,partly-cloudy-day,"71608099999,CWWA,71784099999,71892099999,71042099999,71201099999,F1856"
vancouver,2020-11-07,9.1,2.5,5.7,9.1,-0.5,4.2,-0.3,69.1,⋯,5.7,3,,2020-11-07 07:11:08,2020-11-07 16:40:34,0.75,Partially cloudy,Partly cloudy throughout the day.,partly-cloudy-day,"71608099999,CWWA,71784099999,71892099999,71042099999,71201099999,F1856"
vancouver,2020-11-08,8.4,1.8,4.6,8.3,-1.4,4.1,-3.6,56.9,⋯,7.8,4,,2020-11-08 07:12:44,2020-11-08 16:39:07,0.75,Clear,Clear conditions throughout the day.,clear-day,"71608099999,CWWA,71784099999,71892099999,71042099999,71201099999,F1856"
vancouver,2020-11-09,5.0,1.6,3.6,4.9,-0.4,2.1,0.7,81.7,⋯,1.7,1,,2020-11-09 07:14:21,2020-11-09 16:37:42,0.82,"Rain, Partially cloudy",Partly cloudy throughout the day with rain.,rain,"71608099999,CWWA,F1130,71784099999,71892099999,71042099999,71201099999,F1856"


In [42]:
# selected out variables that are irrelivant to our predictions
weather_selected <- weather_data |>
    select(datetime, temp, feelslike, humidity, sealevelpressure, windspeed, cloudcover, icon) |>
    rename(weather = icon)
weather_selected

datetime,temp,feelslike,humidity,sealevelpressure,windspeed,cloudcover,weather
<date>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>
2020-10-31,7.3,7.2,83.8,1029.0,6.3,47.8,partly-cloudy-day
2020-11-01,7.5,7.3,86.8,1024.2,7.5,29.9,partly-cloudy-day
2020-11-02,10.1,10.0,80.4,1017.9,8.4,63.8,partly-cloudy-day
2020-11-03,10.4,10.3,96.3,1011.2,13.8,90.2,rain
2020-11-04,13.4,13.4,95.3,1011.9,15.5,90.9,rain
2020-11-05,10.1,10.1,78.6,1015.5,14.4,82.9,rain
2020-11-06,6.7,4.7,76.9,1012.2,20.7,36.4,partly-cloudy-day
2020-11-07,5.7,4.2,69.1,1008.9,16.3,30.6,partly-cloudy-day
2020-11-08,4.6,4.1,56.9,1018.5,14.5,13.0,clear-day
2020-11-09,3.6,2.1,81.7,1020.6,14.2,85.8,rain


In [45]:
# split data into training and testing sets
weather_split <- initial_split(weather_selected, prop = 0.75, strata = weather)
weather_training <- training(weather_split)
weather_testing <- testing(weather_split)


datetime,temp,feelslike,humidity,sealevelpressure,windspeed,cloudcover,weather
<date>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>
2020-11-01,7.5,7.3,86.8,1024.2,7.5,29.9,partly-cloudy-day
2020-11-21,6.2,6.1,90.6,1029.5,7.2,82.6,partly-cloudy-day
2020-12-01,4.8,4.4,85.3,1035.2,10.1,21.8,partly-cloudy-day
2020-12-02,4.4,4.4,92.7,1025.1,6.1,20.2,partly-cloudy-day
2020-12-03,5.9,5.8,91.7,1026.2,7.0,80.2,partly-cloudy-day
2020-12-04,7.3,7.3,85.6,1030.2,6.0,33.7,partly-cloudy-day
2020-12-05,6.8,6.4,83.3,1018.9,8.0,35.8,partly-cloudy-day
2020-12-23,2.8,1.5,87.7,1032.5,9.6,65.1,partly-cloudy-day
2021-01-07,5.5,4.1,89.0,1022.3,14.0,79.6,partly-cloudy-day
2021-01-18,5.1,4.9,88.6,1035.5,9.5,32.4,partly-cloudy-day


## Expected outcomes and significance
The expectation is to be able to reasonably predict the type of precipitation on a given day in Vancouver, British Columbia. We believe that although weather can be unpredictable, if other data indicators are given it may be possible to develop a model for predicting the precipitation type. The impact of the model being successful would be that it could be useful in summarizing the precipitation type of a day using only certain indicators. It can act as a redundant tool or even a general model that can be scaled to let people know the general weather of a day, which can be useful in implementing a weather app for example. This will lead to future questions of how localized the model will be and how feasible it would be for other locations..
