# Group Proposal - Predicting Following Day Rainfall in Australia


## Introduction

#### Background

Meteorological data has been collected and used to predict weather conditions ever since we have had the means to do so - since 1869! It helps one plan their day ahead and deal with the given weather conditions accordingly.By predicting a rainy day, lots of variables need to be considered, such as wind, humidity, temperature, etc. 

#### Central Question

Will it rain tomorrow for a given region in Australia based on a set of meteorological characteristics?

#### Dataset

The dataset that we will use is the “Rain in Australia” dataset by Joe Young and Adam Young. This dataset contains meteorological data across 10 years in Australia from 2007/10/31 to 2017/6/24 in various regions, collected by weather stations across Australia. The dataset contains variables such as weather conditions - wind speed, wind direction and temperature, as well the amount of precipitation in the form of rain on any given day.

## Preliminiary Exploratory Data Analysis

To aid in our decision for predictor variables, we can visualize which columns are present with the most valid data (least NA columns). A larger sample of data would allow us to reduce the impact of factors such as random error in the observation process and improve the overall quality of the analysis. We will select the three major cities we want to analyze and remove the categorical  variables from our dataset.

In [7]:
#load tidyverse
library(tidyverse)

In [8]:
#load data into r
weather_data_raw <- read_csv("weatherAUS.csv")

[1mRows: [22m[34m145460[39m [1mColumns: [22m[34m23[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m   (6): Location, WindGustDir, WindDir9am, WindDir3pm, RainToday, RainTom...
[32mdbl[39m  (16): MinTemp, MaxTemp, Rainfall, Evaporation, Sunshine, WindGustSpeed,...
[34mdate[39m  (1): Date

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.


In [10]:
# Select three cities for analysis

options(repr.plot.width =14, repr.plot.height = 8) 
count <- weather_data_raw |>
            # filter(Location=="Sydney" | Location=="Melbourne" | Location=="Canberra") |>
            select(-WindGustDir, -WindDir9am, -WindDir3pm, -RainToday, -RainTomorrow, -Date, -Location) 

# Summarizes each row by counting non N/A cells. Renames the variables afterwards after conversion to data frame
output <- as.data.frame((colSums(!is.na(count)))) 
output <- cbind(rownames(output), output)
rownames(output) <- NULL
colnames(output) <- c("measurement","count")
output <- arrange(output, desc(count))
output

measurement,count
<chr>,<dbl>
MaxTemp,144199
MinTemp,143975
WindSpeed9am,143693
Temp9am,143693
Humidity9am,142806
WindSpeed3pm,142398
Rainfall,142199
Temp3pm,141851
Humidity3pm,140953
WindGustSpeed,135197


## Methods

#### Explain how you will conduct either your data analysis and which variables/columns you will use

The columns we plan to use are those quantifying the day’s weather with the least number of NA-observation counts. This includes minimum temperature, max temperature, rainfall level, humidity and windspeed. This has been proved by listing out the rows containing NA value with colSums!(is.na) function.


#### Describe at least one way that you will visualize the results

## Expected Outcomes and Significance:

#### Expected Outcomes

We are expecting to create a design with an arithmetic trend with a specific date’s information to determine whether the next day rains. Therefore, the expected outcome would be accurately predicting the occurrence of rain on the next day.

#### Significance of Investigation

The significance of this analysis lies in the immense impact that weather and in particular, rain, has on society. Being able to predict rain is not only beneficial for day-to-day life but quintessential for industries such as agriculture, tourism, and urban development

#### Extended/Further Questions

Investigating precipitation further in the future can prompt inquiries on how rain patterns have evolved throughout the last decade. Furthermore, this investigation can also prompt further analysis into how accurate our model can be when compared to modern circumstances.