Weather affects every single human on earth for the better or worse, and we've come to rely on weather predictions in order to plan how we spend our day. But how can we predict the weather? In this post we're going to develop a machine learning model with recurrent neural networks to see how well we can predict the weather.

As per previous posts we're going to go through the following steps (typical of any machine learning project):
1. Data exploration & analysis
2. Build a model
3. Train the model
4. Evaluate the model


In [3]:
import pandas as pd
import plotly
import plotly.express as px
from IPython.core.display import HTML
import torch
import numpy as np

# Data Exploration & Analysis

We'll be using rainfall records for Newcastle NSW retrieved from the Australian Bureau of Meteorology, this can be downloaded at: http://www.bom.gov.au/jsp/ncc/cdio/weatherData/av?p_nccObsCode=136&p_display_type=dailyDataFile&p_startYear=&p_c=&p_stn_num=061055

In [23]:
rainfall = pd.read_csv('data/IDCJAC0009_061055_1800_Data.csv')
print(rainfall.head(1))

  Product code  Bureau of Meteorology station number  Year  Month  Day  \
0   IDCJAC0009                                 61055  1862      1    1   

   Rainfall amount (millimetres)  \
0                            0.0   

   Period over which rainfall was measured (days) Quality  
0                                             NaN       Y  


Let's clean it up

In [24]:
rainfall['timestamp'] = pd.to_datetime(rainfall[['Year', 'Month', 'Day']])
rainfall = rainfall.drop(['Product code','Bureau of Meteorology station number','Year','Month','Day','Period over which rainfall was measured (days)'],axis=1)
rainfall = rainfall.rename(columns={"Rainfall amount (millimetres)": "rainfall", "Quality": "quality"})
rainfall.head()

Unnamed: 0,rainfall,quality,timestamp
0,0.0,Y,1862-01-01
1,0.0,Y,1862-01-02
2,0.0,Y,1862-01-03
3,0.0,Y,1862-01-04
4,0.0,Y,1862-01-05


In [27]:
print(f"First date: {rainfall.timestamp.min()}, last date: {rainfall.timestamp.max()}")

First date: 1862-01-01 00:00:00, last date: 2022-04-07 00:00:00
