In [1]:
# Import necessary libraries
import os
import torch
import numpy as np
import pandas as pd
import seaborn as sns
from torch import nn, optim
from datetime import datetime
import matplotlib.pyplot as plt
from sklearn import preprocessing
import torch.nn.functional as func
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, classification_report





1. Deep Neural Network Concepts (1 hour)
   - Research and summarize key concepts of deep neural networks, focusing on architectures relevant to time-series data (e.g., LSTM, GRU).
   - Explain the advantages of deep neural networks in processing complex patterns in data.

## GRU vs LSTM
Both GRU and LSTM are based on RNN models where they feed back their own outputs into themselves in order to better learn temporal data. Default RNN models suffer from two main problems:
* Vanishing Gradiant Problem
* Short Term Memory Problem 

### GRU
GRU has 2 Gates to control the flow of information
* update - This gate determines the amount of information we pass along from the previous state. This is helpful for modelling rain because the weather of the day before has a sizable impact on the weather on the following day.
* reset - This gate decides what data from the previous state we ignore. 
### LSTM
LSTM has 3 Gates to control the flow of information
* input - Decides what information will be stored in long term memory
* forget - Decides which information from long term memory should be kept or discarded
* output - This gate takes the current input, the previous long term memory and the new long term memory to produce new short term memory

Deep Neural Networks have the advantage of being able to process large amounts of data and continuously improve the model. When constructed correctly, there is no bound to the amount of learning a model can accumulate, the more data, the better.

2. Model Development with PyTorch (2 hours)
   - Design and implement a deep neural network model using PyTorch, appropriate for the time-series nature of the dataset (consider LSTM or GRU networks).
   - Integrate layers like dropout or batch normalization for model optimization.

In [2]:
# LOAD DATA AND MAKE IT USABLE FOR PYTORCH
url = "./weatherAUS.csv"

column_names = ["Date", "Location", "MinTemp", "MaxTemp", "Rainfall", "Evaporation", "Sunshine", "WindGustDir", "WindGustSpeed", "WindDir9am"]
feature_cols = []
data = pd.read_csv(url)

#remove rows with NaN
data = data.dropna()

    
# Had help from https://www.kaggle.com/code/data13/recurrent-neural-network-for-rain-forecasting
# I was super lost on how to set this data up

#reformat date column
data['Date'] = data['Date'].apply(lambda x: datetime.strptime(x, '%Y-%m-%d'))
data['Year'] = data['Date'].dt.year
data['Month'] = data['Date'].dt.month
data['Day'] = data['Date'].dt.day

#encode location
le = preprocessing.LabelEncoder()
data['Location'] = le.fit_transform(data['Location'])

# encode RainToday & RainTomorrow
data['RainToday'].replace({'No': 0, 'Yes': 1}, inplace = True)
data['RainTomorrow'].replace({'No': 0, 'Yes': 1}, inplace = True)

#normalize
normalized = data.copy()
for feature_name in data.select_dtypes(include=['int', 'float']).columns:
    max_value = data[feature_name].max()
    min_value = data[feature_name].min()
    normalized[feature_name] = (data[feature_name] - min_value) / (max_value - min_value)

#encode catagorical
hotdata = pd.get_dummies(normalized)

features = list(hotdata.columns)

#print(features)


chosen_features = ["Temp9am", "MinTemp", "MaxTemp", "Rainfall", "Humidity9am", "WindSpeed9am", "RainToday", "Location", "Year", "Month", "Day"]


#Choose Features
X = hotdata[chosen_features]
Y = hotdata['RainTomorrow']


#Split Data
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size = .3, random_state = 1)

X_train = torch.from_numpy(X_train.to_numpy()).float()
y_train = torch.squeeze(torch.from_numpy(y_train.to_numpy()).float())

X_test = torch.from_numpy(X_test.to_numpy()).float()
y_test = torch.squeeze(torch.from_numpy(y_test.to_numpy()).float())

print(X_train.shape, y_train.shape)
print(X_test.shape, y_test.shape)
X.describe()

torch.Size([39494, 11]) torch.Size([39494])
torch.Size([16926, 11]) torch.Size([16926])


Unnamed: 0,Temp9am,MinTemp,MaxTemp,Rainfall,Humidity9am,WindSpeed9am,RainToday,Location,Year,Month,Day
count,56420.0,56420.0,56420.0,56420.0,56420.0,56420.0,56420.0,56420.0,56420.0,56420.0,56420.0
mean,0.471445,0.529259,0.457255,0.010332,0.658741,0.210265,0.220879,0.505153,0.522107,0.493183,0.490797
std,0.16379,0.168417,0.158424,0.03402,0.185133,0.127954,0.414843,0.292049,0.245098,0.313762,0.292751
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,0.34414,0.401575,0.331818,0.0,0.55,0.107692,0.0,0.28,0.3,0.181818,0.233333
50%,0.461347,0.52231,0.45,0.0,0.67,0.2,0.0,0.52,0.5,0.454545,0.5
75%,0.598504,0.658793,0.581818,0.00291,0.79,0.276923,0.0,0.76,0.7,0.727273,0.733333
max,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0


In [3]:
class RainModel(nn.Module):
    def __init__(self, n_features):
        super().__init__()
        self.lstm = nn.LSTM(input_size=n_features, hidden_size=50, num_layers=1, batch_first=True)
        self.linear = nn.Linear(50,1)
    def forward(self, x):
        x, _ = self.lstm(x)
        x = self.linear(x)
        return x

lstm = RainModel(X_train.shape[1])

3. Hyperparameter Optimization (1 hour)
   - Experiment with various hyperparameters (e.g., number of layers, hidden units, learning rate) in the PyTorch model to optimize performance.
   - Document the process and rationale behind the chosen hyperparameters.

In [4]:
criterion = nn.MSELoss()
optimiser = optim.Adam(lstm.parameters(), lr=0.001)

device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
X_train = X_train.to(device)
y_train = y_train.to(device)

X_test = X_test.to(device)
y_test = y_test.to(device)

lstm = lstm.to(device)
criterion = criterion.to(device)




In [5]:
def round_tensor(t, decimal_places = 3):
    return round(t.item(), decimal_places)



def calculate_accuracy(y_true, y_pred):
    predicted = y_pred.ge(.5).view(-1)
    return (y_true == predicted).sum().float() / len(y_true)

In [6]:
# run the model
for epoch in range(1000):
    y_pred = lstm(X_train)
    y_pred = torch.squeeze(y_pred)
    train_loss = criterion(y_pred, y_train)
    if epoch % 100 == 0:
        train_acc = calculate_accuracy(y_train, y_pred)
        y_test_pred = lstm(X_test)
        y_test_pred = torch.squeeze(y_test_pred)
        test_loss = criterion(y_test_pred, y_test)
        test_acc = calculate_accuracy(y_test, y_test_pred)
        print (str('epoch ') + str(epoch) + str(' Train set: loss: ') + str(round_tensor(train_loss)) + str(', accuracy: ') + str(round_tensor(train_acc)) + str(' Test  set: loss: ') + str(round_tensor(test_loss)) + str(', accuracy: ') + str(round_tensor(test_acc)))
    optimiser.zero_grad()
    train_loss.backward()
    optimiser.step()

epoch 0 Train set: loss: 0.255, accuracy: 0.781 Test  set: loss: 0.261, accuracy: 0.777
epoch 100 Train set: loss: 0.159, accuracy: 0.781 Test  set: loss: 0.162, accuracy: 0.777
epoch 200 Train set: loss: 0.146, accuracy: 0.804 Test  set: loss: 0.149, accuracy: 0.801
epoch 300 Train set: loss: 0.142, accuracy: 0.808 Test  set: loss: 0.145, accuracy: 0.805
epoch 400 Train set: loss: 0.141, accuracy: 0.81 Test  set: loss: 0.144, accuracy: 0.806
epoch 500 Train set: loss: 0.14, accuracy: 0.81 Test  set: loss: 0.144, accuracy: 0.806
epoch 600 Train set: loss: 0.14, accuracy: 0.81 Test  set: loss: 0.143, accuracy: 0.805
epoch 700 Train set: loss: 0.139, accuracy: 0.811 Test  set: loss: 0.143, accuracy: 0.806
epoch 800 Train set: loss: 0.139, accuracy: 0.811 Test  set: loss: 0.142, accuracy: 0.806
epoch 900 Train set: loss: 0.138, accuracy: 0.813 Test  set: loss: 0.142, accuracy: 0.807


4. Model Evaluation and Analysis (1 hour)
   - Evaluate the model's performance using metrics suitable for classification tasks (accuracy, precision, recall, F1 score, and confusion matrix).
   - Analyze the results, focusing on the model’s ability to capture patterns and make predictions.

In [7]:
#create predictions
predict_train = lstm(X_train).detach().numpy()
predict_test = lstm(X_test).detach().numpy()

#reshape nx1 matrix to vector
predict_train = np.reshape(predict_train, predict_train.shape[0])
predict_test = np.reshape(predict_test, predict_test.shape[0])


#rounds answers to 1 or 0
predict_train = np.rint(predict_train)
predict_test = np.rint(predict_test)



from sklearn.metrics import classification_report,confusion_matrix
print("Train Error")
print(confusion_matrix(y_train, predict_train))
print(classification_report(y_train, predict_train))


print("Test Error")
print(confusion_matrix(y_test, predict_test))
print(classification_report(y_test, predict_test))

Train Error
[[29774  1073     0]
 [ 6324  2322     1]
 [    0     0     0]]
              precision    recall  f1-score   support

         0.0       0.82      0.97      0.89     30847
         1.0       0.68      0.27      0.39      8647
         2.0       0.00      0.00      0.00         0

    accuracy                           0.81     39494
   macro avg       0.50      0.41      0.43     39494
weighted avg       0.79      0.81      0.78     39494

Test Error
[[12678   468     0]
 [ 2792   987     1]
 [    0     0     0]]
              precision    recall  f1-score   support

         0.0       0.82      0.96      0.89     13146
         1.0       0.68      0.26      0.38      3780
         2.0       0.00      0.00      0.00         0

    accuracy                           0.81     16926
   macro avg       0.50      0.41      0.42     16926
weighted avg       0.79      0.81      0.77     16926



  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


This model leans towards predicting no rain and has a high recall score for no rain, this however is at the sacrifice to the number of times we predict rain. With a recall rate of .28 and accuracy of .68 for Yes Rain, we cannot have much confidence in the model.

5. Comparative Analysis and Reporting (1 hour)
- Compare the developed PyTorch model with the previous FCNN based model in terms of performance and learning experience.
- Prepare a comprehensive report summarizing the approach, findings, and experiences in using PyTorch for deep learning in rainfall prediction.

My LSTM model performs only slightly better than my FCNN model. The accuracy of this LSTM is .81 while the accuracy of my FCNN was .79. While the FCNN model was simpler, it had the advantage of its predictions being more balanced, with similar precision and recall between True and False while the LSTM had a much better method of predicting cases where it was False but really struggled to predict cases where it would rain tomorrow.

It was a very valuable experience to create these two models. Creating the FCNN showed me the importance of preparing the data and feature selection while LSTM showed me how intricate you can get with creating a model. Getting an FCNN to run with SkLearn was fairly simple and worked out of the box while PyTorch is much more hands on. I think you could get a much better model with PyTorch with more effort but SkLearn does not have that option. 

The Complexity added by my LSTM does not seem to have been worth the effort but potentially with more work pushing the model to learn the positive cases one could get a better model. 
