# Weather Prediction using RNNs 

## By Rahul Mondal, 18MF3IM31
---

In this Notebook, we show how the long-term trend of rainfall can be predicted with decent accuracy using simple recurrent neural network (RNN). A simple one-layer RNN based model seems sufficient to be able to predict long-term trends from limited training data surprisingly well.

In [None]:
%cd /content/
!git clone https://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@github.com/abhinav-bohra/RNN-Weather-Prediction.git
%cd /content/RNN-Weather-Prediction

In [None]:
!git pull

# **Univariate Time Series Model**
---

In [None]:
#--------------------------------------------------
# Importing Libraries
#--------------------------------------------------
import time
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from keras.models import Sequential
from keras.callbacks import Callback
from datetime import datetime, timedelta
from keras.layers import Dense, SimpleRNN
from tensorflow.keras.optimizers import RMSprop

pd.set_option('mode.chained_assignment', None)
pd.options.display.max_columns = None

## **1. Data loading and pre-processing**

### 1.1 Loading the dataset

In [None]:
#--------------------------------------------------
# Loading the dataset
#--------------------------------------------------
raw_df = pd.read_csv( "weather_data.csv", sep = ',', na_values = ['', ' '])
raw_df.columns = raw_df.columns.str.lower().str.replace(' ', '_')

#--------------------------------------------------
# Pre-processing the dataset
#--------------------------------------------------
full_df = raw_df.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)
df = pd.get_dummies( full_df['raint'], drop_first=True).rename(columns = {'Yes':'raint'})

### 1.2 Data Visualization

In [None]:
def plot_train_points(df,Tp=7000):
    plt.figure(figsize=(15,4))
    plt.title("Rainfall of first {} data points".format(Tp),fontsize=16)
    plt.plot(df['raint'][:Tp],c='k',lw=1)
    plt.grid(True)
    plt.xticks(fontsize=14)
    plt.yticks(fontsize=14)
    plt.show()

In [None]:
plot_train_points(df)

### 1.4 Train-Test Split

In [None]:
#We choose Tp=7000 here which means we will train the RNN with only first 7000 data points and 
#then let it predict the long-term trend (for the next > 35000 data points or so). 
Tp = int(len(df['raint'])*0.8)
train = np.array(df['raint'][:Tp]).reshape(-1,1)
test = np.array(df['raint'][Tp:]).reshape(-1,1)

### 1.5 Choose the embedding or step size
RNN model requires a step value that contains n number of elements as an input sequence. Here, we choose `step=8`. In more complex RNN and in particular for text processing, this is also called _embedding size_. The idea here is that **we are assuming that 8 hours of weather data can effectively predict the 9th hour data, and so on.**

In [None]:
step = 14

In [None]:
# add step elements into train and test
test = np.append(test,np.repeat(test[-1,],step))
train = np.append(train,np.repeat(train[-1,],step))

In [None]:
print("Train data length:", train.shape)
print("Test data length:", test.shape)

### 1.6 Converting to a multi-dimensional array
Next, we'll convert test and train data into the matrix with step value as it has shown above example.

In [None]:
def convertToMatrix(data, step):
    X, Y =[], []
    for i in range(len(data)-step):
        d=i+step  
        X.append(data[i:d,])
        Y.append(data[d,])
    return np.array(X), np.array(Y)

In [None]:
trainX,trainY = convertToMatrix(train,step)
testX,testY = convertToMatrix(test,step)

In [None]:
trainX = np.reshape(trainX, (trainX.shape[0], 1, trainX.shape[1]))
testX = np.reshape(testX, (testX.shape[0], 1, testX.shape[1]))

In [None]:
print("Training data shape:", trainX.shape,', ',trainY.shape)
print("Test data shape:", testX.shape,', ',testY.shape)

## **2. Modeling**

### Keras model with `SimpleRNN` layer

A simple function to define the RNN model. It uses a single neuron for the output layer because we are predicting a real-valued number here. As activation, it uses the ReLU function. Following arguments are supported.

- neurons in the RNN layer
- embedding length (i.e. the step length we chose)
- nenurons in the densely connected layer
- learning rate

In [None]:
# Metrics
from keras import backend as K

def recall_m(y_true, y_pred):
    true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
    possible_positives = K.sum(K.round(K.clip(y_true, 0, 1)))
    recall = true_positives / (possible_positives + K.epsilon())
    return recall

def precision_m(y_true, y_pred):
    true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
    predicted_positives = K.sum(K.round(K.clip(y_pred, 0, 1)))
    precision = true_positives / (predicted_positives + K.epsilon())
    return precision

def f1_m(y_true, y_pred):
    precision = precision_m(y_true, y_pred)
    recall = recall_m(y_true, y_pred)
    return 2*((precision*recall)/(precision+recall+K.epsilon()))

In [None]:
import tensorflow as tf

def build_rnn(num_units=128, embedding=14, num_dense=32, lr=0.001):
    """
    Builds and compiles a simple RNN model
    Arguments:
              num_units: Number of units of a the simple RNN layer
              embedding: Embedding length
              num_dense: Number of neurons in the dense layer followed by the RNN layer
              learning_rate: Learning rate (uses RMSprop optimizer)
    Returns:
              A compiled Keras model.
    """
    model = Sequential()
    model.add(SimpleRNN(units=num_units, input_shape=(1,embedding), activation="relu"))
    model.add(Dense(num_dense, activation="relu"))
    model.add(Dense(1, activation="sigmoid"))
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['mse'])
    # model.compile(optimizer=RMSprop(learning_rate=lr), loss='binary_crossentropy')

    return model

In [None]:
model_rainfall = build_rnn(embedding=step,lr=0.0005)

In [None]:
model_rainfall.summary()

In [None]:
# Keras `Callback` class to print progress of the training at regular epoch interval
class MyCallback(Callback):
    def on_epoch_end(self, epoch, logs=None):
        if (epoch+1) % 50 == 0 and epoch>0:
            print("Epoch number {} done".format(epoch+1))

In [None]:
# Batch size and number of epochs
batch_size = 128
num_epochs = 1000

### Training the model

In [None]:
%%time
model_rainfall.fit( trainX, trainY, 
                    epochs=num_epochs, 
                    batch_size=batch_size, 
                    callbacks=[MyCallback(), tf.keras.callbacks.EarlyStopping(monitor='mse', patience=5)],verbose=1)

### Plot RMSE loss over epochs

In [None]:
plt.figure(figsize=(7,5))
plt.title("RMSE loss over epochs",fontsize=16)
plt.plot(np.sqrt(model_rainfall.history.history['mse']),c='k',lw=2)
plt.grid(True)
plt.xlabel("Epochs",fontsize=14)
plt.ylabel("Root-mean-squared Error",fontsize=14)
plt.xticks(fontsize=14)
plt.yticks(fontsize=14)
plt.show()

## Result and analysis

### What did the model see while training?
Showing again what exactly the model see during training.

In [None]:
plt.figure(figsize=(20,4))
plt.title("This is what the model saw",fontsize=18)
x_axis = np.arange(1, 1+len(trainX), 1, dtype=int)
plt.scatter(x_axis, trainX[:,0][:,0])
plt.show()

### Now predict the future points
Now, we can generate predictions for the future by passing `testX` to the trained model.

In [None]:
threshold = 0.5
trainPredict = model_rainfall.predict(trainX)
trainPredict = [1 if p>=threshold else 0 for p in trainPredict]
testPredict= model_rainfall.predict(testX)
testPredict = [1 if p>=threshold else 0 for p in testPredict]
predicted=np.concatenate((trainPredict,testPredict),axis=0)

In [None]:
plt.figure(figsize=(20,4))
plt.title("This is what the model predicted",fontsize=18)
x_axis = np.arange(1, 1+len(testPredict), 1, dtype=int)
plt.scatter(x_axis, testPredict, c='orange')
plt.show()

### Plotting the ground truth and model predictions together
Plotting the ground truth and the model predictions together to see if it follows the general trends in the ground truth data

In [None]:
index = df.index.values

plt.figure(figsize=(15,5))
plt.title("Rainfall: Ground truth and prediction together",fontsize=18)
plt.plot(index,df['raint'],c='blue')
plt.plot(index,predicted,c='orange',alpha=0.75)
plt.legend(['True data','Predicted'],fontsize=15)
plt.axvline(x=Tp, c="r")
plt.grid(True)
plt.xticks(fontsize=14)
plt.yticks(fontsize=14)
plt.ylim(0,1)
plt.show()

## **Perfromance Evaluation**

In [None]:
from sklearn.metrics import classification_report
trainTruth = df['raint'][:Tp]
testTruth = df['raint'][Tp:]
cm_train = classification_report(trainTruth, trainPredict)
cm_test = classification_report(testTruth, testPredict)
cm_full = classification_report(df['raint'], predicted)

In [None]:
print(cm_train)

In [None]:
print(cm_test)

In [None]:
print(cm_full)

## Performance on test set

In [None]:
# from sklearn.model_selection import train_test_split
# x_train, x_test , y_train, y_test = train_test_split( df['raint'], df['raint'], test_size = 0.2, random_state = 42)

# def convertToMatrix_new(data, step):
#     X, Y =[], []
#     print(data, step)
#     for i in range(len(data)-step):
#         d=i+step  
#         X.append(data[i:d,])
#         Y.append(data[d,])
#     return np.array(X), np.array(Y)

# test_split_X, test_split_Y = convertToMatrix_new(np.asarray(x_test).reshape(1,-1)[0],step)

# y_pred = model_rainfall.predict([test_split_X])
# cm_train = classification_report(test_split_Y, y_pred)

In [None]:
cnt=0
for g,p in zip(testTruth, testPredict):
  if g==p:
    cnt=cnt+1
print(100*cnt/len(testPredict))