## Deep Learning Classifiers

We attempted to use Deep Learning methods as they are known to be good for nonlinear and complex datasets.
In this notebook, we first build a basic neural network. Then we compare its results with the results obtained from using resampling methods. Afterwards, we attempt to use PCA to decorrelate temperature and humidity. Lastly, we will attempt to tune the hyperparameters of the neural network

In [None]:
import pandas as pd
import numpy as np
from numpy.random import seed
from datetime import datetime
from collections import Counter
import os
import random

from sklearn.preprocessing import *
from sklearn.decomposition import PCA
from sklearn.metrics import *
from sklearn.model_selection import GridSearchCV

from imblearn.over_sampling import SMOTENC
from imblearn.under_sampling import EditedNearestNeighbours
from imblearn.combine import SMOTEENN

from keras.models import Sequential
from keras.layers import Dense,Dropout,LSTM
from keras.callbacks import EarlyStopping
from tensorflow.random import set_seed
from keras.wrappers.scikit_learn import KerasClassifier
from keras.optimizers import SGD

As the neural network's initialization of weights contain randomness, in order to attempt to create replicable results, we must set the random seed on all sources of randomness including python's kernel in <code>os</code>, the <code>random</code> package, <code>numpy</code>'s random module and <code>tensorflow</code>'s random module.

In [None]:
os.environ['PYTHONHASHSEED'] = str(0)
random.seed(0)
seed(0)
set_seed(0)

Load Data

In [None]:
weather_full=pd.read_csv("../Data/weather_data_2.csv")
weather_full

Unnamed: 0,timestamp,region,past_temperature,past_humidity,past_rainfall,past_wind_x,past_wind_y,delta_temperature,delta_humidity,delta_wind_x,delta_wind_y,rainfall
0,2017-01-01 03:00:00,central,26.850000,90.300000,0.0,-6.0,-4.0,-0.100000,-1.421085e-14,0.0,0.0,0.0
1,2017-01-01 03:00:00,east,26.125000,87.433333,0.0,-1.0,1.0,-0.150000,-5.333333e-01,0.0,1.0,0.0
2,2017-01-01 03:00:00,north,26.000000,87.000000,0.0,-1.0,2.0,-0.100000,4.000000e-01,0.0,0.0,0.0
3,2017-01-01 03:00:00,north-east,26.000000,89.250000,0.0,-1.0,1.0,0.033333,-1.500000e-01,-2.0,-1.0,0.0
4,2017-01-01 03:00:00,west,26.100000,87.066667,0.0,-4.0,0.0,-0.140000,8.666667e-01,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...
197635,2021-12-31 23:00:00,central,23.666667,95.400000,1.0,-1.0,-1.0,-0.766667,-2.150000e+00,-3.0,1.0,1.0
197636,2021-12-31 23:00:00,east,24.800000,93.000000,1.0,-3.0,0.0,0.100000,-3.000000e-01,-2.0,-2.0,0.0
197637,2021-12-31 23:00:00,north,24.500000,96.200000,1.0,0.0,-1.0,0.200000,-6.000000e-01,-2.0,-1.0,0.0
197638,2021-12-31 23:00:00,north-east,24.300000,92.650000,0.0,0.0,-1.0,0.350000,-1.000000e+00,0.0,-3.0,0.0


Create time variables

In [None]:
weather_full.timestamp=pd.to_datetime(weather_full.timestamp,infer_datetime_format=True)
weather_full["year"]=weather_full.timestamp.apply(lambda x: x.year)
weather_full["quarter"]=weather_full.timestamp.apply(lambda x: x.quarter)
weather_full["month"]=weather_full.timestamp.apply(lambda x: x.month)
weather_full["day"]=weather_full.timestamp.apply(lambda x: x.day)
weather_full["hour"]=weather_full.timestamp.apply(lambda x: x.hour)
weather_full=pd.concat([weather_full.iloc[:,12:],weather_full.iloc[:,:11],weather_full.iloc[:,11:12]],axis=1)
weather_full

Unnamed: 0,year,quarter,month,day,hour,timestamp,region,past_temperature,past_humidity,past_rainfall,past_wind_x,past_wind_y,delta_temperature,delta_humidity,delta_wind_x,delta_wind_y,rainfall
0,2017,1,1,1,3,2017-01-01 03:00:00,central,26.850000,90.300000,0.0,-6.0,-4.0,-0.100000,-1.421085e-14,0.0,0.0,0.0
1,2017,1,1,1,3,2017-01-01 03:00:00,east,26.125000,87.433333,0.0,-1.0,1.0,-0.150000,-5.333333e-01,0.0,1.0,0.0
2,2017,1,1,1,3,2017-01-01 03:00:00,north,26.000000,87.000000,0.0,-1.0,2.0,-0.100000,4.000000e-01,0.0,0.0,0.0
3,2017,1,1,1,3,2017-01-01 03:00:00,north-east,26.000000,89.250000,0.0,-1.0,1.0,0.033333,-1.500000e-01,-2.0,-1.0,0.0
4,2017,1,1,1,3,2017-01-01 03:00:00,west,26.100000,87.066667,0.0,-4.0,0.0,-0.140000,8.666667e-01,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
197635,2021,4,12,31,23,2021-12-31 23:00:00,central,23.666667,95.400000,1.0,-1.0,-1.0,-0.766667,-2.150000e+00,-3.0,1.0,1.0
197636,2021,4,12,31,23,2021-12-31 23:00:00,east,24.800000,93.000000,1.0,-3.0,0.0,0.100000,-3.000000e-01,-2.0,-2.0,0.0
197637,2021,4,12,31,23,2021-12-31 23:00:00,north,24.500000,96.200000,1.0,0.0,-1.0,0.200000,-6.000000e-01,-2.0,-1.0,0.0
197638,2021,4,12,31,23,2021-12-31 23:00:00,north-east,24.300000,92.650000,0.0,0.0,-1.0,0.350000,-1.000000e+00,0.0,-3.0,0.0


We split the data into train-test sets chronologically to avoid data leakage. Hence, the train set contains data from the first 80% of the timeline or until 31 December 2020 6pm.

In [None]:
runtimes=list(pd.date_range('2017-01-01 00:00:00',
                            '2021-12-31 23:59:59',
                            freq='60T'))
training_runtimes=runtimes[:int(0.8*len(runtimes))]
X_train = weather_full[weather_full["timestamp"].isin(training_runtimes)].iloc[:,:-1]
X_test = weather_full[~weather_full["timestamp"].isin(training_runtimes)].iloc[:,:-1]
y_train = weather_full[weather_full["timestamp"].isin(training_runtimes)][["rainfall"]]
y_test = weather_full[~weather_full["timestamp"].isin(training_runtimes)][["rainfall"]]
X_train=X_train.drop(columns=["timestamp"])
X_test=X_test.drop(columns=["timestamp"])

## Section 1: Basic Neural Network

Dummifying and Scaling

In [None]:
temp_df=pd.concat([X_train,X_test],axis=0)
temp_df=pd.get_dummies(temp_df, columns=["region"], prefix=["region"])
scaler=StandardScaler()
temp_df.iloc[:,5:14]=scaler.fit_transform(temp_df.iloc[:,5:14])
X_train = temp_df.iloc[:len(X_train),:]
X_test = temp_df.iloc[len(X_train):,:]
del temp_df

For the basic neural network, the neural network requires some parameters to be specified in order to start. Hence, we will be using:
1. <code>"relu"</code> activation function as it is the most well known
2. 2 hidden layers to form a standard Multi-Layered Perceptron (MLP)
3. About 24 nodes per layer as the number of input features is only about 20.
4. <code>"adam"</code> optimizer as it is the most common.
5. Batch size of 64
6. Early stopping would be used to terminate the training if the score does not improve.

In [None]:
nn_model=Sequential()
nn_model.add(Dense(24, input_dim=X_train.shape[1], activation='relu'))
nn_model.add(Dense(24, activation='relu'))
nn_model.add(Dense(24, activation='relu'))
nn_model.add(Dense(1, activation='sigmoid'))
nn_model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
es = EarlyStopping(monitor='val_loss', mode='min', verbose=1,patience=5)
history = nn_model.fit(X_train, y_train, epochs=100, batch_size=64,validation_data=(X_test, y_test),callbacks=[es])
nn_y_pred=nn_model.predict(X_test)
fpr, tpr, thresholds = roc_curve(y_test, nn_y_pred, pos_label=1)
print(nn_model.summary())
print(auc(fpr, tpr))

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 00012: early stopping
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense (Dense)                (None, 24)                480       
_________________________________________________________________
dense_1 (Dense)              (None, 24)                600       
_________________________________________________________________
dense_2 (Dense)              (None, 24)                600       
_________________________________________________________________
dense_3 (Dense)              (None, 1)                 25        
Total params: 1,705
Trainable params: 1,705
Non-trainable params: 0
_________________________________________________________________
None
0.8030421589361034


As the predict function of neural netowrks produces an array of probabilities a function was written to obtain the threshold value that would maximize the f1 score.

In [None]:
def max_thresh(y_test,predictions):
    curr=0
    ix=0
    rec=0
    prec=0
    acc=0
    for i in np.arange(0, 1, 0.001):
        temp=(predictions>i).astype(int)
        fs=round(f1_score(y_test,temp),3)
        if fs>curr:
            curr=fs
            ix=i
            rec=round(recall_score(y_test,temp),3)
            prec=round(precision_score(y_test,temp),3)
            acc=round(accuracy_score(y_test,temp),3)
    print("Optimizing Threshold:",round(ix,3))
    print("F1 score:",curr)
    print("Recall score:",rec)
    print("Precision score:",prec)
    print("Accuracy:",acc)

In [None]:
max_thresh(y_test,nn_y_pred)

Optimizing Threshold: 0.054
F1 score: 0.545
Recall score: 0.548
Precision score: 0.541
Accuracy: 0.915


## Section 2: Using Resampling Methods
### Section 2.1 SMOTE-NC

In [None]:
runtimes=list(pd.date_range('2017-01-01 00:00:00',
                            '2021-12-31 23:59:59',
                            freq='60T'))
training_runtimes=runtimes[:int(0.8*len(runtimes))]
X_train = weather_full[weather_full["timestamp"].isin(training_runtimes)].iloc[:,:-1]
X_test = weather_full[~weather_full["timestamp"].isin(training_runtimes)].iloc[:,:-1]
y_train = weather_full[weather_full["timestamp"].isin(training_runtimes)][["rainfall"]]
y_test = weather_full[~weather_full["timestamp"].isin(training_runtimes)][["rainfall"]]
X_train=X_train.drop(columns=["timestamp"])
X_test=X_test.drop(columns=["timestamp"])

SMOTE-NC is an oversampling method that generates synthetic minority data to deal with class imbalances. One advantage SMOTE-NC has is the ability to input nominal variables as well.

In [None]:
print(Counter(y_train.rainfall))
smote_nc = SMOTENC(categorical_features=[5], random_state=0)
X_resampled, y_resampled = smote_nc.fit_resample(X_train, y_train.rainfall)
Counter(y_resampled)

Counter({0.0: 143549, 1.0: 12625})


Counter({0.0: 143549, 1.0: 143549})

In [None]:
temp_df=pd.concat([X_resampled,X_test],axis=0)
temp_df=pd.get_dummies(temp_df, columns=["region"], prefix=["region"])
scaler=StandardScaler()
temp_df.iloc[:,5:14]=scaler.fit_transform(temp_df.iloc[:,5:14])
X_resampled = temp_df.iloc[:len(X_resampled),:]
X_test = temp_df.iloc[len(X_resampled):,:]
del temp_df

In [None]:
nn_model=Sequential()
nn_model.add(Dense(24, input_dim=X_resampled.shape[1], activation='relu'))
nn_model.add(Dense(24, activation='relu'))
nn_model.add(Dense(24, activation='relu'))
nn_model.add(Dense(1, activation='sigmoid'))
nn_model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
es = EarlyStopping(monitor='val_loss', mode='min', verbose=1,patience=5)
history = nn_model.fit(X_resampled, y_resampled, epochs=100, batch_size=64,validation_data=(X_test, y_test),callbacks=[es])
nn_y_pred=nn_model.predict(X_test)
fpr, tpr, thresholds = roc_curve(y_test, nn_y_pred, pos_label=1)
print(auc(fpr, tpr))
max_thresh(y_test,nn_y_pred)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 00006: early stopping
0.8361307783368785
Optimizing Threshold: 0.802
F1 score: 0.544
Recall score: 0.547
Precision score: 0.54
Accuracy: 0.915


## Section 2.2: ENN

Edited Nearrest Neighbours is an undersampling method that checks for every datapoint in the majority class, and removes it along with its neighbours if the data is different from its neighbours. Since the method cannot accept nominal variables, we must dummify the data FIRST.

In [None]:
runtimes=list(pd.date_range('2017-01-01 00:00:00',
                            '2021-12-31 23:59:59',
                            freq='60T'))
training_runtimes=runtimes[:int(0.8*len(runtimes))]
X_train = weather_full[weather_full["timestamp"].isin(training_runtimes)].iloc[:,:-1]
X_test = weather_full[~weather_full["timestamp"].isin(training_runtimes)].iloc[:,:-1]
y_train = weather_full[weather_full["timestamp"].isin(training_runtimes)][["rainfall"]]
y_test = weather_full[~weather_full["timestamp"].isin(training_runtimes)][["rainfall"]]
X_train=X_train.drop(columns=["timestamp"])
X_test=X_test.drop(columns=["timestamp"])

In [None]:
temp_df=pd.concat([X_train,X_test],axis=0)
temp_df=pd.get_dummies(temp_df, columns=["region"], prefix=["region"])
X_train = temp_df.iloc[:len(X_train),:]
X_test = temp_df.iloc[len(X_train):,:]
del temp_df

In [None]:
print(Counter(y_train.rainfall))
enn = EditedNearestNeighbours()
X_resampled, y_resampled = enn.fit_resample(X_train, y_train.rainfall)
Counter(y_resampled)

Counter({0.0: 143549, 1.0: 12625})


Counter({0.0: 125275, 1.0: 12625})

In [None]:
temp_df=pd.concat([X_resampled,X_test],axis=0)
scaler=StandardScaler()
temp_df.iloc[:,5:14]=scaler.fit_transform(temp_df.iloc[:,5:14])
X_resampled = temp_df.iloc[:len(X_resampled),:]
X_test = temp_df.iloc[len(X_resampled):,:]
del temp_df

In [None]:
nn_model=Sequential()
nn_model.add(Dense(24, input_dim=X_resampled.shape[1], activation='relu'))
nn_model.add(Dense(24, activation='relu'))
nn_model.add(Dense(24, activation='relu'))
nn_model.add(Dense(1, activation='sigmoid'))
nn_model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
es = EarlyStopping(monitor='val_loss', mode='min', verbose=1,patience=5)
history = nn_model.fit(X_resampled, y_resampled, epochs=100, batch_size=64,validation_data=(X_test, y_test),callbacks=[es])
nn_y_pred=nn_model.predict(X_test)
fpr, tpr, thresholds = roc_curve(y_test, nn_y_pred, pos_label=1)
print(auc(fpr, tpr))
max_thresh(y_test,nn_y_pred)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 00030: early stopping
0.8395031749663662
Optimizing Threshold: 0.109
F1 score: 0.546
Recall score: 0.568
Precision score: 0.525
Accuracy: 0.912


## Section 2.3: SMOTEENN

SMOTEENN is a hybrid method that uses SMOTE to generate synthetic minority samples then ENN to remove any noise. Since the method cannot accept nominal variables, we must dummify the data FIRST. This way, we can combine the advantages of both with mitigating the disadvantages of both sampling methods.

In [None]:
runtimes=list(pd.date_range('2017-01-01 00:00:00',
                            '2021-12-31 23:59:59',
                            freq='60T'))
training_runtimes=runtimes[:int(0.8*len(runtimes))]
X_train = weather_full[weather_full["timestamp"].isin(training_runtimes)].iloc[:,:-1]
X_test = weather_full[~weather_full["timestamp"].isin(training_runtimes)].iloc[:,:-1]
y_train = weather_full[weather_full["timestamp"].isin(training_runtimes)][["rainfall"]]
y_test = weather_full[~weather_full["timestamp"].isin(training_runtimes)][["rainfall"]]
X_train=X_train.drop(columns=["timestamp"])
X_test=X_test.drop(columns=["timestamp"])

In [None]:
temp_df=pd.concat([X_train,X_test],axis=0)
temp_df=pd.get_dummies(temp_df, columns=["region"], prefix=["region"])
X_train = temp_df.iloc[:len(X_train),:]
X_test = temp_df.iloc[len(X_train):,:]
del temp_df

In [None]:
print(Counter(y_train.rainfall))
smoteenn = SMOTEENN(random_state=0)
X_resampled, y_resampled = smoteenn.fit_resample(X_train, y_train.rainfall)
Counter(y_resampled)

Counter({0.0: 143549, 1.0: 12625})


Counter({0.0: 109257, 1.0: 142924})

In [None]:
temp_df=pd.concat([X_resampled,X_test],axis=0)
scaler=StandardScaler()
temp_df.iloc[:,5:14]=scaler.fit_transform(temp_df.iloc[:,5:14])
X_resampled = temp_df.iloc[:len(X_resampled),:]
X_test = temp_df.iloc[len(X_resampled):,:]
del temp_df

In [None]:
nn_model=Sequential()
nn_model.add(Dense(24, input_dim=X_resampled.shape[1], activation='relu'))
nn_model.add(Dense(24, activation='relu'))
nn_model.add(Dense(24, activation='relu'))
nn_model.add(Dense(1, activation='sigmoid'))
nn_model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
es = EarlyStopping(monitor='val_loss', mode='min', verbose=1,patience=5)
history = nn_model.fit(X_resampled, y_resampled, epochs=100, batch_size=64,validation_data=(X_test, y_test),callbacks=[es])
nn_y_pred=nn_model.predict(X_test)
fpr, tpr, thresholds = roc_curve(y_test, nn_y_pred, pos_label=1)
print(auc(fpr, tpr))
max_thresh(y_test,nn_y_pred)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 00010: early stopping
0.805074277201179
Optimizing Threshold: 0.546
F1 score: 0.544
Recall score: 0.546
Precision score: 0.541
Accuracy: 0.915


## Section 3: With PCA Decomposition

Moving Forward, we will use the ENN method to resample the data. Next, we will attempt to use PCA decomposition to decorrelate temperature and humidity. We create PCA variables that represent 95% of the variance.

In [None]:
runtimes=list(pd.date_range('2017-01-01 00:00:00',
                            '2021-12-31 23:59:59',
                            freq='60T'))
training_runtimes=runtimes[:int(0.8*len(runtimes))]
X_train = weather_full[weather_full["timestamp"].isin(training_runtimes)].iloc[:,:-1]
X_test = weather_full[~weather_full["timestamp"].isin(training_runtimes)].iloc[:,:-1]
y_train = weather_full[weather_full["timestamp"].isin(training_runtimes)][["rainfall"]]
y_test = weather_full[~weather_full["timestamp"].isin(training_runtimes)][["rainfall"]]
X_train=X_train.drop(columns=["timestamp"])
X_test=X_test.drop(columns=["timestamp"])

In [None]:
temp_df=pd.concat([X_train,X_test],axis=0)
temp_df=pd.get_dummies(temp_df, columns=["region"], prefix=["region"])
X_train = temp_df.iloc[:len(X_train),:]
X_test = temp_df.iloc[len(X_train):,:]
del temp_df

In [None]:
print(Counter(y_train.rainfall))
enn = EditedNearestNeighbours()
X_resampled, y_resampled = enn.fit_resample(X_train, y_train.rainfall)
Counter(y_resampled)

Counter({0.0: 143549, 1.0: 12625})


Counter({0.0: 125275, 1.0: 12625})

In [None]:
temp_df=pd.concat([X_resampled,X_test],axis=0)
scaler=StandardScaler()
temp_df.iloc[:,:]=scaler.fit_transform(temp_df.iloc[:,:])
X_resampled = temp_df.iloc[:len(X_resampled),:]
X_test = temp_df.iloc[len(X_resampled):,:]
del temp_df

In [None]:
temp_df=pd.concat([X_resampled,X_test],axis=0)
pca=PCA(n_components=0.95)
pca_components=pca.fit_transform(temp_df)
pd.DataFrame(pca_components)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13
0,0.896822,-1.985212,-1.090147,0.732924,0.062913,0.569500,1.779771,-1.431385,0.692408,1.034582,2.061520,0.154732,0.278872,0.531186
1,1.020685,-2.108500,-1.030085,0.718981,0.496335,-1.396812,-1.116262,-1.317279,0.718286,0.886699,2.025631,-0.224338,0.051122,0.536204
2,1.035899,-1.844449,-1.341639,1.148550,-1.575293,-0.033163,-0.147252,0.579343,2.180761,0.641435,1.840040,-0.186955,-0.051551,0.049170
3,1.142184,-1.404419,-0.954344,2.811254,-0.201593,-0.759045,0.579118,0.449187,-0.577666,0.907852,2.030088,-0.376833,0.058777,0.902288
4,1.656287,-3.017580,-0.829855,-1.086927,1.382967,2.021935,-0.913306,1.308390,-1.495097,2.592738,2.736437,0.298685,0.779658,2.038284
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
179361,3.054612,1.976332,0.523260,0.256753,1.669365,1.803262,-1.182191,0.777372,0.023814,-1.125567,-2.180560,2.689101,-0.729506,-1.166186
179362,2.361976,2.105666,0.263856,-0.047074,1.443637,0.510551,1.702881,-0.549396,0.788644,-1.556215,-2.306541,3.317226,-0.865543,-0.618537
179363,2.449384,1.829267,0.198619,-0.504190,1.617983,-1.460553,-1.211269,-0.382657,0.454499,-1.454730,-2.225594,3.008285,-0.845628,-1.069541
179364,1.108842,1.833946,-0.616016,-1.193958,-1.380587,-0.155524,-0.075980,1.094924,1.000776,-1.122334,-2.211904,0.478994,-2.275337,0.397401


In [None]:
pca_df = pd.DataFrame(pca_components,columns=["PC1","PC2","PC3","PC4","PC5","PC6","PC7","PC8","PC9","PC10","PC11","PC12","PC13","PC14"])

nn_model=Sequential()
nn_model.add(Dense(24, input_dim=pca_df.shape[1], activation='relu'))
nn_model.add(Dense(24, activation='relu'))
nn_model.add(Dense(24, activation='relu'))
nn_model.add(Dense(1, activation='sigmoid'))
nn_model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
es = EarlyStopping(monitor='val_loss', mode='min', verbose=1,patience=5)
history = nn_model.fit(pca_df.iloc[:len(X_resampled),:], y_resampled, epochs=100, batch_size=64,validation_data=(pca_df.iloc[len(X_resampled):,:], y_test),callbacks=[es])
nn_y_pred=nn_model.predict(pca_df.iloc[len(X_resampled):,:])
fpr, tpr, thresholds = roc_curve(y_test, nn_y_pred, pos_label=1)
print(auc(fpr, tpr))
max_thresh(y_test,nn_y_pred)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 00015: early stopping
0.8414620757532851
Optimizing Threshold: 0.426
F1 score: 0.535
Recall score: 0.538
Precision score: 0.531
Accuracy: 0.913


Since the F1 and Recall of the model is worst, we see that PCA may not help and thus we will not attempt to use PCA going forward.

## Section 4: With Hyperparameter Tuning

In [None]:
runtimes=list(pd.date_range('2017-01-01 00:00:00',
                            '2021-12-31 23:59:59',
                            freq='60T'))
training_runtimes=runtimes[:int(0.8*len(runtimes))]
X_train = weather_full[weather_full["timestamp"].isin(training_runtimes)].iloc[:,:-1]
X_test = weather_full[~weather_full["timestamp"].isin(training_runtimes)].iloc[:,:-1]
y_train = weather_full[weather_full["timestamp"].isin(training_runtimes)][["rainfall"]]
y_test = weather_full[~weather_full["timestamp"].isin(training_runtimes)][["rainfall"]]
X_train=X_train.drop(columns=["timestamp"])
X_test=X_test.drop(columns=["timestamp"])

In [None]:
temp_df=pd.concat([X_train,X_test],axis=0)
temp_df=pd.get_dummies(temp_df, columns=["region"], prefix=["region"])
X_train = temp_df.iloc[:len(X_train),:]
X_test = temp_df.iloc[len(X_train):,:]
del temp_df

In [None]:
print(Counter(y_train.rainfall))
enn = EditedNearestNeighbours()
X_resampled, y_resampled = enn.fit_resample(X_train, y_train.rainfall)
Counter(y_resampled)

Counter({0.0: 143549, 1.0: 12625})


Counter({0.0: 125275, 1.0: 12625})

In [None]:
temp_df=pd.concat([X_resampled,X_test],axis=0)
scaler=StandardScaler()
temp_df.iloc[:,5:14]=scaler.fit_transform(temp_df.iloc[:,5:14])
X_resampled = temp_df.iloc[:len(X_resampled),:]
X_test = temp_df.iloc[len(X_resampled):,:]
del temp_df

Under Hyperparameter tuning, we can attempt to tune the following features:
1. Batch Sizes
2. Opimization Algorithm
3. Learn Rates
4. Activation Functions
4. Dropout Layers
5. Number of Neurons

### Section 4.1: Batch Sizes
The size of the batch was aribitrarily chosen based on past projects. There could be a better batch size that might give a better result.

In [None]:
def create_model():
    model = Sequential()
    model.add(Dense(24, input_dim=X_resampled.shape[1], activation='relu'))
    model.add(Dense(24, activation='relu'))
    model.add(Dense(24, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model
nn_model = KerasClassifier(build_fn=create_model, verbose=0)
param_grid={'batch_size':[10, 20, 40, 60, 80, 100]}
grid = GridSearchCV(estimator=nn_model, param_grid=param_grid, n_jobs=-1, cv=3, verbose=10)
grid_result = grid.fit(X_resampled, y_resampled)
grid_result.best_params_

{'batch_size': 40}

### Section 4.2: Optimizer
While <code>'adam'</code> is the most commonly used in literature, other optimizers can be tried.

In [None]:
def create_model(optimizer='adam'):
    model = Sequential()
    model.add(Dense(24, input_dim=X_resampled.shape[1], activation='relu'))
    model.add(Dense(24, activation='relu'))
    model.add(Dense(24, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])
    return model
nn_model = KerasClassifier(build_fn=create_model, verbose=0)
param_grid={'optimizer':["adam","adamax","SGD"]}
grid = GridSearchCV(estimator=nn_model, param_grid=param_grid, n_jobs=-1, cv=3, verbose=10)
grid_result = grid.fit(X_resampled, y_resampled)
grid_result.best_params_

Fitting 3 folds for each of 3 candidates, totalling 9 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done   2 out of   9 | elapsed:    5.5s remaining:   19.4s
[Parallel(n_jobs=-1)]: Done   3 out of   9 | elapsed:    5.7s remaining:   11.6s
[Parallel(n_jobs=-1)]: Done   4 out of   9 | elapsed:    5.8s remaining:    7.2s
[Parallel(n_jobs=-1)]: Done   5 out of   9 | elapsed:    5.8s remaining:    4.6s
[Parallel(n_jobs=-1)]: Done   6 out of   9 | elapsed:    5.8s remaining:    2.9s
[Parallel(n_jobs=-1)]: Done   7 out of   9 | elapsed:    5.9s remaining:    1.6s
[Parallel(n_jobs=-1)]: Done   9 out of   9 | elapsed:    7.6s remaining:    0.0s
[Parallel(n_jobs=-1)]: Done   9 out of   9 | elapsed:    7.6s finished


{'optimizer': 'SGD'}

### Section 4.3: Learn Rates

As the optimizers make use of the principle of Gradient Descent, setting too high a learn rate could result in a less optimal answer while setting too low a learn rate would result in the optimizer running forever and be unable to find the optimal.

In [None]:
def create_model(learn_rate=0.01):
    model = Sequential()
    model.add(Dense(24, input_dim=X_resampled.shape[1], activation='relu'))
    model.add(Dense(24, activation='relu'))
    model.add(Dense(24, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    optimizer = SGD(lr=learn_rate)
    model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])
    return model
nn_model = KerasClassifier(build_fn=create_model, verbose=0)
param_grid={'learn_rate':[0.001, 0.01, 0.1, 0.2, 0.3]}
grid = GridSearchCV(estimator=nn_model, param_grid=param_grid, n_jobs=-1, cv=3, verbose=10)
grid_result = grid.fit(X_resampled, y_resampled)
grid_result.best_params_

Fitting 3 folds for each of 5 candidates, totalling 15 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done   2 out of  15 | elapsed:    5.0s remaining:   33.3s
[Parallel(n_jobs=-1)]: Done   4 out of  15 | elapsed:    5.1s remaining:   14.3s
[Parallel(n_jobs=-1)]: Done   6 out of  15 | elapsed:    5.2s remaining:    7.9s
[Parallel(n_jobs=-1)]: Done   8 out of  15 | elapsed:    5.5s remaining:    4.8s
[Parallel(n_jobs=-1)]: Done  10 out of  15 | elapsed:    9.7s remaining:    4.8s
[Parallel(n_jobs=-1)]: Done  12 out of  15 | elapsed:    9.7s remaining:    2.4s
[Parallel(n_jobs=-1)]: Done  15 out of  15 | elapsed:   10.2s finished


{'learn_rate': 0.001}

### Section 4.4: Activation Functions
While <code>'relu'</code> is the most commonly used in literature, other activation functions for hidden layers can be tried.

In [None]:
def create_model(activation='relu'):
    model = Sequential()
    model.add(Dense(24, input_dim=X_resampled.shape[1], activation=activation))
    model.add(Dense(24, activation=activation))
    model.add(Dense(24, activation=activation))
    model.add(Dense(1, activation='sigmoid'))
    model.compile(loss='binary_crossentropy', optimizer="SGD", metrics=['accuracy'])
    return model
nn_model = KerasClassifier(build_fn=create_model, verbose=0)
param_grid={'activation':["relu","tanh","sigmoid","linear","elu"]}
grid = GridSearchCV(estimator=nn_model, param_grid=param_grid, n_jobs=-1, cv=3, verbose=10)
grid_result = grid.fit(X_resampled, y_resampled)
grid_result.best_params_

Fitting 3 folds for each of 5 candidates, totalling 15 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done   2 out of  15 | elapsed:   14.0s remaining:  1.5min
[Parallel(n_jobs=-1)]: Done   4 out of  15 | elapsed:   14.0s remaining:   38.7s
[Parallel(n_jobs=-1)]: Done   6 out of  15 | elapsed:   14.1s remaining:   21.1s
[Parallel(n_jobs=-1)]: Done   8 out of  15 | elapsed:   14.7s remaining:   12.8s
[Parallel(n_jobs=-1)]: Done  10 out of  15 | elapsed:   18.9s remaining:    9.4s
[Parallel(n_jobs=-1)]: Done  12 out of  15 | elapsed:   18.9s remaining:    4.6s
[Parallel(n_jobs=-1)]: Done  15 out of  15 | elapsed:   19.2s finished


{'activation': 'relu'}

### Section 4.5: Dropout Layers
Having Dropout layers will help the neural network avoid overfitting the training data.

In [None]:
def create_model(dropout_rate=0.0):
    model = Sequential()
    model.add(Dense(24, input_dim=X_resampled.shape[1], activation="relu"))
    model.add(Dropout(dropout_rate))
    model.add(Dense(24, activation="relu"))
    model.add(Dense(24, activation="relu"))
    model.add(Dense(1, activation='sigmoid'))
    model.compile(loss='binary_crossentropy', optimizer="SGD", metrics=['accuracy'])
    return model
nn_model = KerasClassifier(build_fn=create_model, verbose=0)
param_grid={'dropout_rate':[0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]}
grid = GridSearchCV(estimator=nn_model, param_grid=param_grid, n_jobs=-1, cv=3, verbose=10)
grid_result = grid.fit(X_resampled, y_resampled)
grid_result.best_params_

Fitting 3 folds for each of 10 candidates, totalling 30 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done   2 tasks      | elapsed:    5.1s
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:   10.2s
[Parallel(n_jobs=-1)]: Done  19 out of  30 | elapsed:   15.6s remaining:    9.0s
[Parallel(n_jobs=-1)]: Done  23 out of  30 | elapsed:   16.6s remaining:    5.0s
[Parallel(n_jobs=-1)]: Done  27 out of  30 | elapsed:   20.5s remaining:    2.2s
[Parallel(n_jobs=-1)]: Done  30 out of  30 | elapsed:   20.9s finished


{'dropout_rate': 0.0}

In [None]:
def create_model(neurons=10):
    model = Sequential()
    model.add(Dense(neurons, input_dim=X_resampled.shape[1], activation="relu"))
    model.add(Dense(neurons, activation="relu"))
    model.add(Dense(neurons, activation="relu"))
    model.add(Dense(1, activation='sigmoid'))
    model.compile(loss='binary_crossentropy', optimizer="SGD", metrics=['accuracy'])
    return model
nn_model = KerasClassifier(build_fn=create_model, verbose=0)
param_grid={'neurons':[10,20,30,40,50]}
grid = GridSearchCV(estimator=nn_model, param_grid=param_grid, n_jobs=-1, cv=3, verbose=10)
grid_result = grid.fit(X_resampled, y_resampled)
grid_result.best_params_

Fitting 3 folds for each of 5 candidates, totalling 15 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done   2 out of  15 | elapsed:    5.2s remaining:   34.1s
[Parallel(n_jobs=-1)]: Done   4 out of  15 | elapsed:    5.3s remaining:   14.7s
[Parallel(n_jobs=-1)]: Done   6 out of  15 | elapsed:    5.5s remaining:    8.4s
[Parallel(n_jobs=-1)]: Done   8 out of  15 | elapsed:    5.6s remaining:    4.9s
[Parallel(n_jobs=-1)]: Done  10 out of  15 | elapsed:   10.3s remaining:    5.1s
[Parallel(n_jobs=-1)]: Done  12 out of  15 | elapsed:   10.4s remaining:    2.5s
[Parallel(n_jobs=-1)]: Done  15 out of  15 | elapsed:   10.9s finished


{'neurons': 10}

Therefore, using the tuned hyperparamters:

In [None]:
nn_model=Sequential()
nn_model.add(Dense(10, input_dim=X_resampled.shape[1], activation='relu'))
nn_model.add(Dense(10, activation='relu'))
nn_model.add(Dense(10, activation='relu'))
nn_model.add(Dense(1, activation='sigmoid'))
nn_model.compile(loss='binary_crossentropy', optimizer='SGD', metrics=['accuracy'])
es = EarlyStopping(monitor='val_loss', mode='min', verbose=1,patience=5)
history = nn_model.fit(X_resampled, y_resampled, epochs=100, batch_size=40,validation_data=(X_test, y_test),callbacks=[es])
nn_y_pred=nn_model.predict(X_test)
fpr, tpr, thresholds = roc_curve(y_test, nn_y_pred, pos_label=1)
print(auc(fpr, tpr))
max_thresh(y_test,nn_y_pred)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 00010: early stopping
0.5
Optimizing Threshold: 0.0
F1 score: 0.17
Recall score: 1.0
Precision score: 0.093
Accuracy: 0.093


To conclude, we note that hyperparameter tuning only gives the locally optimal parameters at each step but not the globally optimal solution, hence our results actually worsened from tuning.

## <center>Summary</center>
|           |   Basic   |  SMOTE-NC |    ENN    |  SMOTEENN | With PCA  | With Tuning |
|:---------:|:---------:|:---------:|:---------:|:---------:|-----------|-------------|
|  ROC AUC  |   0.803   |   0.836   |   0.840   |   0.805   | **0.841** |    0.500    |
| F1 Score  |   0.545   |   0.544   | **0.546** |   0.544   |   0.535   |    0.170    |
|   Recall  |   0.548   |   0.547   | **0.568** |   0.546   |   0.538   |     1.00    |
| Precision | **0.541** |   0.540   |   0.525   | **0.541** |   0.531   |    0.093    |
| Accuracy  | **0.915** | **0.915** |   0.912   | **0.915** |   0.913   |    0.093    |