In [2]:
# imports
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, LabelEncoder, OneHotEncoder
from sklearn.linear_model import LassoCV
import torch
import torch.nn as nn
import torch.optim as optim
from collections import OrderedDict
from datetime import datetime
from sklearn import metrics
from torch.utils.data import DataLoader, TensorDataset


In [3]:
# load the data
data = pd.read_csv("weatherAUS.csv")

## <b> Deep neural network concepts </b> <br>

#### Summary <br>
Deep neural networks are models that have more than just an input and output layer, but have 1 to many hidden layers. These hidden layers add to the additional complexity to the model where features can be picked apart more finely. RNN's which look to take into account sequences, or time series, to make more accuracte predictions by taking into account previous observations. A form of RNN is an LSTM, used in this project, which looks to fix a problem with RNN's specifically. While RNN's do use a form of "memory", LSTM's memory is longer hence the long in its name. In this case, with the rain dataset, it is a time series so it is well suited for this model. While these types of models are great when used appropriately, they also come with their own set of problems. Speed and computational complexity are two of the biggest challenges. As explored with previous for a fully connected neural network, the training time and complexity was simpiler and faster. However, with LSTM's the training time increases by a substantial amount. Further, the bigger datasets you work with and how you look to tune the hyperparameters will increase the training time.

#### Advantages of deep neural networks in processing complex patterns in data <br>
Deep neural networks are able to learn the additional compelxity that may be represented by features. In this case giving higher priority to features that provide a more decisive result than those that may appear anywhere. With most data there will also be noise and instability within it. Despite this, deep neural networks are able to circumvent this. As well, despite overfitting being a potential concern for NN models, when used on appropriately complex models, they generalize well.

In [4]:
# Preprocessing
columns_to_keep = ['Date', 'RainToday', 'Pressure9am', 'MaxTemp', 'Humidity3pm', 'WindGustSpeed', 'Cloud3pm', 'Rainfall', 'Sunshine', 'Pressure3pm', 'RainTomorrow']
data = data.drop(columns=data.columns.difference(columns_to_keep))

# Use literally everything
# data = data.drop(columns=['Location', 'WindGustDir', 'WindDir9am', 'WindDir3pm'])

# drop null values
data = data.dropna()


# Parse through the dates so we have a sequence
data['Date'] = data['Date'].apply(lambda x : datetime.strptime(x, "%Y-%m-%d"))
data['Year'] = data['Date'].dt.year
data['Month'] = data['Date'].dt.month
data['Day'] = data['Date'].dt.day

# Then drop the date column
data = data.drop(columns=['Date'])


# Encode these values
data.loc[data['RainToday'] == 'Yes', 'RainToday'] = 1
data.loc[data['RainToday'] == 'No', 'RainToday'] = 0

# Encoding and Normalizing
scale = StandardScaler()
labelenc = LabelEncoder()
ohe = OneHotEncoder()

y = data[['RainTomorrow']]

x = data.drop(['RainTomorrow'], axis=1)

# Normalize x data 
xscale = scale.fit_transform(x)

X = torch.FloatTensor(xscale)
# Encode the response variable  
y = labelenc.fit_transform(y)


# Split the data
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
y_train = torch.LongTensor(y_train)


seqence_size = 1000
batch_size = 25
train_dataset = TensorDataset(x_train, y_train)
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)


  y = column_or_1d(y, warn=True)


In [5]:
# Model developement
class LSTMModel(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, output_size, dropout_rate=0.0):
        super(LSTMModel, self).__init__()
        # LSTM layer
        self.lstm = nn.LSTM(input_size=input_size, hidden_size=hidden_size, num_layers=num_layers, dropout=dropout_rate, batch_first=True)
        # Normalization layer
        self.norm = nn.LayerNorm(hidden_size)
        # Drop out layer
        self.dropout = nn.Dropout(p=0.2)
        # Fully connected layer
        self.lin = nn.Linear(hidden_size, hidden_size)
        self.relu = nn.ReLU()

        # To output layer
        self.lin2 = nn.Linear(hidden_size, output_size)
        # Output layer
        self.sig = nn.Sigmoid()

    # Step forward
    def forward(self, x):
        # Lstm output
        val, (hidden_state, cell_state) = self.lstm(x)

        # Do through every other layer
        return self.sig(self.lin2(self.relu(self.lin(self.dropout(self.norm(val))))))


In [6]:
# Some hyperparameters before optimization
input_size = x_train.shape[1]
hidden_size = 25
num_layers = 8
output_size = 1

# Make the model
clf = LSTMModel(input_size, hidden_size, num_layers, output_size)


# Optimizer
opt = optim.Adam(clf.parameters(), lr=0.001)
# Loss function
loss_fn = nn.BCELoss()

In [7]:
# Model training
for epoch in range(15):
    batch = 0
    # Do in batches
    for data, label in train_loader:
        
        # Get predictions
        yhat = clf(data).squeeze(dim=1)

        # Find loss
        loss = loss_fn(yhat, label.float())

        # Back propagation
        opt.zero_grad()
        loss.backward()
        opt.step()

        # Check every 100th batch
        batch += 1
        if batch % 100 == 0:
            print(f'Epoch: {epoch}, Batch: {batch}, Loss: {loss}')
    # Find epoch performance
    print(f'Epoch [{epoch + 1}/{15}], Loss: {loss}')


Epoch: 0, Batch: 100, Loss: 0.423104465007782
Epoch: 0, Batch: 200, Loss: 0.6269589066505432
Epoch: 0, Batch: 300, Loss: 0.4575839936733246
Epoch: 0, Batch: 400, Loss: 0.7371435761451721
Epoch: 0, Batch: 500, Loss: 0.7127088904380798
Epoch: 0, Batch: 600, Loss: 0.49902573227882385
Epoch: 0, Batch: 700, Loss: 0.6208954453468323
Epoch: 0, Batch: 800, Loss: 0.553520143032074
Epoch: 0, Batch: 900, Loss: 0.5547534823417664
Epoch: 0, Batch: 1000, Loss: 0.46147289872169495
Epoch: 0, Batch: 1100, Loss: 0.5561729669570923
Epoch: 0, Batch: 1200, Loss: 0.5143591165542603
Epoch: 0, Batch: 1300, Loss: 0.6061510443687439
Epoch: 0, Batch: 1400, Loss: 0.49440228939056396
Epoch: 0, Batch: 1500, Loss: 0.3912881016731262
Epoch: 0, Batch: 1600, Loss: 0.5046612024307251
Epoch [1/15], Loss: 0.6528722643852234
Epoch: 1, Batch: 100, Loss: 0.5863404273986816
Epoch: 1, Batch: 200, Loss: 0.631150484085083
Epoch: 1, Batch: 300, Loss: 0.4953569173812866
Epoch: 1, Batch: 400, Loss: 0.5602943897247314
Epoch: 1, Batc

In [8]:
# Don't even know if I use this
def printStats(predicitons, reals):
    # Get the total amount of predictions
    total_samples = len(reals)

    # convert the predictions of the model to classifications
    accuracy = predicitons.ge(0.5)
    # Reshape the array to get the accuracy
    accuracy = accuracy.int().reshape(1, -1)

    # Format the test set for the comparison
    y_test_tensor = torch.tensor(reals)

    # Print the accuracy
    # See the accuracy
    print(f'Accuracy: {((accuracy == y_test_tensor).sum() / total_samples).item()}')
    # print(((accuracy == y_test_tensor).sum() / total_samples).item().dtype)
    y_test_tensor = y_test_tensor.reshape(1, -1)
    print("Confusion matrix:")
    print(metrics.confusion_matrix(y_test_tensor.view(-1).numpy(), accuracy.view(-1).numpy()))
    return (((accuracy == y_test_tensor).sum() / total_samples).item())

In [9]:
# Testing
y_test_tens = torch.LongTensor(y_test)
test_dataset = TensorDataset(x_test, y_test_tens)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=True)

# Don't train the model any more
clf.eval()

total_samples = 0
correct_predictions = 0
for batch in test_loader:
    out = clf(batch[0]).squeeze(dim=1)

    # Get output
    out = out.ge(0.5)
    out = out.int().reshape(1, -1)

    # Update counts
    correct_predictions += torch.sum(out == batch[1]).item()
    total_samples += batch[1].size(0)

# Print accuracy
print(f'Accuracy: {correct_predictions / total_samples}')
    

Accuracy: 0.8540933098591549


In [10]:
# Hyper parameter optimization

hyperparameters = {
    'num_layers': [1, 2, 3, 4],
    'hidden_size': [10, 25],
    'learning_rate': [0.001, 0.01],
    'dropout_rate': [0.2, 0.3, 0.4, 0.5],
    'batch_size': [64],
    'num_epochs': [15]
}

best_accuracy = 0.0
best_hyperparameters = {}

# Best subset selection
model = 1
for num_layers in hyperparameters['num_layers']:
    for hidden_size in hyperparameters['hidden_size']:
        for learning_rate in hyperparameters['learning_rate']:
            for dropout_rate in hyperparameters['dropout_rate']:
                for batch_size in hyperparameters['batch_size']:
                    for num_epochs in hyperparameters['num_epochs']:

                        # Optimizer
                        opt = optim.Adam(clf.parameters(), lr=learning_rate)
                        # Loss function
                        loss_fn = nn.BCELoss()

                        train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)

                        # Make model
                        clf = LSTMModel(input_size, hidden_size, num_layers, output_size, dropout_rate=dropout_rate)
                        
                        # Train
                        for epoch in range(num_epochs):
                            batch = 0
                            for data, label in train_loader:
                                
                                # Predictions
                                yhat = clf(data).squeeze(dim=1)
                                # Calculate loss
                                loss = loss_fn(yhat, label.float())

                                # Back propagation
                                opt.zero_grad()
                                loss.backward()
                                opt.step()

                            # Epoch performance
                            print(f'Epoch [{epoch + 1}/{num_epochs}], Loss: {loss}')


                        # Testing
                        y_test_tens = torch.LongTensor(y_test)
                        test_dataset = TensorDataset(x_test, y_test_tens)
                        test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=True)

                        # Don't train the model any more
                        clf.eval()

                        total_samples = 0
                        correct_predictions = 0
                        for batch in test_loader:
                            out = clf(batch[0]).squeeze(dim=1)

                            out = out.ge(0.5)
                            out = out.int().reshape(1, -1)

                            # Update counts
                            correct_predictions += torch.sum(out == batch[1]).item()
                            total_samples += batch[1].size(0)
                        accuracy = correct_predictions / total_samples
                        print(f'Accuracy: {accuracy}')

                        # Update the best possible subset if needed
                        if accuracy > best_accuracy:
                            best_accuracy = accuracy
                            best_hyperparameters = {
                                'num_layers': num_layers,
                                'hidden_size': hidden_size,
                                'learning_rate': learning_rate,
                                'dropout_rate': dropout_rate,
                                'batch_size': batch_size,
                                'num_epochs': num_epochs
                            }
                        # Keep track of where we are
                        # Because it takes like an hour
                        print(f'Model {model} finished')
                        model += 1
# print the results
print("Best Hyperparameters:", best_hyperparameters)
print("Best Accuracy:", best_accuracy)



Epoch [1/15], Loss: 0.8808826208114624
Epoch [2/15], Loss: 0.8923761248588562
Epoch [3/15], Loss: 0.8928030133247375
Epoch [4/15], Loss: 0.8245386481285095
Epoch [5/15], Loss: 0.941407322883606
Epoch [6/15], Loss: 0.858975350856781
Epoch [7/15], Loss: 0.9207064509391785
Epoch [8/15], Loss: 0.8217532634735107
Epoch [9/15], Loss: 0.8567960858345032
Epoch [10/15], Loss: 0.9605319499969482
Epoch [11/15], Loss: 0.9338351488113403
Epoch [12/15], Loss: 0.8633711934089661
Epoch [13/15], Loss: 0.861169695854187
Epoch [14/15], Loss: 0.9762448668479919
Epoch [15/15], Loss: 0.8563021421432495
Accuracy: 0.2250770246478873
Model 1 finished




Epoch [1/15], Loss: 0.6779001951217651
Epoch [2/15], Loss: 0.65106600522995
Epoch [3/15], Loss: 0.6364338994026184
Epoch [4/15], Loss: 0.6740342378616333
Epoch [5/15], Loss: 0.6651738286018372
Epoch [6/15], Loss: 0.6524777412414551
Epoch [7/15], Loss: 0.6514087915420532
Epoch [8/15], Loss: 0.6508761048316956
Epoch [9/15], Loss: 0.6367076635360718
Epoch [10/15], Loss: 0.6313464045524597
Epoch [11/15], Loss: 0.6329087615013123
Epoch [12/15], Loss: 0.6354649662971497
Epoch [13/15], Loss: 0.658370316028595
Epoch [14/15], Loss: 0.6922593116760254
Epoch [15/15], Loss: 0.626977801322937
Accuracy: 0.750825264084507
Model 2 finished




Epoch [1/15], Loss: 0.7727003693580627
Epoch [2/15], Loss: 0.7643250226974487
Epoch [3/15], Loss: 0.7338169813156128
Epoch [4/15], Loss: 0.7783766388893127
Epoch [5/15], Loss: 0.7988974452018738
Epoch [6/15], Loss: 0.7806248664855957
Epoch [7/15], Loss: 0.7698679566383362
Epoch [8/15], Loss: 0.755848228931427
Epoch [9/15], Loss: 0.7614108324050903
Epoch [10/15], Loss: 0.7418099045753479
Epoch [11/15], Loss: 0.7568839192390442
Epoch [12/15], Loss: 0.7757710218429565
Epoch [13/15], Loss: 0.7478230595588684
Epoch [14/15], Loss: 0.785753071308136
Epoch [15/15], Loss: 0.7834633588790894
Accuracy: 0.26700044014084506
Model 3 finished




Epoch [1/15], Loss: 0.6812231540679932
Epoch [2/15], Loss: 0.6711939573287964
Epoch [3/15], Loss: 0.6881669163703918
Epoch [4/15], Loss: 0.6827751398086548
Epoch [5/15], Loss: 0.7136669158935547
Epoch [6/15], Loss: 0.6859149932861328
Epoch [7/15], Loss: 0.7038635015487671
Epoch [8/15], Loss: 0.7221837043762207
Epoch [9/15], Loss: 0.6788902282714844
Epoch [10/15], Loss: 0.6866686940193176
Epoch [11/15], Loss: 0.6667006611824036
Epoch [12/15], Loss: 0.6888329982757568
Epoch [13/15], Loss: 0.7007710933685303
Epoch [14/15], Loss: 0.6774211525917053
Epoch [15/15], Loss: 0.6913421154022217
Accuracy: 0.5255281690140845
Model 4 finished
Epoch [1/15], Loss: 0.6196545362472534
Epoch [2/15], Loss: 0.550919234752655
Epoch [3/15], Loss: 0.5167945623397827
Epoch [4/15], Loss: 0.5705490112304688
Epoch [5/15], Loss: 0.5219389200210571
Epoch [6/15], Loss: 0.4949495196342468
Epoch [7/15], Loss: 0.5625052452087402
Epoch [8/15], Loss: 0.5622072219848633
Epoch [9/15], Loss: 0.45089563727378845
Epoch [10/15

In [30]:
# Using the best set of hyperparameters
clf = LSTMModel(input_size, best_hyperparameters['hidden_size'], best_hyperparameters['num_layers'], output_size, best_hyperparameters['dropout_rate'])


# Optimizer
opt = optim.Adam(clf.parameters(), lr=best_hyperparameters['learning_rate'])
# Loss function
loss_fn = nn.BCELoss()

for epoch in range(best_hyperparameters['num_epochs']):
    batch = 0
    for data, label in train_loader:
        
        yhat = clf(data).squeeze(dim=1)

        loss = loss_fn(yhat, label.float())

        opt.zero_grad()
        loss.backward()
        opt.step()

        batch += 1
        if batch % 100 == 0:
            print(f'Epoch: {epoch}, Batch: {batch}, Loss: {loss}')

    print(f'Epoch [{epoch + 1}/{15}], Loss: {loss}')

# Testing
y_test_tens = torch.LongTensor(y_test)
test_dataset = TensorDataset(x_test, y_test_tens)
test_loader = DataLoader(test_dataset, batch_size=best_hyperparameters['batch_size'], shuffle=True)





Epoch: 0, Batch: 100, Loss: 0.40653443336486816
Epoch: 0, Batch: 200, Loss: 0.3550562262535095
Epoch: 0, Batch: 300, Loss: 0.3551463186740875
Epoch: 0, Batch: 400, Loss: 0.24728640913963318
Epoch: 0, Batch: 500, Loss: 0.35005974769592285
Epoch: 0, Batch: 600, Loss: 0.26467257738113403
Epoch [1/15], Loss: 0.6018450856208801
Epoch: 1, Batch: 100, Loss: 0.417006254196167
Epoch: 1, Batch: 200, Loss: 0.15704463422298431
Epoch: 1, Batch: 300, Loss: 0.33455419540405273
Epoch: 1, Batch: 400, Loss: 0.24234077334403992
Epoch: 1, Batch: 500, Loss: 0.37085339426994324
Epoch: 1, Batch: 600, Loss: 0.28023743629455566
Epoch [2/15], Loss: 0.492572546005249
Epoch: 2, Batch: 100, Loss: 0.3613828122615814
Epoch: 2, Batch: 200, Loss: 0.4198001027107239
Epoch: 2, Batch: 300, Loss: 0.42834344506263733
Epoch: 2, Batch: 400, Loss: 0.35202330350875854
Epoch: 2, Batch: 500, Loss: 0.340670645236969
Epoch: 2, Batch: 600, Loss: 0.2357664853334427
Epoch [3/15], Loss: 0.3307550251483917
Epoch: 3, Batch: 100, Loss: 0

In [31]:
# Don't train the model any more
clf.eval()

# Do the testing
predictions = []
for batch in test_loader:
    out = clf(batch[0]).squeeze(dim=1)

    out = out.ge(0.5)
    out = out.int().reshape(1, -1)

    # Update counts
    correct_predictions += torch.sum(out == batch[1]).item()
    total_samples += batch[1].size(0)

    predictions.append(out.reshape(-1, 1))
    
# Show the stats for the model
all_pred = torch.cat(predictions, dim=0).squeeze(1)
print(f"Stats for the model:\nPercision score: {metrics.precision_score(y_test_tens, all_pred)}\nAccuracy score: {metrics.accuracy_score(y_test_tens, all_pred)}\nRecall: {metrics.recall_score(y_test_tens, all_pred)}\nF1 score: {metrics.f1_score(y_test_tens, all_pred)}\nMatrix: {metrics.confusion_matrix(y_test_tens, all_pred)}")

Stats for the model:
Percision score: 0.22776183644189382
Accuracy score: 0.6947073063380281
Recall: 0.15752914909451748
F1 score: 0.18624431734858485
Matrix: [[11992  2153]
 [ 3396   635]]


## <b> Results </b> <br>
The results of the hyperparameter optimization don't make sense. I think there is something underlying that I cannot see that is causing my training set to change slightly between runs giving a far below accuracy. Using some of the parameters I have given before hand, I have managed to get ~86% accuracy on a good run. This is later discussed in the analysis where I compare it to the fully connected neural network from the previous assignment. Despite this, I am not quite sure why the accuracy of the model is going down slightly overtime. I have tested this, as the accuracy of the model before the hyperparameters is ~85%, but will dip to ~68% after running some other models. However, we can see that using the best hyperparameters the results are: <br>
Stats for the model: <br>
Percision score: 0.213640610401744 <br>
Accuracy score: 0.6770466549295775 <br>
Recall: 0.17018109650210866 <br>
F1 score: 0.18945042805854737 <br>
Matrix: [[11620  2525] <br>
[ 3345   686]]

## <b> Analysis </b> <br>
#### Methodologies <br>
Using the data from the previous fully connected neural network as the data for the LSTM, we can make a comparison in how they compare to one another with no outside changes to the underlying data. As stated, I opted to use an LSTM over GRU due to not having the constraints that would make and LSTM harder to get away with. Despite this, the model for my LSTM is still quite simple containing the LSTM layer, a normalization layer, a dropout layer, a fully connected layer (with ReLU) (FCNN), a linear layer, followed by a sigmoid for the output layer. As well as, adam for the optimizer and Binary Cross Entropy for the loss function. <br>
#### Model performance <br>
From some set hyperparameters before any tuning what-so-ever, I managed to get the same accuracy as the FCNN with my LSTM. However, after optimizing the hyperparameters, the accuracy is far lower than expected for some reason as mentioned. However, if this bug was to be fixed I think I could achieve accuracy of ~86%-88% if the hyperparameters were found. <br>

#### Comparison <br>
Comparing the fully connected neural network, from previous assignment, to the LSTM here, the results are almost the same. Averaging around the 85% accuracy mark for predictions. While both models did a good job in the end with being able to choose correctly most of the time, one is better than the other. When it came to the training of the models, the FCNN was far more computationaly simple. This came into affect when attempting to find the best hyperparameters. The LSTM took a while at over an hour to run the best possible subset selection for the given ones. Whereas, the FCNN took minutes. <br>
#### Personal take <br>
Despite this, I really had to wonder at the end, why are the two models so similar in their accuracy? I can't exactly pinpoint why, but I would say it relates to what I discussed in the previous assignment where my data is a bit more sloppy in areas and I didn't take as much time as I should have. This likely lead to a getting an almost equal prediction rate. <br>
#### Reflection on this project <br>
I have never worked on any topics within machine learning prior to this class. And trying to take in everything throughout this semester and shove in some of the later topics into this project really pushed me to the edge. I think learning how to work with pytorch will be invaluable in the long run where I want to explore it on my own. Even though it sounds like this may be bad, personally, I do need to be pushed to that edge and basically perform a learn it or fail way. I have my doubts in being able to understand what I've done in this project, but within a few months from now I will have come out better knowing what I have done and be able to expand upon what I have done. <br>
#### Pytorch specific reflection <br>
Honestly, there's not that much to talk about when it comes to pytorch, I've learned a little more about how tensors work, but the concept still makes me dizzy. The actual model creation is simple for someone who doesn't need to go into it. So as long as I know roughly what everything does and how it can relate to the data at hand I can more or less just shove things in and likely get what I want out.