# Exercise 2.2 - Complex Machine Learning Models and Keras

This script is a copy of the CNN Model script that has been modified to run an RNN Model instead (also the cleaned and correctly shaped data has simply been imported in)

## 1. Importing Libraries and Data

In [1]:
import pandas as pd
import numpy as np
import os
import tensorflow as tf
from keras.models import Sequential
from tensorflow.keras.layers import Conv1D, Dense, Dropout, Flatten, MaxPooling1D

import matplotlib.pyplot as plt
import seaborn as sns
import operator

In [2]:
path = r'C:\Users\kyles\CareerFoundary\Machine Learning\Achievement 2\02 Data'

In [3]:
# Import cleaned weather data
X = pd.read_csv(os.path.join(path, 'Cleaned_Weather_Data.csv'))
X.drop(columns=["Unnamed: 0"], inplace = True)

# Check
X

Unnamed: 0,BASEL_cloud_cover,BASEL_humidity,BASEL_pressure,BASEL_global_radiation,BASEL_precipitation,BASEL_sunshine,BASEL_temp_mean,BASEL_temp_min,BASEL_temp_max,BELGRADE_cloud_cover,...,STOCKHOLM_temp_max,VALENTIA_cloud_cover,VALENTIA_humidity,VALENTIA_pressure,VALENTIA_global_radiation,VALENTIA_precipitation,VALENTIA_sunshine,VALENTIA_temp_mean,VALENTIA_temp_min,VALENTIA_temp_max
0,7,0.85,1.0180,0.32,0.09,0.7,6.5,0.8,10.9,1,...,4.9,5,0.88,1.0003,0.45,0.34,4.7,8.5,6.0,10.9
1,6,0.84,1.0180,0.36,1.05,1.1,6.1,3.3,10.1,6,...,5.0,7,0.91,1.0007,0.25,0.84,0.7,8.9,5.6,12.1
2,8,0.90,1.0180,0.18,0.30,0.0,8.5,5.1,9.9,6,...,4.1,7,0.91,1.0096,0.17,0.08,0.1,10.5,8.1,12.9
3,3,0.92,1.0180,0.58,0.00,4.1,6.3,3.8,10.6,8,...,2.3,7,0.86,1.0184,0.13,0.98,0.0,7.4,7.3,10.6
4,6,0.95,1.0180,0.65,0.14,5.4,3.0,-0.7,6.0,8,...,4.3,3,0.80,1.0328,0.46,0.00,5.7,5.7,3.0,8.4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
22945,1,0.79,1.0248,1.34,0.22,7.7,15.9,11.4,21.4,2,...,14.2,5,0.82,1.0142,1.13,0.41,3.4,10.7,7.9,13.5
22946,6,0.77,1.0244,1.34,0.22,5.4,16.7,14.3,21.9,0,...,14.3,5,0.82,1.0142,1.13,0.41,3.4,10.7,7.9,13.5
22947,4,0.76,1.0227,1.34,0.22,6.1,16.7,13.1,22.4,2,...,14.4,5,0.82,1.0142,1.13,0.41,3.4,10.7,7.9,13.5
22948,5,0.80,1.0212,1.34,0.22,5.8,15.4,11.6,21.1,1,...,12.4,5,0.82,1.0142,1.13,0.41,3.4,10.7,7.9,13.5


In [4]:
# Import answers
y = pd.read_csv(os.path.join(path, 'Pleasant_Weather_Prediction_Answers.csv'))
y.drop(columns=["DATE"], inplace = True)

# Check
y

Unnamed: 0,BASEL_pleasant_weather,BELGRADE_pleasant_weather,BUDAPEST_pleasant_weather,DEBILT_pleasant_weather,DUSSELDORF_pleasant_weather,HEATHROW_pleasant_weather,KASSEL_pleasant_weather,LJUBLJANA_pleasant_weather,MAASTRICHT_pleasant_weather,MADRID_pleasant_weather,MUNCHENB_pleasant_weather,OSLO_pleasant_weather,SONNBLICK_pleasant_weather,STOCKHOLM_pleasant_weather,VALENTIA_pleasant_weather
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
22945,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
22946,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
22947,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
22948,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


## 2. Reshaping the Data

Split observations (df_weather) into **15 groups** of **9 types of observations**, and labels (df_pleasant) should also be in **15 groups**. That is:

- Shape of X = (22950, 15, 9) and y = (22950, 15)

In [5]:
# Turn X and y into arrays
X = np.array(X)
y = np.array(y)

In [6]:
# Reshape X (don't need to reshape y)
X = X.reshape(-1, 15, 9)

# Check
X.shape

(22950, 15, 9)

## 3. Split the Data

In [7]:
# Import train_test_split
from sklearn.model_selection import train_test_split

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state = 42) # Can also control size with train_size before random_state

In [8]:
# Check sizes
print(X_train.shape, y_train.shape)
print(X_test.shape, y_test.shape)

(17212, 15, 9) (17212, 15)
(5738, 15, 9) (5738, 15)


## 4. Keras Layered Model - Recurrent Neural Network (RNN) / LSTM Model

In [90]:
from tensorflow.keras.layers import LSTM

# Define hyperparamters at the top for easy adjustments
epochs = 30
batch_size = 16
n_hidden = 64

timesteps = len(X_train[0])
input_dim = len(X_train[0][0])
n_classes = len(y_train[0])

model = Sequential()
model.add(Conv1D(n_hidden, kernel_size=2, activation='relu', input_shape=(timesteps, input_dim)))
model.add(MaxPooling1D())
model.add(LSTM(n_hidden, input_shape=(timesteps, input_dim)))
model.add(Dropout(0.5))
model.add(Dense(n_classes, activation='tanh')) # options include: softmax, sigmoid, or tanh (DON'T USE RELU HERE)

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)
  super().__init__(**kwargs)


In [91]:
model.summary()

## 6. Compiling and Running the Model

In [92]:
# Unsure whether to use sparse_categorical_crossentropy or categorical_crossentropy
# Will check if data is one-hot encoded
import sys
import numpy
numpy.set_printoptions(threshold=sys.maxsize)
y

array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0,

Unsure if this counts as one-hot encoding. Will try out categorical_crossentropy since it was done like this in an example

In [93]:
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

In [94]:
model.fit(X_train, y_train, batch_size=batch_size, epochs=epochs, verbose=2)

Epoch 1/30
1076/1076 - 4s - 4ms/step - accuracy: 0.0245 - loss: 24.3459
Epoch 2/30
1076/1076 - 2s - 2ms/step - accuracy: 0.1211 - loss: 23.6739
Epoch 3/30
1076/1076 - 2s - 2ms/step - accuracy: 0.0783 - loss: 23.8056
Epoch 4/30
1076/1076 - 2s - 2ms/step - accuracy: 0.0175 - loss: 24.6209
Epoch 5/30
1076/1076 - 2s - 2ms/step - accuracy: 0.0168 - loss: 25.3151
Epoch 6/30
1076/1076 - 2s - 2ms/step - accuracy: 0.0232 - loss: 24.8667
Epoch 7/30
1076/1076 - 2s - 2ms/step - accuracy: 0.0249 - loss: 24.8033
Epoch 8/30
1076/1076 - 2s - 2ms/step - accuracy: 0.0286 - loss: 24.7598
Epoch 9/30
1076/1076 - 2s - 2ms/step - accuracy: 0.0320 - loss: 24.6778
Epoch 10/30
1076/1076 - 2s - 2ms/step - accuracy: 0.0841 - loss: 24.9842
Epoch 11/30
1076/1076 - 2s - 2ms/step - accuracy: 0.0846 - loss: 24.6049
Epoch 12/30
1076/1076 - 2s - 2ms/step - accuracy: 0.0941 - loss: 24.7990
Epoch 13/30
1076/1076 - 2s - 2ms/step - accuracy: 0.0452 - loss: 24.4368
Epoch 14/30
1076/1076 - 2s - 2ms/step - accuracy: 0.0351 - l

<keras.src.callbacks.history.History at 0x1b660ff8b50>

## 7. Confusion Matrix

In [95]:
# Set up 'label key' for confusion matrix (similar to activities in HAR Data)
stations = {
    0: 'BASEL', 
    1: 'BELGRADE', 
    2: 'BUDAPEST', 
    3: 'DEBILT', 
    4: 'DUSSELDORF', 
    5: 'HEATHROW', 
    6: 'KASSEL', 
    7: 'LJUBLJANA', 
    8: 'MAASTRICHT', 
    9: 'MADRID', 
    10: 'MUNCHENB', 
    11: 'OSLO', 
    12: 'SONNBLICK', 
    13: 'STOCKHOLM', 
    14: 'VALENTIA'
}

In [96]:
def confusion_matrix(Y_true, Y_pred):
    Y_true = pd.Series([stations[y] for y in np.argmax(Y_true, axis=1)])
    Y_pred = pd.Series([stations[y] for y in np.argmax(Y_pred, axis=1)])

    return pd.crosstab(Y_true, Y_pred, rownames=['True'], colnames=['Pred'])

# Evaluate
print(confusion_matrix(y_test, model.predict(X_test)))

[1m180/180[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step
Pred        BUDAPEST  SONNBLICK
True                           
BASEL           3681          1
BELGRADE        1092          0
BUDAPEST         214          0
DEBILT            82          0
DUSSELDORF        29          0
HEATHROW          82          0
KASSEL            11          0
LJUBLJANA         61          0
MAASTRICHT         9          0
MADRID           458          0
MUNCHENB           8          0
OSLO               5          0
STOCKHOLM          4          0
VALENTIA           1          0


## 8. Notes:

- What happens to loss and accuracy for different combinations of hyperparameters?
- Does the model converge or does the loss grow exponentially? (list the activation type used)
- How accurate is the model at recognising the stations?


### 8.1 - Sigmoid Activation w/ various layers

Params: epoch = 30 | batch_size = 16 | ***n_hidden = 4*** | activation = sigmoid

*Result 1:* Accuracy decreased from 18.4% to 6.8% | loss gradually and slowly increased from 9.1 to 12.9 | recognised 3 stations

*Result 2:* Accuracy decreased from 12.4% to 6.1% | loss gradually and slowly increased from 9.1 to 11.6 | recognised 3 stations

Params: epoch = 30 | batch_size = 16 | ***n_hidden = 8*** | activation = sigmoid

*Result 1:* Accuracy decreased from 6.8% to 5.9% | loss gradually and slowly increased from 9.5 to 12.9 | recognised 2 stations

*Result 2:* Accuracy decreased from 6.7% to 6.1% | loss gradually and slowly increased from 5.8 to 12.8 | recognised 2 stations

Params: epoch = 30 | batch_size = 16 | ***n_hidden = 16*** | activation = sigmoid

*Result 1:* Accuracy decreased from 12.3% to 6.0% | loss gradually and slowly increased from 9.7 to 15.3 | recognised 2 stations

*Result 2:* Accuracy decreased from 11.0% to 5.5% | loss gradually and slowly increased from 9.5 to 13.9 | recognised 2 stations

Params: epoch = 30 | batch_size = 16 | ***n_hidden = 32*** | activation = sigmoid

*Result 1:* Accuracy decreased from 13.2% to 5.0% | loss gradually and slowly increased from 10.2 to 16.4 | recognised 2 stations

Params: epoch = 30 | batch_size = 16 | ***n_hidden = 64*** | activation = sigmoid

*Result 1:* Accuracy decreased from 8.6% to 5.8% | loss gradually and slowly increased from 11.1 to 25.1 | recognised 2 stations

### 8.2 - Softmax Activation w/ various layers

Params: epoch = 30 | batch_size = 16 | ***n_hidden = 4*** | activation = softmax

*Result 1:* Accuracy increased from 4.1% to 5.8% | loss gradually and slowly increased from 8.8 to 10.8 | recognised 2 stations

*Result 2:* Accuracy increased from 5.0% to 6.6% | loss gradually and slowly increased from 8.8 to 11.4 | recognised 1 station

**Note:** accuracy on result 2 fluctuated massively from 5.0 to 11.6 between epoch 1 and 2, but this plateaued out overtime 

Params: epoch = 30 | batch_size = 16 | ***n_hidden = 8*** | activation = softmax

*Result 1:* Accuracy decreased from 11.8% to 6.2% | loss gradually and slowly increased from 9.3 to 12.8 | recognised 1 station

*Result 2:* Accuracy increased from 4.6% to 6.0% | loss gradually and slowly increased from 9.4 to 12.7 | recognised 1 station


Params: epoch = 30 | batch_size = 16 | ***n_hidden = 16*** | activation = softmax

*Result 1:* Accuracy decreased from 5.5% to 5.2% | loss gradually and slowly increased from 9.7 to 16.0 | recognised 4 stations

*Result 2:* Accuracy decreased from 8.2% to 5.7% | loss gradually and slowly increased from 10.3 to 14.1 | recognised 2 stations


Params: epoch = 30 | batch_size = 16 | ***n_hidden = 32*** | activation = softmax

*Result 1:* Accuracy decreased from 10.3% to 5.6% | loss gradually and slowly increased from 10.5 to 17.9 | recognised 5 stations

Params: epoch = 30 | batch_size = 16 | ***n_hidden = 64*** | activation = softmax

*Result 1:* Accuracy decreased from 10.0% to 5.3% | loss gradually and slowly increased from 10.6 to 13.6 | recognised 6 stations

### 8.3 - Tanh Activation w/ various layers

Params: epoch = 30 | batch_size = 16 | ***n_hidden = 4*** | activation = tanh

*Result 1:* Accuracy decreased from 3.8% to 0.6% | loss went up and down: from 23.5 to 24.0 | recognised 7 stations

*Result 2:* Accuracy increased from 9.5% to 22.8% | loss went up and down: from 26.1 to 28.8 | recognised 5 stations

**Note:** There was an instance of running the model and receiving an accuracy of 64.4 and loss of "nan"

Params: epoch = 30 | batch_size = 16 | ***n_hidden = 8*** | activation = tanh

*Result 1:* Accuracy decreased from 2.6% to 0.5% | loss went up and down: from 25.5 to 25.5 | recognised 5 stations

*Result 2:* Accuracy went down and up from 11.1% to 21.8% | loss went up and down: from 24.7 to 25.1 | recognised 4 stations

Params: epoch = 30 | batch_size = 16 | ***n_hidden = 16*** | activation = tanh

*Result 1:* Accuracy went up and down from 2.9% to 1.9% | loss went up and down: from 24.6 to 27.0 | recognised 4 stations

*Result 2:* Accuracy went up and down from 4.5% to 4.0% | loss simply fluctuated: from 23.6 to 25.2 | recognised 6 stations

**Note:** Accuracy has fluctuated in strange ways (improvement over 10 or so epochs followed by decreases until epoch 30)

Params: epoch = 30 | batch_size = 16 | ***n_hidden = 32*** | activation = tanh

*Result 1:* Accuracy went up and down from 3.3% to 5.5% | loss simply fluctuated: from 24.2 to 24.7 | recognised 7 stations

*Result 2:* Accuracy went up and down from 4.6% to 10.4% | loss simply fluctuated: from 22.2 to 24.5 | recognised 6 stations

**Note:** Model accuracy still fluctuates quite a lot. Might toy with the idea of changing the drop out rate and number of epochs later.

Params: epoch = 30 | batch_size = 16 | ***n_hidden = 64*** | activation = tanh

*Result 1:* Accuracy went down and up from 5.9% to 8.4% | loss simply fluctuated: from 24.5 to 24.6 | recognised 6 stations

*Result 2:* Accuracy went up and down from 4.1% to 11.6% | loss simply fluctuated: from 24.3 to 24.6 | recognised 3 stations

**Note:** Model accuracy went down very low before starting its rise up towards the end. May need more epochs

### 8.4 - Other Adjustments

Params: epoch = 30 | batch_size = 16 | ***n_hidden = 64*** | activation = tanh | **Convolution & Pooling Layers**

*Result 1:* Accuracy started at 2.2%, flcutuated, finished at 3.8% | loss simply fluctuated: from 24.5 to 24.9 | recognised 3 stations

*Result 2:* Accuracy started at 2.5%, flcutuated, finished at 3.1% | loss simply fluctuated: from 24.3 to 24.5 | recognised 2 stations

**Note:** Extra layers did not seem to make much of a difference in this case

### 8.5 - Adjustments for Later

- Consider changing the number of **epochs** to allow the model more time to converge (maybe up to 100)
- Consider changing the **dropout rate** from 0.5 to 0.2 or 0.3. this may reduce the fluctuations seen in the accuracy over many epochs
