#Introduction

Below we will use our previously prepared data from the data science part of this assignment to train, evaluate and make predictions for the number of collisions on any given day. The DNN will use one-hot encoded categorical input variables, such as day of the week, months of the year and standardised max temperature.

In [None]:
#required for data frame
import pandas as pd

#required for maths calculations
import numpy as np

# create data frame from csv file hosted on github
dnn = pd.read_csv('https://raw.githubusercontent.com/Ritchie-Robinson/22024961_DataAnalytics/main/dnndata.csv', index_col=0)

In [None]:
import tensorflow as tf

from tensorflow import keras
from tensorflow.keras import layers

print(tf.__version__)

2.15.0


In [None]:
dnn[:5]

Unnamed: 0,Apr,Aug,Dec,Feb,Jan,Jul,Jun,Mar,May,Nov,...,Fri,Mon,Sat,Sun,Thu,Tue,Wed,year,max_temp_standardised,num_collisions
1,0,0,0,0,1,0,0,0,0,0,...,0,0,0,0,1,0,0,2019,-1.466071,0.473577
2,0,0,0,0,1,0,0,0,0,0,...,1,0,0,0,0,0,0,2014,-2.561928,0.715447
3,0,0,0,0,1,0,0,0,0,0,...,0,0,0,1,0,0,0,2019,-0.912551,0.186992
4,0,0,0,0,1,0,0,0,0,0,...,0,0,0,0,0,0,1,2019,-0.409351,0.302846
5,0,0,0,0,1,0,0,0,0,0,...,0,1,0,0,0,0,0,2019,-1.795946,0.711382


Below, we have created five models. Models 0 through 4.

In Model 0, we will use all of the available input variables that showed either a positive or negative correlation in our data science element of the project. These include the month, day, year, and maximum temperature.

In Model 1, we will drop the year variable.

In Model 2, we will drop the year and month variables.

In Model 3, we will use day only, as this has the strongest correlation.

In Model 4, we will use all variables except the day of the week.



#Model 0 - Month, Day, Year, Max Temp

In [None]:
#create new df with headers as required
dnn_data_m0 = [dnn["Jan"], dnn["Feb"], dnn["Mar"], dnn["Apr"], dnn["May"], dnn["Jun"], dnn["Jul"], dnn["Aug"], dnn["Sep"], dnn["Oct"], dnn["Nov"], dnn["Dec"],
                  dnn["Mon"], dnn["Tue"], dnn["Wed"], dnn["Thu"], dnn["Fri"], dnn["Sat"], dnn["Sun"], dnn["year"], dnn["max_temp_standardised"], dnn["num_collisions"]]
dnn_headers_m0 = ["Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec", "Mon","Tue","Wed","Thu","Fri", "Sat", "Sun", "year", "max_temp", "num_collisions"]
#concat headers and data as required
df_dnn_m0 = pd.concat(dnn_data_m0, axis=1, keys=dnn_headers_m0)
#print
df_dnn_m0.head()

Unnamed: 0,Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,...,Mon,Tue,Wed,Thu,Fri,Sat,Sun,year,max_temp,num_collisions
1,1,0,0,0,0,0,0,0,0,0,...,0,0,0,1,0,0,0,2019,-1.466071,0.473577
2,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,1,0,0,2014,-2.561928,0.715447
3,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,1,2019,-0.912551,0.186992
4,1,0,0,0,0,0,0,0,0,0,...,0,0,1,0,0,0,0,2019,-0.409351,0.302846
5,1,0,0,0,0,0,0,0,0,0,...,1,0,0,0,0,0,0,2019,-1.795946,0.711382


In [None]:
#split training set and test set 80/20
training_dataset_m0 = df_dnn_m0.sample(frac=0.8, random_state=0)
test_dataset_m0 = df_dnn_m0.drop(training_dataset_m0.index)

In [None]:
#split labels (outputs) and features (inputs)
training_features_m0 = training_dataset_m0.copy()
test_features_m0 = test_dataset_m0.copy()

training_labels_m0 = training_features_m0.pop('num_collisions')
test_labels_m0 = test_features_m0.pop('num_collisions')

In [None]:
normaliser_m0 = tf.keras.layers.Normalization(axis=-1)
normaliser_m0.adapt(np.array(training_features_m0))

In [None]:
#define DNN regression model with two hidden layers of 48 neurons each and one output layer.
#mean absolute error as loss function and Adam optimiser.
dnn_model_0 = keras.Sequential([
      normaliser_m0,
      layers.Dense(48, activation='relu'),
      layers.Dense(48, activation='relu'),
      layers.Dense(1)
  ])

dnn_model_0.compile(loss='mean_absolute_error',
                optimizer=tf.keras.optimizers.Adam(0.001))

##Train the Model

In [None]:
#The model is trained using the training features and labels. Training is done for 100 epochs with a validation split of 20%.
%%time
history_m0 = dnn_model_0.fit(
    training_features_m0,
    training_labels_m0,
    validation_split=0.2,
    verbose=0,
    epochs=100)

CPU times: user 19.7 s, sys: 811 ms, total: 20.5 s
Wall time: 21.4 s


##Evaluate Model

In [None]:
dnn_model_0_results = dnn_model_0.evaluate(test_features_m0, test_labels_m0, verbose=0)
print(dnn_model_0_results)

0.10664400458335876


##Predictions

In [None]:
#create dummy input data
input_0 = pd.DataFrame.from_dict(data =
				{
         'Jan' : [0,0,0],
         'Feb' : [0,0,1],
         'Mar' : [0,0,0],
         'Apr' : [0,0,0],
         'May' : [0,0,0],
         'Jun' : [0,0,0],
         'Jul' : [0,0,0],
         'Aug' : [1,0,0],
         'Sep' : [0,0,0],
         'Oct' : [0,0,0],
         'Nov' : [0,0,0],
         'Dec' : [0,1,0],
         'Mon' : [0,0,0],
         'Tue' : [0,0,1],
         'Wed' : [0,0,0],
         'Thu' : [0,0,0],
         'Fri' : [0,0,0],
         'Sat' : [0,1,0],
         'Sun' : [1,0,0],
         'year' : [2015,2014,2019],
         'max_temp' : [-0.063549, -1.735158	, 1.539546],
        })

In [None]:
m0_predictions = dnn_model_0.predict(input_0[:3])
print("\nNormalised:\n", m0_predictions)

SCALE_NUM_COLL = 1.0
min_val = 353
max_val = 845

unnormalised_predictions_0 = m0_predictions / SCALE_NUM_COLL * (max_val - min_val) + min_val

unnormalised_predictions_0 = unnormalised_predictions_0.astype(int)

print("\nAbsolute Values:\n", unnormalised_predictions_0)


Normalised:
 [[0.15917325]
 [0.43280727]
 [0.42833877]]

Absolute Values:
 [[431]
 [565]
 [563]]


#Model 1 - Month, Day, Max Temp

In [None]:
#create new df with headers as required
dnn_data_m1 = [dnn["Jan"], dnn["Feb"], dnn["Mar"], dnn["Apr"], dnn["May"], dnn["Jun"], dnn["Jul"], dnn["Aug"], dnn["Sep"], dnn["Oct"], dnn["Nov"], dnn["Dec"],
                  dnn["Mon"], dnn["Tue"], dnn["Wed"], dnn["Thu"], dnn["Fri"], dnn["Sat"], dnn["Sun"], dnn["max_temp_standardised"], dnn["num_collisions"]]
dnn_headers_m1 = ["Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec", "Mon","Tue","Wed","Thu","Fri", "Sat", "Sun", "max_temp", "num_collisions"]
#concat headers and data as required
df_dnn_m1 = pd.concat(dnn_data_m1, axis=1, keys=dnn_headers_m1)
#print
df_dnn_m1.head()

Unnamed: 0,Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,...,Dec,Mon,Tue,Wed,Thu,Fri,Sat,Sun,max_temp,num_collisions
1,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,1,0,0,0,-1.466071,0.473577
2,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,1,0,0,-2.561928,0.715447
3,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,1,-0.912551,0.186992
4,1,0,0,0,0,0,0,0,0,0,...,0,0,0,1,0,0,0,0,-0.409351,0.302846
5,1,0,0,0,0,0,0,0,0,0,...,0,1,0,0,0,0,0,0,-1.795946,0.711382


In [None]:
#split training set and test set 80/20
training_dataset_m1 = df_dnn_m1.sample(frac=0.8, random_state=0)
test_dataset_m1 = df_dnn_m1.drop(training_dataset_m1.index)

In [None]:
#split labels (outputs) and features (inputs)
training_features_m1 = training_dataset_m1.copy()
test_features_m1 = test_dataset_m1.copy()

training_labels_m1 = training_features_m1.pop('num_collisions')
test_labels_m1 = test_features_m1.pop('num_collisions')

In [None]:
normaliser_m1 = tf.keras.layers.Normalization(axis=-1)
normaliser_m1.adapt(np.array(training_features_m1))

In [None]:
#define DNN regression model with two hidden layers of 48 neurons each and one output layer.
#mean absolute error as loss function and Adam optimiser.
dnn_model_1 = keras.Sequential([
      normaliser_m1,
      layers.Dense(48, activation='relu'),
      layers.Dense(48, activation='relu'),
      layers.Dense(1)
  ])

dnn_model_1.compile(loss='mean_absolute_error',
                optimizer=tf.keras.optimizers.Adam(0.001))

##Train the Model

In [None]:
# The model is trained using the training features and labels. Training is done for 100 epochs with a validation split of 20%.
%%time
history_m1 = dnn_model_1.fit(
    training_features_m1,
    training_labels_m1,
    validation_split=0.2,
    verbose=0,
    epochs=100)

CPU times: user 19.4 s, sys: 745 ms, total: 20.1 s
Wall time: 19.6 s


##Evaluate Model

In [None]:
dnn_model_1_results = dnn_model_1.evaluate(test_features_m1, test_labels_m1, verbose=0)
print(dnn_model_1_results)

0.11528263241052628


##Predictions

In [None]:
#create dummy input data
input_1 = pd.DataFrame.from_dict(data =
				{
         'Jan' : [0,0,0],
         'Feb' : [0,0,1],
         'Mar' : [0,0,0],
         'Apr' : [0,0,0],
         'May' : [0,0,0],
         'Jun' : [0,0,0],
         'Jul' : [0,0,0],
         'Aug' : [1,0,0],
         'Sep' : [0,0,0],
         'Oct' : [0,0,0],
         'Nov' : [0,0,0],
         'Dec' : [0,1,0],
         'Mon' : [1,0,0],
         'Tue' : [0,0,0],
         'Wed' : [0,0,0],
         'Thu' : [0,1,0],
         'Fri' : [0,0,0],
         'Sat' : [0,0,1],
         'Sun' : [0,0,0],
         'max_temp' : [-0.063549, -1.735158	, 1.539546],
        })

In [None]:
m1_predictions = dnn_model_1.predict(input_1[:3])
print("\nNormalised:\n", m1_predictions)

SCALE_NUM_COLL = 1.0
min_val = 353
max_val = 845

unnormalised_predictions_1 = m1_predictions / SCALE_NUM_COLL * (max_val - min_val) + min_val

unnormalised_predictions_1 = unnormalised_predictions_1.astype(int)

print("\nAbsolute Values:\n", unnormalised_predictions_1)


Normalised:
 [[0.32035854]
 [0.61727923]
 [0.5403572 ]]

Absolute Values:
 [[510]
 [656]
 [618]]


#Model 2 - Day, Max Temp

In [None]:
#create new df with headers as required
dnn_data_m2 = [dnn["Mon"], dnn["Tue"], dnn["Wed"], dnn["Thu"], dnn["Fri"], dnn["Sat"], dnn["Sun"], dnn["max_temp_standardised"], dnn["num_collisions"]]
dnn_headers_m2 = ["Mon","Tue","Wed","Thu","Fri", "Sat", "Sun", "max_temp", "num_collisions"]
#concat headers and data as required
df_dnn_m2 = pd.concat(dnn_data_m2, axis=1, keys=dnn_headers_m2)
#print
df_dnn_m2.head()

Unnamed: 0,Mon,Tue,Wed,Thu,Fri,Sat,Sun,max_temp,num_collisions
1,0,0,0,1,0,0,0,-1.466071,0.473577
2,0,0,0,0,1,0,0,-2.561928,0.715447
3,0,0,0,0,0,0,1,-0.912551,0.186992
4,0,0,1,0,0,0,0,-0.409351,0.302846
5,1,0,0,0,0,0,0,-1.795946,0.711382


In [None]:
#split training set and test set 80/20
training_dataset_m2 = df_dnn_m2.sample(frac=0.8, random_state=0)
test_dataset_m2 = df_dnn_m2.drop(training_dataset_m2.index)

In [None]:
#split labels (outputs) and features (inputs)
training_features_m2 = training_dataset_m2.copy()
test_features_m2 = test_dataset_m2.copy()

training_labels_m2 = training_features_m2.pop('num_collisions')
test_labels_m2 = test_features_m2.pop('num_collisions')

In [None]:
normaliser_m2 = tf.keras.layers.Normalization(axis=-1)
normaliser_m2.adapt(np.array(training_features_m2))

In [None]:
#define DNN regression model with two hidden layers of 48 neurons each and one output layer.
#mean absolute error as loss function and Adam optimiser.
dnn_model_2 = keras.Sequential([
      normaliser_m2,
      layers.Dense(48, activation='relu'),
      layers.Dense(48, activation='relu'),
      layers.Dense(1)
  ])

dnn_model_2.compile(loss='mean_absolute_error',
                optimizer=tf.keras.optimizers.Adam(0.001))

##Train the Model

In [None]:
# The model is trained using the training features and labels. Training is done for 100 epochs with a validation split of 20%.
%%time
history_m2 = dnn_model_2.fit(
    training_features_m2,
    training_labels_m2,
    validation_split=0.2,
    verbose=0,
    epochs=100)

CPU times: user 19 s, sys: 801 ms, total: 19.8 s
Wall time: 21.3 s


##Evaluate Model

In [None]:
dnn_model_2_results = dnn_model_2.evaluate(test_features_m2, test_labels_m2, verbose=0)
print(dnn_model_2_results)

0.12059439718723297


##Predictions

In [None]:
#create dummy input data
input_2 = pd.DataFrame.from_dict(data =
				{
         'Mon' : [1,0,0],
         'Tue' : [0,0,0],
         'Wed' : [0,1,0],
         'Thu' : [0,0,0],
         'Fri' : [0,0,0],
         'Sat' : [0,0,1],
         'Sun' : [0,0,0],
         'max_temp' : [-0.063549, -1.735158	, 1.539546],
        })

In [None]:
m2_predictions = dnn_model_2.predict(input_2[:3])
print("\nNormalised:\n", m2_predictions)

SCALE_NUM_COLL = 1.0
min_val = 353
max_val = 845

unnormalised_predictions_2 = m2_predictions / SCALE_NUM_COLL * (max_val - min_val) + min_val

unnormalised_predictions_2 = unnormalised_predictions_2.astype(int)

print("\nAbsolute Values:\n", unnormalised_predictions_2)




Normalised:
 [[0.53175783]
 [0.41505828]
 [0.43503448]]

Absolute Values:
 [[614]
 [557]
 [567]]


#Model 3 - Day

In [None]:
#create new df with headers as required
dnn_data_m3 = [dnn["Mon"], dnn["Tue"], dnn["Wed"], dnn["Thu"], dnn["Fri"], dnn["Sat"], dnn["Sun"], dnn["num_collisions"]]
dnn_headers_m3 = ["Mon","Tue","Wed","Thu","Fri", "Sat", "Sun", "num_collisions"]
#concat headers and data as required
df_dnn_m3 = pd.concat(dnn_data_m3, axis=1, keys=dnn_headers_m3)
#print
df_dnn_m3.head()

Unnamed: 0,Mon,Tue,Wed,Thu,Fri,Sat,Sun,num_collisions
1,0,0,0,1,0,0,0,0.473577
2,0,0,0,0,1,0,0,0.715447
3,0,0,0,0,0,0,1,0.186992
4,0,0,1,0,0,0,0,0.302846
5,1,0,0,0,0,0,0,0.711382


In [None]:
#split training set and test set 80/20
training_dataset_m3 = df_dnn_m3.sample(frac=0.8, random_state=0)
test_dataset_m3 = df_dnn_m3.drop(training_dataset_m3.index)

In [None]:
#split labels (outputs) and features (inputs)
training_features_m3 = training_dataset_m3.copy()
test_features_m3 = test_dataset_m3.copy()

training_labels_m3 = training_features_m3.pop('num_collisions')
test_labels_m3 = test_features_m3.pop('num_collisions')

In [None]:
normaliser_m3 = tf.keras.layers.Normalization(axis=-1)
normaliser_m3.adapt(np.array(training_features_m3))

In [None]:
#define DNN regression model with two hidden layers of 48 neurons each and one output layer.
#mean absolute error as loss function and Adam optimiser.
dnn_model_3 = keras.Sequential([
      normaliser_m3,
      layers.Dense(48, activation='relu'),
      layers.Dense(48, activation='relu'),
      layers.Dense(1)
  ])

dnn_model_3.compile(loss='mean_absolute_error',
                optimizer=tf.keras.optimizers.Adam(0.001))

##Train the Model

In [None]:
#The model is trained using the training features and labels. Training is done for 100 epochs with a validation split of 20%.
%%time
history_m3 = dnn_model_3.fit(
    training_features_m3,
    training_labels_m3,
    validation_split=0.2,
    verbose=0,
    epochs=100)

CPU times: user 18.7 s, sys: 808 ms, total: 19.5 s
Wall time: 41.7 s


##Evaluate Model

In [None]:
dnn_model_3_results = dnn_model_3.evaluate(test_features_m3, test_labels_m3, verbose=0)
print(dnn_model_3_results)

0.125098317861557


##Predictions

In [None]:
#create dummy input data
input_3 = pd.DataFrame.from_dict(data =
				{
         'Mon' : [1,0,0],
         'Tue' : [0,0,0],
         'Wed' : [0,1,0],
         'Thu' : [0,0,0],
         'Fri' : [0,0,0],
         'Sat' : [0,0,1],
         'Sun' : [0,0,0]
        })

In [None]:
m3_predictions = dnn_model_3.predict(input_3[:3])
print("\nNormalised:\n", m3_predictions)

SCALE_NUM_COLL = 1.0
min_val = 353
max_val = 845

unnormalised_predictions_3 = m3_predictions / SCALE_NUM_COLL * (max_val - min_val) + min_val

unnormalised_predictions_3 = unnormalised_predictions_3.astype(int)

print("\nAbsolute Values:\n", unnormalised_predictions_3)




Normalised:
 [[0.52345586]
 [0.53807616]
 [0.43205953]]

Absolute Values:
 [[610]
 [617]
 [565]]


#Model 4 - Month, Year, Max Temp

Finally, we will run a model with everything but the day of the week. Our hypothesis is that this should be the worst model, as each day of the week has the strongest correlation.

In [None]:
#create new df with headers as required
dnn_data_m4 = [dnn["Jan"], dnn["Feb"], dnn["Mar"], dnn["Apr"], dnn["May"], dnn["Jun"], dnn["Jul"], dnn["Aug"], dnn["Sep"], dnn["Oct"], dnn["Nov"], dnn["Dec"],
                   dnn["year"], dnn["max_temp_standardised"], dnn["num_collisions"]]
dnn_headers_m4 = ["Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec", "year", "max_temp", "num_collisions"]
#concat headers and data as required
df_dnn_m4 = pd.concat(dnn_data_m4, axis=1, keys=dnn_headers_m4)
#print
df_dnn_m4.head()

Unnamed: 0,Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec,year,max_temp,num_collisions
1,1,0,0,0,0,0,0,0,0,0,0,0,2019,-1.466071,0.473577
2,1,0,0,0,0,0,0,0,0,0,0,0,2014,-2.561928,0.715447
3,1,0,0,0,0,0,0,0,0,0,0,0,2019,-0.912551,0.186992
4,1,0,0,0,0,0,0,0,0,0,0,0,2019,-0.409351,0.302846
5,1,0,0,0,0,0,0,0,0,0,0,0,2019,-1.795946,0.711382


In [None]:
#split training set and test set 80/20
training_dataset_m4 = df_dnn_m4.sample(frac=0.8, random_state=0)
test_dataset_m4 = df_dnn_m4.drop(training_dataset_m4.index)

In [None]:
#split labels (outputs) and features (inputs)
training_features_m4 = training_dataset_m4.copy()
test_features_m4 = test_dataset_m4.copy()

training_labels_m4 = training_features_m4.pop('num_collisions')
test_labels_m4 = test_features_m4.pop('num_collisions')

In [None]:
normaliser_m4 = tf.keras.layers.Normalization(axis=-1)
normaliser_m4.adapt(np.array(training_features_m4))

In [None]:
#define DNN regression model with two hidden layers of 48 neurons each and one output layer.
#mean absolute error as loss function and Adam optimiser.
dnn_model_4 = keras.Sequential([
      normaliser_m4,
      layers.Dense(48, activation='relu'),
      layers.Dense(48, activation='relu'),
      layers.Dense(1)
  ])

dnn_model_4.compile(loss='mean_absolute_error',
                optimizer=tf.keras.optimizers.Adam(0.001))

##Train the Model

In [None]:
#The model is trained using the training features and labels. Training is done for 100 epochs with a validation split of 20%.
%%time
history_m4 = dnn_model_4.fit(
    training_features_m4,
    training_labels_m4,
    validation_split=0.2,
    verbose=0,
    epochs=100)

CPU times: user 19.5 s, sys: 769 ms, total: 20.3 s
Wall time: 21.7 s


##Evaluate Model

In [None]:
dnn_model_4_results = dnn_model_4.evaluate(test_features_m4, test_labels_m4, verbose=0)
print(dnn_model_4_results)

0.13752339780330658


##Predictions

In [None]:
#create dummy input data
input_4 = pd.DataFrame.from_dict(data =
				{
         'Jan' : [0,0,0],
         'Feb' : [0,0,1],
         'Mar' : [0,0,0],
         'Apr' : [0,0,0],
         'May' : [0,0,0],
         'Jun' : [0,0,0],
         'Jul' : [0,0,0],
         'Aug' : [1,0,0],
         'Sep' : [0,0,0],
         'Oct' : [0,0,0],
         'Nov' : [0,0,0],
         'Dec' : [0,1,0],
         'year' : [2015,2014,2019],
         'max_temp' : [-0.063549, -1.735158	, 1.539546],
        })

In [None]:
m4_predictions = dnn_model_4.predict(input_4[:3])
print("\nNormalised:\n", m4_predictions)

SCALE_NUM_COLL = 1.0
min_val = 353
max_val = 845

unnormalised_predictions_4 = m4_predictions / SCALE_NUM_COLL * (max_val - min_val) + min_val

unnormalised_predictions_4 = unnormalised_predictions_4.astype(int)

print("\nAbsolute Values:\n", unnormalised_predictions_4)


Normalised:
 [[0.3338152 ]
 [0.5730394 ]
 [0.58465886]]

Absolute Values:
 [[517]
 [634]
 [640]]


#Conclusion

Overall, model 0 provided the best results (approx. 0.1066) with month, day of week, year, and max temperature. This shows that the DNN was able to pick up patterns with more features that had stronger correlations. Whereas, the model that performed the worst was model 4 (approx. 0.1375), which did not include the day of the week feature. This is because the day of the week was the variable with the strongest correlation. However, as the target label num_collisions has been normalised (between the range of 0 and 1), an evaluation metric of 0.1+ is not a low enough evaluation metric value, meaning the error between the models’ predictions and actual real-world data is too high. As a result, further work is required to improve these models.

Like with our linear regression, we can iteratively look again at our data science element of the project, such as further feature engineering, collecting larger datasets (potentially using the pre- and post-lockdown down omitted previously), and better data pre-processing. However, we can also consider the below changes to improve the accuracy of the models:

· Nodes and Layers: Mbura (2023) describes the input layer, hidden layers, and the output layer of a DNN, along with its nodes. In our models, we have used two hidden layers with 48 neurons each and an output layer with 1 neuron for regression. Further research would involve experimenting with the number of layers and nodes in each layer.

· Regularisation: According to Wu et al. (2018), regularisation techniques are commonly used in DNN training to reduce over-fitting. Therefore, we could apply regularisation with the kernel_regularizer (TensorFlow, 2024).

· Loss Function: As stated on Keras (2024), loss functions “compute the quantity that a model should seek to minimize during training”. We have used MAE as our loss function. However, we could experiment with different loss functions to achieve different results.

· Learning Rate: We could further experiment with adjusting the learning rate of the optimiser. Wu et al. (2019) explain that choosing a learning rate too small can lead to our model learning too slowly, whereas a learning rate too large can make the learning unstable or overshoot. Therefore, choosing the right learning rate allows us to find the best solution without overshooting. The learning rate is set by entering a float value in the Adam optimizer (Keras, 2024).

#Bibliography

Keras (2024) Adam. Available at: https://keras.io/api/optimizers/adam/ (Accessed: 06 Jan 2024).

Keras (2024) Losses. Available at: https://keras.io/api/losses/ (Accessed: 06 Jan 2024).

Mbura, P. (2023) Introduction to Deep Learning with TensorFlow and Keras. Available at: https://python.plainenglish.io/introduction-to-deep-learning-with-tensorflow-and-keras-d9008758ba61 (Accessed: 26 Aug 2023).

TensorFlow (2024) tf.keras.regularizers.Regularizer. Available at: https://www.tensorflow.org/api_docs/python/tf/keras/regularizers/Regularizer (Accessed: 06 Jan 2024).

Wu, C. et al. (2018) 'Improving Interpretability and Regularization in Deep Learning', IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26(2), pp.256-65.

Wu, Y. et al. (2019) 'Demystifying Learning Rate Policies for High Accuracy Training of Deep Neural Networks.' 2019 IEEE International Conference on Big Data (Big Data). Los Angeles, CA IEEE, pp.1971-80.