# Artificial Neural Network

### Importing the libraries

In [1]:
import numpy as np
import pandas as pd
import tensorflow as tf

In [2]:
tf.__version__

'2.13.0'

## Part 1 - Data Preprocessing

### Importing the dataset
This is why reading the source dataset is important, the acronym below stands for different things

Extracted from the website: https://archive.ics.uci.edu/dataset/294/combined+cycle+power+plant

Features consist of hourly average ambient variables
- Temperature (T) in the range 1.81°C and 37.11°C,
- Ambient Pressure (AP) in the range 992.89-1033.30 milibar,
- Relative Humidity (RH) in the range 25.56% to 100.16%
- Exhaust Vacuum (V) in teh range 25.36-81.56 cm Hg
- Net hourly electrical energy output (EP) 420.26-495.76 MW
The averages are taken from various sensors located around the plant that record the ambient variables every second. The variables are given without normalization. 

In [3]:
dataset = pd.read_excel('Dataset/Folds5x2_pp.xlsx')
# AT is average temperature and PE is power output (the info at the website above is abit fucked.)
dataset

Unnamed: 0,AT,V,AP,RH,PE
0,14.96,41.76,1024.07,73.17,463.26
1,25.18,62.96,1020.04,59.08,444.37
2,5.11,39.40,1012.16,92.14,488.56
3,20.86,57.32,1010.24,76.64,446.48
4,10.82,37.50,1009.23,96.62,473.90
...,...,...,...,...,...
9563,16.65,49.69,1014.01,91.00,460.03
9564,13.19,39.18,1023.67,66.78,469.62
9565,31.32,74.33,1012.92,36.48,429.57
9566,24.48,69.45,1013.86,62.39,435.74


### Split dataset into matrix feature and dependent variable

In [4]:
# take everything but the dependent variable
dataset.iloc[:, :-1]

Unnamed: 0,AT,V,AP,RH
0,14.96,41.76,1024.07,73.17
1,25.18,62.96,1020.04,59.08
2,5.11,39.40,1012.16,92.14
3,20.86,57.32,1010.24,76.64
4,10.82,37.50,1009.23,96.62
...,...,...,...,...
9563,16.65,49.69,1014.01,91.00
9564,13.19,39.18,1023.67,66.78
9565,31.32,74.33,1012.92,36.48
9566,24.48,69.45,1013.86,62.39


In [5]:
# :-1 selects is index slicing, i.e. select all indexes from lower bound to -1 (exclude upper bound), so everything except last index
X = dataset.iloc[:, :-1].values

# iloc[]: This is a pandas DataFrame attribute used for integer-location based indexing, meaning you can select elements of the DataFrame by using integer indices.
# In iloc[:, -1], the colon : in the first parameter signifies that we want all rows of the DataFrame.
# -1: This means the last column of the DataFrame is being selected. In Python, indexing starts at 0, and negative indexing starts from -1 for the last element. So, -1 would mean the last column, -2 would mean the second last column, and so on.
# values: This attribute is used to get a Numpy array representation of the DataFrame. It will give the values in the last column as a Numpy array.
# i.e. first parameter is for rows and second parameter is for columns, you can do index slicing for each param
y = dataset.iloc[:, -1].values

In [6]:
y

array([463.26, 444.37, 488.56, ..., 429.57, 435.74, 453.28])

### Splitting the dataset into the Training set and Test set

In [7]:
from sklearn.model_selection import train_test_split
# see your template if needed
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.2, random_state=0)

## Part 2 - Building the Artificial Neural Network (ANN)

This is the ANN we will be building
- We will have 4 features (each contributing to one neuron in the input layer)
- We will have 2 hidden layers with 6 neurons each
- Finally we will have the output layer, outputting the energy output prediction

(Side note: why particularly 2 hidden layers with 6 neurons? The course author said this is just learned from his personal experience that this will bring the best results)

![Alt text](ANN_Architecture.png)

### Initializing the ANN

In [8]:
# There are two types of artificial neural network
# 1. Sequence of layers
# 2. Computational graph (i.e. boltzmann machines - restricted boltzmann machine or deep boltzmann machines)
# Here, we will use a sequence of layer based on the diagram we showed above

# Instantiate object of the Sequential class, thus there are useful inbuilt methods inside
# side note: keras used to be a separate library, but tensorflow 2 incorporated keras into its library
ann = tf.keras.models.Sequential()

### Adding the input layer and the first hidden layer


- `tf.keras.layers.Dense(units=6, activation='relu')`: This creates a dense (fully connected) layer.

    - `units=6`: This argument specifies that there will be 6 neurons (or units) in this dense layer.
  
    - `activation='relu'`: This argument sets the activation function for the dense layer to ReLU (Rectified Linear Activation). An activation function defines the output of a neuron given an input. The ReLU function outputs the input directly if it is positive; otherwise, it will output zero. Mathematically, it is defined as \(f(x) = \max(0, x)\).

So in simple terms, this line of code is adding a fully connected layer with 6 neurons and ReLU activation to the `ann` model.

In [9]:
# You initialized an ANN above, but now you need to add in the different layers yourself.
# You can use the add function to add layers to a neural network

# The actual layer we want to create is under a class called 'Dense'. However, to access the 'Dense' class, we need to go through the tf library, then the keras library, then the layers module, then the Dense class. i.e. tf.keras.layers.Dense
# The intuition of the word dense is because of the "high density" connection between the layers, i.e. a lot of lines intersecting, fully connected layers

# units refer to the number of neurons we want in the layer - we have 6 in this case
ann.add(tf.keras.layers.Dense(units=6, activation='relu'))

# The activation function for the dense layer is set to ReLU (Rectified Linear Activation). An activation function defines the output of a neuron given an input. The ReLU function outputs the input directly if it is positive; otherwise, it will output zero. Mathematically, it is defined as \(f(x) = \max(0, x)\).


# Input layer is instantitated automatically
# Side note: notice how we didn't need to instantiate that there are four features in the input layer, this is because tensorflow will automatically detect the number of features in the input layer when we pass in the datasest later

### Adding the second hidden layer

In [10]:
ann.add(tf.keras.layers.Dense(units=6, activation='relu'))

### Adding the output layer

In [11]:
# Usually, for regression problems, you have a single neuron in the output layer with no activation function or a linear activation function.
# Rationale is that in the cases of predicting continuous value like housing price or temperature, we want the network to be able to predict a range of values as output, without any constraint. (other activation function like sigmoid or Relu will constrain the output to a certain range like between 0 and 1, etc.)
# side note: if you want to predict other kind of outputs, you can use other activation function - can ask ChatGPT
ann.add(tf.keras.layers.Dense(units=1))

## Part 3 - Training the ANN

### Compiling the ANN with an optimizer and loss function

In [12]:
# Optimizer: the tool which you'll use to perform stochastic gradient descent
# It will essentially calculate the loss function and then update the weights accordingly to minimize loss

# compile is a method to configure the model for training.
# Inside the compile method, there are two important parameters being set: optimizer and loss.

# optimizer='adam': The optimizer is responsible for updating the weights of the neurons in the network to minimize the loss function. 'Adam' is a specific type of optimization algorithm that is often used because it is efficient and has low memory requirements. It stands for "Adaptive Moment Estimation" and is known for its effectiveness in practice and efficiency in computation.

# loss='mean_squared_error': The loss function, or cost function, is a measure of how well the model is doing, and the training process aims to minimize this value. 'Mean Squared Error' is a common loss function used for regression problems. It calculates the average of the squares of the differences between predicted and actual values.

# As for the rationale WHY we use adam specifically and loss specifically.... ask chatgpt...
ann.compile(optimizer='adam', loss='mean_squared_error')

### Training the ANN model on the Training set

In [13]:
# .fit is to train the model
# you start with epochs=100 as a general guide
# we usually use batch_size = 32 as a general guide - i.e. mini batch gradient descent is a more precise term than stochastic gradient descent. (SGD uses a batch size of 1, while mini-batch gradient descent uses a batch size greater than 1.)
# if u run this, you'll realize at around epoch=50, the loss will start to plateau/converge, so you can stop the training at around epoch=50
ann.fit(X_train, y_train, batch_size=32, epochs=100)

Epoch 1/100


Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78/100
Epoch 7

<keras.src.callbacks.History at 0x21051bb7250>

## Part 4 - Making Predictions

### Predicting the results of the Test set

In [14]:
# using our trained ANN to do prediction
y_pred = ann.predict(X_test)

# ensure 2dp output when printing - just to make sure output is neater
np.set_printoptions(precision=2)





In [15]:
y_pred.shape

(1914, 1)

In [16]:

# the issue is that y_pred is horizontal
side_by_side_comparison = np.concatenate((y_pred.reshape(len(y_pred), 1), y_test.reshape(len(y_test), 1)), 1)
# left side is predicted value, right side is actual value
# they are pretty damn close - 431.42 vs 431.23
# excellent performance.
side_by_side_comparison

array([[430.99, 431.23],
       [462.01, 460.01],
       [465.5 , 461.14],
       ...,
       [472.72, 473.26],
       [439.58, 438.  ],
       [458.75, 463.28]])

In [1]:
# this code seems kinda wrong
# import matplotlib.pyplot as plt

# reshaped_y_pred = y_pred.reshape(len(y_pred), 1)
# reshaped_y_test = y_test.reshape(len(y_test), 1)
# plt.scatter(reshaped_y_test, reshaped_y_pred)
# plt.xlabel('True Values')
# plt.ylabel('Predictions')
# plt.axis('equal')
# plt.axis('square')
# plt.plot([-100, 100], [-100, 100], c='red')
# plt.show()