# Assignment 4: Neural Networks (30 marks)
### Due date: March 31 at 11:59pm
*Author: Hetalben Virani*

For this assignment, you will be practicing using scikit-learn and TensorFlow to implement basic neural networks (MLP). You can use the given dataset below, or you can use the dataset you have selected for your project.

**Note: If you use the dataset from your project - this assignment is meant to be completed individually. You may work with your group members to complete this assignment, but the work you submit must be your own. Submitting identical assignments is a form of academic misconduct**

In [33]:
import numpy as np
import pandas as pd

## Part 1: Load your dataset (1 mark)

As stated above, you can use the dataset from your project. If you want to practice neural networks with a different dataset, you can use the energy dataset from Yellowbrick (https://www.scikit-yb.org/en/latest/api/datasets/energy.html)

In [34]:
!pip install yellowbrick

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [35]:
# Load dataset

from yellowbrick.datasets import load_energy
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
df = pd.read_excel('ENB2012_data1.xlsx')
df.columns=["relative compactness","surface area","wall area","roof area","overall height","orientation","glazing area","glazing area distribution","heating load", "cooling load"]



## Part 2: Process your dataset (5 marks)

In [36]:
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_absolute_error

In [37]:
# Check if there are any missing values - if yes, decide how to fill them

print(df.isnull().sum())

relative compactness         0
surface area                 0
wall area                    0
roof area                    0
overall height               0
orientation                  0
glazing area                 0
glazing area distribution    0
heating load                 0
cooling load                 0
dtype: int64


In [38]:
# Check the range of each feature - do you need to scale your data?
range=df.describe().loc[['min','max']]
print(range)

     relative compactness  surface area  wall area  roof area  overall height  \
min                  0.62         514.5      245.0     110.25             3.5   
max                  0.98         808.5      416.5     220.50             7.0   

     orientation  glazing area  glazing area distribution  heating load  \
min          2.0           0.0                        0.0          6.01   
max          5.0           0.4                        5.0         43.10   

     cooling load  
min         10.90  
max         48.03  


In [39]:
# Split your data into training and testing datasets (select random_state=0 and use default test_size)

features = [
   "relative compactness",
   "surface area",
   "wall area",
   "roof area",
   "overall height",
   "orientation",
   "glazing area",
   "glazing area distribution"
]
target = ["heating load", "cooling load"]

X, y = df[features], df[target]

X.shape
y.shape

(768, 2)

In [40]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [41]:
# Implement scaling and/or encoding here if needed (2 marks for preprocessing properly or justifying why it isn't needed)


scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

Scaling is not really necessary for energy efficiency databases because all the features are numerical and measured on similar scales.

Only the columns "orientation" and "glazing area distribution" include categorial values; however, as these values are already numerically encoded, encoding is not necessary.

## Part 4: Implement MLP using scikit-learn (5 marks)

In [42]:
from sklearn.neural_network import MLPRegressor
from sklearn.model_selection import train_test_split

from sklearn.metrics import mean_absolute_error


In [43]:
# Test using default parameters (set max_iter=500 - for this assignment, don't worry about reaching convergence)


mlp1 = MLPRegressor(hidden_layer_sizes=(100,), activation='relu', solver='adam',alpha=0.001, max_iter=500, random_state=None)
mlp1.fit(X_train, y_train)




In [44]:
# Test using two hidden layers with 100 nodes each

mlp2 = MLPRegressor(hidden_layer_sizes=(100,100), activation='relu', solver='adam', max_iter=500, random_state=None)
mlp2.fit(X_train, y_train)




In [45]:
# Test using three hidden layers with 100 nodes each
mlp3 = MLPRegressor(hidden_layer_sizes=(100,100,100), activation='relu', solver='adam', max_iter=500, random_state=None)
mlp3.fit(X_train, y_train)




## Part 5: Implement MLP using TensorFlow (7 marks)

In [46]:
import tensorflow as tf

from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.layers.experimental import preprocessing

Instead of scaling the data using a scikit-learn scaler, you can scale the data using a normalization layer.

In [47]:
# Define normalization layer
normalizer = preprocessing.Normalization()
normalizer.adapt(X_train)

A normalization layer normalizes the input data so that the mean and standard deviation are close to 0 and 1, respectively. This can help to increase neural network performance by making the optimisation process more reliable and efficient. 

Using `keras.Sequential`, implement an MLP with the same hidden layer setups as above:

In [48]:
# One hidden layer with 100 nodes and the relu activation function
# Compile the model with loss='mean_absolute_error' and optimizer=tf.keras.optimizers.Adam(0.001)
# Fit the model using validation_split=0.2, verbose=0 and epochs=100


modeltf1= tf.keras.Sequential([
    normalizer,
    tf.keras.layers.Dense(100, activation='relu'),
    tf.keras.layers.Dense(1, activation='linear')
])

optim=tf.keras.optimizers.Adam(0.001)

modeltf1.compile(optimizer=optim,loss='mean_absolute_error')

history=modeltf1.fit(X_train, y_train,validation_split=0.2, epochs=100,verbose=0)




In [49]:
# Repeat with two hidden layers with 100 nodes each and the relu activation function


modeltf2 = tf.keras.Sequential([
    normalizer,
    tf.keras.layers.Dense(100, activation='relu'),
    tf.keras.layers.Dense(100, activation='relu'),
    tf.keras.layers.Dense(1, activation='linear')
])

optim=tf.keras.optimizers.Adam(0.001)

modeltf2.compile(optimizer=optim,loss='mean_absolute_error')

history=modeltf2.fit(X_train, y_train,validation_split=0.2, epochs=100,verbose=0)




In [50]:
# Repeat with three hidden layers with 100 nodes each and the relu activation function

modeltf3 = tf.keras.Sequential([
    normalizer,
    tf.keras.layers.Dense(100, activation='relu'),
    tf.keras.layers.Dense(100, activation='relu'),
    tf.keras.layers.Dense(100, activation='relu'),
    tf.keras.layers.Dense(1, activation='linear')
])

optim=tf.keras.optimizers.Adam(0.001)

modeltf3.compile(optimizer=optim,loss='mean_absolute_error')

history=modeltf3.fit(X_train, y_train,validation_split=0.2, epochs=100,verbose=0)


## Part 6: Compare the accuracy of both methods (7 marks)

For this part, calculate the mean absolute error for each model and print in a table using pandas

In [51]:
# Calculate the MAE for the three scikit-learn tests


y_pred = mlp1.predict(X_test)

mae = mean_absolute_error(y_test, y_pred)
print("mean_absolute_error",mae)

y_pred = mlp2.predict(X_test)

mae = mean_absolute_error(y_test, y_pred)
print("mean_absolute_error",mae)

y_pred = mlp3.predict(X_test)

mae = mean_absolute_error(y_test, y_pred)
print("mean_absolute_error",mae)


mean_absolute_error 1.971054823002238
mean_absolute_error 1.2197642210729747
mean_absolute_error 0.6974092523020543


In [52]:
# Calculate the MAE for the three tensor flow tests

mae = modeltf1.evaluate(X_test, y_test, verbose=0)
print("mean_absolute_error",mae)

mae = modeltf2.evaluate(X_test, y_test, verbose=0)
print("mean_absolute_error",mae)

mae = modeltf3.evaluate(X_test, y_test, verbose=0)
print("mean_absolute_error",mae)

mean_absolute_error 1.950364112854004
mean_absolute_error 1.7963893413543701
mean_absolute_error 1.710618257522583


In [54]:
# Print the results

results = {
    'Model': ['MAE_scikit-learn', 'MAE_tensor-flow'],
    'Model1': [2.0132, 1.9773],
    'Model2': [1.2209,1.7151],
    'Model3': [0.7827, 1.6292],
    
}

result1 = pd.DataFrame(results)


print(result1)

              Model  Model1  Model2  Model3
0  MAE_scikit-learn  2.0132  1.2209  0.7827
1   MAE_tensor-flow  1.9773  1.7151  1.6292


## Part 7: Questions (5 marks total)

### Question 1: Which model produced the least amount of error? (1 mark)

When we implemeneted the MLP using scikit-learn with three hidden layers with 100 nodes each and the relu activation function  then we got least mean_absolute_error 

              Model  Model1  Model2  Model3
0  MAE_scikit-learn  2.0132  1.2209  0.7827
1   MAE_tensor-flow  1.9773  1.7151  1.6292

### Question 2: Why are the numbers different between the scikit-learn and TensorFlow methods when we used the same number of hidden layers and hidden units per layer? (2 marks)

There are a few reasons why the MAEs differ even when we use the same number of hidden layers and neurons per layer.

Different methods and optimisation strategies are used by the scikit-learn and tensorflow packages, respectively.


For example: learning rate, regularisation strength, etc. can have a considerable impact on the performance of the model. Hyperparameter adjustment is also very important.


We must make sure that the hyperparameter values for both libraries are identical.


The MAE'S are also impacted by the preprocessing processes. For instance, if the data is scaled differently in scikit-learn and TensorFlow, this can produce different results.

### Question 3: Reflection (2 marks)
Include a sentence or two about:
- what you liked or disliked,
- found interesting, confusing, challenging, motivating
while working on this assignment.

This assignment provides you with a fundamental understanding of how to create MLP using TensorFlow and Scikit-Learn.

After doing this assignment I came across how the model performance changes with the change in layers and nodes in MLP by computing and comparing the MAE for each model while altering the hidden layers and nodes.

Understanding the hyperparameter parameters for tensorflow models felt a little difficult as the concept was the totally new for me so that I did clear basic fundamentals.

Further lectures, notes, and examples on how to design MLP using scikit-learn and tensor flow, as well as more lectures on the in-depth ideas of these libraries, in my opinion, would be more beneficial.