# Cement Strength Neural Network


## Goal 
- Develop a neural network that can predict cement strength based on various features.

## Approach
- We will primarily be using Python and the deep learning library Keras to develop such solution.

## Performance Evaluation
- For the evaluation of our model, we will be using the mean squared error method (MSE) which can be represented by the following summation : $\frac{1}{n} \sum \limits _{i=1} ^{n} (y_i - \hat y_i)^2$
    - Where $y_i$ is the actual value and $\hat y_i$ is the prediction for a given sample $i$ from $n$ total samples.
- We will try to minimize such sum in the process of building the solution.

## Neural Network Architecture
- Through multiple implementations, we will explore various neural network architectures. We should evaluate these different neural networks via our performance metrics in order to assess which one is the best at solving our problem.

### Implementation details
We will start by developing a neural network that consists of one hidden layer with 10 nodes and a ReLU activation function.
<img src="https://i.ibb.co/31MmjTB/Neural-Network-Architecture-dot.png" alt="Drawing" style="height: 300px;"/>

#### Dependencies
First we need to install the keras library as well as tensorflow, since keras is based on tensorflow backend.

In [1]:
pip install tensorflow && pip install keras

Collecting pyasn1<0.5.0,>=0.4.6
  Downloading pyasn1-0.4.8-py2.py3-none-any.whl (77 kB)
[K     |████████████████████████████████| 77 kB 5.2 MB/s  eta 0:00:01
Installing collected packages: pyasn1
Successfully installed pyasn1-0.4.8
/bin/bash: pip: command not found
Note: you may need to restart the kernel to use updated packages.


Once keras has been installed, we can import the needed dependencies

In [2]:
import pandas as pd
import numpy as np
import keras
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import InputLayer
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

#### Dataset
We will be using the same dataset that was used in lab3 about concrete compressive strength.


In [3]:
dataset = pd.read_csv("https://cocl.us/concrete_data")
dataset.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.3


Let's retrieve the X and y of this dataset.  


In [4]:
yColumnName = 'Strength'

X = dataset[dataset.columns[dataset.columns != yColumnName]]
y = dataset[yColumnName]

Let's take a look at our inputs and target values.


In [5]:
X.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360


In [6]:
y.head()

0    79.99
1    61.89
2    40.27
3    41.05
4    44.30
Name: Strength, dtype: float64

#### Building the neural network
Here we define a class that represents a neural network. 
  
With this class, we can build a neural network with some level of configuration.

We decided to use the **_adam_** optimizer and the **_mean squared error_** as the loss function

In [7]:
class NeuralNetwork:

    def __init__(
        self,
        numberOfInputs,
        numberOfHiddenLayers, 
        numberOfNodesInHiddenLayers
    ):
        self.numberOfInputs = numberOfInputs
        self.numberOfHiddenLayers = numberOfHiddenLayers
        self.numberOfNodesInHiddenLayers = numberOfNodesInHiddenLayers
        self.__initializeModel()


    def __initializeModel(self):
        self.__createModel()
        
        self.__addInputLayer()
        self.__addHiddenLayers()
        self.__addOutputLayer()
        
        self.__compileModel()


    def __createModel(self):
        self.model = Sequential()


    def __addInputLayer(self):
        self.model.add(InputLayer(input_shape=(self.numberOfInputs,)))


    def __addHiddenLayers(self):
        for layer in range(self.numberOfHiddenLayers):
            self.model.add(Dense(self.numberOfNodesInHiddenLayers, activation='relu'))


    def __addOutputLayer(self):
        self.model.add(Dense(1))


    def __compileModel(self):
        self.model.compile(optimizer='adam', loss='mean_squared_error')


    def undoLearning(self):
        self.__initializeModel()


    def train(
        self, 
        inputs, 
        targetOutputs,
        epochs
    ):
        self.model.fit(
            inputs, 
            targetOutputs,
            epochs=epochs,
            verbose=0
        )


    def predict(
        self,
        inputs
    ):
        return self.model.predict(inputs)


    def __str__(self):
        return (
                "----------------------------------\n" +
                "Inputs: {}\n" + 
                "Hidden layers count: {}\n" + 
                    "\tHidden layer nodes count: {}" + 
                "\nOutputs: 1\n" +
                "----------------------------------"
               ).format(
                    self.numberOfInputs, 
                    self.numberOfHiddenLayers, 
                    self.numberOfNodesInHiddenLayers
                )

We create a neural network for this first implementation with the desired configuration described earlier.


In [8]:
neuralNetwork1 = NeuralNetwork(
    numberOfInputs=X.shape[1], 
    numberOfHiddenLayers=1,
    numberOfNodesInHiddenLayers=10
)
print(neuralNetwork1)

----------------------------------
Inputs: 8
Hidden layers count: 1
	Hidden layer nodes count: 10
Outputs: 1
----------------------------------


#### Training the neural network

We will train our neural network with data that has been randomly split, we will reserve 30% of our dataset for test and use the remaining 70% for training.

This way we will achieve out-of-sample testing.

Note: If we wanted to obtain the same split every time we execute the code, we should consider using the parameter `random_state` of `train_test_split`. 
For more information, see https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html.

Here we did not want to control the shuffling so we left it to the default value.

In [9]:
def trainAndEvaluate(
    neuralNetwork,
    numberOfTrainingsToPerform,
    epochsForEachTraining,
    meanSquaredErrors,
    predictors,
    target
):
    print("Training started...\n")
    for trainingNumber in range(numberOfTrainingsToPerform):
        
        X_train, X_test, y_train, y_test = train_test_split(predictors, target, test_size=0.30)
    
        neuralNetwork.train(X_train, y_train, epochs=epochsForEachTraining)
    
        predictions = neuralNetwork.predict(X_test)
    
        meanSquaredError = mean_squared_error(y_test, predictions)
        meanSquaredErrors.append(meanSquaredError)

    print("Training done!\n")

In [10]:
meanSquaredErrors1 = []

trainAndEvaluate(
    neuralNetwork=neuralNetwork1,
    numberOfTrainingsToPerform=50,
    epochsForEachTraining=50,
    meanSquaredErrors=meanSquaredErrors1,
    predictors=X,
    target=y
)

meanOfMeanSquaredErrors1 = np.array(meanSquaredErrors1).mean()
standardDeviationOfMeanSquaredErrors1 = np.array(meanSquaredErrors1).std()

print("Mean: {:.2f}\n".format(meanOfMeanSquaredErrors1))
print("Standard deviation: {:.2f}\n".format(standardDeviationOfMeanSquaredErrors1))

Training started...

Training done!

Mean: 69.34

Standard deviation: 29.89



### Data normalization experiment

Let's repeat the same process, but this time, we will normalize the data first.

In [11]:
normalizedX = (X - X.mean()) / X.std()
normalizedX.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,0.862735,-1.217079,-0.279597
1,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,1.055651,-1.217079,-0.279597
2,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,3.55134
3,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,5.055221
4,-0.790075,0.678079,-0.846733,0.488555,-1.038638,0.070492,0.647569,4.976069


Now that our inputs are normalized, we will undo the learning done previously and train it again but this time, with normalized data.

Then we will compare the performance of our model with and without data normalization.

In [None]:
neuralNetwork1.undoLearning()

meanSquaredErrors2 = []

trainAndEvaluate(
    neuralNetwork=neuralNetwork1,
    numberOfTrainingsToPerform=50,
    epochsForEachTraining=50,
    meanSquaredErrors=meanSquaredErrors2,
    predictors=normalizedX,
    target=y
)

meanOfMeanSquaredErrors2 = np.array(meanSquaredErrors2).mean()
standardDeviationOfMeanSquaredErrors2 = np.array(meanSquaredErrors2).std()

print("Mean without data normalization: {:.2f}\n".format(meanOfMeanSquaredErrors1))
print("Mean with data normalization: {:.2f}\n".format(meanOfMeanSquaredErrors2))

print("Standard deviation without data normalization: {:.2f}\n".format(standardDeviationOfMeanSquaredErrors1))
print("Standard deviation with data normalization: {:.2f}\n".format(standardDeviationOfMeanSquaredErrors2))

Training started...



#### Results breakdown
We can see that data normalization yielded gains in our model performance as the mean of the mean squared errors is a smaller with data normalization.

### Number of epochs experiment
Let's undo the learning of our model again and train it again with normalized data but this time, we will try **100 epochs** for training.

Then let's compare the performance we get.

In [14]:
neuralNetwork1.undoLearning()

meanSquaredErrors3 = []

trainAndEvaluate(
    neuralNetwork=neuralNetwork1,
    numberOfTrainingsToPerform=50,
    epochsForEachTraining=100,
    meanSquaredErrors=meanSquaredErrors3,
    predictors=normalizedX,
    target=y
)

meanOfMeanSquaredErrors3 = np.array(meanSquaredErrors3).mean()
standardDeviationOfMeanSquaredErrors3 = np.array(meanSquaredErrors3).std()

print("Mean with data normalization (50 epochs): {:.2f}\n".format(meanOfMeanSquaredErrors2))
print("Mean with data normalization (100 epochs): {:.2f}\n".format(meanOfMeanSquaredErrors3))

print("Standard deviation with data normalization (50 epochs): {:.2f}\n".format(standardDeviationOfMeanSquaredErrors2))
print("Standard deviation with data normalization (100 epochs): {:.2f}\n".format(standardDeviationOfMeanSquaredErrors3))

Training started...

Training done!

Mean with data normalization (50 epochs): 50.48

Mean with data normalization (100 epochs): 34.84

Standard deviation with data normalization (50 epochs): 44.82

Standard deviation with data normalization (100 epochs): 20.57



#### Results breakdown
We can see that performing more epochs when training can improve the performance of our model drastically.  
That being said, it does not necessarily mean that doubling the epochs would **always** yield such a big impact, we can only confirm for 50 to 100.  
Overall, we got a mean of the mean squared errors that was lower when training with 100 epochs comapred to 50 epochs.

### Number of hidden layers experiment
Let's now create a neural network with a different architecture.  
In the following implementation, we will create a neural network identical to the previous one but with 3 hidden layers instead of 1.

<img src="https://i.ibb.co/wg7DnZw/Neural-Network-Architecture2-dot.png" alt="Drawing" style="height: 300px;"/>

In [15]:
neuralNetwork2 = NeuralNetwork(
    numberOfInputs=X.shape[1], 
    numberOfHiddenLayers=3,
    numberOfNodesInHiddenLayers=10
)
print(neuralNetwork2)

----------------------------------
Inputs: 8
Hidden layers count: 3
	Hidden layer nodes count: 10
Outputs: 1
----------------------------------


#### Training the neural network
We will use normalized data and 50 epochs for training and compare the results with the one from the initial neural network that had only 1 hidden layer, was using normalized data and 50 epochs as well for training.

In [16]:
meanSquaredErrors4 = []

trainAndEvaluate(
    neuralNetwork=neuralNetwork2,
    numberOfTrainingsToPerform=50,
    epochsForEachTraining=50,
    meanSquaredErrors=meanSquaredErrors4,
    predictors=normalizedX,
    target=y
)

meanOfMeanSquaredErrors4 = np.array(meanSquaredErrors4).mean()
standardDeviationOfMeanSquaredErrors4 = np.array(meanSquaredErrors4).std()

print("Mean (network with 1 hidden layer): {:.2f}\n".format(meanOfMeanSquaredErrors2))
print("Mean (network with 3 hidden layers): {:.2f}\n".format(meanOfMeanSquaredErrors4))

print("Standard deviation (neural network with 1 hidden layer): {:.2f}\n".format(standardDeviationOfMeanSquaredErrors2))
print("Standard deviation (network with 3 hidden layers): {:.2f}\n".format(standardDeviationOfMeanSquaredErrors4))

Training started...

Training done!

Mean neural (network with 1 hidden layer): 50.48

Mean neural (network with 3 hidden layers): 30.99

Standard deviation (neural network with 1 hidden layer): 44.82

Standard deviation (network with 3 hidden layers): 17.72



#### Results breakdown
We can observe that adding more layers can improve the performance of our model.  
At least it was the case here when going from 1 hidden layer to 3.  
This is explained by the lower mean obtained compared to the simpler neural network architecture.  