# Introduction to Deep Learning & Neural Networks w/ Keras 
# - Final Assignment

In this markdown file, I will build a regression model through a neural network. The basis of the exercise is a construction data set obtained from cocl. In the exercise, I will attempt to predict the concrete strength as accurately as possible. The measure for accuracy will be the mean-squared error. 

I will use the Keras library to compute the artificial neural network and exectute the training, testing and evaluation process. 

By doing so, I will indicate each specific step and follow the respective way: 

1. Loading the data
2. Transforming predictor and dependent variable (indicator) data 
3. Creating an artificial neural network function according to the prerequisites given 
4. Run the data on the neural network 
5. Evaluate the accuracy of the model
6. Re-run the model on the data to improve its accuracy and display the evolution

## A: Build a baseline model

### 1. Loading the data

In [2]:
import numpy as np
import pandas as pd
import keras
from keras.models import Sequential
from keras.layers import Dense

In [4]:
concrete_data = pd.read_csv('https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/DL0101EN/labs/data/concrete_data.csv')
concrete_data.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.3


### 2. Transforming predictor and dependent variable (indicator) data

In [5]:
concrete_data_col = concrete_data.columns

In [6]:
predictors = concrete_data[concrete_data_col[concrete_data_col != "Strength"]]
indicator = concrete_data[concrete_data_col[concrete_data_col == "Strength"]]

In [7]:
# Normalizing the predictors: 

n_cols = predictors.shape[1]

# Train-Test Splitting the data: 

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(predictors, indicator, test_size=0.3, random_state=4)

### 3. Neural Network Function

In [17]:
rows = []
for i in range(1,51):
    def regression_model(): 
        model = Sequential()
        model.add(Dense(10, activation = "relu", input_shape = (n_cols, )))
        model.add(Dense(1))

        model.compile(optimizer='adam', loss='mean_squared_error')
        return model

    model = regression_model()
    model.fit(predictors, indicator, validation_data=(X_test, y_test), epochs = 50, verbose = 0)
    scores = model.evaluate(X_test, y_test, verbose = 1)
    rows.append(scores)



In [18]:
rows = pd.DataFrame(rows)
print("Mean Value: ", rows.mean(), "\n" "Standard Deviation Value: ", rows.std())

Mean Value:  0    305.423741
dtype: float64 
Standard Deviation Value:  0    349.409952
dtype: float64


### B: Normalizing the data

Here, we can pretty much do everything as we did before, but we normalize the data: 

In [8]:
predictors_norm = (predictors - predictors.mean()) / predictors.std()
indicator_norm = (indicator - indicator.mean()) / indicator.std()
n_cols = n_cols = predictors_norm.shape[1]

X_train, X_test, y_train, y_test = train_test_split(predictors_norm, indicator_norm, test_size=0.3, random_state=4)

In [32]:
rows_B = []
for i in range(1,51):
    def regression_model(): 
        model = Sequential()
        model.add(Dense(10, activation = "relu", input_shape = (n_cols, )))
        model.add(Dense(1))

        model.compile(optimizer='adam', loss='mean_squared_error')
        return model

    model = regression_model()
    model.fit(predictors_norm, indicator, validation_data=(X_test, y_test), epochs = 50, verbose = 0)
    scores = model.evaluate(X_test, y_test, verbose = 1)
    rows_B.append(scores)



In [33]:
rows_B = pd.DataFrame(rows_B)
print("Mean Value: ", rows_B.mean(), "\n" "Standard Deviation Value: ", rows_B.std())

Mean Value:  0    1100.6742
dtype: float64 
Standard Deviation Value:  0    152.84856
dtype: float64


In [34]:
# Differences between A and B: 

print("Mean Value Difference: ", (rows_B.mean() - rows.mean()), "\n" "Standard Deviation Value: ", (rows_B.std() - rows.std()))


Mean Value Difference:  0    795.25046
dtype: float64 
Standard Deviation Value:  0   -196.561392
dtype: float64


By taking normalized values for our predictor variables only, we can see that the new mean is 1100.6742, which is an increase of 795.25 compared to the mean value for our first MSE. 

## C: Increasing the number of epochs

In [12]:
rows_C = []
for i in range(1,51):
    def regression_model(): 
        model = Sequential()
        model.add(Dense(10, activation = "relu", input_shape = (n_cols, )))
        model.add(Dense(1))

        model.compile(optimizer='adam', loss='mean_squared_error')
        return model

    model = regression_model()
    model.fit(predictors_norm, indicator, validation_data=(X_test, y_test), epochs = 100, verbose = 0)
    scores = model.evaluate(X_test, y_test, verbose = 1)
    rows_C.append(scores)



In [13]:
rows_C = pd.DataFrame(rows_C)
print("Mean Value: ", rows_C.mean(), "\n" "Standard Deviation Value: ", rows_C.std())

Mean Value:  0    1367.37397
dtype: float64 
Standard Deviation Value:  0    25.486814
dtype: float64


In [40]:
# Differences between B and C: 

print("Mean Value Difference: ", (rows_C.mean() - rows_B.mean()), "\n" "Standard Deviation Value: ", (rows_C.std() - rows_B.std()))


Mean Value Difference:  0    269.044261
dtype: float64 
Standard Deviation Value:  0   -134.208558
dtype: float64


By taking 100 epochs instead of the 50 in the previous exerccise, we can see that the new mean is 1369.72, which is an increase of 269.04 compared to the mean value for our second MSE. 

## D: Increasing the number of hidden layersIncrease the number of hidden layers

In [9]:
rows_D = []
for i in range(1,51):
    def regression_model(): 
        model = Sequential()
        model.add(Dense(10, activation = "relu", input_shape = (n_cols, )))
        model.add(Dense(10, activation = "relu", input_shape = (n_cols, )))
        model.add(Dense(10, activation = "relu", input_shape = (n_cols, )))
        model.add(Dense(1))

        model.compile(optimizer='adam', loss='mean_squared_error')
        return model

    model = regression_model()
    model.fit(predictors_norm, indicator, validation_data=(X_test, y_test), epochs = 100, verbose = 0)
    scores = model.evaluate(X_test, y_test, verbose = 1)
    rows_D.append(scores)



In [10]:
rows_D = pd.DataFrame(rows_D)
print("Mean Value: ", rows_D.mean(), "\n" "Standard Deviation Value: ", rows_D.std())

Mean Value:  0    1439.214116
dtype: float64 
Standard Deviation Value:  0    29.294346
dtype: float64


In [14]:
# Differences between B and C: 

print("Mean Value Difference: ", (rows_D.mean() - rows_C.mean()), "\n" "Standard Deviation Value: ", (rows_D.std() - rows_C.std()))



Mean Value Difference:  0    71.840146
dtype: float64 
Standard Deviation Value:  0    3.807532
dtype: float64


By taking three artificial neural nets compared to only one in the previous exercise, we can see that the new mean is 1439.21, which is an increase of 71.84 compared to the mean value for our second MSE. 