# Introduction to Deep Learning & Neural Networks with Keras Course Project
created by Lena Horsley

In this course project, you will build a regression model using the deep learning Keras library, and then you will experiment with increasing the number of training epochs and changing number of hidden layers and you will see how changing these parameters impacts the performance of the model.

### Part B. Normalize the data 

Repeat Part A but use a normalized version of the data. 
Recall that one way to normalize the data is by subtracting the mean from the individual predictors and dividing by the standard deviation.

In [1]:
import pandas as pd
import numpy as np

### Let's look at the data

In [2]:
concrete_data = pd.read_csv('https://cocl.us/concrete_data')
concrete_data.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.3


In [3]:
concrete_data.shape

(1030, 9)

In [4]:
concrete_data.describe()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
count,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0
mean,281.167864,73.895825,54.18835,181.567282,6.20466,972.918932,773.580485,45.662136,35.817961
std,104.506364,86.279342,63.997004,21.354219,5.973841,77.753954,80.17598,63.169912,16.705742
min,102.0,0.0,0.0,121.8,0.0,801.0,594.0,1.0,2.33
25%,192.375,0.0,0.0,164.9,0.0,932.0,730.95,7.0,23.71
50%,272.9,22.0,0.0,185.0,6.4,968.0,779.5,28.0,34.445
75%,350.0,142.95,118.3,192.0,10.2,1029.4,824.0,56.0,46.135
max,540.0,359.4,200.1,247.0,32.2,1145.0,992.6,365.0,82.6


In [5]:
concrete_data.isnull().sum()

Cement                0
Blast Furnace Slag    0
Fly Ash               0
Water                 0
Superplasticizer      0
Coarse Aggregate      0
Fine Aggregate        0
Age                   0
Strength              0
dtype: int64

### Split the data 
Step 1. Create the target and predictor sets

In [6]:
concrete_data_columns = concrete_data.columns

predictors = concrete_data[concrete_data_columns[concrete_data_columns != 'Strength']] # all columns except Strength
target = concrete_data['Strength'] # Strength column

In [7]:
predictors.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360


In [8]:
target.head()

0    79.99
1    61.89
2    40.27
3    41.05
4    44.30
Name: Strength, dtype: float64

In [9]:
predictors_norm = (predictors - predictors.mean()) / predictors.std()
predictors_norm.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,0.862735,-1.217079,-0.279597
1,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,1.055651,-1.217079,-0.279597
2,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,3.55134
3,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,5.055221
4,-0.790075,0.678079,-0.846733,0.488555,-1.038638,0.070492,0.647569,4.976069


In [10]:
n_cols_norm = predictors_norm.shape[1] # number of predictors
n_cols_norm

8

Step 2. Using sklearn.model_selection, split the data into training and test sets

In [11]:
from sklearn.model_selection import train_test_split

X_train_norm, X_test_norm, y_train_norm, y_test_norm = train_test_split(predictors_norm, target, test_size=0.3, random_state=101)

### Build the model

In [12]:
# Importing keras directly didn't work, so I had to do it in a different way.
# Check out the following link:
# https://www.tensorflow.org/guide/keras/sequential_model
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

# define regression model
def regression_model():
    # create model
    model = Sequential()
    model.add(Dense(10, activation='relu', input_shape=(n_cols_norm,)))
    model.add(Dense(1))
    
    # compile model
    model.compile(optimizer='adam', loss='mean_squared_error')
    return model

### Train the model

In [13]:
model_norm = regression_model()
# fit the model
model_norm.fit(X_train_norm, y_train_norm, epochs=50, verbose=1)

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


<tensorflow.python.keras.callbacks.History at 0x7fbe182cf880>

### Evaluate the model

In [14]:
test_value = model_norm.evaluate(X_test_norm, y_test_norm, verbose=0)
y_pred = model_norm.predict(X_test_norm)
print("test value: " + str(test_value))

test value: 326.0127868652344


In [15]:
from sklearn.metrics import mean_squared_error

mean_square_error = mean_squared_error(y_test_norm, y_pred)
mean = np.mean(mean_square_error)
standard_deviation = np.std(mean_square_error)

print("mean: " + str(mean))
print("mean_square_error: " + str(mean_square_error))
print("standard_deviation: "+ str(standard_deviation))

mean: 326.0127799930269
mean_square_error: 326.0127799930269
standard_deviation: 0.0


### Create a list of 50 mean squared errors

In [16]:
total_mean_squared_errors_norm = 50
epochs_norm = 50
mean_squared_errors_norm = []
for i in range(total_mean_squared_errors_norm):
    X_train_norm, X_test_norm, y_train_norm, y_test_norm = train_test_split(predictors_norm, target, test_size=0.3, random_state=101)
    model_norm.fit(X_train_norm, y_train_norm, epochs=epochs_norm, verbose=0)
    y_pred_norm = model_norm.predict(X_test_norm)
    mean_square_error_norm = mean_squared_error(y_test_norm, y_pred_norm)
    print("Mean Square Error #" + str(i + 1) + ": " + str(mean_square_error_norm))
    
    mean_squared_errors_norm.append(mean_square_error_norm)

Mean Square Error #1: 189.94990250663855
Mean Square Error #2: 149.85864396533793
Mean Square Error #3: 105.35208744305633
Mean Square Error #4: 77.91850915540748
Mean Square Error #5: 64.26369742847044
Mean Square Error #6: 57.39356287882025
Mean Square Error #7: 53.72893588160795
Mean Square Error #8: 51.74967153894234
Mean Square Error #9: 50.448611298995104
Mean Square Error #10: 49.691231183009876
Mean Square Error #11: 49.074678526710514
Mean Square Error #12: 48.31351909872083
Mean Square Error #13: 47.91717063642985
Mean Square Error #14: 47.25746937622464
Mean Square Error #15: 46.95845913014506
Mean Square Error #16: 46.82394492709461
Mean Square Error #17: 46.52971300870531
Mean Square Error #18: 46.5364329883702
Mean Square Error #19: 46.22047466785192
Mean Square Error #20: 46.10935936993016
Mean Square Error #21: 46.054180176901404
Mean Square Error #22: 45.60133948086615
Mean Square Error #23: 45.103794283940054
Mean Square Error #24: 44.68559284095342
Mean Square Error 

### Mean and Standard Deviation

In [17]:
mean_squared_errors_norm = np.array(mean_squared_errors_norm)
mean_norm = np.mean(mean_squared_errors_norm)
standard_deviation_norm = np.std(mean_squared_errors_norm)

print("epochs per training: " + str(epochs_norm))
print("Mean: " + str(mean_norm))
print("Standard Deviation: " + str(standard_deviation_norm))

epochs per training: 50
Mean: 52.51014777173482
Standard Deviation: 26.37165345758153


### Discussion
Normalizing the data should speed up learning. Less time was required for learning in Part B when compared to Part A.

### References
- [The Sequential model](https://www.tensorflow.org/guide/keras/sequential_model)
- [Splitting Datasets With the Sklearn train_test_split Function](https://www.bitdegree.org/learn/train-test-split)
- [Why random_state in train_test_split is equal 42](https://www.researchgate.net/post/Why_random_state_in_train_test_split_is_equal_42)
- [Python | Mean Squared Error](https://www.geeksforgeeks.org/python-mean-squared-error/)