# Keras Regression Model

Peer-graded assignment for the [Introduction to Deep Learning & Neural Networks with Keras](https://www.coursera.org/learn/introduction-to-deep-learning-with-keras/home/week/5) course.

Note that this notebook runs with tensorflow 2.3


## Data import and preparation

In [1]:
import pandas as pd
import numpy as np

In [2]:
df_concrete = pd.read_csv('https://cocl.us/concrete_data')
df_concrete.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.3


In [3]:
df_concrete.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1030 entries, 0 to 1029
Data columns (total 9 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   Cement              1030 non-null   float64
 1   Blast Furnace Slag  1030 non-null   float64
 2   Fly Ash             1030 non-null   float64
 3   Water               1030 non-null   float64
 4   Superplasticizer    1030 non-null   float64
 5   Coarse Aggregate    1030 non-null   float64
 6   Fine Aggregate      1030 non-null   float64
 7   Age                 1030 non-null   int64  
 8   Strength            1030 non-null   float64
dtypes: float64(8), int64(1)
memory usage: 72.5 KB


In [4]:
features = df_concrete.drop(columns='Strength')
n_features = features.shape[1]
target = df_concrete['Strength']

## A. Build a baseline model (5 marks)

In [5]:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error 

In [6]:
# define regression model
def regression_model(n_hidden_layers: int = 1):
    model = Sequential()
    model.add(Dense(10, activation='relu', input_shape=(n_features,)))
    
    # Add additional hidden layers, if more than one is specified 
    for i in range(n_hidden_layers - 1):
        model.add(Dense(10, activation='relu'))
        
    model.add(Dense(1))
    
    model.compile(optimizer='adam', loss='mean_squared_error')
    
    return model

In [7]:
model = regression_model()
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense (Dense)                (None, 10)                90        
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 11        
Total params: 101
Trainable params: 101
Non-trainable params: 0
_________________________________________________________________


In [8]:
model = regression_model()

# Train test split with 30% test data.
X_train, X_test, y_train, y_test = train_test_split(features, target, test_size=0.3)

# Train model with 50 epochs
model.fit(X_train, y_train, epochs=50, verbose=2)

mean_squared_error(y_test, model.predict(X_test))

Epoch 1/50
23/23 - 0s - loss: 303813.4062
Epoch 2/50
23/23 - 0s - loss: 173520.1562
Epoch 3/50
23/23 - 0s - loss: 93436.1172
Epoch 4/50
23/23 - 0s - loss: 47445.4805
Epoch 5/50
23/23 - 0s - loss: 22614.1250
Epoch 6/50
23/23 - 0s - loss: 10237.2021
Epoch 7/50
23/23 - 0s - loss: 4764.8789
Epoch 8/50
23/23 - 0s - loss: 2715.3352
Epoch 9/50
23/23 - 0s - loss: 2046.6160
Epoch 10/50
23/23 - 0s - loss: 1880.5614
Epoch 11/50
23/23 - 0s - loss: 1817.9279
Epoch 12/50
23/23 - 0s - loss: 1793.9312
Epoch 13/50
23/23 - 0s - loss: 1770.8691
Epoch 14/50
23/23 - 0s - loss: 1749.5392
Epoch 15/50
23/23 - 0s - loss: 1728.9260
Epoch 16/50
23/23 - 0s - loss: 1706.2247
Epoch 17/50
23/23 - 0s - loss: 1683.9493
Epoch 18/50
23/23 - 0s - loss: 1662.6371
Epoch 19/50
23/23 - 0s - loss: 1640.2822
Epoch 20/50
23/23 - 0s - loss: 1617.8794
Epoch 21/50
23/23 - 0s - loss: 1595.0896
Epoch 22/50
23/23 - 0s - loss: 1574.2621
Epoch 23/50
23/23 - 0s - loss: 1551.4760
Epoch 24/50
23/23 - 0s - loss: 1528.2744
Epoch 25/50
23/23

1052.5276830333326

In [27]:
n_trainings = 50
def run_trainings(X: pd.DataFrame, y: pd.Series, n_epochs: int, n_hidden_layers: int = 1) -> list:
    """
    Run 50 trainings and return the mean squared errors as evaluated on 30% test data.
    """
    errors = np.zeros(n_trainings)
    for i in range(n_trainings):
        # Create new model.
        model = regression_model(n_hidden_layers)

        # Train test split with 30% test data.
        X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

        # Train model
        model.fit(X_train, y_train, epochs=n_epochs, verbose=0)

        errors[i] = mean_squared_error(y_test, model.predict(X_test))
    
    # Print model structure
    print(model.summary())
    
    return errors
   

In [10]:
def print_error_stats(errors: list):   
    print(f'Mean of mean squared errors: {np.mean(errors)}')
    print(f'Standard deviation of mean squared errors: {np.std(errors)}')

In [11]:
n_epochs = 50
errors = run_trainings(features, target, n_epochs)

Model: "sequential_51"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_102 (Dense)            (None, 10)                90        
_________________________________________________________________
dense_103 (Dense)            (None, 1)                 11        
Total params: 101
Trainable params: 101
Non-trainable params: 0
_________________________________________________________________
None


In [12]:
print('Using raw features and 50 trainings:')
print_error_stats(errors)

Using raw features and 50 trainings:
Mean of mean squared errors: 374.8981800878648
Standard deviation of mean squared errors: 382.33207659041676


## B. Normalize the data

In [13]:
# Normalize the data by substracting the mean and dividing by the standard deviation.
features_norm = (features - features.mean()) / features.std()
features_norm.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,0.862735,-1.217079,-0.279597
1,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,1.055651,-1.217079,-0.279597
2,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,3.55134
3,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,5.055221
4,-0.790075,0.678079,-0.846733,0.488555,-1.038638,0.070492,0.647569,4.976069


In [14]:
# Run trainings with normalized features
n_epochs = 50
errors_norm = run_trainings(features_norm, target, n_epochs)

Model: "sequential_101"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_202 (Dense)            (None, 10)                90        
_________________________________________________________________
dense_203 (Dense)            (None, 1)                 11        
Total params: 101
Trainable params: 101
Non-trainable params: 0
_________________________________________________________________
None


In [15]:
print('Using normalized features:')
print_error_stats(errors_norm)

Using normalized features:
Mean of mean squared errors: 389.36165017193105
Standard deviation of mean squared errors: 135.65472909413674


**How does the mean of the mean squared errors compare to that from Step A?**

Answer: The mean error is about the same when using normalized data, but the standard deviation is much lower. The mean error highly depends on random chance, even when using 50 trainings (train test split randomness and tensorflow model weight initialization). Results tend to be more stable when using normalized features.

## C. Increase the number of epochs

In [16]:
n_epochs = 100
errors_100_epochs = run_trainings(features_norm, target, n_epochs)

Model: "sequential_151"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_302 (Dense)            (None, 10)                90        
_________________________________________________________________
dense_303 (Dense)            (None, 1)                 11        
Total params: 101
Trainable params: 101
Non-trainable params: 0
_________________________________________________________________
None


In [17]:
print('Using 100 epochs and normalized features:')
print_error_stats(errors_100_epochs)

Using 100 epochs and normalized features:
Mean of mean squared errors: 167.45088965122676
Standard deviation of mean squared errors: 23.79470485361998


**How does the mean of the mean squared errors compare to that from Step B?**

Answer: Increasing the number of epochs from 50 to 100 improves the models significantly. The mean error is a lot smaller, and the standard deviation as well.

## D. Increase the number of hidden layers

In [18]:
n_epochs = 50
n_hidden_layers = 3
errors_3_layers = run_trainings(features_norm, target, n_epochs, n_hidden_layers)

Model: "sequential_201"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_500 (Dense)            (None, 10)                90        
_________________________________________________________________
dense_501 (Dense)            (None, 10)                110       
_________________________________________________________________
dense_502 (Dense)            (None, 10)                110       
_________________________________________________________________
dense_503 (Dense)            (None, 1)                 11        
Total params: 321
Trainable params: 321
Non-trainable params: 0
_________________________________________________________________
None


In [19]:
print('Using 100 epochs and normalized features:')
print_error_stats(errors_3_layers)

Using 100 epochs and normalized features:
Mean of mean squared errors: 128.76145924142867
Standard deviation of mean squared errors: 15.15424505084721


**How does the mean of the mean squared errors compare to that from Step B?**

Answer: Increasing the number of hidden layers from 1 to 3 improves the models significantly. The mean error is a lot smaller, and the standard deviation as well.