# Regression Models with Keras

### Instrucctions

A. Build a baseline model (5 marks)

Use the Keras library to build a neural network with the following:

- One hidden layer of 10 nodes, and a ReLU activation function

- Use the adam optimizer and the mean squared error  as the loss function.

1. Randomly split the data into a training and test sets by holding 30% of the data for testing. You can use the
train_test_split
helper function from Scikit-learn.

2. Train the model on the training data using 50 epochs.

3. Evaluate the model on the test data and compute the mean squared error between the predicted concrete strength and the actual concrete strength. You can use the mean_squared_error function from Scikit-learn.

4. Repeat steps 1 - 3, 50 times, i.e., create a list of 50 mean squared errors.

5. Report the mean and the standard deviation of the mean squared errors.

Submit your Jupyter Notebook with your code and comments.

B. Normalize the data (5 marks)

Repeat Part A but use a normalized version of the data. Recall that one way to normalize the data is by subtracting the mean from the individual predictors and dividing by the standard deviation.

How does the mean of the mean squared errors compare to that from Step A?

### Installing and importing

In [None]:
# installing scikit-learn if not installed
# !pip show scikit-learn

In [None]:
# importing all the libraries needed
import pandas as pd
import numpy as np
import sklearn

## Let's download the data and read it into a pandas dataframe.

In [None]:
concrete_data = pd.read_csv('/content/drive/MyDrive/IBM_AI_Engineering_Professional_Certificate/Course 2 Introduction to Deep Learning & Neural Networks with Keras/Course_Project/concrete_data.csv')
concrete_data.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.3


In [None]:
concrete_data.describe()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
count,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0
mean,281.167864,73.895825,54.18835,181.567282,6.20466,972.918932,773.580485,45.662136,35.817961
std,104.506364,86.279342,63.997004,21.354219,5.973841,77.753954,80.17598,63.169912,16.705742
min,102.0,0.0,0.0,121.8,0.0,801.0,594.0,1.0,2.33
25%,192.375,0.0,0.0,164.9,0.0,932.0,730.95,7.0,23.71
50%,272.9,22.0,0.0,185.0,6.4,968.0,779.5,28.0,34.445
75%,350.0,142.95,118.3,192.0,10.2,1029.4,824.0,56.0,46.135
max,540.0,359.4,200.1,247.0,32.2,1145.0,992.6,365.0,82.6


In [None]:
concrete_data.isnull().sum()

Cement                0
Blast Furnace Slag    0
Fly Ash               0
Water                 0
Superplasticizer      0
Coarse Aggregate      0
Fine Aggregate        0
Age                   0
Strength              0
dtype: int64

In [None]:
concrete_data.shape

(1030, 9)

## 1. Randomly split the data into a training and test sets by holding 30% of the data for testing. You can use the train_test_split helper function from Scikit-learn.

***Split data into training and test sets***

In [None]:
import sklearn.model_selection
from sklearn.model_selection import train_test_split
def split_train_test(concrete_data):
  train_data, test_data = train_test_split(concrete_data, test_size=0.3, train_size=0.7, random_state=42, shuffle=True, stratify=None)
  # print(f"Training dataset is 70% = {train_data.shape}")
  # print(f"Training dataset is 30% = {test_data.shape}")
  return train_data, test_data

***Split data into predictors and target***

The target variable in this problem is the concrete sample strength and the predictors are all the other columns except age.

In [None]:
concrete_data_columns = concrete_data.columns
def split_pred_target(train_data, test_data):
  predictors_train  = train_data[concrete_data_columns[(concrete_data_columns != 'Strength') & (concrete_data_columns != 'Age')]] # all columns except Strength and Age
  target_train = train_data['Strength'] # Strength column
  # predictors_train.head()
  # target_train.head()
  predictors_test = test_data[concrete_data_columns[(concrete_data_columns != 'Strength') & (concrete_data_columns != 'Age')]] # all columns except Strength and Age
  target_test = test_data['Strength'] # Strength column
  # predictors_test.head()
  # target_test.head()
  return (predictors_train, target_train, predictors_test, target_test )


***Normalize the data***

In [None]:
def normalize(x):
  x_norm = (x - x.mean()) / x.std()
  return x_norm


def normalize_data(predictors_train, target_train, predictors_test, target_test):
  predictors_train_norm = normalize(predictors_train)
  # predictors_train_norm.head()
  target_train_norm = normalize(target_train)
  # target_train_norm.head()
  predictors_test_norm = normalize(predictors_test)
  # predictors_test_norm.head()
  target_test_norm = normalize(target_test)
  # target_test_norm.head()
  return (predictors_train_norm, target_train_norm, predictors_test_norm, target_test_norm)

In [None]:
n_cols = 7
# print(n_cols)

## 2. Train the model on the training data using 50 epochs.
## 3. Evaluate the model on the test data and compute the mean squared error between the predicted concrete strength and the actual concrete strength. You can use the mean_squared_error function from Scikit-learn.

***Build a Neural Network***

Import Keras and important modules

In [None]:
from tensorflow import keras
from keras.models import Sequential
from keras.layers import Dense

Define regression model

In [None]:
def regression_model():
    # create model
    model = Sequential()
    model.add(Dense(10, activation='relu', input_shape=(n_cols,)))  # since it has seven(7) predictors
    model.add(Dense(1))

    # compile model
    model.compile(optimizer='adam', loss='mean_squared_error')
    return model

In [None]:
# build the model
model = regression_model()
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense (Dense)               (None, 10)                80        
                                                                 
 dense_1 (Dense)             (None, 1)                 11        
                                                                 
Total params: 91 (364.00 Byte)
Trainable params: 91 (364.00 Byte)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


## 4. Repeat steps 1 - 3, 50 times, i.e., create a list of 50 mean squared errors.

In [None]:
from sklearn.metrics import mean_squared_error
mse = []

for i in range(50):
  train_data, test_data = split_train_test(concrete_data)

  predictors_train, target_train, predictors_test, target_test = split_pred_target(train_data, test_data)

  predictors_train_norm, target_train_norm, predictors_test_norm, target_test_norm = normalize_data(predictors_train, target_train, predictors_test, target_test)

  # fit the model
  model.fit(predictors_train_norm, target_train_norm, validation_split=0.3, epochs=50, verbose=2)
  prediction_test = model.predict(predictors_test_norm, batch_size=None, verbose="auto", steps=None, callbacks=None)

  mse.append(mean_squared_error(target_test_norm, prediction_test))

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
16/16 - 0s - loss: 0.5281 - val_loss: 0.5782 - 65ms/epoch - 4ms/step
Epoch 27/50
16/16 - 0s - loss: 0.5239 - val_loss: 0.5776 - 64ms/epoch - 4ms/step
Epoch 28/50
16/16 - 0s - loss: 0.5200 - val_loss: 0.5759 - 72ms/epoch - 4ms/step
Epoch 29/50
16/16 - 0s - loss: 0.5171 - val_loss: 0.5726 - 71ms/epoch - 4ms/step
Epoch 30/50
16/16 - 0s - loss: 0.5140 - val_loss: 0.5721 - 68ms/epoch - 4ms/step
Epoch 31/50
16/16 - 0s - loss: 0.5098 - val_loss: 0.5705 - 75ms/epoch - 5ms/step
Epoch 32/50
16/16 - 0s - loss: 0.5073 - val_loss: 0.5673 - 77ms/epoch - 5ms/step
Epoch 33/50
16/16 - 0s - loss: 0.5041 - val_loss: 0.5663 - 67ms/epoch - 4ms/step
Epoch 34/50
16/16 - 0s - loss: 0.5012 - val_loss: 0.5629 - 81ms/epoch - 5ms/step
Epoch 35/50
16/16 - 0s - loss: 0.4990 - val_loss: 0.5621 - 64ms/epoch - 4ms/step
Epoch 36/50
16/16 - 0s - loss: 0.4958 - val_loss: 0.5603 - 64ms/epoch - 4ms/step
Epoch 37/50
16/16 - 0s - loss: 0.4930 - val_loss: 0.5562



---



---



In [None]:
print(mse)  # list of 50 mean squared errors
len(mse)

[0.5840221917923275, 0.5899150604831233, 0.5896918557502793, 0.5894860955704747, 0.5887554117935347, 0.5899475079396561, 0.5870670666329406, 0.593686232297959, 0.5964081286064236, 0.5967044009119145, 0.598506243591532, 0.5993958390680247, 0.599948043103538, 0.6034113987841003, 0.6001399978137739, 0.6007501361646378, 0.601786459978713, 0.6040579439695565, 0.6036675839907416, 0.6038971626650422, 0.6028780928135107, 0.6062895429505759, 0.6006314092635295, 0.6019127843414874, 0.6026070836604764, 0.6005458375078272, 0.6081091891586686, 0.595605241491073, 0.5993072857799776, 0.5990654490325503, 0.5996760704841014, 0.5986788690972525, 0.6032886861145274, 0.6036951661946228, 0.604509943210015, 0.6007122066821581, 0.6000091950660997, 0.6009452572807844, 0.5975050974634841, 0.5999987654512314, 0.6025083342504657, 0.599097737750514, 0.5951170724856145, 0.59779445413191, 0.5964862236429846, 0.6011173232679448, 0.5997146751217992, 0.5987637449308417, 0.6008721217423092, 0.603806129394519]


50

## 5. Report the mean and the standard deviation of the mean squared errors.

In [None]:
#  The  mean and the standard deviation of the mean squared errors
mean = sum(mse) / len(mse)
variance = sum([((x - mean) ** 2) for x in mse]) / len(mse)
std = variance ** 0.5

print(f'mean and the standard deviation of the mean squared errors is {mean:.5f} and {std:.5f} respectively.')


mean and the standard deviation of the mean squared errors is 0.59885 and 0.00509 respectively.


**The mean and mean squared error are small numbers compared to Part A.**