# Peer-graded Assignment: Build a Regression Model in Keras
## Activities:
Use the Keras library to build a neural network with the following:

- One hidden layer of 10 nodes, and a ReLU activation function

- Use the adam optimizer and the mean squared error  as the loss function.

1. Randomly split the data into a training and test sets by holding 30% of the data for testing. You can use the
train_test_split
helper function from Scikit-learn.

2. Train the model on the training data using 50 epochs.

3. Evaluate the model on the test data and compute the mean squared error between the predicted concrete strength and the actual concrete strength. You can use the mean_squared_error function from Scikit-learn.

4. Repeat steps 1 - 3, 50 times, i.e., create a list of 50 mean squared errors.

5. Report the mean and the standard deviation of the mean squared errors.


### 1. Import Libraries

In [1]:
# Import libs
import numpy as np
import pandas as pd

### 2. Load the Dataset

In [2]:
# Load the dataset in a pandas dataframe
data_file_path = 'concrete_data.csv'
data = pd.read_csv(data_file_path)

In [6]:
data.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.3


### 3. Data Overview and Summary Statistics

In [3]:
# Check basic information and first few rows
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1030 entries, 0 to 1029
Data columns (total 9 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   Cement              1030 non-null   float64
 1   Blast Furnace Slag  1030 non-null   float64
 2   Fly Ash             1030 non-null   float64
 3   Water               1030 non-null   float64
 4   Superplasticizer    1030 non-null   float64
 5   Coarse Aggregate    1030 non-null   float64
 6   Fine Aggregate      1030 non-null   float64
 7   Age                 1030 non-null   int64  
 8   Strength            1030 non-null   float64
dtypes: float64(8), int64(1)
memory usage: 72.6 KB


In [5]:
# Summary statistics to check basic stats
data.describe()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
count,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0
mean,281.167864,73.895825,54.18835,181.567282,6.20466,972.918932,773.580485,45.662136,35.817961
std,104.506364,86.279342,63.997004,21.354219,5.973841,77.753954,80.17598,63.169912,16.705742
min,102.0,0.0,0.0,121.8,0.0,801.0,594.0,1.0,2.33
25%,192.375,0.0,0.0,164.9,0.0,932.0,730.95,7.0,23.71
50%,272.9,22.0,0.0,185.0,6.4,968.0,779.5,28.0,34.445
75%,350.0,142.95,118.3,192.0,10.2,1029.4,824.0,56.0,46.135
max,540.0,359.4,200.1,247.0,32.2,1145.0,992.6,365.0,82.6


In [9]:
# To check for any missing data within the records
data.isnull().sum()

Unnamed: 0,0
Cement,0
Blast Furnace Slag,0
Fly Ash,0
Water,0
Superplasticizer,0
Coarse Aggregate,0
Fine Aggregate,0
Age,0
Strength,0


The data looks very clean and is ready to be used to build our model.

### 4. Split the dataset into Predictors and Target
The target variable in this problem is the concrete sample strength. Therefore, our predictors will be rest of the columns within our dataset.



In [15]:
data_columns = data.columns
predictors = data[data_columns[data_columns !=  'Strength']] # all columns except the Strength
target = data['Strength'] # target Strength column

print("The shape of predictors dataframe:", predictors.shape)

The shape of predictors dataframe: (1030, 8)


In [12]:
predictors.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360


In [13]:
target.head()

Unnamed: 0,Strength
0,79.99
1,61.89
2,40.27
3,41.05
4,44.3


Finally, the last step is to normalize the data by substracting the mean and dividing by the standard deviation.


In [20]:
predictors_norm = (predictors - predictors.mean()) / predictors.std()
predictors_norm.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,0.862735,-1.217079,-0.279597
1,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,1.055651,-1.217079,-0.279597
2,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,3.55134
3,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,5.055221
4,-0.790075,0.678079,-0.846733,0.488555,-1.038638,0.070492,0.647569,4.976069


In [32]:
# Number input columns to the model predictors
n_cols = predictors.shape[1]
n_cols

8

### 5. Build Regression Model Using Keras
- Import Keras and packages to build linear model
- One hidden layer of 10 nodes, and a ReLU activation function
- Use the adam optimizer and the mean squared error as the loss function.

In [24]:
# Import keras and essentail packages for developing the linear model
import keras
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Input

In [33]:
# build regression model
def regression_model():

  # create model
  model = Sequential()

  # add layers to the model
  model.add(Input(shape=(n_cols,)))
  model.add(Dense(10, activation='relu'))
  model.add(Dense(1))

  # compile model
  model.compile(optimizer='adam', loss='mean_squared_error')
  return model

### 6. Split the Train and Test Dataset
- Randomly split the data into a training and test sets by holding 30% of the data for testing.

In [22]:
# import scikit-learn in order to randomly split the data into a training and test sets
from sklearn.model_selection import train_test_split

# split the training and test data
X_train, X_test, y_train, y_test = train_test_split(predictors, target, test_size=0.3, random_state=42)

### 7. Train and Test the Network
- Train the model on the training data using 50 epochs

In [34]:
# build the model
model = regression_model()

In [35]:
# fit the model
model.fit(predictors_norm, target, validation_split=0.3, epochs=50, verbose=2)

Epoch 1/50
23/23 - 2s - 78ms/step - loss: 1670.8003 - val_loss: 1171.5165
Epoch 2/50
23/23 - 0s - 17ms/step - loss: 1654.5127 - val_loss: 1159.6831
Epoch 3/50
23/23 - 0s - 6ms/step - loss: 1637.9960 - val_loss: 1147.3541
Epoch 4/50
23/23 - 0s - 6ms/step - loss: 1620.3817 - val_loss: 1134.3417
Epoch 5/50
23/23 - 0s - 6ms/step - loss: 1601.8138 - val_loss: 1120.3435
Epoch 6/50
23/23 - 0s - 7ms/step - loss: 1581.7870 - val_loss: 1105.6458
Epoch 7/50
23/23 - 0s - 12ms/step - loss: 1560.6403 - val_loss: 1090.1735
Epoch 8/50
23/23 - 0s - 6ms/step - loss: 1537.8423 - val_loss: 1073.4569
Epoch 9/50
23/23 - 0s - 6ms/step - loss: 1513.1927 - val_loss: 1055.9915
Epoch 10/50
23/23 - 0s - 6ms/step - loss: 1487.4213 - val_loss: 1037.3048
Epoch 11/50
23/23 - 0s - 6ms/step - loss: 1459.7382 - val_loss: 1017.6387
Epoch 12/50
23/23 - 0s - 6ms/step - loss: 1430.5392 - val_loss: 997.2557
Epoch 13/50
23/23 - 0s - 7ms/step - loss: 1399.6505 - val_loss: 976.4081
Epoch 14/50
23/23 - 0s - 6ms/step - loss: 1367

<keras.src.callbacks.history.History at 0x7a4fd1b7a6d0>