# Regression Model in Keras Assignment A

## Download and Clean Dataset

In [2]:
import pandas as pd
import numpy as np

Let's download the data and read it into a <em>pandas</em> dataframe.

In [3]:
c_data = pd.read_csv('https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/DL0101EN/labs/data/concrete_data.csv')
c_data.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.3


Check how many data points we have.

In [4]:
c_data.shape

(1030, 9)

Let's check the dataset for any missing values.

In [5]:
c_data.isnull().sum()

Cement                0
Blast Furnace Slag    0
Fly Ash               0
Water                 0
Superplasticizer      0
Coarse Aggregate      0
Fine Aggregate        0
Age                   0
Strength              0
dtype: int64

As you can see, there is not any null value.

### Split data into predictors and target

In [6]:
c_data_columns = c_data.columns

predictors = c_data[c_data_columns[c_data_columns != 'Strength']] # all columns except Strength
target = c_data['Strength'] # Strength column

Let's check predictors and target variables.

In [7]:
predictors.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360


In [8]:
target.head()

0    79.99
1    61.89
2    40.27
3    41.05
4    44.30
Name: Strength, dtype: float64

### In step A, we don't want to use normalized data.

Let's save the number of predictors to n_cols since we will need this number when building our network.

In [9]:
n_cols = predictors.shape[1] # number of predictors
n_cols

8

Let's import necessary sklearn libraries.

In [10]:
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

### Train test split

In [11]:
X_train, X_test, y_train, y_test = train_test_split(predictors, target, test_size=0.3, random_state=42)

## Import Keras

In [12]:
import keras

Using TensorFlow backend.


Necessary libraries to build our regression model.

In [13]:
from keras.models import Sequential
from keras.layers import Dense

## Build a Neural Network

In [14]:
# define regression model
def regression_model():
    # create model
    model = Sequential()
    model.add(Dense(10, activation='relu', input_shape=(n_cols,)))
    model.add(Dense(1))
    
    # compile model
    model.compile(optimizer='adam', loss='mean_squared_error')
    return model

The above function creates a model that has one hidden layer with 10 hidden units.

## Train and Test the Network

In [15]:
# build the model
model = regression_model()

# fit the model
mean_squared_error_list = []
for i in range(50):
    model.fit(predictors, target, validation_split=0.3, epochs=50, verbose=2)
    mean_squared_error_list.append(mean_squared_error(y_test, model.predict_on_batch(X_test)))

Train on 721 samples, validate on 309 samples
Epoch 1/50
 - 0s - loss: 7674.5222 - val_loss: 4079.4124
Epoch 2/50
 - 0s - loss: 5635.6249 - val_loss: 3033.3437
Epoch 3/50
 - 0s - loss: 4398.4605 - val_loss: 2501.7752
Epoch 4/50
 - 0s - loss: 3499.6441 - val_loss: 2147.0937
Epoch 5/50
 - 0s - loss: 2776.2874 - val_loss: 1862.5343
Epoch 6/50
 - 0s - loss: 2227.0402 - val_loss: 1669.1454
Epoch 7/50
 - 0s - loss: 1813.9871 - val_loss: 1555.4199
Epoch 8/50
 - 0s - loss: 1467.0336 - val_loss: 1461.3246
Epoch 9/50
 - 0s - loss: 1223.6822 - val_loss: 1322.3046
Epoch 10/50
 - 0s - loss: 999.3343 - val_loss: 1251.4988
Epoch 11/50
 - 0s - loss: 849.1528 - val_loss: 1166.2701
Epoch 12/50
 - 0s - loss: 721.2406 - val_loss: 1051.5836
Epoch 13/50
 - 0s - loss: 616.3286 - val_loss: 940.4439
Epoch 14/50
 - 0s - loss: 532.1975 - val_loss: 834.4766
Epoch 15/50
 - 0s - loss: 468.9721 - val_loss: 756.7728
Epoch 16/50
 - 0s - loss: 424.7061 - val_loss: 686.9943
Epoch 17/50
 - 0s - loss: 381.3298 - val_loss:

In [16]:
# evaluate the model
model.evaluate(X_test, y_test)



55.275602309834994

### Mean squared values for train and test

In [20]:
mean_squared_error_list50 = np.asarray(mean_squared_error_list)

In [18]:
model.metrics_names

['loss']

In [23]:
mean_squared_error_list50.mean()

64.19210031317972

In [24]:
mean_squared_error_list50.std()

10.899199053123281