# Regression Model in Keras Assignment A

## Download and Clean Dataset

In [1]:
import pandas as pd
import numpy as np

Let's download the data and read it into a <em>pandas</em> dataframe.

In [2]:
c_data = pd.read_csv('https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/DL0101EN/labs/data/concrete_data.csv')
c_data.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.3


Check how many data points we have.

In [3]:
c_data.shape

(1030, 9)

Let's check the dataset for any missing values.

In [4]:
c_data.isnull().sum()

Cement                0
Blast Furnace Slag    0
Fly Ash               0
Water                 0
Superplasticizer      0
Coarse Aggregate      0
Fine Aggregate        0
Age                   0
Strength              0
dtype: int64

As you can see, there is not any null value.

### Split data into predictors and target

In [5]:
c_data_columns = c_data.columns

predictors = c_data[c_data_columns[c_data_columns != 'Strength']] # all columns except Strength
target = c_data['Strength'] # Strength column

Let's check predictors and target variables.

In [6]:
predictors.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360


In [7]:
target.head()

0    79.99
1    61.89
2    40.27
3    41.05
4    44.30
Name: Strength, dtype: float64

### In step A, we don't want to use normalized data.

Let's save the number of predictors to n_cols since we will need this number when building our network.

In [8]:
n_cols = predictors.shape[1] # number of predictors
n_cols

8

Let's import necessary sklearn libraries.

In [9]:
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

### Train test split

In [10]:
X_train, X_test, y_train, y_test = train_test_split(predictors, target, test_size=0.3, random_state=42)

## Import Keras

In [11]:
import keras

Using TensorFlow backend.


Necessary libraries to build our regression model.

In [12]:
from keras.models import Sequential
from keras.layers import Dense

## Build a Neural Network

In [13]:
# define regression model
def regression_model():
    # create model
    model = Sequential()
    model.add(Dense(10, activation='relu', input_shape=(n_cols,)))
    model.add(Dense(1))
    
    # compile model
    model.compile(optimizer='adam', loss='mean_squared_error')
    return model

The above function creates a model that has one hidden layer with 10 hidden units.

## Train and Test the Network

In [14]:
# build the model
model = regression_model()

# fit the model
mean_squared_error_list = []
for i in range(50):
    model.fit(predictors, target, validation_split=0.3, epochs=50, verbose=2)
    mean_squared_error_list.append(mean_squared_error(y_test, model.predict_on_batch(X_test)))

Train on 721 samples, validate on 309 samples
Epoch 1/50
 - 0s - loss: 4480.7651 - val_loss: 3075.3609
Epoch 2/50
 - 0s - loss: 3446.0284 - val_loss: 2408.7937
Epoch 3/50
 - 0s - loss: 2566.6976 - val_loss: 1859.6284
Epoch 4/50
 - 0s - loss: 1921.0122 - val_loss: 1461.6833
Epoch 5/50
 - 0s - loss: 1434.2797 - val_loss: 1073.6497
Epoch 6/50
 - 0s - loss: 1050.9395 - val_loss: 800.7287
Epoch 7/50
 - 0s - loss: 772.2829 - val_loss: 633.0229
Epoch 8/50
 - 0s - loss: 577.9711 - val_loss: 484.7680
Epoch 9/50
 - 0s - loss: 449.2090 - val_loss: 390.4287
Epoch 10/50
 - 0s - loss: 354.3859 - val_loss: 390.6742
Epoch 11/50
 - 0s - loss: 300.9363 - val_loss: 320.7981
Epoch 12/50
 - 0s - loss: 274.4159 - val_loss: 248.4270
Epoch 13/50
 - 0s - loss: 240.2498 - val_loss: 205.7630
Epoch 14/50
 - 0s - loss: 212.3716 - val_loss: 196.2661
Epoch 15/50
 - 0s - loss: 195.8518 - val_loss: 204.7303
Epoch 16/50
 - 0s - loss: 187.3188 - val_loss: 190.0569
Epoch 17/50
 - 0s - loss: 174.1237 - val_loss: 152.9456


In [15]:
# evaluate the model
model.evaluate(X_test, y_test)



59.30616782475444

### Mean squared values for train and test

In [16]:
mean_squared_error_list50 = np.asarray(mean_squared_error_list)

In [17]:
model.metrics_names

['loss']

In [None]:
mean_squared_error_50.mean()

In [None]:
mean_squared_error_50.std()