<a id="item31"></a>

## Download and Clean Dataset

Let's start by importing the <em>pandas</em> and the Numpy libraries.

In [1]:
#PART A

import pandas as pd
import numpy as np

Let's download the data and read it into a <em>pandas</em> dataframe.

In [2]:
concrete_data = pd.read_csv('https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/DL0101EN/labs/data/concrete_data.csv')
concrete_data.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.3


#### Let's check how many data points we have.

In [3]:
concrete_data.shape

(1030, 9)

In [4]:
concrete_data.describe()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
count,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0
mean,281.167864,73.895825,54.18835,181.567282,6.20466,972.918932,773.580485,45.662136,35.817961
std,104.506364,86.279342,63.997004,21.354219,5.973841,77.753954,80.17598,63.169912,16.705742
min,102.0,0.0,0.0,121.8,0.0,801.0,594.0,1.0,2.33
25%,192.375,0.0,0.0,164.9,0.0,932.0,730.95,7.0,23.71
50%,272.9,22.0,0.0,185.0,6.4,968.0,779.5,28.0,34.445
75%,350.0,142.95,118.3,192.0,10.2,1029.4,824.0,56.0,46.135
max,540.0,359.4,200.1,247.0,32.2,1145.0,992.6,365.0,82.6


#### Split data into predictors and target

The target variable in this problem is the concrete sample strength. Therefore, our predictors will be all the other columns.

In [5]:
concrete_data_columns = concrete_data.columns

predictors = concrete_data[concrete_data_columns[concrete_data_columns != 'Strength']] # all columns except Strength
target = concrete_data['Strength'] # Strength column

<a id="item2"></a>

In [6]:
predictors.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360


In [7]:
target.head()

0    79.99
1    61.89
2    40.27
3    41.05
4    44.30
Name: Strength, dtype: float64

In [8]:
n_cols = predictors.shape[1] # number of predictors

<a id="item1"></a>

In [9]:
import keras

Using TensorFlow backend.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])


As you can see, the TensorFlow backend was used to install the Keras library.

Let's import the rest of the packages from the Keras library that we will need to build our regressoin model.

In [10]:
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import regularizers

<a id='item33'></a>

## Build a Neural Network

Let's define a function that defines our regression model for us so that we can conveniently call it to create our model.

In [12]:
# define regression model
def regression_model():
    # create model
    model = Sequential()
    model.add(Dense(10, activation='relu', input_shape=(n_cols,)))
    model.add(Dense(1))
    
    # compile model
    model.compile(optimizer='adam', loss='mean_squared_error')
    return model

<a id="item4"></a>

<a id='item34'></a>

## Train and Test the Network

Let's call the function now to create our model.

In [29]:

from sklearn.model_selection import train_test_split
mse_list = []
for i in range(50):
    #changing split for every iteration
    X_train, X_test, y_train, y_test = train_test_split( predictors, target, test_size=0.3)
    model = regression_model()
    model.fit(X_train, y_train, validation_split=0.3, epochs=50, verbose=0)
    mse = model.evaluate(X_test,y_test)
    #appending mse of each iteration to the list
    mse_list.append(mse)



In [34]:
#Calculating mean and standard deviation of mse list
mean_mse = sum(mse_list) / len(mse_list) 
variance = sum([((x - mean_mse) ** 2) for x in mse_list]) / len(mse_list) 
mse_std_deviation = variance ** 0.5
print("Mean MSE: ",mean_mse)
print("Standard Deviation of MSE:",mse_std_deviation)

Mean MSE:  728.3870615902997
Standard Deviation of MSE: 896.4221980423395


In [None]:
#PART B

In [35]:
predictors_norm = (predictors - predictors.mean()) / predictors.std()
predictors_norm.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,0.862735,-1.217079,-0.279597
1,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,1.055651,-1.217079,-0.279597
2,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,3.55134
3,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,5.055221
4,-0.790075,0.678079,-0.846733,0.488555,-1.038638,0.070492,0.647569,4.976069


In [36]:
mse_list_b = []
for i in range(50):
    #changing split for every iteration
    X_train, X_test, y_train, y_test = train_test_split( predictors_norm, target, test_size=0.3)
    model = regression_model()
    model.fit(X_train, y_train, validation_split=0.3, epochs=50, verbose=0)
    mse = model.evaluate(X_test,y_test)
    #appending mse of each iteration to the list
    mse_list_b.append(mse)



In [37]:
mean_mse_b = sum(mse_list_b) / len(mse_list_b) 
variance_b = sum([((x - mean_mse_b) ** 2) for x in mse_list_b]) / len(mse_list_b) 
mse_std_deviation_b = variance_b ** 0.5
print("Mean : ",mean_mse_b)
print("Standard Deviation :",mse_std_deviation_b)

Mean :  644.7875507678184
Standard Deviation : 153.45514645982544


In [38]:
#PART C

In [39]:
mse_list_c = []
for i in range(50):
    #changing split for every iteration
    X_train, X_test, y_train, y_test = train_test_split( predictors_norm, target, test_size=0.3)
    model = regression_model()
    model.fit(X_train, y_train, validation_split=0.3, epochs=100, verbose=0)
    mse = model.evaluate(X_test,y_test)
    #appending mse of each iteration to the list
    mse_list_c.append(mse)



In [42]:
mean_mse_c = sum(mse_list_c) / len(mse_list_c) 
variance_c = sum([((x - mean_mse_c) ** 2) for x in mse_list_c]) / len(mse_list_c) 
mse_std_deviation_c = variance_c ** 0.5
print("Mean : ",mean_mse_c)
print("Standard Deviation :",mse_std_deviation_c)

Mean :  228.6044069476575
Standard Deviation : 32.202443202848144


In [43]:
#PART D

In [44]:
# define newer regression model
def regression_model():
    # create model
    model = Sequential()
    model.add(Dense(10, activation='relu', input_shape=(n_cols,)))
    model.add(Dense(10, activation='relu'))
    model.add(Dense(10, activation='relu'))
    model.add(Dense(1))
    
    # compile model
    model.compile(optimizer='adam', loss='mean_squared_error')
    return model

In [45]:
mse_list_d = []
for i in range(50):
    #changing split for every iteration
    X_train, X_test, y_train, y_test = train_test_split( predictors_norm, target, test_size=0.3)
    model = regression_model()
    model.fit(X_train, y_train, validation_split=0.3, epochs=50, verbose=0)
    mse = model.evaluate(X_test,y_test)
    #appending mse of each iteration to the list
    mse_list_d.append(mse)



In [46]:
mean_mse_d = sum(mse_list_d) / len(mse_list_d) 
variance_d = sum([((x - mean_mse_d) ** 2) for x in mse_list_d]) / len(mse_list_d) 
mse_std_deviation_d = variance_d ** 0.5
print("Mean : ",mean_mse_d)
print("Standard Deviation :",mse_std_deviation_d)

Mean :  150.94560288315066
Standard Deviation : 14.873075851501572


<hr>

Copyright &copy; 2019 [IBM Developer Skills Network](https://cognitiveclass.ai/?utm_source=bducopyrightlink&utm_medium=dswb&utm_campaign=bdu). This notebook and its source code are released under the terms of the [MIT License](https://bigdatauniversity.com/mit-license/).