### First load required libraries:

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn import preprocessing
from sklearn.model_selection import train_test_split

from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten, LSTM, TimeDistributed, RepeatVector

Using TensorFlow backend.


### Load CSV File From URL

In [13]:
url=r"https://cocl.us/concrete_data"
concrete_data=pd.read_csv(url)
concrete_data.head(5)

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.3


The dataset is about the compressive strength of different samples of concrete based on the volumes of the different ingredients that were used to make them. Ingredients include:
1. Cement
2. Blast Furnace Slag
3. Fly Ash
4. Water
5. Superplasticizer
6. Coarse Aggregate
7. Fine Aggregate

In [14]:
concrete_data.shape

(1030, 9)

Check each columns datatype of dataset 

In [15]:
concrete_data.dtypes

Cement                float64
Blast Furnace Slag    float64
Fly Ash               float64
Water                 float64
Superplasticizer      float64
Coarse Aggregate      float64
Fine Aggregate        float64
Age                     int64
Strength              float64
dtype: object

Check each columns whether exist missing value of dataset

In [16]:
pd.isna(concrete_data).sum()

Cement                0
Blast Furnace Slag    0
Fly Ash               0
Water                 0
Superplasticizer      0
Coarse Aggregate      0
Fine Aggregate        0
Age                   0
Strength              0
dtype: int64

The data looks no missing valye and is ready to be used to build our model.

In [17]:
X=concrete_data[concrete_data.columns[:8]]
y=pd.DataFrame(concrete_data["Strength"])
Xcol=X.columns
ycol=y.columns

## Normalize Data

Repeat Part A but use a normalized version of the data

In [18]:
stdard_x=preprocessing.StandardScaler()
stdard_y=preprocessing.StandardScaler()
X= stdard_x.fit_transform(X)
X=pd.DataFrame(X,columns=Xcol)
X[0:5]
y= stdard_y.fit_transform(y)
y=pd.DataFrame(y,columns=ycol)
print("Independent variable of dataset:\n {}\n\n".format(X.head()))
print("Dependent variable of dataset:\n {}".format(y.head()))

Independent variable of dataset:
      Cement  Blast Furnace Slag   Fly Ash     Water  Superplasticizer  \
0  2.477915           -0.856888 -0.847144 -0.916764         -0.620448   
1  2.477915           -0.856888 -0.847144 -0.916764         -0.620448   
2  0.491425            0.795526 -0.847144  2.175461         -1.039143   
3  0.491425            0.795526 -0.847144  2.175461         -1.039143   
4 -0.790459            0.678408 -0.847144  0.488793         -1.039143   

   Coarse Aggregate  Fine Aggregate       Age  
0          0.863154       -1.217670 -0.279733  
1          1.056164       -1.217670 -0.279733  
2         -0.526517       -2.240917  3.553066  
3         -0.526517       -2.240917  5.057677  
4          0.070527        0.647884  4.978487  


Dependent variable of dataset:
    Strength
0  2.645408
1  1.561421
2  0.266627
3  0.313340
4  0.507979


1.Use the Keras library to build a neural network with the following:  
2.One hidden layer of 10 nodes, and a ReLU activation function  
3.Use the adam optimizer and the mean squared error  as the loss function.

In [19]:
model=Sequential()
input_number=X.shape[1]
model.add(Dense(units=10, input_shape=(input_number,), activation='relu', kernel_initializer='normal')) 
model.add(Dense(units=1, kernel_initializer='normal')) 

model.compile(loss='mean_squared_error',
              optimizer='adam')

1.Randomly split the data into a training and test sets by holding 30% of the data for testing.

In [20]:
X_train, X_val, y_train, y_val = train_test_split( X, y, test_size=0.3, random_state=1)
print("Traning data sample size:{}".format(X_train.shape[0]))
print("Validation data sample size:{}".format(X_val.shape[0]))

Traning data sample size:721
Validation data sample size:309


2.Train the model on the training data using 100 epochs.

In [21]:
epochs=100
model.fit(X_train,y_train, epochs=epochs, verbose=1)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

<keras.callbacks.callbacks.History at 0x2633333ae48>

3.Evaluate the model on the test data and compute the mean squared error between the predicted concrete strength and the actual concrete strength.

In [22]:
loss_val = model.evaluate(X_val, y_val)
loss_val #mean_squared_error



0.18960762048037694

4.Repeat steps 1 - 3, 50 times.

In [23]:
MSE=[]
repeat=50
epochs=100
for i in range(repeat):
    X_train, X_val, y_train, y_val = train_test_split( X, y, test_size=0.3, random_state=i)
    train_model =model.fit(X_train,y_train, epochs=epochs, batch_size=len(X_train), verbose=0, shuffle=True)
    prediction=model.predict(X_val)
    prediction_inverse=stdard_y.inverse_transform(prediction)
    y_val_inverse=stdard_y.inverse_transform(y_val)
    error=np.array(y_val_inverse).reshape(-1,1)-np.array(prediction_inverse).reshape(-1,1)
    mse=1/len(error)*(error.T.dot(error))
    print('Run{} MSE:{}'.format(i,mse[0][0]))
    MSE.append(mse)

Run0 MSE:42.89967856741234
Run1 MSE:51.9066077760661
Run2 MSE:38.67413809413373
Run3 MSE:43.07804001024621
Run4 MSE:42.634011611565704
Run5 MSE:45.945742798019495
Run6 MSE:47.585577741433355
Run7 MSE:37.49600754185974
Run8 MSE:39.6347826312917
Run9 MSE:43.609061500150254
Run10 MSE:40.474952236486935
Run11 MSE:38.497345530230305
Run12 MSE:46.6572939840996
Run13 MSE:44.84385341627038
Run14 MSE:38.146814100136524
Run15 MSE:35.72545717372704
Run16 MSE:38.24909288654348
Run17 MSE:36.627139943317914
Run18 MSE:38.612888349974384
Run19 MSE:36.5216854290369
Run20 MSE:36.15926625953761
Run21 MSE:35.841610357929305
Run22 MSE:33.958421271077476
Run23 MSE:38.158238986488584
Run24 MSE:39.56037516451183
Run25 MSE:40.25034826921756
Run26 MSE:33.682663961755395
Run27 MSE:32.667560041727526
Run28 MSE:41.6629976228038
Run29 MSE:35.82737944399498
Run30 MSE:36.24988417675795
Run31 MSE:31.144886927026047
Run32 MSE:31.20388078234237
Run33 MSE:35.8958201945533
Run34 MSE:35.05916871268007
Run35 MSE:36.88688650

5.Report the mean and the standard deviation of the mean squared errors.

In [24]:
Mean_MSE=np.mean(MSE)
STD_MSE=np.std(MSE)
print("Below is the mean and standard deviation of {} mean squared errors with normalized data.\n\
Total number of epochs for each training is: {}".format(repeat,epochs))
print("mean of MSE:{}".format(Mean_MSE))
print("Std of MSE:{}".format(STD_MSE))

Below is the mean and standard deviation of 50 mean squared errors with normalized data.
Total number of epochs for each training is: 100
mean of MSE:37.77547370506537
Std of MSE:4.620226182594506


Repeat Part B but use 100 epochs this time for training.  
How does the mean of the mean squared errors compare to that from Step B?

In [29]:
compare_result=pd.DataFrame({"Model":["without_Normalized","with_Normalized","with_Normalized"],
                             "epochs":["50","50","100"],
                             "Mean_MSE":[110.004,39.877,37.775],
                             "Std_MSE":[8.508,5.335,4.620]})
compare_result

Unnamed: 0,Mean_MSE,Model,Std_MSE,epochs
0,110.004,without_Normalized,8.508,50
1,39.877,with_Normalized,5.335,50
2,37.775,with_Normalized,4.62,100


We can see the mean_MSE and Std_MSE significantly reduced after increase epoch. 