# Regression models with Keras
I will use Keras Neural Network API to build a regression model.

In [16]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error
import tensorflow as tf
import keras
from keras.models import Sequential
from keras.layers import Dense

In [25]:
# Check that GPU is working
from tensorflow.python.client import device_lib
device_lib.list_local_devices()

[name: "/device:CPU:0"
 device_type: "CPU"
 memory_limit: 268435456
 locality {
 }
 incarnation: 2844000594274819444, name: "/device:GPU:0"
 device_type: "GPU"
 memory_limit: 6662668288
 locality {
   bus_id: 1
   links {
   }
 }
 incarnation: 8449685535556982453
 physical_device_desc: "device: 0, name: GeForce GTX 1070 Ti, pci bus id: 0000:09:00.0, compute capability: 6.1"]

## About the data

The dataset is about the compressive strength of different samples of concrete based on the volumes of the different ingredients that were used to make them.

In [3]:
CSV_PATH = 'https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/DL0101EN/labs/data/concrete_data.csv'
concrete_data = pd.read_csv(CSV_PATH)
concrete_data.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.3


In [4]:
concrete_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1030 entries, 0 to 1029
Data columns (total 9 columns):
Cement                1030 non-null float64
Blast Furnace Slag    1030 non-null float64
Fly Ash               1030 non-null float64
Water                 1030 non-null float64
Superplasticizer      1030 non-null float64
Coarse Aggregate      1030 non-null float64
Fine Aggregate        1030 non-null float64
Age                   1030 non-null int64
Strength              1030 non-null float64
dtypes: float64(8), int64(1)
memory usage: 72.5 KB


In [5]:
concrete_data.describe()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
count,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0
mean,281.167864,73.895825,54.18835,181.567282,6.20466,972.918932,773.580485,45.662136,35.817961
std,104.506364,86.279342,63.997004,21.354219,5.973841,77.753954,80.17598,63.169912,16.705742
min,102.0,0.0,0.0,121.8,0.0,801.0,594.0,1.0,2.33
25%,192.375,0.0,0.0,164.9,0.0,932.0,730.95,7.0,23.71
50%,272.9,22.0,0.0,185.0,6.4,968.0,779.5,28.0,34.445
75%,350.0,142.95,118.3,192.0,10.2,1029.4,824.0,56.0,46.135
max,540.0,359.4,200.1,247.0,32.2,1145.0,992.6,365.0,82.6


The data looks very clean

## Train test split

In [6]:
X = concrete_data.drop('Strength', axis=1)
y = concrete_data['Strength']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=11)

## Preprocessing

In [7]:
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.fit_transform(X_test)

#X_train = (X_train - X_train.mean()) / X_train.std()
#X = (X - X.mean()) / X.std()

In [8]:
X_train

array([[ 1.82397267,  0.52449106, -0.84805943, ..., -1.54479455,
         0.08715809,  0.1646987 ],
       [ 0.99378155, -0.62989123,  0.61027759, ..., -0.43182836,
         0.8715104 , -0.68661169],
       [-1.25999867, -0.86357185,  1.97708069, ..., -1.71582079,
         1.3569442 , -0.28505019],
       ...,
       [-1.18547015,  1.97564773, -0.84805943, ...,  0.65652043,
        -0.94423433, -0.28505019],
       [-0.6694309 ,  2.82975041, -0.84805943, ...,  0.07347644,
        -1.05169677, -0.28505019],
       [-0.32886386, -0.86357185,  1.09276994, ...,  1.49610377,
         0.32678698,  0.1646987 ]])

In [9]:
y_train

144    72.30
488    22.75
974    15.53
895    49.77
627     7.84
       ...  
688     2.33
621    34.49
850    37.36
583    37.81
332    60.32
Name: Strength, Length: 824, dtype: float64

## Building a neural network

In [10]:
n_columns = X_train.shape[1]
print("Input columns:", n_columns)

Input columns: 8


In [11]:
def regression_model():
    model = Sequential()
    model.add(Dense(50, activation="relu", input_shape=(n_columns,)))
    model.add(Dense(50, activation="relu"))
    model.add(Dense(1))
    
    model.compile(optimizer="adam", loss="mean_squared_error")
    return model

The above function create a model with two hidden layers, each of 50 units

In [12]:
model = regression_model()

In [13]:
model.fit(X_train, y_train, validation_split=0.3, epochs=100, verbose=2)

Train on 576 samples, validate on 248 samples
Epoch 1/100
 - 0s - loss: 1463.3497 - val_loss: 1609.6971
Epoch 2/100
 - 0s - loss: 1355.0188 - val_loss: 1476.0904
Epoch 3/100
 - 0s - loss: 1210.7950 - val_loss: 1288.4408
Epoch 4/100
 - 0s - loss: 1014.1016 - val_loss: 1032.2124
Epoch 5/100
 - 0s - loss: 762.7237 - val_loss: 747.3810
Epoch 6/100
 - 0s - loss: 516.5452 - val_loss: 492.7225
Epoch 7/100
 - 0s - loss: 334.1922 - val_loss: 329.9197
Epoch 8/100
 - 0s - loss: 245.2057 - val_loss: 269.8536
Epoch 9/100
 - 0s - loss: 212.7997 - val_loss: 254.7220
Epoch 10/100
 - 0s - loss: 199.6700 - val_loss: 246.5906
Epoch 11/100
 - 0s - loss: 190.4087 - val_loss: 242.0433
Epoch 12/100
 - 0s - loss: 183.3955 - val_loss: 237.9842
Epoch 13/100
 - 0s - loss: 178.1038 - val_loss: 234.2495
Epoch 14/100
 - 0s - loss: 173.5124 - val_loss: 227.9049
Epoch 15/100
 - 0s - loss: 168.8549 - val_loss: 223.2834
Epoch 16/100
 - 0s - loss: 165.6780 - val_loss: 220.6509
Epoch 17/100
 - 0s - loss: 162.1366 - val_l

<keras.callbacks.callbacks.History at 0x237d9738888>

In [18]:
y_predicted = model.predict(X_test)

In [20]:
mean_squared_error(y_test, y_predicted)

52.28245479268069

Let's compare predicted and original values

In [24]:
for a, b in zip(y_predicted, y_test):
    print(a, b)

[49.214535] 35.3
[29.363642] 21.78
[33.287716] 31.45
[36.916454] 43.58
[57.342865] 59.3
[26.985846] 26.31
[33.8508] 37.36
[45.938744] 35.85
[29.94523] 36.99
[18.863531] 13.52
[46.0881] 41.54
[16.085415] 7.32
[24.086374] 27.04
[39.696377] 50.77
[22.504194] 19.54
[37.7478] 33.19
[43.201553] 24.1
[46.925983] 49.8
[33.353313] 31.97
[38.685562] 43.38
[21.344826] 20.77
[31.145462] 31.42
[45.31134] 33.94
[51.80158] 46.2
[22.86045] 24.13
[41.029625] 32.96
[19.476915] 24.28
[41.02467] 39.42
[36.3944] 42.22
[32.538994] 42.64
[38.08911] 33.76
[47.565514] 52.83
[47.186104] 41.05
[38.317966] 43.06
[29.317379] 25.56
[46.288662] 48.85
[25.499012] 6.27
[37.578236] 42.7
[48.73928] 55.51
[17.117655] 9.45
[53.846462] 50.53
[40.93162] 49.19
[23.096174] 21.16
[12.078311] 13.36
[41.904568] 53.3
[35.138123] 30.88
[16.431734] 14.2
[56.369343] 44.42
[41.925686] 54.38
[47.570198] 44.09
[27.099127] 21.75
[64.72136] 54.77
[27.045158] 16.5
[60.28986] 60.2
[51.19864] 41.1
[33.99989] 26.74
[26.657015] 33.66
[47.1485

Not so precise but that model was fast and dirty.