Multivariate Linear Regression using Sci-Kit Learn and Regression with Keras Project

Predict the comprehensive strength of concrete using Machine Learning and Deep Learning

Regression
Comprehensive strength of cement or concrete is a non-linear combination of age and other ingredients.

by Blibo Albert @bliboalbert

Regression with sklearn: multivariable linear regression

In [1]:
#import libraries: sklearn scipy
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import StandardScaler, normalize
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score

In [2]:
#get data for processing
data = pd.read_excel('Concrete_Data.xls')
data.head()

Unnamed: 0,Cement (component 1)(kg in a m^3 mixture),Blast Furnace Slag (component 2)(kg in a m^3 mixture),Fly Ash (component 3)(kg in a m^3 mixture),Water (component 4)(kg in a m^3 mixture),Superplasticizer (component 5)(kg in a m^3 mixture),Coarse Aggregate (component 6)(kg in a m^3 mixture),Fine Aggregate (component 7)(kg in a m^3 mixture),Age (day),"Concrete compressive strength(MPa, megapascals)"
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.986111
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.887366
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.269535
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05278
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.296075


In [3]:
data.columns

Index(['Cement (component 1)(kg in a m^3 mixture)',
       'Blast Furnace Slag (component 2)(kg in a m^3 mixture)',
       'Fly Ash (component 3)(kg in a m^3 mixture)',
       'Water  (component 4)(kg in a m^3 mixture)',
       'Superplasticizer (component 5)(kg in a m^3 mixture)',
       'Coarse Aggregate  (component 6)(kg in a m^3 mixture)',
       'Fine Aggregate (component 7)(kg in a m^3 mixture)', 'Age (day)',
       'Concrete compressive strength(MPa, megapascals) '],
      dtype='object')

In [4]:
# data columns looks untide, let's clean it by renaming the columns
data = data.rename(columns={'Cement (component 1)(kg in a m^3 mixture)':'Cement',
                            'Blast Furnace Slag (component 2)(kg in a m^3 mixture)':'Blast Furnace Slag',
                    'Fly Ash (component 3)(kg in a m^3 mixture)':'Fly Ash',
                            'Water  (component 4)(kg in a m^3 mixture)':'Water',
                    'Superplasticizer (component 5)(kg in a m^3 mixture)':'Superplasticizer', 
                            'Coarse Aggregate  (component 6)(kg in a m^3 mixture)':'Coarse Aggregate',
                    'Fine Aggregate (component 7)(kg in a m^3 mixture)':'Fine Aggregate',
                            'Age (day)':'Age (day)', 'Concrete compressive strength(MPa, megapascals)':'Strength'})
data.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age (day),"Concrete compressive strength(MPa, megapascals)"
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.986111
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.887366
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.269535
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05278
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.296075


In [6]:
# extract feature matrix and response vector
#data_columns = data.columns
X = data.iloc[:,:-1] # exclude last item
y = data.iloc[:,-1]

In [9]:
# preprocesiing
X.isnull().sum()

Cement                0
Blast Furnace Slag    0
Fly Ash               0
Water                 0
Superplasticizer      0
Coarse Aggregate      0
Fine Aggregate        0
Age (day)             0
dtype: int64

In [10]:
y.isnull().sum()

0

In [12]:
# data looks neat/clean
X = StandardScaler().fit(X).transform(X)
X = normalize(X, norm='l2')

In [13]:
# test / train dataset
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=.33, random_state=40)

In [14]:
# initialize and train model
Linear_reg_sklearn = LinearRegression(n_jobs=10)
Linear_reg_sklearn.fit(X_train, y_train)
Linear_reg_sklearn


LinearRegression(n_jobs=10)

In [15]:
# Testing and Evaluation
sklearn_predict = Linear_reg_sklearn.predict(X_test)
coef_det = r2_score(y_test, sklearn_predict)
print('[Coefficient of Determination]: {0:.2f}'.format(coef_det))

[Coefficient of Determination]: 0.62


In [16]:
# mean square error - mse
from sklearn.metrics import mean_squared_error
mse = mean_squared_error(y_test, sklearn_predict)
print('[Mean Squared Error of Model]: {0:.2f}'.format(mse))

[Mean Squared Error of Model]: 101.25


In [20]:
evaluate_df = pd.DataFrame(y_test)
evaluate_df['prediction'] = sklearn_predict
evaluate_df.head(10)

Unnamed: 0,"Concrete compressive strength(MPa, megapascals)",prediction
808,11.465986,14.92454
281,32.660478,22.495261
243,40.858348,38.742538
589,31.899986,26.455793
283,44.209201,43.582484
136,74.497882,57.639511
966,12.459521,18.21792
14,47.813782,24.514133
0,79.986111,49.786775
752,59.76378,44.805387


In [18]:
# The mse has affected the accuracy of the model, hence the value of r-square.
# this could possibly be an overfitting problem, therefore feature engineering or regularization
# can be applied to prevent the overfitting thereby increasing the accuracy of the model
# suggestion: use GridSearchCV algorithm to hyperparameter tune Lasso() or Ridge() algorithms
# THE END OF SKLEARN LINEAR REGRESSION

Regression with Keras Library - Deep Learning

In [22]:
# download and install keras library with !pip
!pip install keras

Defaulting to user installation because normal site-packages is not writeable
Collecting keras
  Downloading keras-2.11.0-py2.py3-none-any.whl (1.7 MB)
     ---------------------------------------- 1.7/1.7 MB 112.3 kB/s eta 0:00:00
Installing collected packages: keras
Successfully installed keras-2.11.0


In [27]:
!pip install tensorflow

Defaulting to user installation because normal site-packages is not writeable
Collecting tensorflow
  Using cached tensorflow-2.11.0-cp39-cp39-win_amd64.whl (1.9 kB)
Collecting tensorflow-intel==2.11.0
  Using cached tensorflow_intel-2.11.0-cp39-cp39-win_amd64.whl (266.3 MB)
Collecting tensorflow-estimator<2.12,>=2.11.0
  Downloading tensorflow_estimator-2.11.0-py2.py3-none-any.whl (439 kB)
     ------------------------------------- 439.2/439.2 kB 58.8 kB/s eta 0:00:00
Collecting astunparse>=1.6.0
  Downloading astunparse-1.6.3-py2.py3-none-any.whl (12 kB)
Collecting tensorboard<2.12,>=2.11
  Using cached tensorboard-2.11.2-py3-none-any.whl (6.0 MB)
Collecting termcolor>=1.1.0
  Using cached termcolor-2.2.0-py3-none-any.whl (6.6 kB)
Collecting flatbuffers>=2.0
  Downloading flatbuffers-23.3.3-py2.py3-none-any.whl (26 kB)
Collecting absl-py>=1.0.0
  Downloading absl_py-1.4.0-py3-none-any.whl (126 kB)
     ------------------------------------- 126.5/126.5 kB 48.0 kB/s eta 0:00:00
Collect



In [28]:
# import keras module and libraries
import keras
from keras.models import Sequential
from keras.layers import Dense

In [34]:
# number of nodes in input layer
num_input = X.shape[1]

In [35]:
# initialize model
def regression_model():
    
    # create model
    model = Sequential()
    model.add(Dense(50, activation='relu', input_shape=(num_input,)))
    model.add(Dense(50, activation='relu'))
    model.add(Dense(8, activation='relu'))
    model.add(Dense(1))
    
    #compile model
    model.compile(optimizer='adam', loss='mean_squared_error')
    return model

    
    


In [36]:
# train model
# build model
model = regression_model()
model.fit(X, y, validation_split=.30, epochs=100, verbose=1)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

Epoch 81/100
Epoch 82/100
Epoch 83/100
Epoch 84/100
Epoch 85/100
Epoch 86/100
Epoch 87/100
Epoch 88/100
Epoch 89/100
Epoch 90/100
Epoch 91/100
Epoch 92/100
Epoch 93/100
Epoch 94/100
Epoch 95/100
Epoch 96/100
Epoch 97/100
Epoch 98/100
Epoch 99/100
Epoch 100/100


<keras.callbacks.History at 0x2876b770310>

In [37]:
model.fit(X_train, y_train, epochs=100, verbose=1)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

<keras.callbacks.History at 0x2876c9c3b20>

In [38]:
model.evaluate(X_test, y_test)



42.66318130493164

In [40]:
keras_predict = model.predict(X_test)



In [41]:
keras_df = pd.DataFrame(y_test)
keras_df['prediction'] = keras_predict
keras_df.head(10)

Unnamed: 0,"Concrete compressive strength(MPa, megapascals)",prediction
808,11.465986,10.938446
281,32.660478,32.439484
243,40.858348,34.434925
589,31.899986,30.757206
283,44.209201,40.398224
136,74.497882,62.756866
966,12.459521,13.452812
14,47.813782,36.854584
0,79.986111,65.003456
752,59.76378,50.452049


In [42]:
# as expected, the keras regression performed better than the sklearn regression
## THE END OF PROJECT

@bliboalbert <- github / LinkedIn