Concrete is the most important material in civil engineering. The concrete compressive strength is a highly nonlinear function of age and ingredients. Measurements for Cement, Blast Furnace Slag, Fly Ash, Water, Superplasticizer,Coarse Aggregate, and Fine Aggregate are all in units of kg / m^3 of concrete mixture. The Age is measured in days. The Concrete Compressive Strength is measured in MPa.

These data were downloaded from the UCI Machine Learning Repository ( https://archive.ics.uci.edu/ml/datasets/Concrete+Compressive+Strength).

The original source of the data is: I-Cheng Yeh, "Modeling of strength of high performance concrete using artificial neural networks," Cement and Concrete Research, Vol. 28, No. 12, pp. 1797-1808 (1998).

We are building a simple model of the form y=𝛽0+𝛽1𝑥1+𝛽2𝑥2+𝛽3𝑥3+𝛽4𝑥4+𝛽5𝑥5+𝛽6𝑥6+𝛽7𝑥7+𝛽8𝑥8, where:

After reading the data, you should use this code to change the column names.

df.columns = ['Cement', 'Slag', 'FlyAsh', 'Water', 'Plasticizer', 'CoarseAgg', 'FineAgg', 'Age', 'Strength']

x1 is Concrete
x2 is Slag
x3 is FlyAsh
x4 is Water
x5 is Plasticizer
x6 is CoarseAgg
x7 is FineAgg
x8 is Age and all of the 𝛽 values are determined by linear regression.
Problem 1: Regression

Develop a machine learning model that can predict the Concrete Compressive Strength for a particular concrete recipe given the quantities for input ingredients and a number of days (Age) for curing the concrete. This is a baseline model. What we are primarily interested in here is:

Making sure that the data is properly formatted for scikit-learn.
Identifying and separating features (X) and target (y).
Having a base score for the model that we can use to measure progress.
Validating that we have enough data for both training and testing.
Use at least 7 conventional machine learning algorithms and DEEP LEARNING (Tensorflow - Keras or Pytorch) to predict Concrete Compressive Strength
Problem 2: Classification Develop a machine learning model that can predict the ConcreteClass for a particular concrete recipe.

Take concrete regression data and modify it to be suitable for classification examples.# create new categorical targets

Create new columns using the following functions def green_classifier(s): """ Use numeric data to create a Green categorical feature. """

if (s.Slag + s.FlyAsh < 150.0) and (s.Plasticizer < 10.0):
    return "n/a"
else:
    return "green"
def strength_classifer(x): """ Use numeric data to create a ConcreteClass categorical feature. This is based on "CIP 35 - Testing Compressive Strength of Concrete", National Ready Mixed Concrete Association (www.nrmca.org), 2003 & 2014. """

if x < 17.0:
    return "non-structural"
elif x < 28.0:
    return "residential"
elif x < 70.0:
    return "commercial"
else:
    return "high-strength"
df["Green"] = df.apply(green_classifier, axis=1) df["ConcreteClass"] = df.Strength.apply(strength_classifer)

convert Plasticizer to text (the numeric values are embedded in Green)
df.Plasticizer = df.Plasticizer.apply(lambda x: "yes" if x > 0 else "no")

remove Strength feature as replaced by categorical target ConcreteClass
df.drop("Strength", axis=1, inplace=True)

Develop a machine learning model that can predict the ConcreteClass for a particular concrete recipe given the quantities for input ingredients and a number of days (Age) for curing the concrete.

This is a baseline model. What we are primarily interested in here is:

Making sure that the data is properly formatted for scikit-learn.
Identifying and separating features (X) and target (y).
Having a base score for the model that we can use to measure progress.
Validating that we have enough data for both training and testing.
Use at least 7 conventional machine learning algorithms and DEEP LEARNING (Tensorflow - Keras or Pytorch) to predict ConcreteClass

In [1]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression, LogisticRegression
from sklearn.tree import DecisionTreeRegressor, DecisionTreeClassifier
from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier
from sklearn.svm import SVR, SVC
from sklearn.neighbors import KNeighborsRegressor, KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import mean_squared_error, r2_score, accuracy_score, classification_report
import tensorflow as tf

# Load the data
df = pd.read_excel('Concrete_Data.xls')
df.columns = ['Cement', 'Slag', 'FlyAsh', 'Water', 'Plasticizer', 'CoarseAgg', 'FineAgg', 'Age', 'Strength']

# Regression task
X_reg = df.drop('Strength', axis=1)
y_reg = df['Strength']

# Classification task
def green_classifier(s):
    if (s.Slag + s.FlyAsh < 150.0) and (s.Plasticizer < 10.0):
        return "n/a"
    else:
        return "green"

def strength_classifer(x):
    if x < 17.0:
        return "non-structural"
    elif x < 28.0:
        return "residential"
    elif x < 70.0:
        return "commercial"
    else:
        return "high-strength"

df["Green"] = df.apply(green_classifier, axis=1)
df["ConcreteClass"] = df.Strength.apply(strength_classifer)
df.Plasticizer = df.Plasticizer.apply(lambda x: "yes" if x > 0 else "no")
df.drop("Strength", axis=1, inplace=True)

X_class = pd.get_dummies(df.drop('ConcreteClass', axis=1))
y_class = df['ConcreteClass']

# Split the data
X_reg_train, X_reg_test, y_reg_train, y_reg_test = train_test_split(X_reg, y_reg, test_size=0.2, random_state=42)
X_class_train, X_class_test, y_class_train, y_class_test = train_test_split(X_class, y_class, test_size=0.2, random_state=42)

# Scale the features
scaler = StandardScaler()
X_reg_train_scaled = scaler.fit_transform(X_reg_train)
X_reg_test_scaled = scaler.transform(X_reg_test)
X_class_train_scaled = scaler.fit_transform(X_class_train)
X_class_test_scaled = scaler.transform(X_class_test)

# Regression models
reg_models = {
    'Linear Regression': LinearRegression(),
    'Decision Tree': DecisionTreeRegressor(),
    'Random Forest': RandomForestRegressor(),
    'SVR': SVR(),
    'KNN': KNeighborsRegressor(),
}

for name, model in reg_models.items():
    model.fit(X_reg_train_scaled, y_reg_train)
    y_pred = model.predict(X_reg_test_scaled)
    mse = mean_squared_error(y_reg_test, y_pred)
    r2 = r2_score(y_reg_test, y_pred)
    print(f"{name} - MSE: {mse:.4f}, R2: {r2:.4f}")

# Deep Learning Regression model
model_reg = tf.keras.Sequential([
    tf.keras.layers.Dense(64, activation='relu', input_shape=(X_reg_train_scaled.shape[1],)),
    tf.keras.layers.Dense(32, activation='relu'),
    tf.keras.layers.Dense(1)
])
model_reg.compile(optimizer='adam', loss='mse')
model_reg.fit(X_reg_train_scaled, y_reg_train, epochs=100, batch_size=32, verbose=0)
y_pred = model_reg.predict(X_reg_test_scaled)
mse = mean_squared_error(y_reg_test, y_pred)
r2 = r2_score(y_reg_test, y_pred)
print(f"Deep Learning Regression - MSE: {mse:.4f}, R2: {r2:.4f}")

# Classification models
class_models = {
    'Logistic Regression': LogisticRegression(),
    'Decision Tree': DecisionTreeClassifier(),
    'Random Forest': RandomForestClassifier(),
    'SVC': SVC(),
    'KNN': KNeighborsClassifier(),
    'Naive Bayes': GaussianNB(),
}

for name, model in class_models.items():
    model.fit(X_class_train_scaled, y_class_train)
    y_pred = model.predict(X_class_test_scaled)
    accuracy = accuracy_score(y_class_test, y_pred)
    print(f"{name} - Accuracy: {accuracy:.4f}")
    print(classification_report(y_class_test, y_pred))

# Deep Learning Classification model
model_class = tf.keras.Sequential([
    tf.keras.layers.Dense(64, activation='relu', input_shape=(X_class_train_scaled.shape[1],)),
    tf.keras.layers.Dense(32, activation='relu'),
    tf.keras.layers.Dense(4, activation='softmax')
])
model_class.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model_class.fit(X_class_train_scaled, pd.get_dummies(y_class_train).values.argmax(1), epochs=100, batch_size=32, verbose=0)
y_pred = model_class.predict(X_class_test_scaled).argmax(1)
accuracy = accuracy_score(pd.get_dummies(y_class_test).values.argmax(1), y_pred)
print(f"Deep Learning Classification - Accuracy: {accuracy:.4f}")
print(classification_report(pd.get_dummies(y_class_test).values.argmax(1), y_pred))

Linear Regression - MSE: 95.9755, R2: 0.6275
Decision Tree - MSE: 47.0538, R2: 0.8174
Random Forest - MSE: 29.6219, R2: 0.8850
SVR - MSE: 88.9783, R2: 0.6547
KNN - MSE: 72.4172, R2: 0.7190


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 22ms/step
Deep Learning Regression - MSE: 36.3740, R2: 0.8588
Logistic Regression - Accuracy: 0.8058
                precision    recall  f1-score   support

    commercial       0.87      0.89      0.88       133
 high-strength       0.33      0.17      0.22         6
non-structural       0.78      0.91      0.84        32
   residential       0.59      0.49      0.53        35

      accuracy                           0.81       206
     macro avg       0.64      0.61      0.62       206
  weighted avg       0.79      0.81      0.80       206

Decision Tree - Accuracy: 0.8447
                precision    recall  f1-score   support

    commercial       0.92      0.91      0.92       133
 high-strength       1.00      0.50      0.67         6
non-structural       0.88      0.72      0.79        32
   residential       0.59      0.77      0.67        35

      accuracy                           0.84       206
     macro avg 

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


                precision    recall  f1-score   support

    commercial       0.83      0.94      0.88       133
 high-strength       0.00      0.00      0.00         6
non-structural       0.62      0.47      0.54        32
   residential       0.44      0.40      0.42        35

      accuracy                           0.75       206
     macro avg       0.47      0.45      0.46       206
  weighted avg       0.71      0.75      0.72       206

KNN - Accuracy: 0.7621
                precision    recall  f1-score   support

    commercial       0.85      0.92      0.88       133
 high-strength       0.50      0.17      0.25         6
non-structural       0.70      0.59      0.64        32
   residential       0.45      0.43      0.44        35

      accuracy                           0.76       206
     macro avg       0.63      0.53      0.55       206
  weighted avg       0.75      0.76      0.75       206

Naive Bayes - Accuracy: 0.6602
                precision    recall  f1-scor