# Concrete Strength Prediction Using Ensemble Methods
---
*Created: Md. Rafiquzzaman Rafi*

*Date: 27 August, 2024*

---

This notebook demonstrates how to use ensemble methods to predict concrete compressive strength using Random Forest Regressor and Neural Networks. 

The workflow includes:
1. **Data Loading and Preprocessing**
2. **Model Training**
3. **Evaluation**
4. **Ensemble Prediction**
5. **Saving Models and Components**
6. **Load the model and components**

Let's start by loading the necessary libraries and the dataset.


In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestRegressor
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.callbacks import EarlyStopping
from sklearn.metrics import mean_squared_error, r2_score
from tensorflow.keras.optimizers import Adam

# Load the dataset
data = pd.read_csv('Concrete_Data.csv')

# Rename columns for easier access
data = data.rename(columns={
    'Cement (component 1)(kg in a m^3 mixture)': 'cement',
    'Blast Furnace Slag (component 2)(kg in a m^3 mixture)': 'blast_furnace_slag',
    'Fly Ash (component 3)(kg in a m^3 mixture)': 'fly_ash',
    'Water  (component 4)(kg in a m^3 mixture)': 'water',
    'Superplasticizer (component 5)(kg in a m^3 mixture)': 'superplasticizer',
    'Coarse Aggregate  (component 6)(kg in a m^3 mixture)': 'coarse_aggregate',
    'Fine Aggregate (component 7)(kg in a m^3 mixture)': 'fine_aggregate',
    'Age (day)': 'age',
    'Concrete compressive strength(MPa, megapascals) ': 'compressive_strength'
})

# Create additional features
data['cement_coarse'] = data.cement / data.coarse_aggregate
data['cement_fine'] = data.cement / data.fine_aggregate

# Define features and target variable
X = data.drop(['compressive_strength'], axis=1)
y = data['compressive_strength']

# Normalize the features
scaler = StandardScaler()
X = scaler.fit_transform(X)

# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

## Model Definitions

### Neural Network Model

We define a neural network with the following architecture:
- Dense layers with ReLU activation
- Dropout for regularization
- Output layer with ReLU activation

### Random Forest Regressor

We use a Random Forest Regressor with default hyperparameters for comparison.


In [2]:
def build_model():
    model = Sequential()
    model.add(Dense(256, input_dim=X_train.shape[1], activation='relu'))
    model.add(Dense(128, activation='relu'))
    model.add(Dropout(0.2))
    model.add(Dense(1024, activation='relu'))
    model.add(Dropout(0.2))
    model.add(Dense(1, activation='relu'))
    model.compile(optimizer=Adam(learning_rate=0.001), loss='mse', metrics=['mae'])
    return model

# Initialize and train the Neural Network
model = build_model()

# Initialize and train the Random Forest Regressor
rf = RandomForestRegressor(
    max_depth=None,
    max_features="log2",
    min_samples_leaf=1,
    min_samples_split=2,
    n_estimators=300,
    random_state=42,
)
rf.fit(X_train, y_train)

# Train the Neural Network model
history = model.fit(X_train, y_train, 
                    validation_split=0.2,
                    epochs=100, 
                    batch_size=16, 
                    callbacks=[EarlyStopping(monitor='val_loss', patience=10, restore_best_weights=True)],
                    verbose=0)


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


## Evaluation

We evaluate both models and calculate the performance metrics on the test set.


In [3]:
# Evaluate the Neural Network model
test_loss, test_mae = model.evaluate(X_test, y_test, verbose=0)
print(f'Test Loss: {test_loss:.4f}, Test MAE: {test_mae:.4f}')

# Predict with the Random Forest model
rf_predictions = rf.predict(X_test)
rf_mse = mean_squared_error(y_test, rf_predictions)
rf_r2 = r2_score(y_test, rf_predictions)
print(f'Random Forest MSE: {rf_mse:.4f}')
print(f'Random Forest R² Score: {rf_r2:.4f}')

# Predict with the Neural Network model
nn_predictions = model.predict(X_test).flatten()
nn_mse = mean_squared_error(y_test, nn_predictions)
nn_r2 = r2_score(y_test, nn_predictions)
print(f'Neural Network MSE: {nn_mse:.4f}')
print(f'Neural Network R² Score: {nn_r2:.4f}')

Test Loss: 33.8419, Test MAE: 4.1719
Random Forest MSE: 27.4620
Random Forest R² Score: 0.8934
[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 20ms/step
Neural Network MSE: 33.8419
Neural Network R² Score: 0.8687


## Ensemble Prediction

We combine the predictions from the Random Forest and Neural Network models to create an ensemble prediction and evaluate its performance.


In [4]:
# Combine Random Forest predictions with Neural Network predictions
ensemble_predictions = 0.5 * rf_predictions + 0.5 * nn_predictions

# Calculate the ensemble MSE and R² score
ensemble_mse = mean_squared_error(y_test, ensemble_predictions)
ensemble_r2 = r2_score(y_test, ensemble_predictions)
print(f'Ensemble MSE: {ensemble_mse:.4f}')
print(f'Ensemble R² Score: {ensemble_r2:.4f}')

Ensemble MSE: 26.9487
Ensemble R² Score: 0.8954


## Saving Models and Components

Here we save the trained models and other components for future use.


In [5]:
import pickle
import joblib
from tensorflow.keras.models import save_model

# Save Random Forest model
rf_path = 'random_forest_model.pkl'
joblib.dump(rf, rf_path)

# Save Neural Network model
nn_model_path = 'neural_network_model.keras'
model.save(nn_model_path)

# Save Scaler
scaler_path = 'scaler.pkl'
joblib.dump(scaler, scaler_path)

# Define ensemble configuration
ensemble_config = {
    'rf_weight': 0.5,
    'nn_weight': 0.5,
    'rf_model_path': rf_path,
    'nn_model_path': nn_model_path,
    'scaler_path': scaler_path
}

# Save all components in one file
with open('ensemble_model.pkl', 'wb') as f:
    pickle.dump(ensemble_config, f)


## Loading Models and Components

Here we load the trained models and other components for future use.

In [6]:
import pickle
import joblib
from tensorflow.keras.models import load_model

# Load the ensemble configuration
with open('ensemble_model.pkl', 'rb') as f:
    ensemble_config = pickle.load(f)

# Load Random Forest model
rf = joblib.load(ensemble_config['rf_model_path'])

# Load Neural Network model
nn_model = load_model(ensemble_config['nn_model_path'])

# Load Scaler
scaler = joblib.load(ensemble_config['scaler_path'])

# Extract weights from the configuration
rf_weight = ensemble_config['rf_weight']
nn_weight = ensemble_config['nn_weight']

# Data Source

---
  Original Owner and Donor

  Prof. I-Cheng Yeh

  Department of Information Management

  Chung-Hua University, 

  Hsin Chu, Taiwan 30067, R.O.C.

  e-mail:icyeh@chu.edu.tw
  
  TEL:886-3-5186511

  Date Donated: August 3, 2007
 
---