# Concrete Strength Prediction

Predicting the strength of concrete's compressive strength using Random Forest and XGBoost and taking the weighted average of the predictions.

## 1. Import the Libraries

Import the required libraries.

In [1]:
# Import libraries like scikit-learn and xgboost
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error, r2_score

import xgboost as xgb

## 2. Prepare the Data

Load the dataset from Kaggle and convert into numpy array, further splitting for training and testing.

In [2]:
# Load the dataset
data = pd.read_csv('/kaggle/input/concrete-compressive-strength-data-set/concrete_data.csv')
data.head()

Unnamed: 0,cement,blast_furnace_slag,fly_ash,water,superplasticizer,coarse_aggregate,fine_aggregate,age,concrete_compressive_strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.3


In [3]:
# Split into training and testing
x_data = data.drop(columns = ['concrete_compressive_strength'])
y_data = data['concrete_compressive_strength']

x_train, x_test, y_train, y_test = train_test_split(x_data, y_data, test_size = 0.2)

In [4]:
# Check the shape and length of the data
shape = x_train.shape
length = len(y_test)

print(f"Shape of the training data: {shape}")
print(f"Length of the test data: {length}")

Shape of the training data: (824, 8)
Length of the test data: 206


## 3. Random Forest

* **RandomForestRegressor()**: Ensemble learning method using multiple decision trees for regression tasks.

* **n_estimators = 100**: Number of decision trees to build in the forest.

* **max_depth = 10**: Maximum depth allowed for each decision tree; controls over fitting.

* **random_state = 50**: Seed for random number generator; ensures reproducible results.

* **fit(x_train, y_train)**: Trains the model using the training dataset.

* **predict(x_test)**: Uses the trained model to predict outcomes for new data.

In [5]:
rf_regressor = RandomForestRegressor(n_estimators = 100, max_depth = 10, random_state = 50)
rf_regressor.fit(x_train, y_train)

y_predicted_rf = rf_regressor.predict(x_test)

# Mean Squared Error and R2 Score for Random Forest Regressor
mse_rf = mean_squared_error(y_test, y_predicted_rf)
r2_rf = r2_score(y_test, y_predicted_rf)
print(f"Mean Squared Error for Random Forest: {mse_rf}")
print(f"R2 Score for Random Forest: {r2_rf}")

Mean Squared Error for Random Forest: 28.256079528693423
R2 Score for Random Forest: 0.9115206060433415


## 4. XGBoost

* **XGBRegressor()**: XGBoost's implementation for regression using gradient boosting.

* **n_estimators = 100**: Number of boosting rounds or trees to build.

* **learning_rate = 0.1**: Controls the step size for weight updates.

In [6]:
xgb_regressor = xgb.XGBRegressor(n_estimators = 100, learning_rate = 0.1, max_depth = 5, random_state = 50)
xgb_regressor.fit(x_train, y_train)

y_predicted_xgb = xgb_regressor.predict(x_test)

# Mean Squared Error and R2 Score for XGBoost Regressor
mse_xgb = mean_squared_error(y_test, y_predicted_xgb)
r2_xgb = r2_score(y_test, y_predicted_xgb)
print(f"Mean Squared Error for XGBoost: {mse_xgb}")
print(f"R2 Score for XGBoost: {r2_xgb}")

Mean Squared Error for XGBoost: 23.555267449721466
R2 Score for XGBoost: 0.9262404472523533


## 5. Weighted Average

Set weights for each ensemble to balance them for better prediction.

In [7]:
# Define weight of each ensemble
w_rf = 0.1
w_xgb = 0.9

# Final predicted value
y_predicted = (w_rf * y_predicted_rf) + (w_xgb * y_predicted_xgb)

## 6. Evaluate the model

* **mse_final = mean_squared_error(y_test, y_predicted)**: Measures the average squared difference between actual and predicted values; lower values indicate better performance.

* **r2_score(y_test, y_predicted)**: Indicates the proportion of variance explained by the model.

In [8]:
mse_final = mean_squared_error(y_test, y_predicted)
r2_final = r2_score(y_test, y_predicted)

print(f"Mean Squared Error: {mse_final}")
print(f"R2 Score: {r2_final}")

Mean Squared Error: 23.347037931766632
R2 Score: 0.9268924846849994
