# Predicting Concrete Compressive Strength Using Random Forest Regression
---
## 1. Problem Definition
Concrete is one of the most widely used materials in construction, and its compressive strength determines the quality, durability, and safety of structures. Traditionally, the strength of concrete is determined through laboratory testing, which is time-consuming, costly, and dependent on curing time.

In this project, the goal is to develop a predictive model that can accurately estimate the compressive strength (in MPa) of concrete based on its mixture components and age. By using Random Forest Regression, we aim to capture the nonlinear relationships between input variables such as cement, water, aggregates, and other admixtures, to produce reliable predictions.

### Objectives
- To build a regression model that predicts concrete compressive strength using machine learning.
- To analyze the influence of different ingredients (cement, slag, fly ash, etc.) on concrete strength.
- To reduce reliance on physical testing by providing a faster, data-driven alternative.
---


## 2. Data Collection
The dataset used in this project is the Concrete Compressive Strength Dataset, which is publicly available from Kaggle. It contains 1,030 instances and 9 quantitative attributes, with no missing values. The collected data represents laboratory results measuring the compressive strength of concrete under various mixture combinations and curing ages.

In [None]:
# Importing necessary libraries
import numpy as np
import scipy as sp
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score

In [None]:
# Loading the dataset
df = pd.read_csv("Concrete Compressive Strength.csv")

In [None]:
# Instances of the dataset (number of rows and columns)
df.shape

(1030, 9)

---


## 3. Data Cleaning


In [20]:
# Displaying the first few rows of the dataset
df.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age (day),Concrete compressive strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.986111
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.887366
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.269535
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05278
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.296075


In [21]:
# Checking the data types of each column
df.dtypes

Cement                            float64
Blast Furnace Slag                float64
Fly Ash                           float64
Water                             float64
Superplasticizer                  float64
Coarse Aggregate                  float64
Fine Aggregate                    float64
Age (day)                           int64
Concrete compressive strength     float64
dtype: object

In [22]:
# Checking for missing values
df.isnull().sum()

Cement                            0
Blast Furnace Slag                0
Fly Ash                           0
Water                             0
Superplasticizer                  0
Coarse Aggregate                  0
Fine Aggregate                    0
Age (day)                         0
Concrete compressive strength     0
dtype: int64

In [23]:
# Checking for duplicate rows
df.duplicated().sum()


np.int64(25)