Uploadig Data

In [32]:
from google.colab import files
uploaded = files.upload()

Saving Material Strength Predictor data.csv to Material Strength Predictor data (1).csv


Loading Data

In [None]:
import pandas as pd
raw_data=pd.read_csv("Material Strength Predictor data.csv")
print(raw_data.head())

### Feature Engineering

Based on the insights from EDA, we know that concrete strength depends not just on individual ingredients, but also on derived features and interactions.

In this notebook, we will:

- Create physics-informed features like Water-to-Cement ratio.
- Transform skewed variables for better model performance.
- Engineer interaction features such as Age × Water-to-Cement ratio.

The goal is to prepare the dataset for machine learning while preserving physical meaning.


**Water-to-Cement Ratio**: Critical physics-based feature controlling porosity and strength.

In [52]:
raw_data['Water_Cement_Ratio'] = raw_data['Water  (component 4)(kg in a m^3 mixture)'] / raw_data['Cement (component 1)(kg in a m^3 mixture)'].round(4)
print(raw_data['Water_Cement_Ratio'].head(5))

0    0.300000
1    0.300000
2    0.685714
3    0.685714
4    0.966767
Name: Water_Cement_Ratio, dtype: float64


**Total Aggregate**: Represents the combined filler content, important for volume stability.

In [53]:
raw_data['Total_Aggregate'] = raw_data['Coarse Aggregate  (component 6)(kg in a m^3 mixture)'] + raw_data['Fine Aggregate (component 7)(kg in a m^3 mixture)'].round(3)
print(raw_data['Total_Aggregate'].head(5))

0    1716.0
1    1731.0
2    1526.0
3    1526.0
4    1803.9
Name: Total_Aggregate, dtype: float64


**Interaction Feature (Age × W/C)**: Captures how strength growth over time depends on the mixture’s chemistry.

In [54]:
raw_data['Age_WCR'] = raw_data['Age (day)'] * raw_data['Water_Cement_Ratio'].round(4)
print(raw_data['Age_WCR'].head(5))

0      8.4000
1      8.4000
2    185.1390
3    250.2805
4    348.0480
Name: Age_WCR, dtype: float64


**Log Age**: Helps linear models capture the non-linear strength gain over time.

In [55]:
import numpy as np
raw_data['Log_Age'] = np.log1p(raw_data['Age (day)']).round(4)  # log(1 + Age)
print(raw_data['Log_Age'].head(5))

0    3.3673
1    3.3673
2    5.6021
3    5.9026
4    5.8889
Name: Log_Age, dtype: float64


## Installing Processed File in my system

In [56]:
from google.colab import files

raw_data = raw_data.rename(columns={
    'Cement (component 1)(kg in a m^3 mixture)': 'Cement',
    'Blast Furnace Slag (component 2)(kg in a m^3 mixture)': 'Slag',
    'Fly Ash (component 3)(kg in a m^3 mixture)': 'FlyAsh',
    'Water  (component 4)(kg in a m^3 mixture)': 'Water',
    'Superplasticizer (component 5)(kg in a m^3 mixture)': 'Superplasticizer',
    'Coarse Aggregate  (component 6)(kg in a m^3 mixture)': 'CoarseAgg',
    'Fine Aggregate (component 7)(kg in a m^3 mixture)': 'FineAgg',
    'Age (day)': 'Age',
    'Concrete compressive strength(MPa, megapascals) ': 'Strength',
    'Water_Cement_Ratio': 'WCR',
    'Total_Aggregate': 'TotalAgg',
    'Age_WCR': 'Age_x_WCR',
    'Log_Age': 'Log_Age'
})

print(raw_data.head(10))

# Save the processed data with all new features
processed_file = "processed_material_concrete_data.csv"
raw_data.to_csv(processed_file, index=False)

# Download to your laptop
files.download(processed_file)

   Cement   Slag  FlyAsh  Water  Superplasticizer  CoarseAgg  FineAgg  Age  \
0   540.0    0.0     0.0  162.0               2.5     1040.0    676.0   28   
1   540.0    0.0     0.0  162.0               2.5     1055.0    676.0   28   
2   332.5  142.5     0.0  228.0               0.0      932.0    594.0  270   
3   332.5  142.5     0.0  228.0               0.0      932.0    594.0  365   
4   198.6  132.4     0.0  192.0               0.0      978.4    825.5  360   
5   266.0  114.0     0.0  228.0               0.0      932.0    670.0   90   
6   380.0   95.0     0.0  228.0               0.0      932.0    594.0  365   
7   380.0   95.0     0.0  228.0               0.0      932.0    594.0   28   
8   266.0  114.0     0.0  228.0               0.0      932.0    670.0   28   
9   475.0    0.0     0.0  228.0               0.0      932.0    594.0   28   

   Strength       WCR  TotalAgg  Age_x_WCR  Log_Age  
0     79.99  0.300000    1716.0     8.4000   3.3673  
1     61.89  0.300000    1731.0  

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

These engineered features enhance model interpretability and predictive power, while simplifying messy original column names for cleaner analysis.