# Predicting Concrete Compressive Strength

Concrete strength is a crucial factor in civil engineering. It's influenced by the mix proportions of its ingredients. The following are variables for this model:

* Dependent Variable (Y): Concrete Compressive Strength (MPa)

* Independent Variables (X):

    * Cement (kg/m³)

    * Blast Furnace Slag (kg/m³)

    * Fly Ash (kg/m³)

    * Water (kg/m³)

    * Superplasticizer (kg/m³)

    * Coarse Aggregate (kg/m³)

    * Fine Aggregate (kg/m³)

    * Age (days)

#### Sample Data</br>
Since actual concrete strength data can be extensive, a small, representative synthetic dataset is created for this simulation. In a real engineering scenario, this data would come from laboratory experiments or field measurements.

In [None]:
import pandas as pd
import numpy as np

# Synthetic Data for Concrete Compressive Strength
# (Actual datasets are much larger and more complex)
np.random.seed(42) # for reproducibility
data = {
    'Cement': np.random.uniform(100, 500, 20),
    'BlastFurnaceSlag': np.random.uniform(0, 300, 20),
    'FlyAsh': np.random.uniform(0, 200, 20),
    'Water': np.random.uniform(150, 250, 20),
    'Superplasticizer': np.random.uniform(0, 20, 20),
    'CoarseAggregate': np.random.uniform(800, 1200, 20),
    'FineAggregate': np.random.uniform(600, 900, 20),
    'Age': np.random.randint(7, 90, 20),
    # Compressive Strength - simplified linear relationship + noise
    'CompressiveStrength': (
        5 + 0.05 * np.random.uniform(100, 500, 20) + # Cement
        0.03 * np.random.uniform(0, 300, 20) +      # Slag
        0.02 * np.random.uniform(0, 200, 20) -      # Fly Ash (slight negative impact sometimes)
        0.1 * np.random.uniform(150, 250, 20) +     # Water (negative impact)
        0.2 * np.random.uniform(0, 20, 20) +        # Superplasticizer
        0.01 * np.random.uniform(800, 1200, 20) +   # Coarse Aggregate
        0.01 * np.random.uniform(600, 900, 20) +    # Fine Aggregate
        0.3 * np.random.randint(7, 90, 20) +        # Age
        np.random.normal(0, 5, 20) # Noise
    )
}
df = pd.DataFrame(data)

# Ensure strength is positive and somewhat realistic
df['CompressiveStrength'] = df['CompressiveStrength'].apply(lambda x: max(10, x)).round(2)

print("Sample Concrete Compressive Strength Data:")
print(df.head())

### Using scikit-learn for Predictive Modeling</br>
Use scikit-learn to build a model that can predict the compressive strength of concrete given its mix proportions and age. This is useful for quality control and mix design optimization.

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Define independent variables (X) and dependent variable (y)
X = df[['Cement', 'BlastFurnaceSlag', 'FlyAsh', 'Water',
        'Superplasticizer', 'CoarseAggregate', 'FineAggregate', 'Age']]
y = df['CompressiveStrength']

# Split data into training and testing sets (e.g., 80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create a Linear Regression model object
model_skl = LinearRegression()

# Train the model using the training data
model_skl.fit(X_train, y_train)

# Make predictions on the test set
y_pred_skl = model_skl.predict(X_test)

# Evaluate the model's performance
mse_skl = mean_squared_error(y_test, y_pred_skl)
rmse_skl = np.sqrt(mse_skl) # Root Mean Squared Error
r2_skl = r2_score(y_test, y_pred_skl)

print("\n--- Scikit-learn Results (for Prediction) ---")
print("Coefficients for each constituent (Cement to Age):")
for i, col in enumerate(X.columns):
    print(f"  {col}: {model_skl.coef_[i]:.4f}")
print(f"Intercept: {model_skl.intercept_:.2f}")
print(f"Mean Squared Error (MSE): {mse_skl:.2f}")
print(f"Root Mean Squared Error (RMSE): {rmse_skl:.2f}") # Easier to interpret than MSE
print(f"R-squared (R²): {r2_skl:.2f}")

# Example prediction: Predict strength for a new mix
new_mix = pd.DataFrame([[350, 50, 20, 180, 5, 1000, 700, 28]],
                       columns=X.columns)
predicted_strength = model_skl.predict(new_mix)
print(f"\nPredicted Compressive Strength for a new mix: {predicted_strength[0]:.2f} MPa")

#### Engineering Relevance of scikit-learn Output:

* Coefficients: An engineer can interpret these to understand the relative impact of each ingredient on strength. For example, a positive coefficient for 'Cement' means more cement generally leads to higher strength, while a negative coefficient for 'Water' means more water (assuming all else constant) tends to reduce strength.

* RMSE: Provides an intuitive measure of the typical error in predictions, in the same units as the dependent variable (MPa). An engineer wants to minimize this.

* R²: Indicates how well the model explains the variability in concrete strength. A high R² suggests the model's inputs are good predictors.

### Using statsmodels for Statistical Inference
statsmodels is crucial for civil engineers or material scientists who need to understand the statistical significance of each concrete constituent on its strength, or to validate hypotheses about mix design.

In [None]:
import statsmodels.api as sm

# Add a constant (intercept) term to the independent variables
X_sm = sm.add_constant(df[['Cement', 'BlastFurnaceSlag', 'FlyAsh', 'Water',
                           'Superplasticizer', 'CoarseAggregate', 'FineAggregate', 'Age']])
y_sm = df['CompressiveStrength']

# Create and fit the OLS (Ordinary Least Squares) model
model_sm = sm.OLS(y_sm, X_sm).fit()

# Print the comprehensive summary of the regression results
print("\n--- Statsmodels Results (for Statistical Inference) ---")
print(model_sm.summary())

#### Engineering Relevance of statsmodels Output:

* P-values (P>|t|): This is perhaps the most critical output for an engineer designing concrete mixes. A low p-value (e.g., < 0.05) for a specific ingredient indicates that its quantity has a statistically significant impact on concrete strength. For instance, if 'Water' has a very low p-value, it confirms that controlling the water content is statistically important for strength.

* Confidence Intervals ([0.025, 0.975]): Provide a range for the true effect of each ingredient. An engineer can use this to understand the variability and robustness of the relationships.

* F-statistic and Prob (F-statistic): These indicate if the overall model is statistically significant. A low p-value here means that the chosen mix of ingredients (as a whole) significantly predicts concrete strength.

* R-squared and Adjusted R-squared: Similar to scikit-learn, but often accompanied by more detailed statistical tests and diagnostics, which are valuable for a deeper understanding of model fit and assumptions.

This engineering example demonstrates how multiple linear regression can be used not just for prediction but also for gaining actionable insights into complex material behaviors, allowing engineers to optimize designs, control quality, and understand the fundamental relationships between input parameters and performance.