### Using the same data set of Civil_Engineering_Regression_Dataset.csv

Part 5: Model Interpretation & Conclusion
Summarize the key takeaways from your regression models.
How can construction companies use regression analysis to estimate costs more effectively?
What limitations did you encounter in this analysis?
If you were to improve this model, what additional variables might you consider?
How does regression analysis in civil engineering contribute to cost-effective planning?
Provide a conclusion on the role of data science in optimizing construction project costs.


In [2]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
from statsmodels.stats.outliers_influence import variance_inflation_factor
import statsmodels.api as sm

# Load dataset
df = pd.read_csv("Civil_Engineering_Regression_Dataset.csv").applymap(lambda x: x.strip() if isinstance(x, str) else x)

# Define variables
dep_var = "Construction_Cost"
indep_vars = ["Building_Height", "Material_Quality_Index", "Labor_Cost", "Concrete_Strength", "Foundation_Depth"]
X, y = df[indep_vars], df[dep_var]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Fit models
simple_model = LinearRegression().fit(X_train[["Building_Height"]], y_train)
model = LinearRegression().fit(X_train, y_train)

# Model evaluation
simple_r2, multiple_r2 = simple_model.score(X_test[["Building_Height"]], y_test), r2_score(y_test, model.predict(X_test))
mse = mean_squared_error(y_test, model.predict(X_test))
adj_r2 = 1 - ((1 - multiple_r2) * (len(y_test) - 1) / (len(y_test) - len(indep_vars) - 1))

# VIF Calculation
X_const = sm.add_constant(X)
vif = pd.DataFrame({"Feature": X_const.columns, "VIF": [variance_inflation_factor(X_const.values, i) for i in range(X_const.shape[1])]})

# Results
print(f"Simple R²: {simple_r2:.4f}, Multiple R²: {multiple_r2:.4f}, MSE: {mse:.4f}, Adjusted R²: {adj_r2:.4f}")
print("VIF:", vif)


  df = pd.read_csv("Civil_Engineering_Regression_Dataset.csv").applymap(lambda x: x.strip() if isinstance(x, str) else x)


Simple R²: 0.9251, Multiple R²: 0.9998, MSE: 113.5044, Adjusted R²: 0.9997
VIF:                   Feature        VIF
0                   const  36.217244
1         Building_Height   1.047164
2  Material_Quality_Index   1.048067
3              Labor_Cost   1.054086
4       Concrete_Strength   1.019701
5        Foundation_Depth   1.040594
