Built a regression-based model to predict bridge condition ratings for the Texas Department of Transportation using predictors including age, average usage, truck traffic, material, and design.
Performed feature engineering (e.g., deriving age), variable transformation, correlation analysis, and residual evaluation to assess model effectiveness and predictor impact.
To understand how well the proposed variables can predict the condition of a bridge, and which variables have the greatest influence on it.
- File:
tx19_bridges_sample.csv - Source: National Bridge Inventory
- Columns:
Age: Derived fromYear(current year minus year built)AverageDaily: Average daily traffic volumeTrucks_percent: Percent of traffic made up of trucksMaterial: Primary bridge materialDesign: Structural design of the bridgeCurrent_Condition: Derived by summingDeck_rating,Superstr_rating, andSubstr_rating
👉 Bridge-condition-regression.ipynb
- Loaded and cleaned the dataset, removed null values and outliers
- Created the
Agevariable, renamed columns for clarity - Explored data distributions with histograms and box plots
- Simplified categorical variables (
Material,Design) - Converted condition ratings (ordinal) into continuous scale
- Derived the target variable:
Current_Condition
- Analyzed correlation matrix and visualized with heatmap
- Created scatter plots and pair plots for continuous variables
- Used kernel density plots to compare group distributions
- Performed crosstabs and normalization for categorical variables
- Plotted conditional probability distributions and heatmaps
- Applied regression using both continuous and categorical predictors
- Evaluated model performance using R² = 0.456
- Analyzed residual distributions to assess model fit
- Compared coefficients and interpreted variable influence
- Summarized key insights and project conclusions
The regression model explains approximately 46% of the variance in bridge condition (R² = 0.456).
This shows that the selected variables can predict condition fairly well, though improvements are possible.
| Predictor | Coefficient (β) | Interpretation |
|---|---|---|
| Age | -0.0485 | Each 1-year increase reduces condition by ~4.8% |
| Average Use | -7.97×10⁻⁷ | Negligible effect |
| Percent Trucks | +0.0047 | Minimal impact |
| Material: Other | +0.3539 | 35% better than concrete |
| Material: Steel | -1.376 | 1.3% worse than concrete |
| Material: Timber | -3.197 | 3.3% worse than concrete |
| Design: Other | +0.011 | Almost no difference from beam design |
| Design: Slab | -0.082 | Slightly worse than beam design |
- Age is the most influential predictor — older bridges tend to be in worse condition.
- Traffic volume and truck percentage show minimal influence.
- Material type significantly impacts condition:
- Timber and steel bridges are in worse condition than concrete ones.
- Design type contributes little to predictive power.
- Structural age and material are the strongest predictors of bridge condition.
- While the model doesn't capture all variance, it provides actionable insights for infrastructure risk assessment.
- Potential improvements:
- Include more variables (e.g., regional climate, maintenance history)
- Try non-linear models (e.g., Random Forest, Gradient Boosting)
- Segment analysis by region or bridge type
- Python
- Pandas, NumPy
- Matplotlib, Seaborn
- scikit-learn