# Predicting Pipe Breaks with Data: A Civil Engineer’s AI Challenge

In 2022, the City of Syracuse reported **231 water main breaks** across its public water infrastructure network (City of Syracuse Open Data Portal, 2023). These breaks disrupt daily life, damage property, close roads, and strain limited municipal budgets. While each incident varies in severity and cost, the cumulative burden highlights the growing pressure on aging infrastructure systems nationwide.

This blog post outlines a data-driven effort to proactively predict water main failures using civil engineering insights and machine learning techniques—before the next costly break occurs.

## Understanding the Problem

Across the U.S., municipalities face the challenge of maintaining water distribution networks with limited resources. Traditionally, pipe replacement and maintenance efforts have relied heavily on straightforward heuristics like pipe age or material. While these variables are important, they rarely provide the nuance necessary to accurately assess risk across a citywide system.

This project proposes a risk-based predictive approach, integrating historical failure data and engineering metadata with machine learning classification models. The goal is not to simply react to failures—but to prioritize prevention where it matters most.

## Data Access Through FOIL

To support this initiative, a formal Freedom of Information Law (FOIL) request was submitted to the City of Syracuse. The request specifically seeks:

* Dates and locations of pipe failures
* Pipe characteristics (material, diameter, installation year)
* Depth of cover and soil/subgrade conditions
* Repair types and work orders

The city has acknowledged receipt of the request and noted that a response can typically be expected within **20 business days**, consistent with guidelines outlined by the New York State Committee on Open Government. This timeframe reflects standard FOIL protocol, which requires agencies to either fulfill the request or provide a written explanation of any delays within that period.

Once received, this data will form the backbone of the modeling effort.

## Planned Methodology

While the dataset is pending, the foundational methodology has been outlined. The process will follow these core steps:

* **Data Cleaning and Preprocessing**: Normalize and structure records, address missing values, and prepare geospatial data.
* **Exploratory Data Analysis (EDA)**: Visualize distributions, identify patterns in break frequency, pipe age, and other features.
* **Feature Engineering**: Derive new metrics such as pipe age, traffic zone intensity, and historical break density.
* **Model Development**: Evaluate classification models including logistic regression, random forest, and gradient boosting.
* **Validation and Testing**: Apply k-fold cross-validation and metrics such as precision, recall, and AUC.
* **Visualization**: Build an interactive dashboard to present risk scores by location and pipe attributes.

This process draws on standard machine learning workflows but tailors them to a high-stakes public infrastructure application.

## What Comes Next

This article is the first in a two-part series. Once the FOIL data is received, the follow-up post will present:

* Exploratory analysis of the received dataset
* Initial modeling results and performance benchmarks
* Visualizations that illustrate risk zones and prioritization recommendations

In parallel, a supplementary technical series will document the numerical methods used in model development—providing insight into foundational algorithms like Newton’s Method, LU decomposition, and curve fitting.

Stay tuned for updates, and follow along for a closer look at how data science and civil engineering can work together to solve problems beneath our feet.

---

## References

City of Syracuse Open Data Portal. (2023). *Water main break incidents – 2022*. Retrieved from [https://data.syr.gov/items/c8be66d9d53945edad5886e914418b68](https://data.syr.gov/items/c8be66d9d53945edad5886e914418b68)

New York State Committee on Open Government. (n.d.). *Time Limits for Responding to FOIL Requests*. Retrieved from [https://opengovernment.ny.gov/explanation-time-limits-response](https://opengovernment.ny.gov/explanation-time-limits-response)
