Water Consumption Forecasting Using Digital Twins and AI/ML

Digital Twin in Water Industries

Overview

This project leverages Digital Twin technology, advanced Machine Learning (ML), and Artificial Intelligence (AI) models to accurately forecast water consumption across multiple rural villages in Spain. We aim to support effective resource management, optimize infrastructure planning, and enhance sustainability by integrating real-time and historical data Digital Twins collects.

Digital Twin System

Our Digital Twin setup collects data from multiple sources:

Water Meters: Capture regular water usage data.
Meteorological Stations: Provide weather data, helping identify correlations with water consumption.
Programmable Logic Controllers (PLCs): Monitor and control the water distribution system for improved operational efficiency.

This data is aggregated and processed through ML and AI models to forecast water consumption and detect anomalies. By identifying unusual patterns early, we enable timely responses to leaks, reduce waste, and support efficient water usage.

Importance of Water Consumption Forecasting

Accurate water forecasting is crucial for managing finite water resources, supporting infrastructure planning, and enhancing sustainability. The benefits of a robust forecasting model include:

Improved Resource Management: Predictions allow for optimized supply, minimizing waste and cost.
Better Infrastructure Decisions: Insights from forecasts support maintenance and expansion prioritization.
Enhanced Sustainability: By aligning distribution with demand, we conserve energy and reduce environmental impacts.

Forecasting Models and Methodologies

The project evaluates several models for water consumption forecasting over 6-month and 18-month horizons, comparing traditional and advanced ML methods:

Prophet Model: A time series forecasting model designed for seasonality, holidays, and custom regressors, tailored for periodic consumption changes.
XGBoost and LightGBM: Efficient and accurate boosted tree algorithms, further improved with custom feature engineering.
LSTM Neural Networks: Long Short-Term Memory networks, ideal for capturing long-sequence dependencies essential in forecasting water trends.

Feature Engineering

Advanced feature engineering techniques were applied to enhance predictive accuracy:

Lag Features: Integrate past consumption data to capture temporal patterns.
Rolling Statistics: Use rolling means, standard deviations, and max values to smooth out short-term fluctuations.
Domain-Specific Variables: Incorporate factors like maximum daily temperature and day of the week, significantly impacting water usage.

Hyperparameter Tuning

We fine-tuned each model to optimize performance:

Prophet Model: Tuned seasonality, added holiday effects, and custom regressors.
LSTM Model: Adjusted dropout rates, learning rates, sequence length, and layer units through grid and randomized search.
Stacking Ensemble Methods: Combined outputs from XGBoost and LightGBM models to increase robustness by capturing different aspects of the data.

Digital Twin Diagram

Below is a preview of the Digital Twin system diagram:

Project Structure

Data Sources:
- Historical Water Consumption Data: Collected from sensors and databases over time.
- Real-time Water Consumption Data: Continuously monitored and collected daily. This data is captured every 8 hours from water meters in the village.
- Meteorological Data: Collected from meteorological stations to improve prediction accuracy. We evaluated the relationship between different parameters and water consumption using the Pearson correlation method. Our analysis revealed that maximum temperature positively correlates with water consumption. The results of this correlation analysis are presented in the following figure.
Main Features:
- Water Consumption Prediction: AI and ML models, including LSTM and Prophet, are used for forecasting daily water usage.
- Leakage Detection: Early detection of water leakages by analyzing consumption patterns.
- Energy Consumption and CO2 Footprint: Monitoring the energy impact of water distribution and associated CO2 emissions (These parameters result from maintaining the water distribution network). For example, each operator has several tasks with a variety of time, location, priority, and other metrics, and the scheduling with preemption (Urgent tasks) is an NP-hard problem.
Pre-Processing and AI/ML Pipeline: The project follows a well-defined data pipeline that includes:
- Pre-processing Stage: Data cleaning, normalization, and preparation for analysis.
- Analysis Stage: Applying models like LSTM and Prophet for time series forecasting.
- Post-Processing Stage: Interpret the model outputs to provide actionable insights regarding water consumption and leakage detection.

Project Workflow

Data Input: Collecting historical and real-time water consumption and meteorological data.
Pre-Processing: Cleaning and preparing the data for machine learning models.
AI/ML Processing: Applying LSTM and Prophet models to predict future water consumption and detect anomalies such as leakages.
Output: Predictions and analytics related to water usage, energy consumption, and environmental impact.

Models Used

No	Main Method	Algorithm Name	Differences/Parameters
1	Prophet	Prophet Basic	Basic model, no additional seasonality or regressors
		Prophet + Seasonality	Includes seasonality components (e.g., yearly or weekly)
		Advanced Prophet	Includes advanced features like holidays added regressors
		Prophet Adv. Engineering	Custom feature engineering (lag, rolling means, etc.)
2	LSTM	LSTM Basic	Vanilla LSTM, no additional tuning or feature engineering
		LSTM Hyperparameter Tuning	LSTM with tuned hyperparameters (e.g., learning rate, units)
		LSTM + GRU Hybrid	Combination of LSTM and GRU layers for better generalization
		LSTM Rolling Mean Features	LSTM with rolling mean features for smoother predictions
3	XGBoost	XGBoost Basic	Basic XGBoost model without additional feature engineering
		XGBoost with Feature Engineering	XGBoost with advanced feature engineering (lag, etc.)
4	LightGBM	LightGBM Basic	Basic LightGBM model, no feature engineering
		LightGBM with Feature Engineering	LightGBM with engineered features (e.g., lags, moving averages)
5	Stacking	Stacking XGBoost + LightGBM	Ensemble of XGBoost and LightGBM, stacking the models

Model Evaluation Metrics

Each model was evaluated on the following metrics to identify top performers across 6-month and 18-month forecasting periods:

Mean Absolute Error (MAE): Average error magnitude in predictions.
Root Mean Squared Error (RMSE): Provides a higher penalty for large prediction errors.
Mean Absolute Percentage Error (MAPE): Standardized error measurement as a percentage for cross-model comparison.

Based on these metrics, the best models for each forecasting period are presented in the following sections.

No	Model	6 M MAE	6 M RMSE	6 M MAPE	18 M MAE	18 M RMSE	18 M MAPE
1	LightGBM	5.90	8.25	19.64%	11.77	18.31	24.98%
2	LSTM Hyper. Tuning Plus	5.96	9.38	18.64%	12.63	20.66	25.61%
3	Prophet Adv. Engineering	6.21	8.75	20.61%	10.12	17.02	21.43%
4	Advanced Prophet	6.24	8.78	20.77%	11.14	18.02	22.34%
5	LSTM Rolling Mean Features	7.94	10.82	27.59%	12.33	20.57	24.67%

These results underscore the effectiveness of combining feature engineering, hyperparameter tuning, and advanced machine learning techniques to improve water consumption forecasting accuracy. The models also demonstrated robust performance in the 18-month forecasts, showcasing their versatility across different forecasting horizons.

No	Model	6 M MAE	6 M RMSE	6 M MAPE	18 M MAE	18 M RMSE	18 M MAPE
1	Prophet Basic	10.37	13.66	22.45%	19.70	28.68	22.45%
2	Prophet + Seasonality	12.39	14.91	22.45%	24.25	35.02	22.45%
3	Advanced Prophet	6.24	8.78	20.77%	11.14	18.02	22.34%
4	Prophet Adv. Engineering	6.21	8.75	20.61%	10.12	17.02	21.43%
5	XGBoost	7.02	8.74	24.93%	12.34	18.50	27.49%
6	LightGBM	5.90	8.25	19.64%	11.77	18.31	24.98%
7	Stacking XGBoost + LightGBM	6.57	8.70	22.45%	12.48	18.94	27.62%
8	LSTM Network	7.31	10.74	22.43%	16.03	22.16	39.95%
9	LSTM Hyperparameter Tuning	6.51	9.61	21.34%	12.72	20.61	26.47%
10	LSTM Hyper. Tuning Plus	5.96	9.38	18.64%	12.63	20.66	25.61%
11	LSTM Hyper. Tuning Changed Params	6.52	9.63	21.38%	13.33	20.60	29.21%
12	LSTM + GRU Hybrid	8.18	10.10	30.10%	14.64	22.32	34.06%
13	LSTM Rolling Mean Features	7.94	10.82	27.59%	12.33	20.57	24.67%

Prophet model's results:

Figure 4: Prophet Model with Advanced Feature Engineering 6 months forecasting

Figure 5: Prophet Model with Advanced Feature Engineering 18 months forecasting

Water Distribution System (WDS) Maintenance Optimization

Overview

In rural Water Distribution Networks (WDNs), operators face the challenge of efficiently routing and scheduling maintenance tasks across vast areas with varied priorities and dependencies. These tasks often rely on operators’ judgment, which can lead to inefficiencies, especially when handling simultaneous tasks with differing priorities. This project addresses these challenges by developing a systematic approach to optimize routing and scheduling, thereby reducing operational costs, such as travel time, fuel consumption, and CO₂ emissions.

Problem Description

The maintenance scheduling in WDNs is modeled as a complex Single Machine Scheduling problem with preemptive tasks, variable release times, and task dependencies. This problem is NP-hard and involves several constraints, including:

Task Prioritization: Tasks are assigned different priorities, requiring high-priority tasks to be addressed promptly.
Emergency Tasks: High-priority emergency tasks can arrive at any time and need to be incorporated into the existing schedule, often with penalties for delays.
Task Dependencies: Some tasks are dependent on the completion of others, which must be respected to avoid operational conflicts.

The objective is to develop an optimized schedule that minimizes completion time, fuel consumption, CO₂ emissions, and task delays, while ensuring efficient handling of tasks and respecting dependencies and preemptions.

Mathematical Model and Objectives

To solve this problem, we formulated a Constraint Programming (CP) model that accounts for deterministic parameters, aiming to:

Minimize Total Completion Time $(C_{max})$
Minimize Total Fuel Consumption $(F_{total})$
Minimize Total CO₂ Emissions $(C_{total})$
Minimize Delays and Penalties for high-priority tasks $(D_{total})$

Model Components

The model includes various sets, indices, parameters, decision variables, and constraints:

Sets and Indices:
- $T$ : Set of all tasks
- $D$ : Set of task dependencies, where a dependency $(i, j)$ indicates task $j$ must follow task $i$
Parameters:
- $p_{i}$ : Processing time for each task $i$
- $d_{i j}$ : Travel time between tasks $i$ and $j$
- $f_{i}$ : Fuel consumption for each task $i$
- $c_{i}$ : CO₂ emissions for each task $i$
- $r_{i}$ : Release time for task $i$ (emergency tasks have $r_{i} \geq 0$ )
Decision Variables:
- Task Scheduling Variables: Define start and end times for each task or segment.
- Sequencing Variables: Determine the order in which tasks are performed.
- Auxiliary Variables: Help manage task preemptions and dependencies.

Objective Function

The overall objective function is a weighted sum of the various components we aim to minimize:

$min Z = w_{t} \times (C_{max} - S) + w_{f} \times F_{total} + w_{c} \times C_{total} + w_{d} \times D_{total}$

where:

$w_{t}, w_{f}, w_{c}, w_{d}$ are the weights for completion time, fuel, CO₂ emissions, and delays.
$C_{max}$ : Completion time of the last task.
$F_{total}$ : Total fuel consumed.
$C_{total}$ : Total CO₂ emissions.
$D_{total}$ : Total delays for emergency tasks.

Constraints

To ensure the model functions effectively within operational limits, we included several constraints:

Processing Time: Total scheduled processing time for each task matches its required time.
Precedence for Dependencies: If task $j$ depends on task $i$ , task $j$ cannot start until task $i$ completes.
Non-Overlap Constraint: Ensures no overlap in tasks on a single machine.
Work Hours: Tasks are scheduled within defined working hours.
Emergency Task Release Times: Emergency tasks cannot start before their release time.
Limit on Preemptions: Each task has a maximum number of preemptions.

Optimization Approach

Given the NP-hard nature of the problem, Constraint Programming (CP) was chosen for its effectiveness in solving complex scheduling problems with various dependencies and constraints. We used a CP solver (e.g., Google OR-Tools) to identify the optimal scheduling arrangement.

Performance Comparison

The optimized model was compared against conventional operator methods, demonstrating significant improvements:

Metric	Conventional Method	Proposed Model	Improvement (%)
Total Completion Time	180.58 hours	155.24 hours	14%
Delays and Penalties	17.5 hours	13.15 hours	25%
CO₂ Emissions	660.8 kg	545.7 kg	17%
Fuel Consumption	85.58 Litres	71.98 Litres	16%
Efficiency and Utilization	86.17%	92.23%	7%

These improvements illustrate the effectiveness of our model in reducing operational costs and environmental impact, while ensuring timely and efficient task completion.

Visual Analysis

Below are visual representations of key metrics analyzed in the model:

Technologies

Digital Twins: Simulating the water consumption of villages in Spain, providing a virtual representation for better decision-making and forecasting.
AI/ML Techniques: Leveraging advanced machine learning algorithms to generate accurate forecasts and detect anomalies in water usage.

How to Run

Set up the Environment:
- Install necessary dependencies by running:
```
pip install -r requirements.txt
```
Run the Jupyter Notebook:
- The project's core is in the Jupyter notebook Water_Consumption_Forecasting.ipynb. Open it to explore the data and model workflow:
```
jupyter notebook Water_Consumption_Forecasting.ipynb
```

Future Work

Expand the data sources to include additional villages.
Improve the models' accuracy by incorporating more environmental and socioeconomic data.
Integrate real-time alerts for water leakage and unusual consumption patterns.

Contributors

Hubert Homaei, Oscar Mogollon.

How to cite to this research:

📌 APA (7th Edition) arXiv version:

Homaei, M., Di Bartolo, A. J., Ávila, M., Mogollón-Gutiérrez, Ó., & Caro, A. (2024). Digital transformation in the water distribution system based on the digital twins concept. arXiv. https://doi.org/10.48550/arXiv.2412.06694

MDPI Preprints version:

Homaei, M., Di Bartolo, A. J., Ávila, M., Mogollón-Gutiérrez, Ó., & Caro, A. (2024). Digital transformation in the water distribution system based on the digital twins concept. MDPI Preprints. https://doi.org/10.20944/preprints202412.0756.v1

📌 IEEE arXiv version:

[1] M. Homaei, A. J. Di Bartolo, M. Ávila, Ó. Mogollón-Gutiérrez, and A. Caro, “Digital transformation in the water distribution system based on the digital twins concept,” arXiv, 2024. [Online]. Available: https://doi.org/10.48550/arXiv.2412.06694.

MDPI Preprints version:

[2] M. Homaei, A. J. Di Bartolo, M. Ávila, Ó. Mogollón-Gutiérrez, and A. Caro, “Digital transformation in the water distribution system based on the digital twins concept,” MDPI Preprints, Dec. 2024. [Online]. Available: https://doi.org/10.20944/preprints202412.0756.v1.

📌 Chicago (Author-Date) arXiv version:

Homaei, MohammadHossein, Agustín Javier Di Bartolo, Mar Ávila, Óscar Mogollón-Gutiérrez, and Andrés Caro. 2024. “Digital Transformation in the Water Distribution System Based on the Digital Twins Concept.” arXiv. https://doi.org/10.48550/arXiv.2412.06694.

MDPI Preprints version:

Homaei, MohammadHossein, Agustín Javier Di Bartolo, Mar Ávila, Óscar Mogollón-Gutiérrez, and Andrés Caro. 2024. “Digital Transformation in the Water Distribution System Based on the Digital Twins Concept.” MDPI Preprints, December. https://doi.org/10.20944/preprints202412.0756.v1.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
src		src
Digital_Twins.png		Digital_Twins.png
LICENSE		LICENSE
README.md		README.md
Requirements.txt		Requirements.txt
Water_Consumption_Forcasting.ipynb		Water_Consumption_Forcasting.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Water Consumption Forecasting Using Digital Twins and AI/ML

Digital Twin in Water Industries

Overview

Digital Twin System

Importance of Water Consumption Forecasting

Forecasting Models and Methodologies

Feature Engineering

Hyperparameter Tuning

Digital Twin Diagram

Project Structure

Project Workflow

Models Used

Model Evaluation Metrics

Water Distribution System (WDS) Maintenance Optimization

Overview

Problem Description

Mathematical Model and Objectives

Model Components

Objective Function

Constraints

Optimization Approach

Performance Comparison

Visual Analysis

Technologies

How to Run

Future Work

Contributors

How to cite to this research:

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

Homaei/DigitalTwin-Water-ML

Folders and files

Latest commit

History

Repository files navigation

Water Consumption Forecasting Using Digital Twins and AI/ML

Digital Twin in Water Industries

Overview

Digital Twin System

Importance of Water Consumption Forecasting

Forecasting Models and Methodologies

Feature Engineering

Hyperparameter Tuning

Digital Twin Diagram

Project Structure

Project Workflow

Models Used

Model Evaluation Metrics

Water Distribution System (WDS) Maintenance Optimization

Overview

Problem Description

Mathematical Model and Objectives

Model Components

Objective Function

Constraints

Optimization Approach

Performance Comparison

Visual Analysis

Technologies

How to Run

Future Work

Contributors

How to cite to this research:

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages