# Use-Case: Machine Learning based Leakage Detection

Anomaly detection such as leakage detection is a classic but often non-trivial task in WDN operation. With traditional (model-based) methods reaching their limits, Machine Learning offers promising solutions.

#### Outline 
This notebook demonstrates how EPyT-Flow can be utilized to create a scenario containing several leakages that have to be detected.
Here, we use a simple Machine Learning based leakage detector that is already included in EPyT-Flow.
It consists of the following steps:
1. Create a new (realistic) scenario.
2. Add some leakages to the scenario.
3. Create a simple Machine Learning based leakage detector.
4. Evaluate the leakage detector.

In [None]:
%pip install epyt-flow

In [None]:
import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning)
warnings.filterwarnings("ignore", category=ImportWarning)

import numpy as np
import matplotlib.pyplot as plt

from epyt_flow.data.networks import load_ltown
from epyt_flow.simulation import ScenarioSimulator
from epyt_flow.simulation.events import AbruptLeakage, IncipientLeakage
from epyt_flow.utils import to_seconds, time_points_to_one_hot_encoding
from epyt_flow.models import SensorInterpolationDetector

### 1. Create new Scenario

Create a new scenario based on the [L-Town network](https://epyt-flow.readthedocs.io/en/stable/epyt_flow.data.html#epyt_flow.data.networks.load_ltown) with a default sensor configuration and realistic demand patterns from the [BattLeDIM challenge](https://battledim.ucy.ac.cy/):

In [None]:
config = load_ltown(use_realistic_demands=True,
                    include_default_sensor_placement=True,
                    verbose=True)

scenario = ScenarioSimulator(scenario_config=config)

Set simulation duration to 2 weeks and use 5min time intervals for the hydraulics:

In [None]:
params = {"simulation_duration": to_seconds(days=14),
          "hydraulic_time_step": to_seconds(minutes=5),
          "reporting_time_step": to_seconds(minutes=5)}
scenario.set_general_parameters(**params)

### 2. Add Leakages to the Scenario

In this example, we build a scenario with two leakages: A small abrupt leakage and a large incipient leakage in the second week:

In [None]:
leak1 = AbruptLeakage(link_id="p673", diameter=0.002,
                      start_time=to_seconds(days=7),
                      end_time=to_seconds(days=8))
scenario.add_leakage(leak1)

leak2 = IncipientLeakage(link_id="p100", diameter=0.1,
                         start_time=to_seconds(days=11),
                         end_time=to_seconds(days=13),
                         peak_time=to_seconds(days=12))
scenario.add_leakage(leak2)

Run the complete simulation:

In [None]:
scada_data = scenario.run_simulation(verbose=True)

### 3. Machine Learning based Leakage Detection

Prepare the simulation results for calibrating (i.e. creating) a Machine Learning based leakage detection method:

- Create a feature vector (pressure and flow readings at the sensors).
- Create ground-truth labels utilizing the [`time_points_to_one_hot_encoding()`](https://epyt-flow.readthedocs.io/en/stable/epyt_flow.html#epyt_flow.utils.time_points_to_one_hot_encoding) helper function.

In [None]:
# Concatenate pressure and flow readings into a single feature vector
X = np.concatenate((scada_data.get_data_pressures(), scada_data.get_data_flows()), axis=1)

# Build ground-truth labels -- i.e. indicator of events
events_times = [int(t / params["hydraulic_time_step"])
                for t in scenario.get_events_active_time_points()]
y = time_points_to_one_hot_encoding(events_times, total_length=X.shape[0])

Split the data into train and test set -- here, training data is the fault-free first week of the simulation and the second week (containing the leakages) is the test data:

In [None]:
split_point = 2000
X_train, y_train = X[:split_point, :], y[:split_point]
X_test, y_test = X[split_point:, :], y[split_point:]

#### Machine Learning based Event Detector


As a classic baseline, EPyT-Flow already implements a residual-based interpolation detection method called [`SensorInterpolationDetector`](https://epyt-flow.readthedocs.io/en/stable/epyt_flow.models.html#epyt_flow.models.sensor_interpolation_detector.SensorInterpolationDetector).

This method tries to predict the readings of a given sensor based on all other sensors: $f: \vec{x}_t\setminus\{i\} \mapsto (\vec{x}_t)_i$, where $\vec{x}_t$ refers to these sensor ratings at time $t$, and $\vec{x}_t\setminus\{i\}$ denotes these sensor readings without the $i$-th sensor.
An alarm is raised (i.e. event detected) whenever the prediction and the observation of at least one sensor differ significantly:
$$
   \exists i:\; |f(\vec{x}_t\setminus\{i\}) - (\vec{x}_t)_i| > \theta_i
$$
where $\theta_i > 0$ denotes a sensor-specific threshold at which the difference is considered as significant.
For this, the detection method has to be calibrated (i.e. fitted) to a time window of (ideally event-free) sensor readings to determine a suitable threshold $\theta$ that does not raise an alarm when the network is in normal operation (i.e. no events present).

We use this event detector to detect leakages in our generated scenario.
We create and calibrate (i.e. fit) the leakage detector to the first week of simulated data:

In [None]:
detector = SensorInterpolationDetector()
detector.fit(X_train)

Apply the detector to the test data (i.e. second week of simulated data):

In [None]:
suspicious_time_points = detector.apply(X_test)
y_test_pred = time_points_to_one_hot_encoding(suspicious_time_points, X_test.shape[0])

### 4. Evaluation

In order to evaluate the performance of the leakage detector, we could either compute the [confusion matrix](https://en.wikipedia.org/wiki/Confusion_matrix) or plot the raised alarms together with the ground truth labels.

Here, we plot event (i.e. leakage) presence over time together with the raised alarms by the detector:

In [None]:
plt.figure()
plt.plot(list(range(len(y_test))), y_test, color="red", label="Ground truth")
plt.bar(list(range(len(y_test_pred))), y_test_pred, label="Raised alarm")
plt.legend()
plt.ylabel("Leakage indicator")
plt.yticks([0, 1], ["Inactive", "Active"])
plt.xlabel("Time (5min steps)")
plt.show()

We observe that the small abrupt leakage is not detected, while the large incipient leakage is detected -- only a single false alarm is raised.

**Note:** More advanced algorithms & methods are likely to show a better detection performance.

### Close the Simulation

Do not forget to close the simulation:

In [None]:
scenario.close()