<a id='toc'></a>
# Table of Contents:
1. [Make Graph](#makegraph)
2. [Read in Yearly Prediction and Scale Back to Original Interval](#readscale)
3. [Exploratory Data Analysis](#eda) <br>
    3.1 [Data Wrangling](#wrangling) <br>
    3.2 [Calculating Per-Node Error](#node-error) <br>
    3.3 [Calculating Per-Pipe Error](#pipe-error) <br>
    3.4 [Leakage Labelset](#leaks) <br>
    3.5 [Dataset Pre-Processing](#pre-process) <br>
5. [Boem et al. Residual Analysis](#residual)

# Leak Detection

> Garðar Örn Garðarsson <br>
Integrated Machine Learning Systems 20-21 <br>
University College London

<a id='makegraph'></a>
*Back to [Table of Contents](#toc)*

## 1. Make Graph

Convert the `EPANET` model to a `networkx` graph

In [2]:
import os
import yaml
import time
import torch
import epynet
import numpy as np
import pandas as pd
import networkx as nx
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split

from utils.epanet_loader import get_nx_graph
from utils.epanet_simulator import epanetSimulator
from utils.data_loader import battledimLoader, dataCleaner, dataGenerator, embedSignalOnGraph, rescaleSignal
from modules.torch_gnn import ChebNet
from utils.visualisation import visualise

# Runtime configuration
path_to_wdn     = './data/L-TOWN.inp'
path_to_data    = './data/l-town-data/'
weight_mode     = 'pipe_length'
self_loops      = True
scaling         = 'minmax'
figsize         = (50,16)
print_out_rate  = 1               
model_name      = 'l-town-chebnet-' + weight_mode +'-' + scaling + '{}'.format('-self_loop' if self_loops else '')
last_model_path = './studies/models/' + model_name + '-1.pt'
last_log_path   = './studies/logs/'   + model_name + '-1.csv' 

# Import the .inp file using the EPYNET library
wdn = epynet.Network(path_to_wdn)

# Solve hydraulic model for a single timestep
wdn.solve()

# Convert the file using a custom function, based on:
# https://github.com/BME-SmartLab/GraphConvWat 
G , pos , head = get_nx_graph(wdn, weight_mode=weight_mode, get_head=True)

<a id='readscale'></a>
*Back to [Table of Contents](#toc)*

## 2. Read in Yearly Prediction and Scale Back to Original Interval

In [3]:
def read_prediction(filename='predictions.csv', scale=1, bias=0, start_date='2018-01-01 00:00:00'):
    df = pd.read_csv(filename, index_col='Unnamed: 0')
    df.columns = ['n{}'.format(int(node)+1) for node in df.columns]
    df = df*scale+bias
    df.index = pd.date_range(start=start_date,
                             periods=len(df),
                             freq = '5min')
    return df

## 2019 Detections:

In [41]:
faults_19 = pd.read_csv('InceptionTime_Predictions.csv', index_col='Unnamed: 0')

In [42]:
faults_19[faults_19 > 0.2] = 1      # Every label with predicted probability > 20% is classified as a leak
faults_19[faults_19 < 1  ] = 0      # Every label that is not set to 1 is is now set to 0
faults_19 = faults_19.astype('int') # Conversion to integer

In [43]:
faults_19 = faults_19.diff(periods=1).fillna((faults_19.iloc[0]).astype('int'))

In [44]:
detections_19 = {}

for pipe in faults_19:
    timestamp = faults_19[pipe].index[faults_19[pipe]>0]
    if timestamp.empty:
        continue
    else: 
        detections_19[pipe] = timestamp

In [45]:
with open('inceptionTime_results.txt', 'w') as f:
    f.write('#linkID, startTime\n')
    for key in detections_19.keys():
        for val in detections_19[key]: 
            f.write(key + ', ' + str(val)[:-3] + '\n')
    f.close()

In [46]:
results = []
for key in detections_19.keys():
    for val in detections_19[key]:
        entry =[]
        entry.append(str(val)[:-3])
        entry.append(key)
        results.append(entry)

results = pd.DataFrame(results)
results.set_index(0,drop=True, inplace=True)

In [26]:
results.to_csv('results_data.csv')

In [27]:
len(detections_19.keys())

14

124 for $\alpha=2$

691 for $\alpha=1.5$

783 for $\alpha=1.0$

In [28]:
detections = {}

In [29]:
detections = pd.read_csv('results_data.csv', index_col = '0').sort_index()

In [30]:
detections.index = pd.to_datetime(detections.index)

In [31]:
detections['leakTimeStamp'] = detections.index

In [32]:
detections = detections.resample('d').first()

In [33]:
detections = detections.replace(to_replace='None', value=np.nan).dropna()

In [34]:
detections.set_index('leakTimeStamp', drop=True, inplace=True)

In [35]:
with open('results_data.txt', 'w') as f:
    f.write('#linkID, startTime\n')
    for time, pipe in detections['1'].to_dict().items():
        f.write(pipe + ', ' + str(time)[:-3] + '\n')
    f.close()

In [36]:
detections.shape

(365, 1)

In [37]:
import numpy as np
import pandas as pd

In [38]:
gibberish_detections = np.random.randint(1,905,365)

In [39]:
dateRange = pd.date_range("2019-01-01 00:00", periods = 365, freq='D')

In [40]:
with open('random_results.txt', 'w') as f:
    f.write('#linkID, startTime\n')
    for pipe, time in zip(gibberish_detections, dateRange):
        f.write('p'+ str(pipe) + ', ' + str(time)[:-3] + '\n')
    f.close()

`(220, 1)` for $\alpha = 1.5$

`(317, 1)` for $\alpha = 1.0$