# Detection algorithm
## First thoughts

There are many variables missing for a best case detection mechanism, like outside temperature, backflow temperature, when day or nighttime shutdown is and so on. <br>
Furthermore, it is important to think about wether false positive or false negatives are more harmful, as no algorithm will (probably) be 100% accurate. <br><br>

I will start with a simple detection mechanism for domestic hot water, which can than be advanced into a detection model for forward flow temperatures at a later stage. <br><br>

For real time detection I would suggest to train a time series model like ARIMAX and compare expected values with actual values.


In [9]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import datetime 

In [10]:
df = pd.read_csv("combined_data.csv", parse_dates=["timestamp"])
df.set_index("timestamp", inplace=True)

df.head()

Unnamed: 0_level_0,avg,min_v,max_v,nighttime,time_diff_min,T_diff,type,ID
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2022-12-07 01:15:00+00:00,65.7,64.1,67.2,True,15,3.1,forward,S3C1
2022-12-07 01:15:00+00:00,65.2,63.5,66.7,True,15,3.2,forward,S3C1
2022-12-07 01:30:00+00:00,64.7,62.5,66.8,True,15,4.3,forward,S3C1
2022-12-07 01:30:00+00:00,65.3,63.1,67.2,True,15,4.1,forward,S3C1
2022-12-07 01:45:00+00:00,64.3,62.5,66.6,True,15,4.1,forward,S3C1


In [11]:
def find_anomaly_domestic(dataset):
    #creating a quick and dirty anomaly detector function that works only on domestic hot water data for now
    df_domestic = dataset[dataset['type'] == 'domestic'].copy()
    rolling_mean = df_domestic['avg'].rolling(window=24, min_periods=1).mean() # roling means
    rolling_std = df_domestic['avg'].rolling(window=24, min_periods=1).std()
    df_domestic['anomaly'] = np.abs(df_domestic['avg'] - rolling_mean) > 2 * rolling_std
    anomalies = {}
    in_anomaly = False
    period_count = 1
    for idx, row in df_domestic.iterrows():
        if row['anomaly'] and not in_anomaly:
            start = idx
            in_anomaly = True
        elif not row['anomaly'] and in_anomaly:
            end = prev_idx
            anomalies[f"Period {period_count}"] = [
                start.strftime('%A'), start.strftime('%H:%M'),
                end.strftime('%A'), end.strftime('%H:%M')
            ]
            period_count += 1
            in_anomaly = False
        prev_idx = idx
        
    if in_anomaly:
        end = prev_idx
        anomalies[f"Period {period_count}"] = [
            start.strftime('%A'), start.strftime('%H:%M'),
            end.strftime('%A'), end.strftime('%H:%M')
        ]
    return anomalies

In [12]:
find_anomaly_domestic(df)

{'Period 1': ['Wednesday', '09:30', 'Wednesday', '09:30'],
 'Period 2': ['Wednesday', '19:00', 'Wednesday', '19:00'],
 'Period 3': ['Friday', '08:45', 'Friday', '08:45'],
 'Period 4': ['Friday', '13:15', 'Friday', '13:15'],
 'Period 5': ['Friday', '16:15', 'Friday', '16:15'],
 'Period 6': ['Friday', '21:45', 'Friday', '21:45'],
 'Period 7': ['Saturday', '07:15', 'Saturday', '07:15'],
 'Period 8': ['Saturday', '15:30', 'Saturday', '15:45'],
 'Period 9': ['Saturday', '19:45', 'Saturday', '19:45'],
 'Period 10': ['Sunday', '05:45', 'Sunday', '05:45'],
 'Period 11': ['Monday', '07:00', 'Monday', '07:00'],
 'Period 12': ['Monday', '13:15', 'Monday', '13:15'],
 'Period 13': ['Monday', '18:15', 'Monday', '18:15'],
 'Period 14': ['Tuesday', '04:30', 'Tuesday', '04:30'],
 'Period 15': ['Tuesday', '13:45', 'Tuesday', '13:45'],
 'Period 16': ['Wednesday', '05:30', 'Wednesday', '05:45'],
 'Period 17': ['Wednesday', '15:30', 'Wednesday', '15:30'],
 'Period 18': ['Thursday', '11:30', 'Thursday', '11