# Overview

This notebook is an optimized version of the notebook "start-transfers-default.ypynb". You have to consider reading this one to understand the following script.

In this first bloc, we fetch the data, train the model, run the prediction, detect outliers and calculate the distance of outliers. The following sections will demonstrate different kinds of anomalies detection among the outliers.

The main change is the interval_width parameter to reduce the model interval.

> AS A REMINDER: WE WANT TO DETECT THE ANOMALY ON FEBRUARY 16th AT 5:30 AM !!!

In [3]:
import pandas as pd
from pandas.core.frame import DataFrame
import plotly.graph_objects as go
import plotly.express as px
from prophet import Prophet

def categorized(anomalies):
    anomalies.loc[anomalies['distance'] < .25, 'category'] = 'warning'
    anomalies.loc[(anomalies['distance'] >= .25) & (anomalies['distance'] < .50), 'category'] = 'minor'
    anomalies.loc[(anomalies['distance'] >= .50) & (anomalies['distance'] < .75), 'category'] = 'major'
    anomalies.loc[anomalies['distance'] >= .75, 'category'] = 'critical'

def showAnomalies(forecasted, anomalies):
    #render the anomalies
    fig = go.Figure()
    #display model
    fig.add_trace(go.Scatter(
        x=forecasted.ds,
        y=forecasted.yhat,
        line_color='rgba(51, 81, 184, 0.8)',
        name="yhat"))
    fig.add_trace(go.Scatter(
        x=forecasted.ds,
        y=forecasted.yhat_upper,
        line_color='rgba(51, 81, 184, 0.5)',
        customdata=forecasted.yhat_upper - forecasted.yhat,
        hovertemplate='yhat_upper:%{y:.3f}<br>ref:%{customdata:.2f}',
        name="yhat_upper"))
    fig.add_trace(go.Scatter(
        x=forecasted.ds,
        y=forecasted.yhat_lower,
        customdata=forecasted.yhat - forecasted.yhat_lower,
        hovertemplate='yhat_upper:%{y:.3f}<br>ref:%{customdata:.2f}',
        fill='tonexty',
        line_color='rgba(51, 81, 184, 0.5)',
        name="yhat_lower"))
    #display fact
    fig.add_trace(go.Scatter(
        x=forecasted.ds,
        y=forecasted.fact,
        line_color='rgba(148, 242, 119, 1)',
        name="fact"))
    #display anomalies
    if(anomalies.size > 0):
        fig_anomalies = px.scatter(anomalies,
            x="ds",
            y="fact",
            color="category",
            hover_data={
                'distance':':.2f'},
            color_discrete_map={
                    "warning": "#008cd1",
                    "minor": "yellow",
                    "major": "orange",
                    "critical": "red"})
        fig.add_traces(fig_anomalies.data)
    fig.update_layout(hovermode='x unified')
    fig.show()

#forecasted for the model
df = pd.read_csv('../data/start-tranfers.csv')

#Configure prophet engine
prophet = Prophet(interval_width=0.55)
#train
model = prophet.fit(df)
#run
forecasted = model.predict(df)
forecasted['fact'] = df['y'].reset_index(drop = True)

#outlier detection
forecasted['anomaly'] = 0
forecasted.loc[forecasted['fact'] > forecasted['yhat_upper'], 'anomaly'] = 1
forecasted.loc[forecasted['fact'] < forecasted['yhat_lower'], 'anomaly'] = -1
#calculate the distance
forecasted.loc[forecasted['anomaly'] ==1, 'distance'] = \
    (forecasted['fact'] - forecasted['yhat_upper']) / (forecasted['yhat_upper'] - forecasted['yhat'])
forecasted.loc[forecasted['anomaly'] ==-1, 'distance'] = \
    (forecasted['yhat_lower'] - forecasted['fact']) / (forecasted['yhat'] - forecasted['yhat_lower'])



INFO:prophet:Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.
INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this.


Initial log joint probability = -20.831
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes 
      67       1604.12   9.79998e-05       67.6619   1.397e-06       0.001      129  LS failed, Hessian reset 
      99       1604.14   8.56352e-05       74.3953           1           1      170   
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes 
     161       1604.15   6.35551e-09        73.759     0.07329     0.07329      256   
Optimization terminated normally: 
  Convergence detected: absolute parameter change was below tolerance


# Detect anomaly based on the distance

We detect the anomalies for all outliers lower than the model and we score them depends on the distance. I filter only on critical anomalies.

Conclusion: I detect a lot an anomalies but that is not very relevent. I have a lot false negative and the main issue is detected at 10:00 am, so 4 hours after the first incident.

In [6]:
#filter the outliers that lower than the model
anomalies = forecasted[forecasted['anomaly'] ==-1]

categorized(anomalies)
anomalies = anomalies[anomalies['category'] == "critical"]
showAnomalies(forecasted, anomalies)





A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



# Detect anomaly based on the distance sum for adjacent ouliers

We detect the anomalies for all outliers lower than the model and we sum the distance for all adjacent anomalies.

Conclusion: we have again a lot false negatif but we detect the main anomalie at 8:30 am.

In [9]:
#calculate the adjacent outliers and sum the distance
forecasted['crossing'] = (forecasted.anomaly != forecasted.anomaly.shift()).cumsum()
forecasted['cumul_distance'] = forecasted.groupby(['crossing','anomaly', 'fact'])['distance'].cumsum()
forecasted.loc[forecasted['cumul_distance'] >= 1, 'distance'] = 1
forecasted.loc[forecasted['cumul_distance'] < 1, 'distance'] = forecasted['cumul_distance']

#filter the outliers that lower than the model
anomalies = forecasted[forecasted['anomaly'] ==-1]

categorized(anomalies)
anomalies = anomalies[anomalies['category'] == "critical"]
showAnomalies(forecasted, anomalies)



# Detect anomaly based on step multiplicator for adjacent ouliers

We detect the anomalies for all outliers lower than the model and we increase a multiplicator for the adjacents outliers.

Conclusion: We erase false positive because we don't consider no adjacent outlier and we detect the main incident at 9:45 am.

In [14]:
step = 0.1
#calculate the adjacent outliers and sum the distance
forecasted['crossing'] = (forecasted.anomaly != forecasted.anomaly.shift()).cumsum()
forecasted['score_multiplicator'] = forecasted.groupby(['crossing','anomaly']).cumcount() * step
forecasted.loc[forecasted['anomaly'] == 0,'score_multiplicator'] = 0
forecasted.loc[forecasted['anomaly'] != 0, 'distance'] = forecasted['distance'] * forecasted['score_multiplicator']

#filter the outliers that lower than the model
anomalies = forecasted[forecasted['anomaly'] ==-1]

categorized(anomalies)
anomalies = anomalies[(anomalies['category'] != 'warning')]
showAnomalies(forecasted, anomalies)