# Anomaly Detection

# Test with streaming data

In the training stage we fitted an Isolation Forest on our data and we saved the model in an `model.sav` file under the /data folder in Orchest. As mendtioned in the [Orchest docs](https://docs.orchest.io/en/latest/index.html): the /data directory is accessible by all pipelines and the jobs will only create a snapshot of the project directory and not the data directory. The smaller the size of the project directory, the smaller the size of the jobs.

Here we will get the new data from Clarify and use the predict method from the model in oder to see if a particular sample is an outlier or not.

We will also plot similar graphs as in the Training stage notebook.

In [None]:
from plotly.graph_objs import Layout, XAxis, YAxis
import plotly.graph_objects as go
import plotly.express as px

from sklearn.ensemble import IsolationForest
import pandas as pd
import numpy as np

import orchest
import pickle
import json
import os

In [None]:
orchest_data = orchest.get_inputs()
response = orchest_data["response"]
df = response[0]
item_name = response[1]
hours = response[2]
item_id = response[3]
try:
    dates = df.index
    values = sum(df.values.tolist(), [])
    print("Data received: ")
except Exception as e:
    print(e)

In [None]:
df = pd.DataFrame(data = {'date': dates, 'x': values})
df.head()

In [None]:
fig = go.Figure(data=go.Scatter(x = df['date'], y = df['x'], mode='markers', marker=dict(color='blue', size=5)))
fig.update_layout(title = "New Data points")
fig.update_xaxes(rangeslider_visible=True)
fig.show()

In [None]:
print("Load the model...")
file = '../data/model.sav'
model = pickle.load(open(file, 'rb'))

In [None]:
scores = model.decision_function(df[['x']])
l = len(scores)

fig = go.Figure(data=go.Scatter(x = np.arange(0,l), y = scores))
fig.update_layout(title = "Score values")
fig.show()

We can plot the score values from the new data set. Negative scores represent outliers, positive scores represent normal points. 

In [None]:
pred = model.predict(df[['x']])
df['anomaly'] = pd.Series(pred)
anomaly = df.loc[df['anomaly'] == 1, ['date', 'x']] 
print("The parametres of our model are: ", model.get_params())
print("Note that in the training stage we set there parameters.")

In [None]:
fig = go.Figure(data=go.Scatter(x = np.arange(0,len(pred)), y = pred))
fig.update_layout(title = "predict values")
fig.show()

The `predict` method returns the values 1 or -1. With -1 are marked the data points which corresponde to a negavive score value and with 1 are marked the data points which corresponde to a positive score value.

In [None]:
layout = Layout(
    paper_bgcolor='rgba(0,0,0,0)',
    plot_bgcolor='rgba(0,0,0,0)', 
    xaxis=XAxis(showgrid=True, zeroline=True, showline=True, zerolinecolor='#DBDBDB', zerolinewidth=2, gridcolor='#DBDBDB', gridwidth=2,  linecolor='#AFAFAF', linewidth=2),
    yaxis=YAxis(showgrid=True, zeroline=True, showline=True, zerolinecolor='#DBDBDB', zerolinewidth=2, gridcolor='#DBDBDB', gridwidth=2, linecolor='#AFAFAF', linewidth=2)
)

fig = go.Figure(data=go.Scatter(x = df['date'], y = df['x'], mode='lines', name = "Normal values", marker=dict(color='blue', size=2)), layout = layout)
fig.add_traces(go.Scatter(x = anomaly['date'], y = anomaly['x'], textposition='top left', mode='markers+text', name = "Anomaly value", marker=dict(color='red', size=6)))
fig.update_layout(title = "New Data points")
fig.show()

In [None]:
if not anomaly.empty:
    print("Anomaly points exist")
    fig.write_image("../data/plot.pdf")
else:
    print("No anomaly points")

In [None]:
# Pass anomaly data to the next step.
orchest.output((anomaly, item_name, hours, item_id), name = "anomaly")
print("Done!")