# Anomaly Detection with Apache Druid and Python

**Outline:** The user would like to analyze the orders placed on his ecommerce website over the last 5 hours. Using Python, the user queries the data in Druid and exports the time series to a pandas data frame. The user then proceeds to calculate some basic descriptive statistics using pandas, such as the number of orders, the average order value, the minimum and maximum order values, etc. After that, the user uses Plotly to generate an interactive plot of the time series. The plot indicates a seasonal peak every 2 hours, as well as a number of potential anomalies. The user decides to fit a Prophet model to the time series in order to more systematically identify the anomalies that require further investigation.  

## 1. Load the data from Druid

In [None]:
from pydruid.client import *
import pandas as pd
import numpy as np
import plotly.graph_objects as go
from fbprophet import Prophet
import requests

In [None]:
url = 'http://54.78.73.75:8888/druid/v2/?pretty'


# define the Druid query
query = {'queryType': 'scan', 
         'dataSource': '1_1_OrdersNew', 
         'intervals': ['2020-06-01T08:00:00.000Z/2020-06-01T10:00:00.000Z'],
         'granularity': 'all'}

# run the Druid query
results = json.dumps(requests.post(url, headers={'Content-Type': 'application/json'}, json=query).json()[1]['events'])

# organize the results of the Druid 
# query in a pandas data frame
df = pd.read_json(results, orient='records')
df = df[['__time', 'Value']]
df.rename(columns={'__time': 'time', 'Value': 'value'}, inplace=True)
df.sort_values(by='time', inplace=True)

Print the first few rows of the data frame.

In [None]:
df.head()

Print the last few rows of the data frame.

In [None]:
df.tail()

## 2. Perform some basic data analysis using pandas

Calculate the descriptive statistics.

In [None]:
df.describe()

Calculate the number of orders above and below some thresholds.

In [None]:
(df['value'] > 150).sum()

In [None]:
(df['value'] < 50).sum()

## 3. Plot the data with Plotly

The following code generates a basic time series plot with Plotly Express.

In [None]:
fig = px.scatter(df, x='time', y='value')
fig.update_traces(marker=dict(color='#b5babe'))
fig.update_layout(plot_bgcolor='white', paper_bgcolor='white')
fig.show()

The following code generates an interactive time series plot with Plotly Graph Objects. The buttons in the top left corner allow the user to display only the values of the time series over a recent time window (such as the last 10 minutes, the last 30 minutes, or the last hour). The slider on the bottom allows the user to display only the values of the time series over a custom selected time window. The user can additionally hover on the data points with the mouse to display a tooltip with the corresponding time and value. Additional interactive features (such as zooming in and out) are provided by the Plotly modebar, which appears on hover in the top right corner. Finally, the plot can be exported in HTML format to be included in a website, interactive report or dashboard.

In [None]:
# create the layout
layout = {'plot_bgcolor': 'white', 
          'paper_bgcolor': 'white',
          'margin': {'t':10, 'b':10, 'l':10, 'r':10, 'pad':0},
          'showlegend': False,
          'yaxis': {'showgrid': True, 
                    'zeroline': False, 
                    'mirror': True, 
                    'color': '#737373', 
                    'linecolor': '#d9d9d9',
                    'gridcolor': '#d9d9d9',
                    'tickformat': '$,.0f'},
          'xaxis': {'range':[df['time'].min(), df['time'].max()],
                    'autorange': False,
                    'showgrid': True, 
                    'zeroline': False, 
                    'mirror': True, 
                    'color': '#737373', 
                    'linecolor': '#d9d9d9', 
                    'gridcolor': '#d9d9d9',
                    'type': 'date', 
                    'tickformat': '%d %b %y %H:%M', 
                    'tickangle': 0,
                    'nticks': 5,
                    'rangeslider': {'visible': True}, 
                    'rangeselector': {'buttons': [
                        {'count': 10, 'label': '10m', 'step': 'minute', 'stepmode': 'backward'}, 
                        {'count': 30, 'label': '30m', 'step': 'minute', 'stepmode': 'backward'}, 
                        {'count': 60, 'label': '60m', 'step': 'minute', 'stepmode': 'backward'}, 
                        {'step': 'all'}]}}}

# create the traces
data = go.Scatter(x=df['time'],
                  y=df['value'],
                  mode='markers',
                  marker=dict(color='#b5babe', size=5),
                  hovertemplate='<b>Time:</b> %{x|%d %b %Y %H:%M:%S}<br>'
                  '<b>Value:</b> %{y: $,.2f}<extra></extra>')

# create the figure
fig = go.Figure(data=data, layout=layout)

# display the figure
fig.show()

# export the figure
fig.write_html('orders_plot.html')

## 4. Detect the anomalies with the Prophet model

Fit the Prophet model.

In [None]:
X = pd.DataFrame({'ds': df['time'], 'y': df['value']})
m = Prophet(interval_width=0.9999).fit(X)

Extract the model predictions.

In [None]:
predictions = m.predict(X)
predictions = predictions[['ds', 'yhat_lower', 'yhat_upper', 'yhat']]
predictions['ytrue'] = df['value'].values
predictions.head()

Identify the anomalies.

In [None]:
predictions['anomaly'] = np.where((predictions['ytrue'] < predictions['yhat_lower']) | (predictions['ytrue'] > predictions['yhat_upper']), True, False)
predictions.head()

Plot the results with Plotly.

In [None]:
# create the layout
layout = {'plot_bgcolor': 'white', 
          'paper_bgcolor': 'white',
          'margin': {'t':10, 'b':10, 'l':10, 'r':10, 'pad':0},
          'yaxis': {'showgrid': True, 
                    'zeroline': False, 
                    'mirror': True, 
                    'color': '#737373', 
                    'linecolor': '#d9d9d9',
                    'gridcolor': '#d9d9d9',
                    'tickformat': '$,.0f'},
          'xaxis': {'range':[predictions['ds'].min(), predictions['ds'].max()],
                    'autorange': False,
                    'showgrid': True, 
                    'zeroline': False, 
                    'mirror': True, 
                    'color': '#737373', 
                    'linecolor': '#d9d9d9', 
                    'gridcolor': '#d9d9d9',
                    'type': 'date', 
                    'tickformat': '%d %b %y %H:%M', 
                    'tickangle': 0,
                    'nticks': 5}}

# create the traces
data = []

data.append(go.Scatter(x=predictions.query('anomaly == False')['ds'],
                       y=predictions.query('anomaly == False')['ytrue'],
                       mode='markers',
                       marker=dict(color='#b5babe', size=5),
                       name='Orders',
                       hovertemplate='<b>Orders</b><br>'
                       '<b>Time:</b> %{x|%d %b %Y %H:%M:%S}<br>'
                       '<b>Value:</b> %{y: $,.2f}<extra></extra>'))

data.append(go.Scatter(x=predictions.query('anomaly == True')['ds'],
                       y=predictions.query('anomaly == True')['ytrue'],
                       mode='markers',
                       marker=dict(color='#e83e8c', size=5), 
                       name='Anomalies',
                       hovertemplate='<b>Anomalies</b><br>'
                       '<b>Time:</b> %{x|%d %b %Y %H:%M:%S}<br>'
                       '<b>Value:</b> %{y: $,.2f}<extra></extra>'))

data.append(go.Scatter(x=predictions['ds'],
                       y=predictions['yhat'],
                       mode='lines',
                       line=dict(color='#8A348E', width=4, dash='dot'),
                       name='Model Predictions',
                       hovertemplate='<b>Model Predictions</b></br>'
                       '<b>Time:</b> %{x|%d %b %Y %H:%M:%S}<br>'
                       '<b>Value:</b> %{y: $,.2f}<extra></extra>'))

# create the figure
fig = go.Figure(data=data, layout=layout)

# display the figure
fig.show()

# export the figure
fig.write_html('model_plot.html')