# Anomaly Detection

In this notebook, we will visualize the time series to help spot anomalies.

<img src="https://box.hu-berlin.de/f/53a91798173c4dad9345/?dl=1" width=800/>

A time series is a sequence of real-values order by time. The dataset, you are given, contains a total of 30 time series. Each time series has exactly one anomaly of unknown size and shape. The anomaly is located in the test segment, while you are also given a train segment that contains no anomaly. 

**Your task is, given each time series, to locate the anomaly in the time series.**

### Required Libraries

In [1]:
try:
    import plotly.graph_objects as go
except ImportError as e:
    !pip install "plotly>=5.0"

In [2]:
import os
import fnmatch
import zipfile

import numpy as np
import pandas as pd

import plotly.io as pio
pio.renderers.default = "notebook"  # or "plotly_mimetype+notebook"

import plotly.graph_objects as go
from ipywidgets import interact
import ipywidgets as widgets

%config InlineBackend.figure_formats = {'png', 'retina'}


### Utility functions
def read_series(file, locations):
    # file is e.g. "000_Anomaly_2500.csv"
    internal_name = folder_in_zip + file
    # print(internal_name)

    with zf.open(internal_name) as f:
        data = pd.read_csv(f, header=None)

    data = np.array(data).flatten()

    # Extract file name
    file_name = file.split('.')[0]
    splits = file_name.split('_')
    test_start = np.array(splits[-1])

    # Extract anomaly location
    anomaly = (-1, -1)
    if file_name in locations.index:
        row = locations.loc[file_name]
        anomaly = row["Start"], row["End"]    

    return (file_name, int(test_start), data, anomaly)


## Task 1: Identify Anomalies by Inspection

<div class="alert alert-block alert-success">
Your next task is to identify the anomalies. You must correctly identify at least 10 anomalies.
</div>

Use the file `labels.csv` to store your anomalies. Each anomaly has a `start` and an `end` offset. Update `start` and `end` offsets accordingly. You should use the below functionality provided by `plotly` to zoom into the time series. 

**Hint:** 
- There are some files with anonalies from the same dataset. You might splot anomalies faster by comparing these.
- You could also implement some other functionality to guide you in spotting anomalies. You may use the code box below the plots.


In [3]:
# e.g. 0.6 alpha
test_color = 'rgba(76, 114, 176, 0.9)'   # seaborn blue with alpha
train_color  = 'rgba(221, 132, 82, 0.9)'   # seaborn orange with alpha

# Read labels.csv once
locations = pd.read_csv("labels.csv")
locations.set_index("Name", inplace=True)

zip_path = "phase_1.zip"
folder_in_zip = "phase_1/"

# Pre-open the zip for listing files
zf = zipfile.ZipFile(zip_path)

# All CSV files inside phase_1/ (excluding labels.csv for the widget)
file_list = np.sort([
    name[len(folder_in_zip):]  # strip "phase_1/" so widget shows just the filename
    for name in zf.namelist()
    if name.startswith(folder_in_zip)
       and fnmatch.fnmatch(name, "*.csv")
       and not name.endswith("labels.csv")
])

    
@interact
def show(file=file_list):
    name, test_start, data, anomaly = read_series(file, locations)
        
    # Create figure
    layout = dict(xaxis = dict(showgrid=False, ticks='inside'),
                  yaxis = dict(showgrid=False, ticks='inside'),
                  font=dict(size=12),
                )
    
    fig = go.Figure(layout=layout)

    # Train
    fig.add_trace(
        go.Scatter(x=list(range(test_start)), y=data[:test_start],
                   line=dict(width=1, color=train_color)))
    
    # Test
    fig.add_trace(
        go.Scatter(x=list(range(test_start, len(data))), y=data[test_start:],
                   line=dict(width=1, color=test_color)))
    
    # Anomaly
    if anomaly[0]>0:
        fig.add_trace(
            go.Scatter(x=list(range(anomaly[0], anomaly[1])), 
                       y=data[anomaly[0]:anomaly[1]], 
                       line=dict(width=1, color='green')))

    # Set title
    fig.update_layout(
        title_text="Time Series with Normal (Orange) and Anomaly (Blue) Part",
        autosize=True,
        margin=dict(l=10, r=10, t=30, b=30),
        showlegend=False,
        paper_bgcolor="white",
        plot_bgcolor="white"     
    )

    # Add range slider
    fig.update_layout(
        xaxis=dict(            
            rangeslider=dict(
                visible=True
            ),
            type="linear"
        )
    )

    fig.show()

interactive(children=(Dropdown(description='file', options=(np.str_('001_Anomaly_5000.csv'), np.str_('002_Anomâ€¦

# Inspect Annotated Anomalies

In [4]:
locations

Unnamed: 0_level_0,TrainSplit,Start,End
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
001_Anomaly_5000,5000,7428,7517
002_Anomaly_4375,4375,7074,7126
003_Anomaly_4375,4375,8380,8390
004_Anomaly_2500,2500,5450,5600
005_Anomaly_4000,4000,5389,5392
006_Anomaly_4000,4000,5691,5740
007_Anomaly_4000,4000,6510,6550
008_Anomaly_4000,4000,5544,5600
009_Anomaly_4000,4000,4850,4890
010_Anomaly_4000,4000,6037,6077
