# Transforming Temporal-Dynamic Graphs Into Time-Series Data for Solving Event Detection Problems

Event detection problems on temporal-dynamic graphs aim to detect important events by detection abnormal changes on the
network. Because of the excessive use of social media, many real world problems can be modelled as temporal-dynamic graph
data. With the recent progress in graph representation learning, new anomaly detection on static graphs are studied. In this
work, we present a workflow for event detection on temporal-dynamic graphs with using graph representation learning.
Our workflow uses generated embeddings of the temporal-dynamic graph to transform the problem into a unsupervised
time-series anomaly detection problem. Since this is a widely studied research area, transforming temporal-dynamic graph
data into multivariate time series data, provides many possible solutions for the event detection problems. We have evaluated
our proposed workflow on four different real-world datasets and compared our results. Our workflow shows competitive per-
formance, when compared to previous studies. This study gives a proof of concept for using graph embeddings as time-series
data in anomaly detection task.

# Proposed Workflow

In the figure bellow you can see the proposed model workflow. Input is a temporal-dynamic graph G which consists of static
snapshots of the graph taken in different time steps.Then model generates n-dimensional vector embeddings from given
input graph, with using graph representation learning. After this step model pass these embeddings to an unsupervised
anomaly detector. Output of proposed workflow is the anomaly scores corresponding to each time step.

<img src="Proposed_Workflow.png">

In following experiments, we are going to use our proposed workflow. After pre-processing our data, first step is
to generate graph embeddings. For this task we used tdGraphEmbed model. Model generates 40 random-walks
for each node in the graph and length of each walk is 16 in our experiments. The model is trained 50 iterations,
with generated random-walk document. In the second step we are going to use time-series anomaly detectors we
mentioned above. For these algorithms we used Merlion machine learning library for time series data.

# Importing Packages

Before importing libraries, make sure to install requirements in Github repository.

In [1]:
from gensim.models.doc2vec import Doc2Vec
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from merlion.utils import TimeSeries
from evaluation_util import *

from tdGraphEmbed.tdGraphEmbed.temporal_graph import TemporalGraph
from tdGraphEmbed.tdGraphEmbed.model import TdGraphEmbed
from datasetConverter import dataset_convert
from tempfile import TemporaryFile

import datetime

  from .autonotebook import tqdm as notebook_tqdm


# Generating Temporal-Dynamic Graph Embeddings

In this step we are going to read the data from files and generate a model for training process. For getting the datasets you, we will use the data_conver() function. Available datasets are:

Tw-WorldCup - The Twitter WorldCup datasets. In experiments you can use granularity as hours.

Tw-Terror-Security - The Twitter Terror Security. In experiments you can use granularity as days.

gameofthrones - The Reddit Game of Thrones. You can read directly, since this dataset is provided as picke file.

formula - The Reddit Formula 1. You can read directly, since this dataset is provided as picke file.

In [2]:
graphs = dataset_convert(dataset="Tw-WorldCup",granularity="hours")
model = TdGraphEmbed(dataset_name="Tw-WorldCup")

In [4]:
documents = model.get_documents_from_graph(graphs)

Computing transition probabilities: 100%|██████████| 11/11 [00:00<00:00, 10987.70it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 2353.18it/s]
Computing transition probabilities: 100%|██████████| 48/48 [00:00<00:00, 7994.54it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 449.35it/s]
Computing transition probabilities: 100%|██████████| 59/59 [00:00<00:00, 8428.32it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 366.97it/s]
Computing transition probabilities: 100%|██████████| 48/48 [00:00<00:00, 8003.12it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 470.61it/s]
Computing transition probabilities: 100%|██████████| 104/104 [00:00<00:00, 5778.12it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 204.06it/s]
Computing transition probabilities: 100%|██████████| 43/43 [00:00<00:00, 8600.62it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 536.84it/s]
Computing transition pro

Computing transition probabilities: 100%|██████████| 68/68 [00:00<00:00, 4857.83it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 300.76it/s]
Computing transition probabilities: 100%|██████████| 73/73 [00:00<00:00, 6636.70it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 287.77it/s]
Computing transition probabilities: 100%|██████████| 73/73 [00:00<00:00, 3842.05it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 279.72it/s]
Computing transition probabilities: 100%|██████████| 52/52 [00:00<00:00, 7430.12it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 416.66it/s]
Computing transition probabilities: 100%|██████████| 48/48 [00:00<00:00, 6857.41it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 481.93it/s]
Computing transition probabilities: 100%|██████████| 30/30 [00:00<00:00, 7504.12it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 727.30it/s]
Computing transition probabi

Computing transition probabilities: 100%|██████████| 39/39 [00:00<00:00, 13010.25it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 655.75it/s]
Computing transition probabilities: 100%|██████████| 53/53 [00:00<00:00, 8835.03it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 421.06it/s]
Computing transition probabilities: 100%|██████████| 58/58 [00:00<00:00, 5800.56it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 336.15it/s]
Computing transition probabilities: 100%|██████████| 61/61 [00:00<00:00, 8713.73it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 366.98it/s]
Computing transition probabilities: 100%|██████████| 25/25 [00:00<00:00, 12506.87it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 1025.66it/s]
Computing transition probabilities: 100%|██████████| 49/49 [00:00<00:00, 8168.23it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 454.55it/s]
Computing transition prob

Computing transition probabilities: 100%|██████████| 16/16 [00:00<00:00, 5333.72it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 1333.42it/s]
Computing transition probabilities: 100%|██████████| 19/19 [00:00<00:00, 6335.30it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 1212.17it/s]
Computing transition probabilities: 100%|██████████| 39/39 [00:00<00:00, 3545.71it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 519.48it/s]
Computing transition probabilities: 100%|██████████| 63/63 [00:00<00:00, 5249.96it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 322.58it/s]
Computing transition probabilities: 100%|██████████| 41/41 [00:00<00:00, 5125.22it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 500.01it/s]
Computing transition probabilities: 100%|██████████| 36/36 [00:00<00:00, 3273.32it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 454.53it/s]
Computing transition proba

Computing transition probabilities: 100%|██████████| 50/50 [00:00<00:00, 5556.84it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 439.56it/s]
Computing transition probabilities: 100%|██████████| 94/94 [00:00<00:00, 1843.23it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 176.99it/s]
Computing transition probabilities: 100%|██████████| 82/82 [00:00<00:00, 4823.95it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 246.92it/s]
Computing transition probabilities: 100%|██████████| 121/121 [00:00<00:00, 5260.76it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 170.94it/s]
Computing transition probabilities: 100%|██████████| 98/98 [00:00<00:00, 6125.44it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 210.53it/s]
Computing transition probabilities: 100%|██████████| 47/47 [00:00<00:00, 7837.33it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 481.94it/s]
Computing transition proba

Computing transition probabilities: 100%|██████████| 95/95 [00:00<00:00, 2794.12it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 206.19it/s]
Computing transition probabilities: 100%|██████████| 35/35 [00:00<00:00, 11674.01it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 727.29it/s]
Computing transition probabilities: 100%|██████████| 61/61 [00:00<00:00, 10166.19it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 384.62it/s]
Computing transition probabilities: 100%|██████████| 57/57 [00:00<00:00, 11399.20it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 408.16it/s]
Computing transition probabilities: 100%|██████████| 100/100 [00:00<00:00, 3448.50it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 199.01it/s]
Computing transition probabilities: 100%|██████████| 75/75 [00:00<00:00, 7500.90it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 291.97it/s]
Computing transition pr

Computing transition probabilities: 100%|██████████| 190/190 [00:00<00:00, 3653.88it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 102.30it/s]
Computing transition probabilities: 100%|██████████| 213/213 [00:00<00:00, 3380.93it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 90.29it/s] 
Computing transition probabilities: 100%|██████████| 183/183 [00:00<00:00, 4159.26it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 109.89it/s]
Computing transition probabilities: 100%|██████████| 68/68 [00:00<00:00, 5666.18it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 325.21it/s]
Computing transition probabilities: 100%|██████████| 100/100 [00:00<00:00, 5000.96it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 206.19it/s]
Computing transition probabilities: 100%|██████████| 119/119 [00:00<00:00, 3500.03it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 164.61it/s]
Computing transiti

Computing transition probabilities: 100%|██████████| 207/207 [00:00<00:00, 3696.46it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 93.68it/s] 
Computing transition probabilities: 100%|██████████| 244/244 [00:00<00:00, 2489.89it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 69.32it/s]
Computing transition probabilities: 100%|██████████| 175/175 [00:00<00:00, 5468.86it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 116.62it/s]
Computing transition probabilities: 100%|██████████| 172/172 [00:00<00:00, 3510.22it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 111.42it/s]
Computing transition probabilities: 100%|██████████| 212/212 [00:00<00:00, 2409.10it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 80.16it/s]
Computing transition probabilities: 100%|██████████| 140/140 [00:00<00:00, 4666.78it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 145.99it/s]
Computing transiti

Computing transition probabilities: 100%|██████████| 192/192 [00:00<00:00, 3840.11it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 100.87it/s]
Computing transition probabilities: 100%|██████████| 282/282 [00:00<00:00, 4209.01it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 66.78it/s]
Computing transition probabilities: 100%|██████████| 122/122 [00:00<00:00, 4880.07it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 165.97it/s]
Computing transition probabilities: 100%|██████████| 203/203 [00:00<00:00, 3903.90it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 96.15it/s] 
Computing transition probabilities: 100%|██████████| 179/179 [00:00<00:00, 4162.91it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 110.50it/s]
Computing transition probabilities: 100%|██████████| 193/193 [00:00<00:00, 2573.39it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 96.26it/s] 
Computing transit

Computing transition probabilities: 100%|██████████| 660/660 [00:03<00:00, 196.42it/s] 
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:09<00:00,  4.06it/s]
Computing transition probabilities: 100%|██████████| 258/258 [00:00<00:00, 892.74it/s] 
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:01<00:00, 33.76it/s]
Computing transition probabilities: 100%|██████████| 223/223 [00:00<00:00, 2207.95it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 65.15it/s]
Computing transition probabilities: 100%|██████████| 405/405 [00:00<00:00, 459.70it/s] 
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:03<00:00, 13.28it/s]
Computing transition probabilities: 100%|██████████| 145/145 [00:00<00:00, 3718.06it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 125.00it/s]
Computing transition probabilities: 100%|██████████| 174/174 [00:00<00:00, 3480.04it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 105.54it/s]
Computing transition

Computing transition probabilities: 100%|██████████| 85/85 [00:00<00:00, 8500.01it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 250.00it/s]
Computing transition probabilities: 100%|██████████| 102/102 [00:00<00:00, 4636.40it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 197.05it/s]
Computing transition probabilities: 100%|██████████| 119/119 [00:00<00:00, 5409.25it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 170.94it/s]
Computing transition probabilities: 100%|██████████| 95/95 [00:00<00:00, 6333.89it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 219.78it/s]
Computing transition probabilities: 100%|██████████| 98/98 [00:00<00:00, 7538.18it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 217.39it/s]
Computing transition probabilities: 100%|██████████| 114/114 [00:00<00:00, 6000.06it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 180.18it/s]
Computing transition p

Computing transition probabilities: 100%|██████████| 105/105 [00:00<00:00, 3281.37it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 183.49it/s]
Computing transition probabilities: 100%|██████████| 130/130 [00:00<00:00, 3023.27it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 151.52it/s]
Computing transition probabilities: 100%|██████████| 52/52 [00:00<00:00, 5778.66it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 430.11it/s]
Computing transition probabilities: 100%|██████████| 54/54 [00:00<00:00, 4153.84it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 370.38it/s]
Computing transition probabilities: 100%|██████████| 116/116 [00:00<00:00, 5800.35it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 181.82it/s]
Computing transition probabilities: 100%|██████████| 115/115 [00:00<00:00, 5227.43it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 180.99it/s]
Computing transition

Computing transition probabilities: 100%|██████████| 143/143 [00:00<00:00, 5958.29it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 147.06it/s]
Computing transition probabilities: 100%|██████████| 116/116 [00:00<00:00, 6824.22it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 191.39it/s]
Computing transition probabilities: 100%|██████████| 119/119 [00:00<00:00, 8500.62it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 184.33it/s]
Computing transition probabilities: 100%|██████████| 153/153 [00:00<00:00, 3642.89it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 129.03it/s]
Computing transition probabilities: 100%|██████████| 120/120 [00:00<00:00, 5714.31it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 182.65it/s]
Computing transition probabilities: 100%|██████████| 197/197 [00:00<00:00, 3862.88it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 101.27it/s]
Computing transi

Computing transition probabilities: 100%|██████████| 157/157 [00:00<00:00, 7476.90it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 136.05it/s]
Computing transition probabilities: 100%|██████████| 153/153 [00:00<00:00, 4371.54it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 132.45it/s]
Computing transition probabilities: 100%|██████████| 105/105 [00:00<00:00, 7500.93it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 206.19it/s]
Computing transition probabilities: 100%|██████████| 153/153 [00:00<00:00, 5100.33it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 135.14it/s]
Computing transition probabilities: 100%|██████████| 86/86 [00:00<00:00, 5058.76it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 232.56it/s]
Computing transition probabilities: 100%|██████████| 93/93 [00:00<00:00, 7154.89it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 223.47it/s]
Computing transition

Computing transition probabilities: 100%|██████████| 107/107 [00:00<00:00, 6296.87it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 197.05it/s]
Computing transition probabilities: 100%|██████████| 44/44 [00:00<00:00, 14668.90it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 606.07it/s]
Computing transition probabilities: 100%|██████████| 56/56 [00:00<00:00, 5091.06it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 363.64it/s]
Computing transition probabilities: 100%|██████████| 61/61 [00:00<00:00, 6099.86it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 360.36it/s]
Computing transition probabilities: 100%|██████████| 96/96 [00:00<00:00, 6400.26it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 219.78it/s]
Computing transition probabilities: 100%|██████████| 122/122 [00:00<00:00, 4357.42it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 160.00it/s]
Computing transition pr

Computing transition probabilities: 100%|██████████| 99/99 [00:00<00:00, 5210.97it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 210.53it/s]
Computing transition probabilities: 100%|██████████| 62/62 [00:00<00:00, 7751.02it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 350.88it/s]
Computing transition probabilities: 100%|██████████| 36/36 [00:00<00:00, 7201.21it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 655.74it/s]
Computing transition probabilities: 100%|██████████| 77/77 [00:00<00:00, 7000.05it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 285.72it/s]
Computing transition probabilities: 100%|██████████| 60/60 [00:00<00:00, 7501.44it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 360.36it/s]
Computing transition probabilities: 100%|██████████| 116/116 [00:00<00:00, 2636.43it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 153.85it/s]
Computing transition proba

Computing transition probabilities: 100%|██████████| 45/45 [00:00<00:00, 7502.33it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 476.20it/s]
Computing transition probabilities: 100%|██████████| 69/69 [00:00<00:00, 4600.11it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 294.12it/s]
Computing transition probabilities: 100%|██████████| 76/76 [00:00<00:00, 6333.54it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 287.77it/s]
Computing transition probabilities: 100%|██████████| 108/108 [00:00<00:00, 6000.12it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 200.00it/s]
Computing transition probabilities: 100%|██████████| 82/82 [00:00<00:00, 5124.91it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 251.57it/s]
Computing transition probabilities: 100%|██████████| 91/91 [00:00<00:00, 3956.60it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 209.43it/s]
Computing transition proba

Computing transition probabilities: 100%|██████████| 67/67 [00:00<00:00, 6090.95it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 310.08it/s]
Computing transition probabilities: 100%|██████████| 37/37 [00:00<00:00, 7401.24it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 606.07it/s]
Computing transition probabilities: 100%|██████████| 72/72 [00:00<00:00, 8000.16it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 294.12it/s]
Computing transition probabilities: 100%|██████████| 49/49 [00:00<00:00, 8166.93it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 448.97it/s]
Computing transition probabilities: 100%|██████████| 81/81 [00:00<00:00, 6175.38it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 266.67it/s]
Computing transition probabilities: 100%|██████████| 81/81 [00:00<00:00, 6231.11it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 277.78it/s]
Computing transition probabi

Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 425.54it/s]
Computing transition probabilities: 100%|██████████| 56/56 [00:00<00:00, 4666.63it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 396.05it/s]
Computing transition probabilities: 100%|██████████| 96/96 [00:00<00:00, 4799.95it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 215.05it/s]
Computing transition probabilities: 100%|██████████| 69/69 [00:00<00:00, 4312.88it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 305.34it/s]
Computing transition probabilities: 100%|██████████| 83/83 [00:00<00:00, 6916.89it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 264.90it/s]
Computing transition probabilities: 100%|██████████| 65/65 [00:00<00:00, 6500.94it/s]
Generating walks (CPU: 1): 100%|██████████| 40/40 [00:00<00:00, 338.99it/s]
Computing transition probabilities: 100%|██████████| 134/134 [00:00<00:00, 6381.27it/s]
Generating walks (CPU: 1):

In [5]:
model.run_doc2vec(documents)

iteration 0
iteration 1
iteration 2
iteration 3
iteration 4
iteration 5
iteration 6
iteration 7
iteration 8
iteration 9
iteration 10
iteration 11
iteration 12
iteration 13
iteration 14
iteration 15
iteration 16
iteration 17
iteration 18
iteration 19
iteration 20
iteration 21
iteration 22
iteration 23
iteration 24
iteration 25
iteration 26
iteration 27
iteration 28
iteration 29
iteration 30
iteration 31
iteration 32
iteration 33
iteration 34
iteration 35
iteration 36
iteration 37
iteration 38
iteration 39
iteration 40
iteration 41
iteration 42
iteration 43
iteration 44
iteration 45
iteration 46
iteration 47
iteration 48
iteration 49
Model Saved


In [6]:
graph_vectors = model.get_embeddings()
np.save("tdGraphEmbed/saved_embeddings/Tw-WorldCup.npy", graph_vectors)

In the code above you can read the dataset and train the graph representation learning model. Then save the model and embeddings into files. Training for this process takes a long time around 15-24 hours to complete. Because of this we will use the model and files we have saved.

# Unsupervised Time Series Anomaly Detection

Bellow, we will read the saved model and dataset labels and prepares the data in the format of Merlion library. Saved model file will provide us with time-stamps and n-dimensional embeddings and we will use the labels to evaluate our model. All training process is fully unsupervised and we only use labels for evaluation.

In [2]:
model_path = "tdGraphEmbed/trained_models/Tw-WorldCup.model"
labels_path = "Datasets/Twitter_WorldCup/Twitter_WorldCup_2014_labels.txt"

#model_path = "tdGraphEmbed/trained_models/Tw-Terror-Security.model"
#labels_path = "Datasets/Twitter_Security/Twitter_May_Aug_2014_TerrorSecurity_labels.txt"

#model_path = "tdGraphEmbed/trained_models/GoT-2017.model"
#labels_path = "Datasets/gameofthrones/gameofthrones_2017_labels.txt"

#model_path = "tdGraphEmbed/trained_models/Formula-2019.model"
#labels_path = "Datasets/formula/formula_2019_labels.txt"



model = Doc2Vec.load(model_path)
doc_vecs = model.docvecs.doctag_syn0
doc_vecs = doc_vecs[np.argsort([model.docvecs.index_to_doctag(i) for i in range(0, doc_vecs.shape[0])])]

time_stamps = list(model.docvecs.doctags.keys())
time_series_custom = pd.DataFrame(doc_vecs, index=time_stamps)

ls = readFiles(labels_path, granularity="hours")
df_metadata = pd.DataFrame(columns = ['trainval', 'anomaly'], index = time_stamps)
df_metadata = generate_metadata(df_metadata, time_stamps, ls)

Call to deprecated `doctag_syn0` (Attribute will be removed in 4.0.0, use docvecs.vectors_docs instead).


In this step we are going to train our unsupervised time-series anomaly detection model. There are some different settings for different datasets. Bellow the settings are adjusted for The Twitter World-Cup dataset. Overall important parameters are; "top" variable is the k variable in Recall@k and Precision@k and defines the amount of anomalies to detect. Available models we recommend are "VAE", "LSTMED", and "IsolationForest". For other datasets please follow the comments.

In [6]:
from merlion.models.factory import ModelFactory
from merlion.post_process.threshold import AggregateAlarms

# @k parameter to set
top=5

train_data = TimeSeries.from_pd(time_series_custom[:])
test_labels = TimeSeries.from_pd(df_metadata["anomaly"][:])

#Available models are VAE, LSTMED, and IsolationForest
#Because of the data properties Isolation Forest is best in Twitter Security dataset.
#It is best to use VAE or LSTMED on other datasets.
model = ModelFactory.create("VAE",
                            threshold=AggregateAlarms(alm_threshold=0))

model.train(train_data)
labels = model.get_anomaly_label(train_data)
df_temp = labels.to_pd()
df_cpy = df_temp.copy()

#For datasets other than The Twitter World-Cup ascending should be True. This is due to some
#implementation error on Merlion library.
df_temp = get_top_anomalies(df_temp,ascending=False , top=top)

#If you are using Isolation Forest Model you have to use test_labels_temp = test_labels[1:]
#Because ısolation forest does nor return an anomaly score for first time-stamp.
test_labels_temp = test_labels[:]

 |████████████████████████████████████████| 100.0% Complete, Loss 0.0013
Anomaly Threshold: 
anom_score    2.435622
Name: 2014-07-13 18:00:00, dtype: float64


# Evaluation

In this part you can evaluate the model with precision and recall. Also you can try out different delay factors. Since model training is trivial. You can get different results in significance interval.

In [7]:
prec = get_precision(df_temp, test_labels_temp,delay=0)
rec = get_recall(df_temp, test_labels_temp,delay=0,top=top)
acc = get_accuracy(df_temp, test_labels_temp)
print("Top",top,"time-stamps are considered as anomalies")
print("Precision:", prec)
print("Recall:", rec)
print("Accuracy:", acc)

Top 5 time-stamps are considered as anomalies
Precision: 0.6
Recall: 0.025210084033613446
Accuracy: 0.9341150195421553


In [8]:
for i in range(6):  
    prec = get_precision(df_temp, test_labels_temp,delay=i)
    rec = get_recall(df_temp, test_labels_temp,delay=i,top=top)
    print("Precision and Recall with delay",i)
    print("Precision:", prec)
    print("Recall:", rec)
    print("---------")
    

Precision and Recall with delay 0
Precision: 0.6
Recall: 0.025210084033613446
---------
Precision and Recall with delay 1
Precision: 1.0
Recall: 0.07563025210084033
---------
Precision and Recall with delay 2
Precision: 1.0
Recall: 0.1092436974789916
---------
Precision and Recall with delay 3
Precision: 1.0
Recall: 0.12605042016806722
---------
Precision and Recall with delay 4
Precision: 1.0
Recall: 0.14285714285714285
---------
Precision and Recall with delay 5
Precision: 1.0
Recall: 0.15966386554621848
---------
