You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The tegdet package includes the modules that implements the TEG detectors:
teg.py: it is the main module that includes the API;
graph_comparison.py: it includes the graph class and it is responsible to compute the difference between two graphs according to a given metric (variant of strategy pattern).
The tegdet package depends on several well-known Python packages as shown in the diagram below:
Each module includes a set of classes, which are detailed in the class diagram below,
where the colour is used to map the classes to the module they belong to:
teg module
The teg module includes the following classes:
TEGDetector: the API class to be used from the user point of view.
attributes (class)
description
__N_BINS: int
Level of discretization of real valued observations (number of levels). Value=30.
__N_OBS_PER_PERIOD: int
Number of observations per period. Value=336
__ALPHA: int
Significance level 100-_ALPHA. Value=5.
attributes
description
__metric: string
Dissimilarity metric used to compare two graphs. Input parameter.
__n_bins: int
Level of discretization of real valued observations (number of levels). Input parameter. Default value= __N_BINS
__n_obs_per_period: int
Number of observation per period. Input parameter. Default value= __N_OBS_PER_PERIOD
Builds the prediction model based on the training_dataset and returns it together with the time to build the model (float type)
predict(testing_dataset: Dataframe, model: ModelBuilder): numpy array of int, int, float
Makes predictions on the testing_dataset using the model. It returns: the outliers (numpy array of {0,1} values) and total number of observations (int type), and the time to make predictions (float type)
compute_confusion_matrix(ground_true: numpy array of int, predictions: numpy array of int): dict
Computes the confusion matrix based on the ground true values and predicted values (numpy array of {0,1} values). It returns the confusion matrix as a dictionary (dict) type
Prints on the stdout: the detector (dicttype including the metric and the input parameters setting), and the testing_set, the performance metrics perf(dict type including the time to build the model and the time to make predictions) and the confusion matrix cm
Saves in the file with pathname results_csv_path (comma-separated values format): the detector (dicttype), the testing_set, the performance metrics perf (dict type) and the confusion matrix cm (dict type)
ModelBuilder: the builder of the prediction model, based on TEG and baseline graph dissimilarity distribution
attribute
description
__obs: Dataframe
Training set
__le: LevelExtractor
LevelExtractor instance
__baseline: numpy array of float
Baseline distribution of the training period.
__global_graph: Graph
Global graph associated to the training period.
method
description
__init__(observations: Dataframe, n_bins: int)
Constructor that initializes the ModelBuilder based on the training dataset observations and n_bins
__sum_graphs(gr1: Graph, gr2: Graph)
Adds to graph gr1 the graph gr2. Pre-Condition: gr1 nodes set includes the gr2node set
__compute_global_graph(graphs: list of Graph): Graph
Creates and returns a global graph as the sum of a list of graphs
get_level_extractor(): Dataframe
Returns __le
get_baseline(): numpy array of float
Returns __baseline
get_global_graph(): Graph
Returns __global_graph
build_model(metric: string, n_periods: int)
Computes and sets __global_graph and __baseline based on the metric, number of periods n_periods and __obs.
AnomalyDetector: it makes predictions and computes outliers
attribute
description
__model: ModelBuilder
ModelBuilder instance
method
description
__init__(model: ModelBuilder)
Constructor that initializes model with the reference ModelBuilder
make_prediction(metric: string, observations: Dataframe, n_periods): numpy array of float
Makes the predictions of the observations based on the dissimilarity metric, the number of periods n_periods and the reference __model
compute_outliers(prediction: numpy array of float, sigLevel: int): numpy array of int
Computes the outliers based on the prediction, the significance level sigLevel and the reference model (concretely, the __baseline distribution)
LevelExtractor: extractor of levels and univariate time-series discretizer
attribute
description
__level: numpy array of int
Discretization levels
__step: int
Discretization step
__minValue : float
Minimum value of the training observation set
method
description
__init__(minValue: float, step: int, n_bins: int)
Constructor that initializes the LevelExtractor attributes
get_levels(): numpy array of int
Returns level
discretize(observations: Dataframe): numpy array of int
Discretizes the real valued observations according to the discretization levels and returns the discretized observations
TEGGenerator: Time-Evolving-Graph generator
attribute
description
__teg: list of Graph
Time evolving graph
method
description
__init__(observation_discretized: numpy array of int, n_periods: int)
Generates and sets the __teg from the discretized observations observation_discretized and the number of periods n_periods
get_teg(): list of Graph
Returns the generated __teg
GraphDistanceCollector: Collector of distances between graphs in a TEG and the global graph
attribute
description
__distance: numpy array of float
Graph distances
method
description
__init__(n_periods: int)
Constructor, sets the __distance attributes as an n_periods length empty array
compute_graphs_dist(teg: list of Graph, global_graph: Graph, metric: string): numpy array of float
Computes and returns the distances between each graph in teg and global_graph using the dissimilarity metric
graph comparison module
This module includes the following classes:
Graph: Graph generator (empty graph, graph from an univariate time-series) and manipulator (graph expansion)
Constructor that initializes the Graph attributes (possible empty graph)
__get_index(element: int): int
Returns the index (int type) of the matrix row/column based on element
get_nodes()
Returns __nodes
get_nodes_freq()
Returns __nodes_freq
get_matrix()
Returns __matrix
update_node_freq(pos: int, value: int)
Increments by value the pos element of __nodes_freq
update_matrix_entry(row: int , col: int, value: int)
Increments by value the __matrix entry in position row and col
generate_graph(obs_discretized_: Dataframe)
Generates the graph from the discretized observations obs_discretized
expand_graph(position: int, vertex: int)
Expands the graph by inserting a new node vertex in position. The new added fictious node has frequency -1. The new added row and column of the adjacency matrix have -1 entries
GraphComparator: Graph comparator operator. It is the superclass (actually never instantiated).
attribute
description
_graph1: Graph
first operator
_graph2: Graph
second operator
method
description
__init__(gr1: Graph, gr2: Graph)
Constructor that initializes the attributes as gr1 and gr2, respectively
_normalize_matrices( )
Converts the incidence matrices of graph1 and graph2 into one-dimensional array, and normalizes the entries (i.e., relative frequencies).
resize_graphs( )
Compares the nodes of the two graphs and possibly expand them
compare_graphs( )
Signature only (it is overriden)
The rest of the classes are subclasses of GraphComparator that override the method compare_graphs(). Each subclass GraphMetricDissimilarity implements the dissimilarity metric included in the following Table.
The Hamming metric is computed considering two vectors P and Q obtained by flattening the incidence matrices of the two graphs
The Cosine metric is computed considering two vectors P and Q obtained by flattening the node-frequency and the incidence matrices of the two graphs
The rest of the metrics are computed considering two vectors P and Q obtained by normalizing the incidence matrices of the two graphs.
User script scenarios
The following sequence diagrams show two scenarios where a userScript uses the API of the tegdet library 1) to build a model from a training set; and 2) to make predictions with the model on a testing set and print the metrics of interest (confusion matrix and times to build & make predictions).