Skip to content
Sanjay Nadhavajhala edited this page Jan 25, 2023 · 21 revisions

Overview

Artificial Intelligence for IT Operations(AIOps) is the application of AI and machine learning to IT and observability data. Opni AIOps offers the following features:

  • Log Anomaly Detection

Log Anomaly Detection

The goal of log anomaly detection is to apply machine-learning technologies to structure log data and detect anomalous logs. Opni AIOps incorporates an advanced deep learning and a log parsing method and offers:

Data flow overview

Log Anomaly Detection

Key Technologies

This section will walk through a few key technologies that Opni Log Anomaly Detection based on.

Deep Learning Model

Opni AIOps utilizes transformers to build a self-supervised learning model that formulates the masked word prediction task as masked language modeling. This model is then applied to log anomaly detection in unsupervised scenario, and it can also be further fine-tuned with downstream log-anomaly-detection task with ground-truth data. Deep Learning Model Architecture

Technical Details

  1. model input. raw unstructured logs
  2. tokenization. Normalize raw logs to lists of tokens. Some words are transferred to pre-defined tokens, for example, https://rancher.com becomes <URL>. Tokenized sentences will always start with token <CLS> and end with padding token <PAD>, and it has a fixed number of tokens (such as 64 or 128), defined by a hyper parameter.
  3. word masking. one token in each sentence is masked as <MASK> for the model to predict.
  4. build model. The model consists of N encoder layers, N decoder layers, a generator layer (linear + softmax). Typically N = 1 or 2.
  5. model output. predicted tokens of the <MASK> tokens.
  6. model training. model is trained with batches of training data. the loss function is defined by the accuracy of masked word prediction.
  7. model inference as unsupervised log anomaly detection task. The trained model will try to predict every token of an input sentence (except the leading <CLS> token and the padding <PAD> token), the anomaly score is then calculated by anomaly_score = token_correct_predict / token_total

Log Parsing/Templatization

Log parsing is the first step of log analysis. It transforms unstructured log messages into structured events, namely log templates. Opni AIOps built the log parsing module on top of an open-sourced project Drain3, which is an online log template miner that can extract templates from streaming log messages. Log Parsing Example

Data Pipeline

Opni AIOps uses Nats to implement data pipeline including message queues and key-value storage. A python package Nats Wrapper is created to simplify the usage of nats in python services. Additionally, Protobuf is used to encapsulate data between the different microservices.

Clone this wiki locally