Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
A Content Anomaly Detector based on n-Grams
C CMake Groff
Branch: master

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
bin
doc
res
src
CHANGELOG
CMakeLists.txt
LICENSE
README.md
TargetArch.cmake

README.md

Salad - A Content Anomaly Detector based on n-Grams

Letter Salad or Salad for short, is an efficient and flexible implementation of the well-known anomaly detection method Anagram by Wang et al. (RAID 2006)

Salad enables detecting anomalies in large-scale string data. The tool is based on the concepts of n-grams, that is, strings are compared using all substrings of length n. During training, these n-grams are extracted from a collection of strings and stored in a Bloom filter. This enables the detector to represent a large number of n-grams in very little memory. During anomaly detection, the n-grams of unknown strings are matched against the Bloom filter and strings containing several n-grams not seen during training are flagged as anomalous.

Salad extends the original method Anagram in different ways: First, the tool does not only operate on n-grams of bytes, but is also capable of comparing n-grams over words and tokens. Second, Salad implements a 2-class version of the detector that enables discriminating strings of two types. Finally, the tool features a build-in inspection and statistic mode that can help to analyze the learned Bloom filter and its predictions.

The tool can be utilized in different fields of application. For example, the concept underlying Salad has been prominently used for intrusion detection, but is not limited to this scenario. To illustrate the versatility of Salad we provide some concrete examples of its usage. All examples come with data sets and instructions.

Copyright (C) 2012-2014 Christian Wressnegger

Something went wrong with that request. Please try again.