Skip to content
Branch: master
Go to file
Code

Latest commit

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
bin
 
 
doc
 
 
lib
 
 
res
 
 
src
 
 
 
 
 
 
 
 
 
 
 
 

README.md

Salad - A Content Anomaly Detector based on n-Grams

Letter Salad or Salad for short, is an efficient and flexible implementation of the well-known anomaly detection method Anagram by Wang et al. (RAID 2006)

Salad enables detecting anomalies in large-scale string data. The tool is based on the concepts of n-grams, that is, strings are compared using all substrings of length n. During training, these n-grams are extracted from a collection of strings and stored in a Bloom filter. This enables the detector to represent a large number of n-grams in very little memory. During anomaly detection, the n-grams of unknown strings are matched against the Bloom filter and strings containing several n-grams not seen during training are flagged as anomalous.

Salad extends the original method Anagram in different ways: First, the tool does not only operate on n-grams of bytes, but is also capable of comparing n-grams over words and tokens. Second, Salad implements a 2-class version of the detector that enables discriminating strings of two types. Finally, the tool features a build-in inspection and statistic mode that can help to analyze the learned Bloom filter and its predictions.

The tool can be utilized in different fields of application. For example, the concept underlying Salad has been prominently used for intrusion detection, but is not limited to this scenario. To illustrate the versatility of Salad we provide some concrete examples of its usage. All examples come with data sets and instructions.

Copyright (C) 2012-2014 Christian Wressnegger

You can’t perform that action at this time.