Snap processor plugin regexp-engine; parse with multiple regexes, split on regexp, template tags, and more!
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.



Split, parse, and re-tag string metrics.

  1. Getting Started
  1. Documentation
  1. Community Support
  2. Contributing
  3. License
  4. Acknowledgements

Getting Started

System Requirements

Operating systems

All OSs currently supported by snap:

  • Linux/amd64
  • Darwin/amd64


Download plugin binary:

You can get the pre-built binaries for your OS and architecture under the plugin's release page. For Snap, check here.

To build the plugin binary:


Clone repo into $GOPATH/src/

$ git clone<yourGithubID>/snap-plugin-processor-regexp-engine.git


The following provides instructions for building the plugin yourself if you decided to download the source. We assume you already have a $GOPATH setup for golang development. The repository utilizes glide for library management.

build: make

testing: make test-small

Configuration and Usage

Load the Plugin

Once the framework is up and running, you can load the plugin.

$ snaptel plugin load snap-plugin-processor-regexp-engine
Plugin loaded
Name: regexp-engine
Version: 1
Type: processor
Signed: false
Loaded Time: Sat, 18 Mar 2017 13:28:45 PDT


This is a Snap processor plugin; metrics are passed in from the collector (or from previous processors), modified here, and passed on "down the chain" to either more processors or publishers. This plugin can split one metric into many by a regular expression, then enrich each of the split metrics with further regular expressions and golang templating.

Upon receiving a metric, this plugin will:

  1. Attempt to match the metric against a "gate" provided in configuration. a. If it matches, the plugin will process against that gate's configuration by:
    1. Splitting the metrics by the regexes provided in the 'split' section, in order
    2. Attempting to match each split against the original gate, filtering it out entirely on failure
    3. Parsing the regexes, using capture groups to capture parts of the string to store in the metric as tags
    4. Using golang templating against the metric as a whole to create or override further tags for the metric.
  2. If no gates match, the metric is simply passed "down the chain" as-is.

If the metric matches more than one gate, it will be processed for each gate.

Imagine a task manifest like:

  version: 1
    type: "simple"
    interval: "3s"
  max-failures: 10
        metric_name: featurelistfile
        cache_dir: /var/lib/snap/logcache
        log_dir: /var/log
        log_file: featurelistfile
        splitter_type: new-line
        collection_time: 2s
        metrics_limit: 1000
        /intel/logs/*: {}
        - plugin_name: "regexp-engine"
            "^feature ([A-Za-z0-9]+)":
                - "\|"
                - "^feature (?P<feature_name>)"
                feature_index: "{{ .Tags.feature_name }}"
            - plugin_name: "file"
                file: /tmp/logmetrics

And a list of metrics comes into the plugin like:

  "Name": "/intel/logs/featurelistfile/message",
  "Value": "feature 1|feature 2|feature 3",
  "Tags": {},

The resulting metrics list will be passed down like:

  "Name": "/intel/logs/featurelistfile/message",
  "Value": "feature 1",
  "Tags": {
      "feature_name": "1",
      "feature_index": "1"
  "Name": "/intel/logs/featurelistfile/message",
  "Value": "feature 2",
  "Tags": {
    "feature_name": "2",
    "feature_index": "2"
  "Name": "/intel/logs/featurelistfile/message",
  "Value": "feature 3",
  "Tags": {
    "feature_name": "3",
    "feature_index": "3"

Parse Details


The configuration is a yaml dict where the keys are regexp matches -- "gates" -- and the values are further dicts that express instructions on handling a metric with data matching the key. Take this sample config for instance:

  "^<[^>]+> .*$":
      - "<(?<user>[^>]+)> some IRC message"

A metric with a value of " test message" would gain a 'user' tag of 'zcarlson', while a metric with a value of "%some_other_value%" would pass through unprocessed.

The next few sections will instruct how to define the parsing of string metrics that match this gate.

Split phase

If you want to split the metrics based on a string (regexp), use the 'split' key with a list value. The list is a list of regular expressions that will be used to split the string metrics. So for a config like this:

      - '\\$'
      - '\\|'
      - '.*'

And a metric string value like map1|key1$value1|map2$key2|value2, you would get metrics with values "map1", "key1", "value1", "map2", "key2", "value2". The splits here are applied in the order they are defined.

Match-again phase

If a metric was split, the gate match is attempted against the split metric's value; if there is no match, the split metric is discarded.

Parse phase

The parse key is required, and its value is also a list of regular expressions, again applied in order. For instance:

  "instanceHostname\": \"([^\"]+)\"":
      - 'instanceHostname\": \"(?P<hostname>[^\"]*)\"'
      - 'otherHostname\": \"(?P<hostname>[^\"]+)\"'

For a metric with a JSON-like value like {"instanceHostname": "myhost1"}, the hostname tag will be myhost1; however, if the JSON-value looked like:

{"instanceHostname": "myhost1", "otherHostname": "differenthost"}

The hostname tag would be "differenthost1". Remember, though, parse tags only get set on match, so JSON like this:

{"instanceHostname": "myhost1", "otherHostname": ""}

Would still see hostname set to "myhost1" (because the otherHostname value is empty and the regex requires at least one non-quote character).

As you might expect, JSON like:

{"instanceHostname": ""}

Will end up setting the hostname tag to an empty string.

(If you want to do just a split or template for some reason, you can set parse to a list containing only a ".*", but parsing is the primary intended use of the plugin)

Template phase

The metric is essentially filled out after the parse phase, but you can do some additional processing/tag-setting with golang templating and the template key, whose value is another dict where the keys are tag names and the values are golang templates (see the documentation linked) that provide the intended values for a tag. For instance:

  "instanceHostname\": \"(?P<host>[^\"]+)\":
      - "instanceHostname\": \"(?P<host>[^\"]+)\""
      - "instanceHttpPort\": (?P<port>)"
      url: "http://{{ }}:{{ .Tags.port }}/"


We keep working on more feature and will update the processor as needed.

Community Support

Open an issue and we will respond.


We love contributions!

The most immediately helpful way you can benefit this plug-in is by cloning the repository, adding some further examples and submitting a pull request.


Released under the Apache 2.0 License.