Skip to content

Run Stream Enrich 0 10

Mike Jongbloet edited this page Jan 11, 2021 · 2 revisions

This documentation is outdated!

🚧 The latest Stream Enrich documentation can be found on the Snowplow documentation site.


This documentation is for version 0.5.0 - 0.10.0 of Stream Enrich.

Running

Stream Enrich is an executable jarfile which should be runnable from any Unix-like shell environment. Simply provide the configuration file as a parameter:

$ ./snowplow-stream-enrich-0.x.0 --config my.conf --resolver file:resolver.json

This will start the Stream Enrich app to read raw events from Kinesis and write enriched events back to Kinesis.

If you are using configurable enrichments, provide the path to your enrichments directory as a parameter:

$ ./snowplow-stream-enrich-0.x.0 --config my.conf --resolver file:resolver.js --enrichments file:path/to/enrichments

If you are storing the resolver and/or enrichments in DynamoDB, use the "dynamodb:" prefix in place of the "file:" prefix:

$ ./snowplow-stream-enrich-0.x.0 --config my.conf --resolver dynamodb:eu-west-1/ConfigurationTable/resolver --enrichments dynamodb:eu-west-1/ConfigurationTable/enrichment_

The above command that the enrichments and resolver are stored in a table named ConfigurationTable in eu-west-1, that the hash key for that table is "id", that the resolver JSON is stored in an item whose hash key has value "resolver", and the enrichments are stored in items whose hash keys have values beginning with "enrichment_".

Running in local mode

When developing the Scala collector and Kinesis enrichment components, we realized that there were strong parallels between the Kinesis stream processing paradigm and conventional Unix stdio I/O streams. As a result, we added the ability for:

  1. Scala Stream Collector to write Snowplow raw events to stdout instead of a Kinesis stream
  2. Stream Enrich to read Snowplow raw events from stdin, and write enriched events to stdout

This has a nice side-effect: it is possible to run Snowplow in a "local mode", where you simply pipe the output of Scala Stream Collector directly into Stream Enrich, and can then see the generated enriched events printed to your console. You can run Snowplow in local mode with a shell script like this:

#!/bin/sh

echo "Piping local collector into local enrichment..."
./snowplow-stream-collector-0.1.0 --config ./collector.conf | ./snowplow-kinesis-enrich-0.1.0 --config ./enrich.conf

Make sure to set the sources and sinks in your configuration files (Scala configuration template, Kinesis enrich template) to the relevant stdio/stdout settings.

Snowplow "local mode" could be helpful for debugging Snowplow tracker implementations before putting tags live.

Configuring the log level

Stream Enrich uses slf4j logging. If you run the executable jarfile using the java -jar command, you can set the log level as a system property:

$ java -jar -Dorg.slf4j.simpleLogger.defaultLogLevel=debug \
    snowplow-stream-enrich-0.x.0 --config my.conf --resolver file:resolver.json

This will also affect messages logged by the Kinesis Client Library(which Stream Enrich uses to read from Kinesis.)

All done?

You have setup Stream Enrich! You are now ready to setup alternative data stores.

Return to the setup guide.

Clone this wiki locally