Skip to content

AMP Logs Routing

Nicolas Degory edited this page Sep 8, 2017 · 3 revisions

Until AMP 0.12.0, logs are gathered on the Docker nodes with the AMP agent, sent to NATS, fetched by ampbeat and sent to Elasticsearch. Our log analysis capabilities are limited, most of the powerful tools on top of Elasticsearch are not free (and require a payed license, which is not compatible with the AMP open source model).

To unlock the capabilities around log analysis, and to fit a use case where an existing log analysis tool already exists in the corporate IS, we should be able to integrate AMP to an external datastore.

One solution for that is to add Logstash to the core services, which will allow through an easy reconfiguration a routing of the logs out of the AMP core stack.

There's native plugin in libbeat (used by ampbeat) and Logstash to create a flow between them. Logstash will listen to events with the beat plugin. Ampbeat will disable the Elasticsearch output plugin and use the Logstash output plugin instead. The definition of a pipeline on Logstash will route these events to the Elasticsearch output.

That is the simple step to enable by configuration a routing to Elasticsearch, an external stash, or both. The Docker config feature is a way to update the routing, Logstash can be configured to scan the pipeline definition on a regular basis. This brings a buffer between beat and ES, as well as a capacity to filter the events (to transform, send to another index, or drop). To make sure logs are not lost, the buffer has to be persisted, safe from container failure and node failure. There's several ways to do that, but that should be the object of a different document.

One consequence of adding LS between ampbeat and ES, is that ampbeat will only be able to send the data to LS, not the metadata (mappings). Ampbeat used to send both to ES. The standard way to do it is to store the mappings in the Logstash image, and tell Logstash to create the mappings on ES. That's a minor inconvenience since the mappings are built in the cmd/ampbeat tree.

This already answers to the external stash use case for the next release, but it opens also a convenient way to manage logs on a multi swarm deployment. A Logstash dedicated to a Swarm can make its data available to a higher level Logstash, via a Redis datastore. This would be the first Logstash output and the second Logstash input. And at the same time routing the logs in duplication to a local Elasticsearch if needed. Redis has an official Prometheus metrics exporter, which will be a nice way to monitor the status of the logs in a single Swarm. For now Logstash produces metrics, but they're not on the Prometheus format.

Prototyped in #1526