Switch branches/tags
Nothing to show
Find file History
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
..
Failed to load latest commit information.
conf
img
ReadMe.md

ReadMe.md

Open Source Elastic Stack First Steps

Table Of Contents

Preface

This document is about my first steps with the Open Source Elastic Stack which I attend to use for the management of logging data. The most important reason for this document is the documentation of the setup work I did and not the introduction to the Elastic Stack.

Thus to understand this document you should know about the basic concepts of the Elastic Stack and its components. You can find a lot of information on these topics in the links I provided below.

Elastic Stack Documentation

When it comes to dealing with a new technology, I always like to read some basic concepts about the technology first. Fortunately the Elastic Stack is well documented. These are the links I used a lot

The product entry pages provide some overview information and on my first visit I wondered where to find the reference (product) documentation. Once noticed it is really obvious. Each product page has a litte link bar at the right side with Docs, Forum and GitHub links (seel below).

Elasticsearch product page

Start / Installation

I did my first steps on Windows and the installation was mostly about extracting the archives and - after configuration - starting a batch script.

So please take a look at the Get Started page which will guide you through the downloads of all Elastic Stack components and follow the installation instructions of each component as described in their reference documentation (see links above).

Configuration

Elasticsearch

Configuration file: elasticsearch.yml

I used the default setup of Elasticsearch and just change the path to the data store

#
# Path to directory where to store the data (separate multiple locations by comma):
#
path.data: C:/data/FooBar/elasticsearch

Filebeat

Currently Filebeat it is the recommended way to forward logging data to Elasticsearch or Logstash. In my setup I use Filebeat to send the logging data of two different applications to Logstash.

The complete configuration can be found here: filebeat.yml

As the filebeat prospectors are the most relevant part I am going to explain the configuration in detail here:

filebeat.prospectors:

- input_type: log
  paths:
    - C:\Progs\FooBarBackend\logs\FooBarBackend*.log
  # log4j2 pattern: %date{ISO8601}{UTC}Z | %5.5level | %15.15thread | %25.25logger{1.} | %30.30class{1.}(%4.4line) |> %message | MDC=%MDC%n
  # consider lines that do not start with the date pattern to belong to the previous line
  multiline.pattern: '^[0-9]{4}-[0-9]{2}-[0-9]{2}'
  multiline.negate: true
  multiline.match: after
  # custom fields that will be used for conditional filtering in logstash
  fields:
    component: FooBarBackend

Above ist the configuration for the prospector that handles the log files of the first application (let's call it Backend). The paths declaration is the pattern for the log files which should be handled. The recommendation for rolling log files seems to be to put all log files in the path ('*') and not only the "active" one.

Usually a log file entry is just a line. But in some cases (like Stacktraces or formatted data like XML or JSON) a log entry covers several lines. These should be kept together. That is where the multiline configuration option is needed. In my case a regular log file entry starts with a timestamp. Thus I configured the system this way: each line that does not match my timestamp pattern (pattern + negate = true) is a consecutive line of the last regular log file entry (with a timestamp in front).

And as a last important configuration option I set a custom component field to identify the component that causes the logging entry (my backend application). I need this in logstash when transforming the log entry.

The log files of my second application (Frontend) looks very similar but considers that a different logging framework with a different timestamp pattern is used:

- input_type: log
  paths:
    - C:\Progs\\FooBarFrontend\logs\FooBarFrontend.log*
  # log4j pattern: %d{dd.MM.yy,HH:mm:ss,SSS} %-5p [%20.20t] %30.30c{1.} - %m%n 
  # consider lines that do not start with the date pattern to belong to the previous line
  multiline.pattern: '^[0-9]{2}\.[0-9]{2}\.[0-9]{2}'
  multiline.negate: true
  multiline.match: after
  # custom fields that will be used for conditional filtering in logstash
  fields:
    component: FooBarFrontend

To send all data to Logstash (listening on the default port) the output has to be configured as shown below

output.logstash:
  hosts: ["localhost:5044"]

Logstash

A Logstash pipeline includes inputs, filter (optional) and output components. These pipeline components are realized by plugins. Numerous plugins offer a rich feature selection for different use cases

            |------------ Logstash pipeline  -----------|

  data -->>    inputs -> filters (optional) -> outputs    -->> Elasticsearch

Logstash is used in my setup to filter, parse and modify the logging data that is fetched up by Filebeat before it is forwarded to Elasticsearch.

The complete configuration can be found here: logstash-beats.conf

As my input comes from beats I use the beats plugin to enable Logstash to receive events form the Beats framework.

input {
  beats {
    port => 5044
  }
}

The incoming data (called Logstash event) should be converted to proper (structured) data before it gets forwarded to Elasticsearch. Therefore a filter is used:

filter {

  if [fields][component] == "FooBarBackend" {

    grok {
      # log4j2 pattern: %date{ISO8601}{UTC}Z | %5.5level | %15.15thread | %25.25logger{1.} | %30.30class{1.}(%4.4line) |> %message | MDC=%MDC%n
      match => { "message" => "%{TIMESTAMP_ISO8601:logtimestamp} +\| +%{WORD:level} +\| +%{DATA:thread} +\| +%{DATA:logger} +\| +%{DATA:class} \|\> +%{DATA:msg} \| +MDC=%{DATA:mdc}$" }
      add_field => { "[@metadata][index]" => "foobar-backend" }
    }

  }

  if [fields][component] == "FooBarFrontend" {

    grok {
      # log4j pattern: %d{dd.MM.yy,HH:mm:ss,SSS} %-5p [%20.20t] %30.30c{1.} - %m%n
      match => { "message" => "%{GUI_DATETIME:logtimestamp} +%{WORD:level} +\[ +%{DATA:thread}\] +%{DATA:class} - %{DATA:msg}$" }
      add_field => { "[@metadata][index]" => "foobar-frontend" }
    }

    date {
       match => [ "logtimestamp", "dd.MM.yy,HH:mm:ss,SSS" ]
       target => "logtimestamp" 
    }

  }
  
}

The filter section should handle Backend and Frontend events different. Therefore a condition on the component name (which was attached as a custom field in the Beats framework) is used.

A regular expression syntax is used in conjuction with the match configuration option of the grok filter plugin to split the log line into structured data (key/value pairs). A variety of predefined patterns is available for this task.

>    grok {
>      # log4j2 pattern: %date{ISO8601}{UTC}Z | %5.5level | %15.15thread | %25.25logger{1.} | %30.30class{1.}(%4.4line) |> %message | MDC=%MDC%n
>      match => { "message" => "%{TIMESTAMP_ISO8601:logtimestamp} +\| +%{WORD:level} +\| +%{DATA:thread} +\| +%{DATA:logger} +\| +%{DATA:class} \|\> +%{DATA:msg} \| +MDC=%{DATA:mdc}$" }
>      ...
>    }

The FRONTEND_DATETIME pattern which is used to parse the datetime of a fontend log entry is no standard pattern. But custom patterns can easily be added by providing a custom text file with these patterns which needs to be located in the logstash/patterns application directory. I did this with the patterns/custom file. The contents looks like this

> FRONTEND_DATETIME %{MONTHDAY}\.%{MONTHNUM}\.%{YEAR},%{HOUR}:?%{MINUTE}(?::?%{SECOND})

The given grok filter above achieves that the event data has structured fields information about timestamp, thread, logging level, class and message after the filter process. This data will be relevant for the later output to Elasticsearch.

Finding the right pattern that matches a log can be hard. The online Grok Debugger is a great help on doing so.

Beside handling the given log data my filter also adds a @metada information to a logging event. This will only be used internally and not be part of a later output. In my setup I added the index name that should be used for the Elasticsearch output of the events by using the add_field option. Backend and Frontend logging data should be stored in different indices in Elasticsearch to make them accessible independently.

>     add_field => { "[@metadata][index]" => "foobar-backend" }

Last but not least I had to transfer the logtimestamp data of the Frontend events explicit to a date type because the custom pattern was not detected as a date value by default. I used the date filter for this

>    date {
>       match => [ "logtimestamp", "dd.MM.yy,HH:mm:ss,SSS" ]
>       target => "logtimestamp" 
>    }

It is important to have the correct date type here otherwise the values can not be used as a time-field later on (when working with the data in Elasticsearch / Kibana).

To complete the Logstash pipeline the elasticsearch output plugin is used.

		output {
		  elasticsearch {
			hosts => "localhost:9200"
			user => elastic
			password => changeme 
			manage_template => false
			index => "%{[@metadata][index]}-%{+YYYY.MM.dd}"
			document_type => "%{[@metadata][type]}"
		  }

Beside the connection options also the index and document_type are specified. Formerly defined @metadata information is used for this.

The logging data of the two different applications will be stored with different elasticsearch indices now.

logging data elasticsearch index
Backend Application foobar-backend-YYYY.MM.dd
Frontend Application foobar-frontend-YYYY.MM.dd

Kibana

Configuration file: kibana.yml

I used the default setup of Kibana. I only changed the server.host value from localhost to 0.0.0.0 to allow remote connections.

# Specifies the address to which the Kibana server will bind. IP addresses and host names are both valid values.
# The default is 'localhost', which usually means remote machines will not be able to connect.
# To allow connections from remote users, set this parameter to a non-loopback address.
#server.host: "localhost"
server.host: "0.0.0.0"

Because of the setup I can now declare the following index patterns in Kibana

Kibana Index Pattern affects logs of
foobar-* backend and frontend application
foobar-backend-* the backend application
foobar-frontend-* the frontend application

Discovering of log entries for both or individual applications works great ;-)

Known-Issues

  • multiline log entries are grouped together to one log event but the extracted msg content contains only the message of the first line (the rest ist hidden in the source attribute of the logging data which is sent to Elasticsearch)

Lessons Learned

  • treating each logging source (application) individually is hard and error-prone

    • consider to use a common logging framework with a common Pattern Layout for each application of a system
    • consider to use log file names which can be used as template for the Elasticsearch index; the log file name is part of the source log event data an can propably be used to generate a automatic index name
      • index name must be all lowercase (see here)
  • ...

  • License

    • see Subscriptions
    • using the Monitoring feature of X-Pack requires a free license which needs to be renewed every year