Skip to content

aws-samples/aws-cdk-managed-elkk

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Amazon Managed ELKK

This repository contains an implementation example of a managed ELKK stack using the AWS Cloud Development Kit. This example uses Python.

Table of Contents

  1. Context
  2. Prerequisites
  3. Amazon Virtual Private Cloud
  4. Amazon Managed Streaming for Apache Kafka
  5. Filebeat
  6. Amazon Elasticsearch Service
  7. Kibana
  8. Amazon Athena
  9. Logstash
  10. Clean up

Context

The ELKK stack is a pipeline of services to support real-time reporting and analytics. Amazon services can provide a managed ELKK stack using the services Amazon Elasticsearch Service, Logstash on Amazon EC2 or on Amazon Elastic Container Services and Amazon Managed Streaming for Kafka. Kibana is included as a capability of the Amazon Elasticsearch Service. As part of a holistic solution Logstash in addition to outputting logs to Amazon Elasticsearch outputs the log to Amazon S3 for longer term storage. Amazon Athena can be used to directly query files in Amazon S3.

Components

Filebeat agents will be used to collect the logs from the application/host systems, and publish the logs to Amazon MSK. Filebeat agents are deployed on an Amazon EC2 instance to simulate log generation.

Amazon Managed Streaming for Kafka (Amazon MSK) is used as a buffering layer to handle the collection of logs and manage the back-pressure from downstream components in the architecture. The buffering layer will provide recoverability and extensibility in the platform.

The Logstash layer will perform a dual-purpose of reading the data from Amazon MSK and indexing the logs to Amazon Elasticsearch in real-time as well as storing the data to S3.

Users can search for logs in Amazon Elasticsearch Service using Kibana front-end UI application. Amazon Elasticsearch is a fully managed service which provides a rich set of features such as Dashboards, Alerts, SQL query support and much more which can be used based on workload specific requirements.

Logs are stored in Amazon S3 to support cold data log analysis requirements. AWS Glue catalog will store the metadata information associated with the log files to be made available to the user for ad-hoc analysis.

Amazon Athena supports SQL queries against log data stored in Amazon S3.

ELKK Architecture


Prerequisites

The following tools are required to deploy this Amazon Managed ELKK stack.

If using AWS Cloud9 skip to section "AWS Cloud9 - Create Cloud9 Environment" below.

AWS CDK - https://docs.aws.amazon.com/cdk/latest/guide/getting_started.html
AWS CLI - https://aws.amazon.com/cli/
Git - https://git-scm.com/downloads
python (3.6 or later) - https://www.python.org/downloads/
Docker - https://www.docker.com/

If desired AWS Cloud9 set up is detailed in the AWS Cloud9 setup Instructions.

Create the Managed ELKK

Complete the following steps to set up the Managed ELKK workshop in your environment.

At a bash terminal session.

# clone the repo
$ git clone https://github.com/aws-samples/aws-cdk-managed-elkk
# move to directory
$ cd aws-cdk-managed-elkk

Create Elkk - 1

# bootstrap the remaining setup (assumes us-east-1)
$ bash bootstrap.sh
# activate the virtual environment
$ source .env/bin/activate

Create Elkk - 2

Bootstrap the CDK

Create the CDK configuration by bootstrapping the CDK.

# bootstrap the cdk
(.env)$ cdk bootstrap aws://youraccount/yourregion

Terminal - Bootstrap the CDK

Terminal - Bootstrap the CDK


Amazon Virtual Private Cloud

The first stage in the ELKK deployment is to create an Amazon Virtual Private Cloud with public and private subnets. The Managed ELKK stack will be deployed into this VPC.

Use the AWS CDK to deploy an Amazon VPC across multiple availability zones.

# deploy the vpc stack
(.env)$ cdk deploy elkk-vpc

ELKK VPC - 1

ELKK VPC - 2


Amazon Managed Streaming for Apache Kafka

The second stage in the ELKK deployment is to create the Amazon Managed Streaming for Apache Kafka cluster. An Amazon EC2 instance is created with the Apache Kafka client installed to interact with the Amazon MSK cluster.

Use the AWS CDK to deploy an Amazon MSK Cluster into the VPC.

# deploy the kafka stack
(.env)$ cdk deploy elkk-kafka

The CDK will prompt to apply Security Changes, input "y" for Yes.

ELKK Kafka - 1

When Client is set to True an Amazon EC2 instance is deployed to interact with the Amazon MSK Cluster. It can take up to 30 minutes for the Amazon MSK cluster and client EC2 instance to be deployed.

ELKK Kafka - 2

Wait until 2/2 checks are completed on the Kafka client EC2 instance to ensure that the userdata scripts have fully run.

ELKK Kafka - 3

On creation the Kafka client EC2 instance will create three Kafka topics: "elkktopic", "apachelog", and "appevent".

Open a terminal window to connect to the Kafka client Amazon EC2 instance and create a Kafka producer session:

# get the ec2 instance public dns
(.env)$ kafka_client_dns=`aws ec2 describe-instances --filter file://kafka/kafka_filter.json --output text --query "Reservations[*].Instances[*].{Instance:PublicDnsName}[0].Instance"` && echo $kafka_client_dns
# use the public dns to connect to the amazon ec2 instance
(.env)$ ssh ec2-user@$kafka_client_dns

ElKK Kafka - 4

While connected to the Kafka client EC2 instance create the Kafka producer session on the elkktopic Kafka topic:

# Get the cluster ARN
$ kafka_arn=`aws kafka list-clusters --output text --query 'ClusterInfoList[*].ClusterArn'` && echo $kafka_arn
# Get the bootstrap brokers
$ kafka_brokers=`aws kafka get-bootstrap-brokers --cluster-arn $kafka_arn --output text --query '*'` && echo $kafka_brokers
# Connect to the cluster as a producer on the Kafka topic "elkktopic" 
$ /opt/kafka_2.12-2.4.0/bin/kafka-console-producer.sh --broker-list $kafka_brokers --topic elkktopic

ElKK Kafka - 5

Leave the Kafka producer session window open.

Open a new terminal window and connect to the Kafka client EC2 instance to create a Kafka consumer session:

ElKK Kafka - 6

# get the ec2 instance public dns
(.env)$ kafka_client_dns=`aws ec2 describe-instances --filter file://kafka/kafka_filter.json --output text --query "Reservations[*].Instances[*].{Instance:PublicDnsName}[0].Instance"` && echo $kafka_client_dns
# use the public dns to connect to the ec2 instance
(.env)$ ssh ec2-user@$kafka_client_dns

Note the optional steps in red, if the yourkeypair is not recognized.

ElKK Kafka - 7

While connected to the Kafka client EC2 instance create the consumer session on the elkktopic Kafka topic.

# Get the cluster ARN
$ kafka_arn=`aws kafka list-clusters --output text --query 'ClusterInfoList[*].ClusterArn'` && echo $kafka_arn
# Get the bootstrap brokers
$ kafka_brokers=`aws kafka get-bootstrap-brokers --cluster-arn $kafka_arn --output text --query '*'` && echo $kafka_brokers
# Connect to the cluster as a consumer
$ /opt/kafka_2.12-2.4.0/bin/kafka-console-consumer.sh --bootstrap-server $kafka_brokers --topic elkktopic --from-beginning

Type messages into the Kafka producer session and they are published to the Amazon MSK cluster

ElKK Kafka - 8

The messages published to the Amazon MS cluster by the producer session will appear in the Kafka consumer window as they are read from the cluster.

ElKK Kafka - 9

The Kafka client EC2 instance windows can be closed.


Filebeat

To simulate incoming logs for the ELKK cluster Filebeat will be installed on an Amazon EC2 instance. Filebeat will harvest logs generated by a dummy log generator and push these logs to the Amazon MSK cluster.

Use the AWS CDK to create an Amazon EC2 instance installed with Filebeat and a dummy log generator.

# deploy the Filebeat stack
(.env)$ cdk deploy elkk-filebeat

ElKK Filebeat - 1

An Amazon EC2 instance is deployed with Filebeat installed and configured to output to Kafka.

ElKK Filebeat - 2

Wait until 2/2 checks are completed on the Filebeat EC2 instance to ensure that the userdata script as run.

Open a new terminal window connect to the Filebeat EC2 instance and create create dummy logs:

# get the Filebeat ec2 instance public dns
(.env)$ filebeat_dns=`aws ec2 describe-instances --filter file://filebeat/filebeat_filter.json --output text --query "Reservations[*].Instances[*].{Instance:PublicDnsName}"` && echo $filebeat_dns
# use the public dns to connect to the filebeat ec2 instance
(.env)$ ssh ec2-user@$filebeat_dns

ElKK Filebeat - 3

While connected to the Filebeat EC2 instance create dummy logs:

# generate dummy apache logs with log generator
$ ./log_generator.py

ElKK Filebeat - 4

Dummy logs created by the log generator will be written to the apachelog folder. Filebeat will harvest the logs and publish them to the Amazon MSK cluster.

In the Kafka client EC2 instance terminal window disconnect the consumer session with <control+c>.

Create Kafka consumer session on the apachelog Kafka topic.

# Get the cluster ARN
$ kafka_arn=`aws kafka list-clusters --output text --query 'ClusterInfoList[*].ClusterArn'` && echo $kafka_arn
# Get the bootstrap brokers
$ kafka_brokers=`aws kafka get-bootstrap-brokers --cluster-arn $kafka_arn --output text --query '*'` && echo $kafka_brokers
# Connect to the cluster as a consumer
$ /opt/kafka_2.12-2.4.0/bin/kafka-console-consumer.sh --bootstrap-server $kafka_brokers --topic apachelog --from-beginning

ElKK Filebeat - 5

Messages generated by the log generator should appear in the Kafka consumer terminal window.

ElKK Filebeat - 6


Amazon Elasticsearch Service

The Amazon Elasticsearch Service provides an Elasticsearch domain and Kibana dashboards. The elkk-elastic stack also creates an Amazon EC2 instance to interact with the Elasticsearch domain. The EC2 instance can also be used to create an SSH tunnel into the VPC for Kibana dashboard viewing.

# deploy the elastic stack
(.env)$ cdk deploy elkk-elastic

When prompted input "y" for Yes to continue.

ElKK Elastic - 1

An Amazon EC2 instance is deployed to interact with the Amazon Elasticsearch Service domain.

New Amazon Elasticsearch Service domains take about ten minutes to initialize.

ElKK Elastic - 2

Wait until 2/2 checks are completed on the Amazon EC2 instance to ensure that the userdata script has run.

ElKK Elastic - 3

Connect to the EC2 instance using a terminal window:

# get the elastic ec2 instance public dns
(.env)$ elastic_dns=`aws ec2 describe-instances --filter file://elastic/elastic_filter.json --output text --query "Reservations[*].Instances[*].{Instance:PublicDnsName}"` && echo $elastic_dns
# use the public dns to connect to the elastic ec2 instance
(.env)$ ssh ec2-user@$elastic_dns

ElKK Elastic - 4

While connected to the Elastic EC2 instance:

# get the elastic domain
$ elastic_domain=`aws es list-domain-names --output text --query '*'` && echo $elastic_domain
# get the elastic endpoint
$ elastic_endpoint=`aws es describe-elasticsearch-domain --domain-name $elastic_domain --output text --query 'DomainStatus.Endpoints.vpc'` && echo $elastic_endpoint
# curl a doc into elasticsearch
$ curl -XPOST $elastic_endpoint/elkktopic/_doc/ -d '{"message": "Hello - this is a test message"}' -H 'Content-Type: application/json'
# curl to query elasticsearch
$ curl -XPOST $elastic_endpoint/elkktopic/_search -d' { "query": { "match_all": {} } }' -H 'Content-Type: application/json'
# count the records in the index
$ curl -GET $elastic_endpoint/elkktopic/_count
# exit the Elastic ec2 instance
$ exit

ElKK Elastic - 5


Kibana

Amazon Elasticsearch Service has been deployed within a VPC in a private subnet. To allow connections to the Kibana dashboard we deploy a public endpoint using Amazon API Gateway, AWS Lambda, Amazon CloudFront, and Amazon S3.

# deploy the kibana endpoint
(.env)$ cdk deploy elkk-kibana

Kibana - 1

When prompted "Do you wish to deploy these changes?", enter "y" for Yes.

Kibana - 2

When the deployment is complete the Kibana url is output by the AWS CDK as "elkk-kibana.kibanalink. Click on the link to navigate to Kibana.

Kibana - 3

Open the link.

Kibana - 4

The Kibana Dashboard is visible.

Kibana - 5

To view the records on the Kibana dashboard an "index pattern" needs to be created.

Select "Management" on the left of the Kibana Dashboard.

Kibana - 6

Select "Index Patterns" at the top left of the Management Screen.

Select Index Patterns

Input an Index Patterns into the Index Pattern field as "elkktopc*".

Input Pattern

Click "Next Step".

Click "Create Index Pattern".

The fields from the index can be seen. Click on "Discover".

Select Discover

The data can be seen on the Discovery Dashboard.

Dashboard


Amazon Athena

Amazon Simple Storage Service is used to storage logs for longer term storage. Amazon Athena can be used to query files on S3.

# deploy the athena stack
(.env)$ cdk deploy elkk-athena

ELKK Athena

ELKK Athena


Logstash

Logstash is deployed to subscribe to the Kafka topics and output the data into Elasticsearch. An additional output is added to push the data into S3. Logstash additionally parses the apache common log format and transforms the log data into json format.

The Logstash pipeline configuration can be viewed in logstash/logstash.conf

Check the /app.py file and verify that the elkk-logstash stack is initially set to deploy Logstash on an Amazon EC2 instance and Amazon Fargate deployment is disabled.

# logstash stack
logstash_stack = LogstashStack(
    app,
    "elkk-logstash",
    vpc_stack,
    logstash_ec2=True,
    logstash_fargate=False,
    env=core.Environment(
        account=os.environ["CDK_DEFAULT_ACCOUNT"],
        region=os.environ["CDK_DEFAULT_ACCOUNT"],
    ),
)

When we deploy the elkk-stack we will be deploying Logstash on an Amazon EC2 instance.

(.env)$ cdk deploy elkk-logstash

ELKK Logstash 1

An Amazon EC2 instance is deployed with Logstash installed and configured with an input from Kafka and output to Elasticsearch and s3.

ELKK Logstash 2

Wait until 2/2 checks are completed on the Logstash EC2 instance to ensure that the userdata scripts have fully run.

ELKK Logstash 3

Connect to the Logstash EC2 instance using a terminal window:

# get the logstash instance public dns
$ logstash_dns=`aws ec2 describe-instances --filter file://logstash/logstash_filter.json --output text --query "Reservations[*].Instances[*].{Instance:PublicDnsName}"` && echo $logstash_dns
# use the public dns to connect to the logstash instance
$ ssh ec2-user@$logstash_dns

ELKK Logstash 4

While connected to logstash EC2 instance:

# verify the logstash config, the last line should contain "Config Validation Result: OK. Exiting Logstash"
$ /usr/share/logstash/bin/logstash --config.test_and_exit -f /etc/logstash/conf.d/logstash.conf
# check the logstash status
$ service logstash status -l

ELKK Logstash 5

Exit the Logstash instance and reconnect to the Filebeat instance.

# exit logstash instance
exit
# get the Filebeat ec2 instance public dns
(.env)$ filebeat_dns=`aws ec2 describe-instances --filter file://filebeat/filebeat_filter.json --output text --query "Reservations[*].Instances[*].{Instance:PublicDnsName}"` && echo $filebeat_dns
# use the public dns to connect to the filebeat ec2 instance
(.env)$ ssh ec2-user@$filebeat_dns

In the Filebeat EC2 instance generate new log files.

# generate new logs
$ ./log_generator.py

ELKK Logstash 6

Navigate to Kibana and view the logs generated.

Create a new Index Pattern for the apache logs using pattern "elkk-apachelog*".

ELKK Logstash 7

At the Configure Settings dialog there is now an option to select a timestamp. Select "@timestamp".

Dashboard

Apache Logs will now appear on a refreshed Dashboard by their timestamp. Apache Logs are selected by their index at mid-left of the Dashboard.

Dashboard

Navigate to s3 to view the files pushed to s3.

Dashboard

Dashboard

Logstash can be deployed into containers or virtual machines. To deploy logstash on containers update the logstash deployment from Amazon EC2 to AWS Fargate.

Update the /app.py file and verify that the elkk-logstash stack is set to fargate and not ec2.

# logstash stack
logstash_stack = LogstashStack(
    app,
    "elkk-logstash",
    vpc_stack,
    logstash_ec2=False,
    logstash_fargate=True,
    env=core.Environment(
        account=os.environ["CDK_DEFAULT_ACCOUNT"],
        region=os.environ["CDK_DEFAULT_ACCOUNT"],
    ),
)

Deploy the updated stack, terminating the Logstash EC2 instance and creating a Logstash service on AWS Fargate.

(.env)$ cdk deploy elkk-logstash

Logstash 12

The logstash EC2 instance will be terminated and an AWS Fargate cluster will be created. Logstash will be deployed as containerized tasks.

Logstash 13

In the Filebeat EC2 instance generate new logfiles.

# get the Filebeat ec2 instance public dns
(.env)$ filebeat_dns=`aws ec2 describe-instances --filter file://filebeat/filebeat_filter.json --output text --query "Reservations[*].Instances[*].{Instance:PublicDnsName}"` && echo $filebeat_dns
# use the public dns to connect to the filebeat ec2 instance
(.env)$ ssh ec2-user@$filebeat_dns

```bash
# generate new logs, the -f 20 will generate 20 files at 30 second intervals
$ ./log_generator.py -f 20

Logstash 14

Navigate to Kibana and view the logs generated. They will appear in the Dashboard for Apache Logs as they are generated.

Dashboard


Cleanup

To clean up the stacks... destroy the elkk-vpc stack, all other stacks will be torn down due to dependencies.

CloudWatch logs will need to be separately removed.

(.env)$ cdk destroy elkk-vpc

Destroy

License

This library is licensed under the MIT-0 License. See the LICENSE file.