Docker compose file to bring up a Kafka and Zookeeper install for testing purposes.
This build has been put together as part of the Databricks blog demonstrating a simple architecture for consuming windows endpoint logs into Databricks, using winlogbeats on the endpoint to send sysmon events via kafka. It is read and processed by the notebook link.
You can use it to quickly spin up an instance of kafka for testing purposes. Please DO NOT try using it for production use cases. It works for my simple needs, and will work for readers of the blog researching how to re-create the simple architecture.
💡 Remember: This should be used for testing only
Best installed on a fresh AWS instance since it uses the metadata service to configure advertised.listener. If you want to run this somewhere else, change the docker-compose.yml file.
- Create an EC2 Instance, with a MINIMUM of 16GB Ram. Anything less will fail the install.
- If you plan to send data to the Kafka server from outside of the cloud provider, create an inbound security group rule for TCP 9094.
I have tested this on t2.xlarge Ubuntu 20.04 which works for my purposes.
There are two ways to install.
- Create your own infrastructure and manually install by cloning the repo and executing the install script.
- Clone the repo onto your local machine, and execute the terraform scripts (assuming you have installed terraform already)
Once you have created an EC2 instance.
The install.sh script will download and install docker, docker-compose and everything else you need to get up and running. You'll need to download the repo and execute the install script however.
git clone https://github.com/DerekKing001/kafka-in-docker.git
cd kafka-in-docker
sudo ./install.sh
If you want to have the instance install and run kafka instantly run the following in from the ec2 userdata section.
git clone https://github.com/DerekKing001/kafka-in-docker.git 1>>/var/log/install.log 2>&1
cd kafka-in-docker 1>>/var/log/install.log 2>&1
sudo ./install.sh 1>>/var/log/install.log 2>&1
- Clone the repo onto your local machine
- Make sure you already have terraform installed (
brew install terraform) or your chosen package manager - cd into the kafka-in-docker/terraform_scripts/kafka directory. Execute
terraform init - Make any changes to the main and variables.tf files for VPC names etc
terraform apply
Once installed, you'll probably want to check a few things out. Logon to the host via ssh.
Example EC2 based on ubuntu
ssh -i <pem_file> ubuntu@<hostname>
sudo docker ps
sudo docker logs --follow docker-kafka-1
sudo docker exec -it docker-kafka-1 bash
After running a shell on the kafka container
cd /opt/kafka/bin
./kafka-topics.sh --bootstrap-server <hostname> --list
By default, the compose file is setup to create the 'winlogbeat' topic only.
from /opt/kafka/bin
./kafka-console-producer.sh --bootstrap-server <hostname>:9094 --topic winlogbeat
input is received until CTRL-d
from /opt/kafka/bin
./kafka-console-consumer.sh --bootstrap-server <hostname>:9094 --topic winlogbeat --from-beginning
Use the docker-compose.yml file to pass in kafka variables if needed.
Initially the topic winlogbeat will be configured. If you want to add more topics, amend the KAFKA_CREATE_TOPICS accordingly. This is a comma separated config
KAFKA_CREATE_TOPICS: "winlogbeat:1:1, an-other-topic:1:1"
where: 1:1 corresponds to the partition(s), and replica(s) you want.
The compose file is set to expose TCP 9092 for the internal listener, and 9094 for the external. Make sure when producing or consuming messages you use the external listening port (9094).
