Running Apache Kafka Streams at scale on AWS Fargate with Amazon MSK
Build a real-time stream processing application using Amazon MSK, AWS Fargate and the Apache Kafka Streams API. Kafka Streams API is a client library that simplifies development of stream applications. Behind the scenes Kafka Streams library is really an abstraction over standard Kafka Producer and Kafka Consumer API. When you build applications with Kafka Streams library your data streams are automatically made fault tolerant, and are transparently, and elastically distributed over the instances of the applications. Kafka Streams applications are supported by Amazon MSK. AWS Fargate is a serverless compute engine for containers that works with AWS container orchestration services like Amazon Elastic Container Service (Amazon ECS), which allows us to easily run, scale, and secure containerized applications.
Our streaming application architecture will consist of Stream Producer, which will connect to Twitter Stream API, read tweets and publish to MSK. Kafka Streams Processor will consume these messages, perform window aggregation, push to topic result, and also print out to logs. Both apps will be hosted on AWS Fargate as service.
You can find a further details and a more thorough description and discussion of the architecture on the AWS Big Data Blog.
Make sure to complete the following steps as prerequisites:
Create an AWS account. For this post, you configure the required AWS resources in the
us-west-2Region. If you haven’t signed up, complete the following tasks:
a. Create an account. For instructions, see Sign Up for AWS.
Have a Bearer Token associated with your Twitter app. To create a developer account, see Get started with the Twitter developer platform.
Install Docker on your local machine.
See CONTRIBUTING for more information.
This library is licensed under the MIT-0 License. See the LICENSE file.