Demo applications and code examples for Apache Kafka's Streams API.
Switch branches/tags
v5.1.0-beta201806200051 v5.1.0-beta201806191956 v5.1.0-beta39 v5.1.0-beta38 v5.1.0-beta37 v5.1.0-beta36 v5.1.0-beta35 v5.1.0-beta34 v5.1.0-beta180919183606 v5.1.0-beta180917172706 v5.1.0-beta180912202326 v5.1.0-beta180911213206 v5.1.0-beta180905054157 v5.1.0-beta180829024526 v5.1.0-beta180828173516 v5.1.0-beta180828022857 v5.1.0-beta180824214446 v5.1.0-beta180820223106 v5.1.0-beta180812233046 v5.1.0-beta180730185716 v5.1.0-beta180724024536 v5.1.0-beta180723173636 v5.1.0-beta180723023347 v5.1.0-beta180722215127 v5.1.0-beta180718203536 v5.1.0-beta180707004950 v5.1.0-beta180706202701 v5.1.0-beta180703024529 v5.1.0-beta180702220040 v5.1.0-beta180702214311 v5.1.0-beta180702063440 v5.1.0-beta180702063039 v5.1.0-beta180701175749 v5.1.0-beta180701010040 v5.1.0-beta180630224439 v5.1.0-beta180628184841 v5.1.0-beta180628064520 v5.1.0-beta180627203509 v5.1.0-beta180626014959 v5.1.0-beta180622181348 v5.1.0-beta180620183559 v5.1.0-beta180620180739 v5.1.0-beta180620180431 v5.1.0-beta180619025141 v5.1.0-beta180618225004 v5.1.0-beta180618223247 v5.1.0-beta180618214711 v5.1.0-beta180618191747 v5.1.0-beta180615005408 v5.1.0-beta180614233101 v5.1.0-beta180613013021 v5.1.0-beta180612224009 v5.1.0-beta180612043613 v5.1.0-beta180611231144 v5.0.1-beta180914024526 v5.0.1-beta180913003126 v5.0.1-beta180911213156 v5.0.1-beta180909000436 v5.0.1-beta180909000146 v5.0.1-beta180905054336 v5.0.1-beta180902210116 v5.0.1-beta180830182727 v5.0.1-beta180828173436 v5.0.1-beta180826190446 v5.0.1-beta180824214627 v5.0.1-beta180812233236 v5.0.1-beta180802235906 v5.0.0 v5.0.0-rc4 v5.0.0-rc3 v5.0.0-rc1 v5.0.0-beta33 v5.0.0-beta32 v5.0.0-beta31 v5.0.0-beta30 v5.0.0-beta29 v5.0.0-beta28 v5.0.0-beta27 v5.0.0-beta26 v5.0.0-beta25 v5.0.0-beta24 v5.0.0-beta23 v5.0.0-beta22 v5.0.0-beta21 v5.0.0-beta20 v5.0.0-beta19 v5.0.0-beta18 v5.0.0-beta17 v5.0.0-beta16 v5.0.0-beta15 v5.0.0-beta14 v5.0.0-beta12 v5.0.0-beta11 v5.0.0-beta10 v5.0.0-beta9 v5.0.0-beta8 v5.0.0-beta7 v5.0.0-beta6 v5.0.0-beta5 v5.0.0-beta3
Nothing to show
Clone or download
ybyzek DEVX-265: Moving code from quickstart-demos to here for microservices…
… demo (#165)

* Moving code from quickstart-demos to here for microservices demo

* Conform to master's new checks

* Addressed reviewer comments

* #165 code review (#166)

* reformat and remove unused imports

* fix wrong package

* code cleanup, including unused exceptions and variables, deprecated methods, compiler warnings, etc
Latest commit 935284f Sep 21, 2018

README.md

Kafka Streams Examples

This project contains code examples that demonstrate how to implement real-time applications and event-driven microservices using the Streams API of Apache Kafka aka Kafka Streams.

For more information take a look at the latest Confluent documentation on the Kafka Streams API, notably the Developer Guide.


Table of Contents


Available examples

This repository has several branches to help you find the correct code examples for the version of Apache Kafka and/or Confluent Platform that you are using. See Version Compatibility Matrix below for details.

There are two kinds of examples:

  • Examples under src/main/: These examples are short and concise. Also, you can interactively test-drive these examples, e.g. against a local Kafka cluster. If you want to actually run these examples, then you must first install and run Apache Kafka and friends, which we describe in section Packaging and running the examples. Each example also states its exact requirements and instructions at the very top.
  • Examples under src/test/: These examples are a bit longer because they implement integration tests that demonstrate end-to-end data pipelines. Here, we use a testing framework to automatically spawn embedded Kafka clusters, feed input data to them (using the standard Kafka producer client), process the data using Kafka Streams, and finally read and verify the output results (using the standard Kafka consumer client). These examples are also a good starting point to learn how to implement your own end-to-end integration tests.

Examples

Name Concepts used Java 8+ Java 7+ Scala
WordCount DSL Java 8+ example Scala Example
MapFunction DSL, stateless transformations, map() Java 8+ example Scala Example
SessionWindows Sessionization of user events, user behavior analysis Java 7+ example
Sum DSL, stateful transformations, reduce Java 8+ example
PageViewRegion join between KStream and KTable Java 8+ example Java 7+ example
PageViewRegionGenericAvro Generic Avro Java 8+ example Java 7+ example
WikipediaFeedSpecificAvro Specific Avro Java 8+ example Java 7+ example
SecureKafkaStreams Secure, encryption, client authentication Java 7+ example
StatesStoresDSL State Stores, DSL Java 8+ example
WordCountInteractiveQueries Interactive Queries, REST, RPC Java 8+ example
KafkaMusic Interactive Queries, State Stores, REST API Java 8+ example
PoisonPill Corrupt input records Java 8+ example
MixAndMatch DSL+Processor DSL, Processor API, KStream#transform(), KStream#process(), custom Transformer and Processor implementations Java 8+ example
ApplicationReset Application Reset Tool bin/kafka-streams-application-reset Java 8+ example
GlobalKTable join between KStream and GlobalKTable Java 8+ example
Microservice Microservice ecosystem Java 8+ example

Additional examples

Additional examples may be found here.

Integration Tests

We also provide several integration tests, which demonstrate end-to-end data pipelines. Here, we spawn embedded Kafka clusters and the Confluent Schema Registry, feed input data to them (using the standard Kafka producer client), process the data using Kafka Streams, and finally read and verify the output results (using the standard Kafka consumer client).

Tip: Run mvn test to launch the integration tests.

Integration Test Name Java 8+ Java 7+ Scala
WordCount Java 8+ Integration Test Scala Integration Test
WordCountInteractiveQueries Java 7+ Integration Test
EventDeduplication Java 8+ Integration Test
GlobalKTable Java 7+ Integration Test
HandlingCorruptedInputRecords Java 7+ Integration Test
KafkaMusic Java 7+ Integration Test
MapFunction Java 8+ Integration Test
MixAndMatch DSL+Processor Java 8+ Integration Test
PassThrough Java 7+ Integration Test
SessionWindows Java 7+ Integration Test
Sum Java 8+ Integration Test
StreamToStreamJoin Java 7+ Integration Test
StreamToTableJoin Java 7+ Integration Test Scala Integration Test
TableToTableJoin Java 7+ Integration Test
UserCountsPerRegion Java 8+ Integration Test
GenericAvro Java 7+ Integration Test Scala Integration Test
SpecificAvro Java 7+ Integration Test Scala Integration Test
ValidateStateWithInteractiveQueries Java 8+ Integration Test
ProbabilisticCounting*** Scala Integration Test

***demonstrates how to probabilistically count items in an input stream by implementing a custom state store (CMSStore) that is backed by a Count-Min Sketch data structure (with the CMS implementation of Twitter Algebird

Requirements

Apache Kafka

The code in this repository requires Apache Kafka 0.10+ because from this point onwards Kafka includes its Kafka Streams library. See Version Compatibility Matrix for further details, as different branches of this repository may have different Kafka requirements.

For the master branch: To build a development version, you typically need the latest trunk version of Apache Kafka (cf. kafka.version in pom.xml for details). The following instructions will build and locally install the latest trunk Kafka version:

$ git clone git@github.com:apache/kafka.git
$ cd kafka
$ git checkout trunk

# Bootstrap gradle wrapper
$ gradle

# Now build and install Kafka locally
$ ./gradlew clean installAll

Confluent Platform

The code in this repository requires Confluent Schema Registry. And to build Confluent Schema Registry in its development version, further dependencies of Confluent Platform are needed (e.g. Confluent Common and Confluent Rest Utils, please read its own README file for details). See Version Compatibility Matrix for further details, as different branches of this repository may have different Confluent Platform requirements.

For the master branch: To build a development version, you typically need the latest master version of Confluent Platform's Schema Registry (cf. confluent.version in pom.xml for details). The following instructions will build and locally install the latest master Schema Registry version:

$ git clone https://github.com/confluentinc/common.git
$ cd common
$ git checkout master

# Build and install common locally
$ mvn -DskipTests=true clean install

$ git clone https://github.com/confluentinc/rest-utils.git
$ cd rest-utils
$ git checkout master

# Build and install rest-utils locally
$ mvn -DskipTests=true clean install

$ git clone https://github.com/confluentinc/schema-registry.git
$ cd schema-registry
$ git checkout master

# Now build and install schema-registry locally
$ mvn -DskipTests=true clean install

Also, each example states its exact requirements at the very top.

Java 8

Some code examples require Java 8, primarily because of the usage of lambda expressions.

IntelliJ IDEA users:

  • Open File > Project structure
  • Select "Project" on the left.
    • Set "Project SDK" to Java 1.8.
    • Set "Project language level" to "8 - Lambdas, type annotations, etc."

Scala

Scala is required only for the Scala examples in this repository. If you are a Java developer you can safely ignore this section.

If you want to experiment with the Scala examples in this repository, you need a version of Scala that supports Java 8 and SAM / Java lambda (e.g. Scala 2.11 with -Xexperimental compiler flag, or 2.12).

Packaging and running the examples

Tip: If you only want to run the integration tests (mvn test), then you do not need to package or install anything -- just run mvn test. The instructions below are only needed if you want to interactively test-drive the examples under src/main/.

The first step is to install and run a Kafka cluster, which must consist of at least one Kafka broker as well as at least one ZooKeeper instance. Some examples may also require a running instance of Confluent schema registry. The Confluent Platform Quickstart guide provides the full details.

In a nutshell:

# Start ZooKeeper
$ ./bin/zookeeper-server-start ./etc/kafka/zookeeper.properties

# In a separate terminal, start Kafka broker
$ ./bin/kafka-server-start ./etc/kafka/server.properties

# In a separate terminal, start Confluent schema registry
$ ./bin/schema-registry-start ./etc/schema-registry/schema-registry.properties

# Again, please refer to the Confluent Platform Quickstart for details such as
# how to download Confluent Platform, how to stop the above three services, etc.

Tip: You can also run mvn test, which executes the included integration tests. These tests spawn embedded Kafka clusters to showcase the Kafka Streams functionality end-to-end. The benefit of the integration tests is that you don't need to install and run a Kafka cluster yourself.

If you want to run the examples against a Kafka cluster, you may want to create a standalone jar ("fat jar") of the Kafka Streams examples via:

# Create a standalone jar
#
# Tip: You can also disable the test suite (e.g. to speed up the packaging
#      or to lower JVM memory usage) if needed:
#
#     $ mvn -DskipTests=true clean package
#
$ mvn clean package

# >>> Creates target/kafka-streams-examples-5.0.0-SNAPSHOT-standalone.jar

You can now run the example applications as follows:

# Run an example application from the standalone jar.
# Here: `WordCountLambdaExample`
$ java -cp target/kafka-streams-examples-5.0.0-SNAPSHOT-standalone.jar \
  io.confluent.examples.streams.WordCountLambdaExample

The application will try to read from the specified input topic (in the above example it is TextLinesTopic), execute the processing logic, and then try to write back to the specified output topic (in the above example it is WordsWithCountsTopic). In order to observe the expected output stream, you will need to start a console producer to send messages into the input topic and start a console consumer to continuously read from the output topic. More details in how to run the examples can be found in the java docs of each example code.

If you want to turn on log4j while running your example application, you can edit the log4j.properties file and then execute as follows:

# Run an example application from the standalone jar.
# Here: `WordCountLambdaExample`
$ java -cp target/kafka-streams-examples-5.0.0-SNAPSHOT-standalone.jar \
  -Dlog4j.configuration=file:src/main/resources/log4j.properties \
  io.confluent.examples.streams.WordCountLambdaExample

Keep in mind that the machine on which you run the command above must have access to the Kafka/ZK clusters you configured in the code examples. By default, the code examples assume the Kafka cluster is accessible via localhost:9092 (aka Kafka's bootstrap.servers parameter) and the ZooKeeper ensemble via localhost:2181. You can override the default bootstrap.servers parameter through a command line argument.

Development

This project uses the standard maven lifecycle and commands such as:

$ mvn compile # This also generates Java classes from the Avro schemas
$ mvn test    # Runs unit and integration tests

Version Compatibility Matrix

Branch (this repo) Apache Kafka Confluent Platform Notes
master 2.0.0-SNAPSHOT 5.0.0-SNAPSHOT You must manually build the trunk version of Apache Kafka and the master version of Confluent Platform. See instructions above.
4.1.0-post 1.1.0(-cp1) 4.1.0 Works out of the box
4.0.0-post 1.0.0(-cp1) 4.0.0 Works out of the box
3.3.0-post 0.11.0.0(-cp1) 3.3.0 Works out of the box

The master branch of this repository represents active development, and may require additional steps on your side to make it compile. Check this README as well as pom.xml for any such information.

Docker Examples

This example launches:

The Kafka Music application demonstrates how to build of a simple music charts application that continuously computes, in real-time, the latest charts such as Top 5 songs per music genre. It exposes its latest processing results -- the latest charts -- via Kafka’s Interactive Queries feature and a REST API. The application's input data is in Avro format and comes from two sources: a stream of play events (think: "song X was played") and a stream of song metadata ("song X was written by artist Y").

More specifically, we will run the following services:

  • Confluent's Kafka Music demo application
  • a single-node Kafka cluster with a single-node ZooKeeper ensemble
  • Confluent Schema Registry

You can find detailed documentation at http://docs.confluent.io/current/streams/kafka-streams-examples/docs/index.html

Where to find help