spark-streaming-direct-kafka

High Performance Spark Streaming with Direct Kafka in Java

What & Why

Simple library provides easy way to consume from Kafka using Spark Streaming. This lib keeps offsets in zookeeper - instead of them stored in HDFS. Since lib stores offsets only once per batch - we can achieve very high throughput.

This is relatively reliable - but there can be still some data loss. But in most scenarios this provide at least once guarantees. We managed to consume over 100,000 messages/ sec using this lib.

How to Run:

This is how you start your job:

spark-streaming-direct-kafka/src/main/java/com/spark/streaming/tools/StreamingEngine.java

Configs are self explanatory and can be changed here:

spark-streaming-direct-kafka/src/main/resources/streaming.yml

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
src/main		src/main
LICENSE		LICENSE
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

src/main

src/main

LICENSE

LICENSE

README.md

README.md

pom.xml

pom.xml

Repository files navigation

spark-streaming-direct-kafka

What & Why

How to Run:

About

Releases

Packages

Languages

License

ameyamk/spark-streaming-direct-kafka

Folders and files

Latest commit

History

Repository files navigation

spark-streaming-direct-kafka

What & Why

How to Run:

About

Resources

License

Stars

Watchers

Forks

Languages