Integration of Apache Kafka with Spark Streaming API.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
project create project with sbt spark-streaming-kafka Nov 8, 2018
src add arguments Nov 18, 2018
.gitignore create project with sbt spark-streaming-kafka Nov 8, 2018
README.md safe state Nov 8, 2018
build.sbt safe state Nov 8, 2018

README.md

Apacha Kafka - Integration of Spark

..

Java Framework

Verify Java Installation

java -version

Install Java

sudo apt-get install default-jre

Install ZooKeeper Framework

cd /opt
wget http://apache.lauf-forum.at/zookeeper/stable/zookeeper-3.4.12.tar.gz
tar -zxf zookeeper-3.4.12.tar.gz
cd zookeeper-3.4.12
mkdir data

Create config conf/zoo.cfg

tickTime=2000
dataDir=/opt/zookeeper-3.4.12/data
clientPort=2181
initLimit=5
syncLimit=2

Start

bin/zkServer.sh start

Test with Netcat, send ok, get imok

ncat 192.168.56.101 2181
ruok
imok

Start Cli

bin/zkCli.sh

Stop

bin/zkServer.sh stop

Apache Kafka Installation

Download�and configure

wget http://mirrors.ae-online.de/apache/kafka/2.0.0/kafka_2.11-2.0.0.tgz
tar -zxf kafka_2.11-2.0.0.tgz
cd kafka_2.11-2.0.0

Start

bin/kafka-server-start.sh config/server.properties

Stop

bin/kafka-server-stop.sh config/server.properties

Create a Kafka Topic

bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test

List topics

bin/kafka-topics.sh --list --zookeeper localhost:2181

Start Producer to send messages

bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test	

Start Consumer to receive messages

bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning

Delete a topic

bin/kafka-topics.sh --zookeeper localhost:2181 --delete --topic Hello-kafka

List consumer groups

bin/kafka-consumer-groups.sh --bootstrap-server 127.0.0.1:9092 --list

https://hevodata.com/blog/how-to-install-kafka-on-ubuntu/

https://www.tutorialspoint.com/apache_kafka/apache_kafka_basic_operations.htm

https://kafka.apache.org/quickstart

https://stackoverflow.com/questions/41376647/understanding-consumer-group-id

Spark Implementation

build.sbt

name := "kafka-spark-integration"

version := "0.1"

scalaVersion := "2.11.7"

libraryDependencies ++= Seq(
  "org.apache.spark" %% "spark-core" % "2.3.2",
  "org.apache.spark" %% "spark-sql" % "2.3.2",
  "org.apache.spark" %% "spark-streaming" % "2.3.2" % "provided",
  "org.apache.spark" %% "spark-streaming-kafka" % "1.6.0",
  "org.jfree" % "jfreechart" % "1.0.19",
  "junit" % "junit" % "4.12",
  "org.scalactic" %% "scalactic" % "3.0.5",
  "org.scalatest" %% "scalatest" % "3.0.5" % "test",
  "org.apache.logging.log4j" % "log4j-slf4j-impl" % "2.7",
  "org.apache.logging.log4j" %% "log4j-api-scala" % "2.7"
)