This project contains the learning and experiments with the Apache Spark.
Following packages contains the relative api program.
- pacakage org.apn.spark.rdd contains Spark RDD code.
- pacakage org.apn.spark.dsl contains Spark DSL code.
- pacakage org.apn.spark.sql contains Spark SQL code.
- pacakage org.apn.spark.sql.streaming contains Spark Structured Streaming code.
- pacakage org.apn.spark.streaming contains _Spark Streaming (DStream) code.
The Spark-Coach uses Java 1.8 features so you'll need the correct jdk to run it.
Some of the features of Kafka used in these are only available since the 0.10.x release.
The master branch of this demo uses 0.10.x features of Apache Kafka so all you need to do is clone and install kafka 0.10.0.0 into your local maven :
$ git clone https://github.com/apache/kafka.git $KAFKA_HOME
$ cd $KAFKA_HOME
$ git checkout 0.10.0.0
$ gradle
$ ./gradlew -PscalaVersion=2.11 jar
$
$ # IF YOU ARE USING WINDOWS, USE `.bat` IN PLACE OF `.sh` FOR THE LAUNCH SCRIPTS BELLOW:
$ export SCALA_VERSION="2.11.8"; export SCALA_BINARY_VERSION="2.11";
$ ./bin/zookeeper-server-start.sh ./config/zookeeper.properties &
$ ./bin/kafka-server-start.sh ./config/server.properties
Initialize topics: if you're already running a local Zookeeper and Kafka and you have topic auto-create enabled on the broker you can skip the following setup, just note that if your default partitions number is 1 you will only be able to run a single instance demo.
$ # IF YOU ARE USING WINDOWS, USE `.bat` IN PLACE OF `.sh` FOR THE LAUNCH SCRIPTS BELLOW:
$ ./bin/kafka-topics.sh --zookeeper localhost --create --topic topicname --replication-factor 1 --partitions 4
In another terminal cd
into the directory where you have cloned the spark-coach
project and use provided
maven to create archive.
$ mvn clean package