Skip to content

This project contains the learning and experiments with the Apache Spark.

License

Notifications You must be signed in to change notification settings

amitnema/spark-coach

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Build Status | Quality Gate | Sputnik

spark-coach

This project contains the learning and experiments with the Apache Spark.

Following packages contains the relative api program.

  • pacakage org.apn.spark.rdd contains Spark RDD code.
  • pacakage org.apn.spark.dsl contains Spark DSL code.
  • pacakage org.apn.spark.sql contains Spark SQL code.
  • pacakage org.apn.spark.sql.streaming contains Spark Structured Streaming code.
  • pacakage org.apn.spark.streaming contains _Spark Streaming (DStream) code.

Quick Start

Prerequisites

The Spark-Coach uses Java 1.8 features so you'll need the correct jdk to run it.

Some of the features of Kafka used in these are only available since the 0.10.x release.

Setup local environment

The master branch of this demo uses 0.10.x features of Apache Kafka so all you need to do is clone and install kafka 0.10.0.0 into your local maven :

$ git clone https://github.com/apache/kafka.git $KAFKA_HOME
$ cd $KAFKA_HOME
$ git checkout 0.10.0.0
$ gradle
$ ./gradlew -PscalaVersion=2.11 jar 
$
$ # IF YOU ARE USING WINDOWS, USE `.bat` IN PLACE OF `.sh` FOR THE LAUNCH SCRIPTS BELLOW:
$ export SCALA_VERSION="2.11.8"; export SCALA_BINARY_VERSION="2.11";
$ ./bin/zookeeper-server-start.sh ./config/zookeeper.properties &
$ ./bin/kafka-server-start.sh ./config/server.properties

Initialize topics: if you're already running a local Zookeeper and Kafka and you have topic auto-create enabled on the broker you can skip the following setup, just note that if your default partitions number is 1 you will only be able to run a single instance demo.

$ # IF YOU ARE USING WINDOWS, USE `.bat` IN PLACE OF `.sh` FOR THE LAUNCH SCRIPTS BELLOW:
$ ./bin/kafka-topics.sh --zookeeper localhost --create --topic topicname --replication-factor 1 --partitions 4

Build the Spark-Coach

In another terminal cd into the directory where you have cloned the spark-coach project and use provided maven to create archive.

$ mvn clean package