Complete Pipeline Training at Big Data Scala By the Bay
Switch branches/tags
Nothing to show
Clone or download
Permalink
Failed to load latest commit information.
config fixed job server, cleaned up exports Aug 15, 2015
datasets adding cassandra exercises notebook and associated very small dataset Aug 14, 2015
feeder/src/main Update feeder app to use new Kafka producer. Aug 12, 2015
notebooks/spark-notebook/pipeline updating cassandra exercises Aug 16, 2015
project
streaming/src/main/scala/com/bythebay/pipeline/spark/streaming
usbstick
.gitignore pushing minor update to streaming script Aug 12, 2015
Dockerfile snipped dependencies on all things exceptspark-1-4.1-bin-fluxcapacito… Oct 19, 2015
README.md cleaned up README Aug 15, 2015
build.sbt
bythebay-config.sh changed wording a bit Aug 15, 2015
bythebay-create.sh updated scripts Aug 15, 2015
bythebay-feed.sh fixing batchTime column issue Aug 12, 2015
bythebay-jobserver.sh
bythebay-setup.sh updated scripts Aug 15, 2015
bythebay-start.sh
bythebay-stop.sh
bythebay-streaming.sh Add jobserver start script, upgrade cass spark connector to 1.4.0-M3 Aug 14, 2015

README.md

pipeline

Join the chat at https://gitter.im/bythebay/pipeline Complete Pipeline Training at Big Data Scala By the Bay

Pipeline Description

Dating ratings data => Akka app => Kafka => Spark Streaming => Cassandra => Dashboard

In addition, Spark MLLib, DataFrames will be demonstrated using a combination of the Cassandra real time data plus static Parquet data, on a notebook interface.

Follow the Wiki to continue exploring -->