Skip to content
Spark reference applications
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
logs_analyzer Explain the usage of the network stream emulator Nov 29, 2016
timeseries Removed accidentally committed timeseries-weather files Jul 9, 2015
twitter_classifier Merge branch 'spark-2' of github.com:mslinn/reference-apps into spark-2 Nov 10, 2016
.gitignore Specify sbt version to avoid compatibility problems. Nov 29, 2016
LICENSE Add name of the license Nov 7, 2014
README.md
SUMMARY.md Some edits in the reference app Apr 22, 2015

README.md

Databricks Reference Apps

At Databricks, we are developing a set of reference applications that demonstrate how to use Apache Spark. This book/repo contains the reference applications.

The reference applications will appeal to those who want to learn Spark and learn better by example. Browse the applications, see what features of the reference applications are similar to the features you want to build, and refashion the code samples for your needs. Additionally, this is meant to be a practical guide for using Spark in your systems, so the applications mention other technologies that are compatible with Spark - such as what file systems to use for storing your massive data sets.

  • Log Analysis Application - The log analysis reference application contains a series of tutorials for learning Spark by example as well as a final application that can be used to monitor Apache access logs. The examples use Spark in batch mode, cover Spark SQL, as well as Spark Streaming.

  • Twitter Streaming Language Classifier - This application demonstrates how to fetch and train a language classifier for Tweets using Spark MLlib. Then Spark Streaming is used to call the trained classifier and filter out live tweets that match a specified cluster. For directions on how to build and run this app - see twitter_classifier/scala/README.md.

  • Weather TimeSeries Data Application with Cassandra - This reference application works with Weather Data which is taken for a given weather station at a given point in time. The app demonstrates several strategies for leveraging Spark Streaming integrated with Apache Cassandra and Apache Kafka for fast, fault-tolerant, streaming computations with time series data.

These reference apps are covered by license terms covered here.

You can’t perform that action at this time.