Skip to content
master
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 

README.md

This repository is a complementary to the my blog post regarding stateful streaming in Spark. It contains a working example which uses the same data structures discussed in the post.

This example assumes you already have Spark set up locally or remotely.

Steps to make this code run:

  1. Clone the code
  2. Set a checkpoint directory inside the application.conf file under the "checkpoint-directory" key:
spark {
  spark-master-url = "local[*]"
  checkpoint-directory = "" 
  timeout-in-minutes = 5
}
  1. This application uses a socket stream to consume data (this was the simplest way to make this work). In order for that to work, you need to pass two arguments to the program: host and port

For anyone using IntelliJ, you can configure the "Program Arguments" in the configuration:

Configuration

Otherwise you can pass it as arguments to spark-submit or pass them locally via your favorite IDE.

  1. Start up netcat on the same port you pass the program.
  2. Take the data from resources/sample-data.txt and send them via netcat.
  3. Set breakpoints as you desire and watch the app run!

About

A full example of my blog post regarding Sparks stateful streaming (http://asyncified.io/2016/07/31/exploring-stateful-streaming-with-apache-spark/).

Resources

License

Releases

No releases published

Packages

No packages published

Languages

You can’t perform that action at this time.