Experiments in Streaming
Scala
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
core
docs
project
samza
.gitignore
LICENSE
NOTICE
README.md

README.md

coast

In this dark stream-processing landscape, coast is a ray of light.

Why coast?

  • Simple: coast provides a simple streaming model with strong ordering and exactly-once semantics. This straightforward behaviour extends across multiple machines, state aggregations, and even between independent jobs, making it easier to reason about how your entire system behaves.

  • Easy: Streams are built up and wired together using a concise, idiomatic Scala API. These dataflow graphs can be as small or as large as you like: no need to cram all your logic in one big job, or to write a bunch of single-stage jobs and track their relationships by hand.

  • Kafkaesque: coast's core abstractions are patterned after Kafka's data model, and it's designed to fit comfortably in the middle of a larger Kafka-based infrastructure. By taking advantage of Kafka's messaging guarantees, coast can implement exactly-once semantics for messages and state without a heavy coordination cost.

Quick Introduction

coast's streams are closely patterned after Kafka's topics: a stream has multiple partitions, and each partition has an ordered series of values. A stream can have any number of partitions, each of which has a unique key. You can create a stream by pulling data from a topic, but coast also has a rich API for building derivative streams: applying transformations, merging streams together, regrouping, aggregating state, or performing joins. Once you've defined a stream you like, you can give it a name and publish it out to another topic.

By defining streams and networking them together, it's possible to express arbitrarily-complex dataflow graphs, including cycles and joins. You can use the resulting graphs in multiple ways: print it out as a GraphViz image, unit-test your logic using a simple in-memory implementation, or compile the graph to multiple Samza jobs and run it on a cluster.

Sound promising? You might be interested in:

Getting Started

The 0.2.0 release is published on Bintray. If you're using maven, you'll want to point your pom.xml at the repo:

<repository>
  <id>bintray-coast</id>
  <url>https://dl.bintray.com/bkirwi/maven</url>
</repository>

...and add coast to your dependencies:

<dependency>
  <groupId>com.monovore</groupId>
  <artifactId>coast-samza_2.10</artifactId>
  <version>0.2.0</version>
</dependency>

Mutatis mutandis, the same goes for SBT and Gradle.

Mandatory Word Count Example

val Sentences = Topic[Source, String]("sentences")

val WordCounts = Topic[String, Int]("word-counts")

val graph = Flow.build { implicit builder =>

  Sentences.asSource
    .flatMap { _.split("\\s+") }
    .map { _ -> 1 }
    .groupByKey
    .streamTo("words")
    .sum.updates
    .sinkTo(WordCounts)
}

Future Work

If you're interested in what the future holds for coast -- or have questions or bugs to report -- come on over to the issue tracker.