coast

In this dark stream-processing landscape, coast is a ray of light.

Why `coast`?

Simple: coast provides a simple streaming model with strong ordering and exactly-once semantics. This straightforward behaviour extends across multiple machines, state aggregations, and even between independent jobs, making it easier to reason about how your entire system behaves.
Easy: Streams are built up and wired together using a concise, idiomatic Scala API. These dataflow graphs can be as small or as large as you like: no need to cram all your logic in one big job, or to write a bunch of single-stage jobs and track their relationships by hand.
Kafkaesque: coast's core abstractions are patterned after Kafka's data model, and it's designed to fit comfortably in the middle of a larger Kafka-based infrastructure. By taking advantage of Kafka's messaging guarantees, coast can implement exactly-once semantics for messages and state without a heavy coordination cost.

Quick Introduction

coast's streams are closely patterned after Kafka's topics: a stream has multiple partitions, and each partition has an ordered series of values. A stream can have any number of partitions, each of which has a unique key. You can create a stream by pulling data from a topic, but coast also has a rich API for building derivative streams: applying transformations, merging streams together, regrouping, aggregating state, or performing joins. Once you've defined a stream you like, you can give it a name and publish it out to another topic.

By defining streams and networking them together, it's possible to express arbitrarily-complex dataflow graphs, including cycles and joins. You can use the resulting graphs in multiple ways: print it out as a GraphViz image, unit-test your logic using a simple in-memory implementation, or compile the graph to multiple Samza jobs and run it on a cluster.

Sound promising? You might be interested in:

The heavily-commented 'Twitter reach' example, which walks through all the pieces of a real job.
A fork of the hello-samza project with setup and deployment instructions.
Some wiki documentation on the core concepts, nuances of the graph-builder API, or details on the Samza integration.

Getting Started

The 0.2.0 release is published on Bintray. If you're using maven, you'll want to point your pom.xml at the repo:

<repository>
  <id>bintray-coast</id>
  <url>https://dl.bintray.com/bkirwi/maven</url>
</repository>

...and add coast to your dependencies:

<dependency>
  <groupId>com.monovore</groupId>
  <artifactId>coast-samza_2.10</artifactId>
  <version>0.2.0</version>
</dependency>

Mutatis mutandis, the same goes for SBT and Gradle.

Mandatory Word Count Example

val Sentences = Topic[Source, String]("sentences")

val WordCounts = Topic[String, Int]("word-counts")

val graph = Flow.build { implicit builder =>

  Sentences.asSource
    .flatMap { _.split("\\s+") }
    .map { _ -> 1 }
    .groupByKey
    .streamTo("words")
    .sum.updates
    .sinkTo(WordCounts)
}

Future Work

If you're interested in what the future holds for coast -- or have questions or bugs to report -- come on over to the issue tracker.

Name		Name	Last commit message	Last commit date
Latest commit History 237 Commits
core		core
docs		docs
project		project
samza		samza
.gitignore		.gitignore
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

core

core

docs

docs

project

project

samza

samza

.gitignore

.gitignore

LICENSE

LICENSE

NOTICE

NOTICE

README.md

README.md

Repository files navigation

coast

Why `coast`?

Quick Introduction

Getting Started

Mandatory Word Count Example

Future Work

About

Releases

Packages

Languages

License

dbrambilla/coast

Folders and files

Latest commit

History

Repository files navigation

coast

Why coast?

Quick Introduction

Getting Started

Mandatory Word Count Example

Future Work

About

Resources

License

Stars

Watchers

Forks

Languages

Why `coast`?