Skip to content
Example of using Scala types with Apache Beam pipelines
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
project
src
.gitignore
README.md
build.sbt

README.md

A short project to demonstrate how Scala's types can make type signatures more useful when working with Beam pipelines.

Our story looks a little like this:

As a ...
I want to know how many times each user accesses each URL and the status code they received
So that ...

Our log lines look like this:

1.2.3.4,bob,2017-01-01T00:00:00.001Z,/,200

Use your IDE's support to show the type signature at different points in the pipeline.

Without type aliases:

sc.textFile(args("input"))
  // SCollection[String]
  .map(AccessLog.parseLine)
  // SCollection[Entry]
  .map(x => (x.userId, x.path, x.statusCode))
  // SCollection[(String, String, Int)]
  .countByValue
  // SCollection[((String, String, Int), Long)]

With type aliases:

sc.textFile(args("input"))
  // SCollection[String]
  .map(AccessLog.parseLine)
  // SCollection[Entry]
  .map(x => (x.userId, x.path, x.statusCode))
  // SCollection[(UserId, Path, StatusCode)]
  .countByValue
  // SCollection[((UserId, Path, StatusCode), Long)]
You can’t perform that action at this time.