Skip to content
Joe Betz edited this page May 27, 2015 · 4 revisions

Courier Design

Why

Improve software development for development teams where REST endpoints written in Scala are consumed by multiple client languages, by introducing a schema language and generating Scala data bindings from them.

Say you've got developers writing web, iOS and Android clients using a shared set of REST APIs. To stay productive, the client developers need to have a clear understanding of the REST request/response data structure for all the REST endpoints provided to them. Scala already provides a number of high quality libraries for serializing/deserializing JSON, but not all client developers (for web, Android and iOS) are fluent in Scala, so it is not reasonable to expect them to read the Scala source code to figure out the exact structure of request and response data, and manually documenting the structure of data for all REST endpoints is tedious and error prone.

By using a schema language to clearly describe the JSON structure of all our data, all developers can use the schemas as needed. If those schemas are also machine parseable, developers can also use them to automate various tasks such as validating data and generating language bindings. For Scala, we primarily want to be able to read and write the data using language idiomatic data bindings. For example, a JSON object that represents a record would bind to a Scala case class, with a field for each JSON object field, where any optional fields in the JSON Object are represented with Option[T] in Scala.

Schema language selection

We reviewed a variety of messaging protocols and schema languages, including JSON Schema, Thrift, Protobuf, Pegasus, Avro and Message Pack. Of these, only Pegasus had both excellent schema language expressivity and a high quality JVM implementation.

Pegasus is an Apache Avro based schema language that retains compatibility with Avro while offering an improved JSON serialization. This means that we can use Pegasus for both JSON and Avro binary. Pegasus has a well engineered, feature rich, implementation for the Java. It is subsystem of the Rest.li opensource project, is well maintained by the service infrastructure team at Linkedin and is in widespread use a Linkedin both by Rest.li and a variety of data systems.

Our criteria for a schema language:

  • Able to represent all types with natural looking JSON
  • Includes types for Records, Tagged unions, Maps and Arrays (we basically want ADTs)
  • Supports optional types and defaults
  • Includes primitive types for all JSON types, and, ideally, distinguishes between floating point numeric types and integer types
  • Allows validation rules to be included in the schema
  • Composability. Type declarations can be written in multiple files that reference each other.

Code generator

Write a code generator for Scala that generates a Scala class for each Pegasus schemas (.pdsc files).

With only a few minor modifications, we can reuse the existing rest.li-sbt-plugin, that is already able to generate Java files for Pegasus schemas. We can simply swap in a different code generator implementation.

To generate Scala code, we'll use Twirl, the same string template generator used by the Play! Framework.

Generator alternatives considered:

  • Treehugger: A DSL based Scala code generator. We almost used this generator, it is able to produce correctly tabbed and escaped code. Unfortunately, the DSL has a bit of a learning curve and we want to make it as easy as possible for engineers to read through the code generator source and understand how it works and, if needed, make changes to it.
  • Scala Macros: Macros are designed for to generate Scala expression trees from Scala expression trees and are not a good fit for JSON based schema language to Scala source generation.
  • Write our own: This is not a viable option. This is simply too large of a detour for us at this point in time.

Structure of the Generated Code

We want all generated classes to be Scala idiomatic.

Schema type Scala type
record case class
tagged union sealed trait with case class for each union member type
enumeration Enumeration
map Map[String, T]
array Seq[T]
optional Option[T]
int Int
long Long
float Float
double Double
boolean Boolean
string String

We also need to integrate with the Pegasus data bindings code. This means that each Scala class must extend DataTemplate. In particular, each Scala class provide access to the both the schema that it was generated from as well as direct access to the data so that it can be serialized/deserialized by all the codecs that Pegasus supports.

Sample pegasus schemas: https://github.com/coursera/courier/tree/master/spec/src/test/pegasus/org/coursera/fortune

Sample Scala data bindings that should be generated from those schemas: https://github.com/coursera/courier/tree/master/spec/src/test/scala/org/coursera/fortune