public
Description: extprot: extensible binary protocols for cross-language communication and long-term serialization
Homepage:
Clone URL: git://github.com/mfp/extprot.git
name age message
file .gitignore Thu Oct 30 11:45:09 -0700 2008 Moved compiler sources to compiler/, renamed co... [mfp]
file LICENSE Thu Jan 15 03:29:08 -0800 2009 Added LICENSE. [mfp]
file OMakefile Wed Jul 08 03:44:44 -0700 2009 Use ocamlfind's -syntax camlp4o instead of manu... [mfp]
file OMakeroot Fri Oct 03 04:15:00 -0700 2008 Build system, updated gitignore. [mfp]
file README.md Wed Jan 28 17:06:52 -0800 2009 Typo in sample message in README.md. [mfp]
directory compiler/ Thu Jul 09 06:22:43 -0700 2009 extprotc: changed error reporting, exit with st... [mfp]
directory doc/ Sat Aug 29 03:29:53 -0700 2009 Implemented numeric type widening. [mfp]
directory examples/ Fri Jan 23 11:09:11 -0800 2009 Added address_book example (adapted from GPB tu... [mfp]
directory ruby/ Wed Jan 28 03:56:11 -0800 2009 Universal (pure) Ruby decoder: further optimiza... [mfp]
directory runtime/ Tue Sep 01 09:43:39 -0700 2009 Conv: renamed read_versioned to io_read_version... [mfp]
directory test/ Tue Sep 01 09:43:39 -0700 2009 Conv: renamed read_versioned to io_read_version... [mfp]
README.md

Note: the latest documentation can be found in extprot's git repository. Click on README.md in the directory view so that relative links work.

Introduction

extprot allows you to create compact, efficient, extensible, binary protocols that can be used for cross-language communication and long-term data serialization. extprot supports protocols with rich, composable types, whose definition can evolve while keeping both forward and backward compatibility.

The extprot compiler (extprotc) takes a protocol description and generates code in any of the supported languages to serialize and deserialize the associated data structures. It is accompanied by a runtime library for each target language which is used to read and write the structures defined by the protocol.

The protocols created using extprot are:

  • extensible: types can be extended in several ways without breaking compatibility with existent producers/consumers
  • self-delimited: each message indicates its own length. This allows you to send sequences of messages (streaming) without having to add message delimiters.
  • self-describing: a message can be decoded even without the protocol definition. What you get is roughly equivalent to XML without the DTD.
  • compact: 2 to >6 times less space than XML, typically 2 to 4 times less space than individual, compressed XML messages.
  • fast: can be deserialized one to two orders of magnitude faster than XML, and faster than it'd take to merely uncompress XML data.

There are three parts to extprot, from lower to higher level:

  1. the low-level encoding
  2. the abstract syntax to define the protocol
  3. the mapping to the target language

The abstract syntax is what the extprot user feeds to the extprotc compiler; it defines the protocol, and controls how it maps to both the low-level encoding and the target language's data model.

The low-level encoding is of interest to people who want to add support for additional target languages --- knowledge of the low-level encoding is obviously needed for the required runtime.

Example

Here's a trivial protocol definition:

(* this is a comment (* and this a nested comment *) *)
message user = {
  id : int;
  name : string;
}

The value

{ id = 1; name = "J.R.R. Tolkien" }

is serialized as this 21-byte message (output from hexdump -C)

00000000  01 13 02 00 02 03 0e 4a  2e 52 2e 52 2e 20 54 6f  |.......J.R.R. To|
00000010  6c 6b 69 65 6e                                    |lkien|
00000015

The code generated by extprotc allows you to manipulate such messages as any normal value. For instance, in the Ruby target (in progress as of 2008-11-04), you'd do:

# writing
puts "About to save record for user #{user.name}"
user.write(buf)
# save buf

# reading
user = User.read(io)
puts "Got user #{user.id} #{user.name}"

In OCaml, the message is simply a record:

let u = User.io_read_user stream in
  printf "User %S has got id %d\n" u.name u.id

Usage

  1. Write a protocol definition using extprot's abstract syntax: myprotocol.proto

  2. Run the extprotc compiler to generate the code needed to read, write, and inspect the messages defined in the protocol: extprotc myprotocol.proto (generates the code, e.g. myprotocol.ml for OCaml). More information about the generated code can be found here.

  3. Use it from your application code.