Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace flatbuffers #2121

Open
ry opened this Issue Apr 15, 2019 · 6 comments

Comments

4 participants
@ry
Copy link
Collaborator

commented Apr 15, 2019

Our benchmarks are finally at the point where we're able to measure the overhead of serialization. The gap below is due to serialization:
Screen Shot 2019-04-15 at 1 13 19 PM
Flatbuffers appears to now be our bottleneck (see benchmarks in #1966)

Here are our requirements for a new serialization library:

  1. Supports Rust and TypeScript.
  2. Generated Rust code complicates our build system. Ideally we could specify the message schema using Rust structs to avoid this.
  3. Must be able to serialize into existing byte slice (&[u8] in Rust or Uint8Array in JS). That is, it should not return a newly allocated slice.
  4. Must support (at minimum) String and i32 types within a struct.
  5. At minimum be able to handle 200k serialization/deserialization per second. But probably needs to be more like 2m/sec in order to close the above throughput gap
@rsp

This comment has been minimized.

Copy link

commented Apr 15, 2019

Cap'n Proto seems like the way to go for maximum performance. There were good points provided in issue #269 and having @kentonv himself on board we couldn't go wrong with serialization (even if we need some tweaks to meet the requirements).

@kentonv

This comment has been minimized.

Copy link

commented Apr 15, 2019

I'm not sure if Cap'n Proto will perform much better than FlatBuffers. In theory, they should be similar, as they are both zero-copy protocols. I don't know very much about the details of FlatBuffers, so it's entirely possible there's some reason I'm not aware of why it would be significantly slower (or faster?) than Cap'n Proto. You could certainly try it and see what happens.

If they do in fact perform similarly, then I think you're out of options. I don't think there's any other serialization framework which would be expected to perform better than these for this use case.

@ry

This comment has been minimized.

Copy link
Collaborator Author

commented Apr 15, 2019

Some of our messages are very small - two int32 integers. We need a framework which can take a struct like

struct ReadMessage {
  rid: i32;
}

and compile into JavaScript code which only does the bare minimum:

interface ReadMessage { rid: number };

function ReadMessageParse(i32: Int32Array) => ReadMessage {
  assert(i32[0] == READ_MESSAGE_ID);
  return { rid: i32[1] };
}

function ReadMessageSerialize(msg: ReadMessage, i32: Int32Array): void {
  i32[0] = READ_MESSAGE_ID;
  i32[1] = msg.rid;
}

We know this is fast... Unfortunately existing systems seem to have a lot more code. I'd be interested to know if cap'n'proto can produce minimal code like this.

@kentonv

This comment has been minimized.

Copy link

commented Apr 15, 2019

Hmm. You could short-circuit Cap'n Proto to provide something like that. Fundamentally when you have a struct reader in Cap'n Proto, it's just a pointer to a buffer, and each accessor method simply reads from / writes to an offset in that buffer. So if you directly construct a struct reader from a buffer that contains only that one struct, then that's basically what you get.

However, I'm not sure how you extend that to support strings, or any sort of variable-length fields. At that point, you fundamentally need some sort of way to delimit objects, deal with memory allocation, etc. That's where the complexity comes from.

Could you give me an example of what kind of code you'd hope to see generated for a struct that contains some string fields?

@kentonv

This comment has been minimized.

Copy link

commented Apr 15, 2019

Oh, also: Is forwards/backwards-compatibility important to you? Or can you always assume the TypeScript and Rust sides are updated in lockstep?

If compatibility is not an issue I suppose there's a lot you can simplify, but serialization frameworks usually prioritize compatibility.

@afinch7

This comment has been minimized.

Copy link
Contributor

commented Apr 16, 2019

I think the zero-copy parameter solves the problem of variable length. The read operation currently looks like this:

function read(
  rid: number,
  zeroCopy: Uint8Array
): [flatbuffers.Builder, msg.Any, flatbuffers.Offset, Uint8Array] {
  const builder = flatbuffers.createBuilder();
  const inner = msg.Read.createRead(builder, rid);
  return [builder, msg.Any.Read, inner, p];
}

export async function read(rid: number, zeroCopy: Uint8Array): Promise<ReadResult> {
  // dispatch.sendAsync sends our serialized flatbuffers int8Array with a "zeroCopy" handle
  // to c++ v8 bindings that then passes it along to rust land via ffi. zeroCopy is used as a fixed
  // length buffer in this case the length being the number of bytes you want to read from a file. 
  // We never have to copy zeroCopy we can just write directly to it from the other side hence the
  // name "zeroCopy".
  return resRead(await dispatch.sendAsync(...reqRead(rid, p)));
}

We don't perform any serialization on the zero copy parameter, so It can be any length it needs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.