Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace flatbuffers #2121

Closed
ry opened this issue Apr 15, 2019 · 13 comments · Fixed by #2818

Comments

@ry
Copy link
Collaborator

commented Apr 15, 2019

Our benchmarks are finally at the point where we're able to measure the overhead of serialization. The gap below is due to serialization:
Screen Shot 2019-04-15 at 1 13 19 PM
Flatbuffers appears to now be our bottleneck (see benchmarks in #1966)

Here are our requirements for a new serialization library:

  1. Supports Rust and TypeScript.
  2. Generated Rust code complicates our build system. Ideally we could specify the message schema using Rust structs to avoid this.
  3. Must be able to serialize into existing byte slice (&[u8] in Rust or Uint8Array in JS). That is, it should not return a newly allocated slice.
  4. Must support (at minimum) String and i32 types within a struct.
  5. At minimum be able to handle 200k serialization/deserialization per second. But probably needs to be more like 2m/sec in order to close the above throughput gap
@rsp

This comment has been minimized.

Copy link

commented Apr 15, 2019

Cap'n Proto seems like the way to go for maximum performance. There were good points provided in issue #269 and having @kentonv himself on board we couldn't go wrong with serialization (even if we need some tweaks to meet the requirements).

@kentonv

This comment has been minimized.

Copy link

commented Apr 15, 2019

I'm not sure if Cap'n Proto will perform much better than FlatBuffers. In theory, they should be similar, as they are both zero-copy protocols. I don't know very much about the details of FlatBuffers, so it's entirely possible there's some reason I'm not aware of why it would be significantly slower (or faster?) than Cap'n Proto. You could certainly try it and see what happens.

If they do in fact perform similarly, then I think you're out of options. I don't think there's any other serialization framework which would be expected to perform better than these for this use case.

@ry

This comment has been minimized.

Copy link
Collaborator Author

commented Apr 15, 2019

Some of our messages are very small - two int32 integers. We need a framework which can take a struct like

struct ReadMessage {
  rid: i32;
}

and compile into JavaScript code which only does the bare minimum:

interface ReadMessage { rid: number };

function ReadMessageParse(i32: Int32Array) => ReadMessage {
  assert(i32[0] == READ_MESSAGE_ID);
  return { rid: i32[1] };
}

function ReadMessageSerialize(msg: ReadMessage, i32: Int32Array): void {
  i32[0] = READ_MESSAGE_ID;
  i32[1] = msg.rid;
}

We know this is fast... Unfortunately existing systems seem to have a lot more code. I'd be interested to know if cap'n'proto can produce minimal code like this.

@kentonv

This comment has been minimized.

Copy link

commented Apr 15, 2019

Hmm. You could short-circuit Cap'n Proto to provide something like that. Fundamentally when you have a struct reader in Cap'n Proto, it's just a pointer to a buffer, and each accessor method simply reads from / writes to an offset in that buffer. So if you directly construct a struct reader from a buffer that contains only that one struct, then that's basically what you get.

However, I'm not sure how you extend that to support strings, or any sort of variable-length fields. At that point, you fundamentally need some sort of way to delimit objects, deal with memory allocation, etc. That's where the complexity comes from.

Could you give me an example of what kind of code you'd hope to see generated for a struct that contains some string fields?

@kentonv

This comment has been minimized.

Copy link

commented Apr 15, 2019

Oh, also: Is forwards/backwards-compatibility important to you? Or can you always assume the TypeScript and Rust sides are updated in lockstep?

If compatibility is not an issue I suppose there's a lot you can simplify, but serialization frameworks usually prioritize compatibility.

@afinch7

This comment has been minimized.

Copy link
Contributor

commented Apr 16, 2019

I think the zero-copy parameter solves the problem of variable length. The read operation currently looks like this:

function reqRead(
  rid: number,
  zeroCopy: Uint8Array
): [flatbuffers.Builder, msg.Any, flatbuffers.Offset, Uint8Array] {
  const builder = flatbuffers.createBuilder();
  const inner = msg.Read.createRead(builder, rid);
  return [builder, msg.Any.Read, inner, zeroCopy];
}

export async function read(rid: number, zeroCopy: Uint8Array): Promise<ReadResult> {
  // dispatch.sendAsync sends our serialized flatbuffers int8Array with a "zeroCopy" handle
  // to c++ v8 bindings that then passes it along to rust land via ffi. zeroCopy is used as a fixed
  // length buffer in this case the length being the number of bytes you want to read from a file. 
  // We never have to copy zeroCopy we can just write directly to it from the other side hence the
  // name "zeroCopy".
  return resRead(await dispatch.sendAsync(...reqRead(rid, zeroCopy)));
}

We don't perform any serialization on the zero copy parameter, so It can be any length it needs.

@jcao219

This comment has been minimized.

Copy link
Contributor

commented Apr 27, 2019

Only for strings and buffers, but Servo's Tendril project solves similar problems.

@ry

This comment has been minimized.

Copy link
Collaborator Author

commented May 3, 2019

First part of this transition has landed in 00ac871
and resulted in this perf improvement
Screen Shot 2019-05-03 at 9 54 06 AM

The new serialization method doesn't generalize - it's just the simplest thing that would work for read, write (and soon accept). It will require more work to migrate other ops to this system.

@jesskfullwood

This comment has been minimized.

Copy link

commented Jun 1, 2019

Just a suggestion, have you looked at Abomonation?

It's super-fast and also super-unsafe - but perhaps no moreso than rolling your own.

@FishOrBear

This comment has been minimized.

Copy link

commented Jul 25, 2019

https://github.com/FishOrBear/SimpleFlatBuffers
Simplify and optimize performance, you can refer to

@ry

This comment has been minimized.

Copy link
Collaborator Author

commented Jul 25, 2019

@FishOrBear oh very nice! I wonder if you could give an example of how translating, say, our chown function would look using SimpleFlatBuffer
https://github.com/denoland/deno/blob/master/js/chown.ts#L11

@FishOrBear

This comment has been minimized.

Copy link

commented Jul 25, 2019

I am not familiar with deno, I will try to answer it.

chown.ts
In this file, I seem to only see the call to builder.createString.

If you use SimpleFlatBuffers, it is also very simple

let path: string;
let f = new SimpleFlatBuffers();
f.writeString(path);
//If finished
f.finish();

//send buffers
let bytes = f.bb;

Unlike FlatBuffers, SimpleFlatBuffer uses sequential write data.

The party receiving the data can also directly read the data.

let f = new SimpleFlatBuffers(bytes);
let path = f.readString();

I am not familiar with the interaction of js and c++ or rust.

If the passed uint8array is passed in memory, then even we can make a memory alignment pass object.

For example in C++:

Struct A
{
    Int32 a;
    Int32 b;
}

Pass binary memory to js.

Let f = new SimpleFlatBuffers(btyes);
Let a = f.readInt32();
Let b = f.readInt32();
@FishOrBear

This comment has been minimized.

Copy link

commented Jul 25, 2019

In addition, some of the performance of flatbuffers can be optimized by using DataView.
ref:https://v8.dev/blog/dataview

Part of it might be through optimizing String Decode.
ref:Https://bugs.chromium.org/p/v8/issues/detail?id=4383
Avoid using js to encode strings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
7 participants
You can’t perform that action at this time.