Binary encode from back-to-front #1518

tbkka · 2023-11-30T00:16:25Z

Our current front-to-back binary encoder has a subtle performance problem: In order to allocate the correct number of bytes for the size of a sub-message, it needs to know the final size of the sub-message. This results in recursive calls to size each message, leading to overall performance that is quadratic in the depth of nesting.

To be clear: this is not usually a serious problem since few people have messages with more than 2 or 3 layers of nesting.

This PR changes the encoder to encode from the end of the buffer back towards the front. This allows us to write the size after we finish each sub-message, avoiding the recursive sizing calls. With this version, we have only one sizing traversal in the initial top-level encoding request in order to properly allocate the output buffer.

Working from back to front does mean that individual fields get written in the opposite order. This is mostly not a problem: Protobuf explicitly states that decoders must accept fields in any order. The one exception is for repeated fields, which we handle here by iterating the arrays backwards inside the encoder.

A few cases where order might matter:

Unrecognized enum cases in repeated fields are treted as "unknown" fields which means that re-serializing puts them into a different place. Since this code puts the unknown fields at the beginning of the buffer rather than the end, this means that we've changed the resulting order after deserializing/reserializing.
The conformance test has one test case that verifies merging behavior and seems very sensitive to the order of fields. I suspect this is a bug in the test, but need to check further.

tbkka · 2023-11-30T00:19:29Z

Finally sat down and hammered this out. (It was easier than I thought.) As noted above, there are a couple of issues that need to be further explored before this could be accepted as-is. And of course, I'd have to work through and update the many tests that hard-code the exact bytes expected from the binary output. (Or write a test helper that is smart enough to order-insensitively compare two binary serialized messages.)

Reworking this to sit on top of #1504 would allow us to iterate the fields in reverse order, which would let this encoder provide the same field ordering as the current encoding does.

tbkka · 2023-11-30T00:22:38Z

And of course I also need to get the perf harness working again so I can verify the performance of this approach...

Sources/SwiftProtobuf/BinaryReverseEncoder.swift

thomasvl · 2023-11-30T01:02:56Z

Nice! I think waiting on the other cl so we could visit in reverse order would be best since it solves the ordering problem is the best bet vs. changing all the test/writing helpers/etc.

Performance wise…

One minor one, I think I can dig up a reference around doing the varints so they don't have to be recursive.

The sizing allocation/resizing is likely the biggest thing. To compute the buffer size initial means serializedDataSize and BinaryEncodingSizeVisitor for sub messages is also the bad performance case; we traverse the sub messages multiple times. So the draft removes those multiple passes while encoding the bytes, but we're still playing the sizing costs. (and there is a non trivial number of usages way more than 3-4 levels deep 😄 )

Thinking about this, I see a few options:

Completely remove BinaryEncodingSizeVisitor and make binary encoding resize the buffer (likely in some chunk size) as it prepends bytes. Some of the "sub" encoders/visitors might make this complex.
Change BinaryEncodingSizeVisitor to over estimate things. When it goes to compute a sub message size, just assume it will take the max number of bytes for a varint. Then once the full message has been written out, figure out how many wasted bytes are on the front and copy the data down to the front and resize the buffer.
Or, can we do the same trick in your reverse encoder within BinaryEncodingSizeVisitor? Each time we need a sub message size, cache the current value for serializedSize, visit into the sub message with self, then compute the size using the cached value and the current value for serializedSize? Thus we avoid the over visiting the sub messages.

Did any of those ideas make sense?

tbkka · 2023-11-30T01:26:16Z

I really don't believe the initial traversal to compute the overall size is a problem at all. The problem with the old approach was that we did many smaller traversals as well:

The initial size traversal pass walks the entire message, including all submessages, recursively. Last time I profiled this code, this was actually quite fast.
As we serialized, each time we looked at a submessage, we had to compute the size of that sub message again by walking it and all of its children again. This repeated walk as we visit every sub message is where we get the quadratic behavior.

Google's C++ implementation avoids recomputing the size of each submessage by caching the size on each message object. This is essentially your idea number 3 above. It works, but it's awkward and requires some additional storage somewhere to store all the sizes. (Google's C++ implementation adds a hidden field to every message for this purpose.)

I suspect re-allocating buffers would be more expensive than doing a single sizing pass up front. Not only is malloc a fairly expensive call, but keeping track of when a new allocation is needed adds a lot of bookkeeping logic.

(A more efficient way to serialize the varints would be nice. This PR is still "draft" for a reason. 😉 )

(Edited: I was using the word "resizing" to mean both "recomputing the size of something" and "changing the size of a buffer." Hopefully clearer now.)

Our current front-to-back binary encoder has a subtle performance problem: In order to allocate the correct number of bytes for the size of a sub-message, it needs to know the final size of the sub-message. This results in recursive calls to size each message, leading to overall performance that is quadratic in the depth of nesting. This is not usually a serious problem since few people have messages with more than 2 or 3 layers of nesting. This PR changes the encoder to encode from the end of the buffer back towards the front. This allows us to write the size after we finish each sub-message, avoiding the recursive sizing calls. With this version, we have only one sizing traversal in the initial top-level encoding request in order to properly allocate the output buffer. Working from back to front does mean that individual fields get written in the opposite order. This is _mostly_ not a problem: Protobuf explicitly states that decoders must accept fields in any order. The one exception is for repeated fields, which we handle here by iterating the arrays backwards inside the encoder. A few cases where order might matter: * Unrecognized enum cases in repeated fields are treted as "unknown" fields which means that re-serializing puts them into a different place. Since this code puts the unknown fields at the beginning of the buffer rather than the end, this means that we've changed the resulting order after deserializing/reserializing. * The conformance test has one test case that verifies merging behavior and seems very sensitive to the order of fields. I suspect this is a bug in the test, but need to check further.

While I'm digging around, fix some whitespace errors and simplify a few minor points in the code.

tbkka · 2024-03-04T23:12:43Z

Link this to #713

tbkka requested a review from thomasvl November 30, 2023 00:16

thomasvl reviewed Nov 30, 2023

View reviewed changes

Sources/SwiftProtobuf/BinaryReverseEncoder.swift Outdated Show resolved Hide resolved

tbkka mentioned this pull request Nov 30, 2023

Update Visitor traversal logic to use statically defined Field descriptors #1504

Open

tbkka added 2 commits December 6, 2023 15:58

Implement the missing non-contiguous String case

e976607

While I'm digging around, fix some whitespace errors and simplify a few minor points in the code.

tbkka force-pushed the tbkka-binary-reverse-encoding branch from 773afaa to e976607 Compare December 6, 2023 23:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Binary encode from back-to-front #1518

Binary encode from back-to-front #1518

tbkka commented Nov 30, 2023

tbkka commented Nov 30, 2023

tbkka commented Nov 30, 2023

thomasvl commented Nov 30, 2023

tbkka commented Nov 30, 2023 •

edited

tbkka commented Mar 4, 2024

Binary encode from back-to-front #1518

Are you sure you want to change the base?

Binary encode from back-to-front #1518

Conversation

tbkka commented Nov 30, 2023

tbkka commented Nov 30, 2023

tbkka commented Nov 30, 2023

thomasvl commented Nov 30, 2023

tbkka commented Nov 30, 2023 • edited

tbkka commented Mar 4, 2024

tbkka commented Nov 30, 2023 •

edited