Replication message overhead, especially for ACK. #1636

lmwnshn · 2021-08-19T15:17:50Z

The message payload is a very small fraction of the overall message.

Here's an example of an ACK:

It is not as bad, but still somewhat bad, for lengthier messages such as TXN_APPLIED. Note that we could theoretically Huffman encode the keys or whatever and significantly shrink the message size, which brings us back to the ACK case:

To fix this, we should examine batching messages, or maintaining a channel to send small messages over.

lmwnshn · 2021-08-19T15:35:27Z

A quick counter of pending_messages_.size() suggests that batching would be quite profitable.

https://github.com/cmu-db/noisepage/blob/master/src/messenger/messenger.cpp#L526

jkosh44 · 2021-08-19T15:36:16Z

In terms of reading these pictures which number is the payload and which is the overall message size?

I thinking switching to a smaller data serialization format like protobuf would probably help with individual message size, which is something I'd like to try next semester.

lmwnshn · 2021-08-19T15:37:47Z

@jkosh44 First line, Frame blabla: N bytes on wire is the total frame size
Our payload is the last line, Data (N bytes)

lmwnshn · 2021-08-19T15:39:07Z

A quick thought is maybe you can just introduce a new BatchedMessage message type, which is interpreted by the receiver as many individual little messages. But a complication is that we do not separate out pending_messages_ by recipient right now.

jkosh44 · 2021-08-19T16:33:07Z

Ahh ok, I think I misunderstood then. So the non-data bytes are things like TCP headers? In that case protobuf would reduce payload/non-payload ratio but not help with non-payload data per message.

I like the idea of a BatchedMessage type.

One option would be for the ReplicationManager to be responsible for batching and unbatching messages, so it just hands a single batched message to the Messenger, which can treat it like any other message. That way we don't need to separate out pending_messages_ by recipient. However no other clients of the Messenger would benefit from batched messages.

Though I don't think separating pending_messages_ out by recipient would be too hard. Also it could allow us to do some cool things in the future like see if certain recipients are more backed up than others.

lmwnshn · 2021-08-19T19:17:21Z

A separate optimization is that we can assume messages are not lost by default, which means that we don't need to ACK the majority of messages.

lmwnshn added the performance Performance related issues or changes. label Aug 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replication message overhead, especially for ACK. #1636

Replication message overhead, especially for ACK. #1636

lmwnshn commented Aug 19, 2021 •

edited

lmwnshn commented Aug 19, 2021

jkosh44 commented Aug 19, 2021

lmwnshn commented Aug 19, 2021

lmwnshn commented Aug 19, 2021

jkosh44 commented Aug 19, 2021 •

edited

lmwnshn commented Aug 19, 2021

Replication message overhead, especially for ACK. #1636

Replication message overhead, especially for ACK. #1636

Comments

lmwnshn commented Aug 19, 2021 • edited

lmwnshn commented Aug 19, 2021

jkosh44 commented Aug 19, 2021

lmwnshn commented Aug 19, 2021

lmwnshn commented Aug 19, 2021

jkosh44 commented Aug 19, 2021 • edited

lmwnshn commented Aug 19, 2021

lmwnshn commented Aug 19, 2021 •

edited

jkosh44 commented Aug 19, 2021 •

edited