-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Save TLog resources by letting peek request only spilled data. #1584
Conversation
If a peek is entirely fulfilled from spilled data, then it's likely that the next peek will be also. It is thus wasteful for each of these peeks to call peekMessagesFromMemory, which memcpy's excessively, and then throw all that data away without using it. Now, TLogs will give a hint back to peek cursors about if the provided reply was served entirely from the spilled data, which peek curors then feed back as the hint into their next request. At some point, a cursor will send a request for only spilled data, get an incomplete response, and then be told to send its next request as one that peeks from memory as well, and then it will fully catch up.
If there's some spilled data, there's probably a lot of spilled data, and now we can pull all of it faster.
I realized I was down to a one-line diff to address #1529 ParallelPeekGetMore when we've peeked only spilled data. I haven't fully thought through if this is a good idea in all cases, but it does pass correctness ¯\_(ツ)_/¯ |
@fdb-build, test this please |
@@ -158,6 +158,7 @@ ACTOR Future<Void> serverPeekParallelGetMore( ILogSystem::ServerPeekCursor* self | |||
expectedBegin = res.end; | |||
self->futureResults.pop_front(); | |||
self->results = res; | |||
self->onlySpilled = res.onlySpilled; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When switching from parallelGetMore back to regular GetMore, we still want to process the requests we already have outstanding to the TLogs to both avoid wasting work by the logs and to avoid an oscillation where the tlog spills more just as we stop using parallelGetMore. I think this can be done by uses the size of futureResults as another indicator that this function should be called, and then avoid issuing more requests is onlySpilled==false
A profile on mighty showed that we're spending ~10%-15% of CPU doing peekMessagesFromMemory, which are largely going to be for spilled pieces of data, and thus completely wasted.
This involves a protocol change, so it's not 6.1 patchable.
There's another way to go with this, which would be to do this as a heuristic entirely on the TLog, via looking at if the request is for a begin that's within N million versions of the current version, and then taking endVersion an knownDurableVersion from the peek reply, and doing the same heuristic on the client side. That feels potentially trickier to feel comfortable with, but would be 6.1 patchable instead.