Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix race condition in scan / watch setup #33

Merged
merged 1 commit into from Mar 23, 2024
Merged

Conversation

barrucadu
Copy link
Owner

I set up a watch by:

  1. Scanning all the keys under all of the prefix
  2. Establishing a watch from the revision returned by the scan

To avoid missing data, it's essential that all the scanning is done at the same revision. Otherwise, if the scan requires multiple requests, and some data is written between the two requests, the second request could miss it.

Unfortunately, that's what seemed to be happening.

I was able to track it down by adding a bit more logging of revision numbers: each scan used the following revision, rather than using the same revision! Turns out that I had misinterpreted the meaning of the revision field of the response header. The documentation says:

revision is the key-value store revision when the request was applied.

Which is terse, but made me think that:

  • If I don't supply a revision in the RangeRequest (ie, revision: 0) it returns the latest revision.
  • If I do supply a revision in the RangeRequest, it uses that.

That's not right. The header revision is actually the latest revision at the time the request was answered, and has no relation to the revision specified in the request.

So the correct thing to do is, on the very first request, to not supply a revision - and then use the header revision returned from that first request for all subsequent requests.

Closes #32

I set up a watch by:

1. Scanning all the keys under all of the prefix
2. Establishing a watch from the revision returned by the scan

To avoid missing data, it's essential that all the scanning is done at
the *same* revision.  Otherwise, if the scan requires multiple requests,
and some data is written between the two requests, the second request
could miss it.

Unfortunately, that's what seemed to be happening.

I was able to track it down by adding a bit more logging of revision
numbers: each scan used the following revision, rather than using the
same revision!  Turns out that I had misinterpreted the meaning of the
`revision` field of the response header.  The documentation says:

> revision is the key-value store revision when the request was applied.

Which is terse, but made me think that:

- If I don't supply a revision in the `RangeRequest` (ie, `revision: 0`) it
   returns the latest revision.
- If I do supply a revision in the `RangeRequest`, it uses that.

That's not right.  The header `revision` is actually the latest
revision at the time the request was answered, and has no relation to
the revision specified in the request.

So the correct thing to do is, on the very first request, to not supply
a revision - and then use the header `revision` returned *from that
first request* for all subsequent requests.
@barrucadu barrucadu merged commit 7473e11 into master Mar 23, 2024
3 checks passed
@barrucadu barrucadu deleted the fix-revision-race branch March 23, 2024 23:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix race condition in schedulerd / workerd coming online
1 participant