New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: change the GC to keep values within the GC interval #6778
Conversation
Review status: 0 of 3 files reviewed at latest revision, 2 unresolved discussions, some commit checks pending. storage/engine/gc.go, line 65 [r1] (raw file):
does this correctly distinguish between putting an empty value and a nonexistent one? storage/engine/gc.go, line 71 [r1] (raw file):
this is a little confusing. Better to stick with explicit Comments from Reviewable |
Reviewed 3 of 3 files at r1. storage/gc_queue_test.go, line 188 [r1] (raw file):
s/first/only oldest/ The example would benefit from another value sitting at ts2-1s. ts2 is exactly the GC threshold, but there's nothing special about it in this case. Should also have (separate) examples in which the last value below the GC threshold timestamp is a deletion (in which case it should be deleted) storage/engine/gc.go, line 48 [r1] (raw file):
Add commentary here (or elsewhere; find a good place for it) about what the job of the garbage collector is: to give us the key-value pairs we can delete without invalidating any reads which happen in the time interval storage/engine/gc.go, line 66 [r1] (raw file):
The code seems to make assumptions about the order in which the keys enter this function. Update the comment to Comments from Reviewable |
the wording "As a first change, modify the GC to keep all values within the GC interval" is a bit misleading. How about
Review status: all files reviewed at latest revision, 5 unresolved discussions, all commit checks successful. Comments from Reviewable |
Review status: all files reviewed at latest revision, 5 unresolved discussions, all commit checks successful. storage/engine/gc.go, line 65 [r1] (raw file):
|
Done.
|
basically looks good, but I'm going to have to finish this up tomorrow.
|
Review status: 0 of 2 files reviewed at latest revision, 2 unresolved discussions, some commit checks pending. storage/gc_queue_test.go, line 188 [r1] (raw file):
|
Agreed about that the test would be much more legible if the outcome were recorded along with the test data. Maybe you could send a follow-up.
|
I decided to not do a major refactor of the tests now because I'd like to move on and things are working. Also maybe I forgot a reason it was needed in this PR. Otherwise give this a review.
|
Reviewed 3 of 3 files at r5. storage/gc_queue_test.go, line 216 [r5] (raw file):
s/within/above/ storage/engine/gc.go, line 49 [r5] (raw file):
storage/engine/gc.go, line 67 [r5] (raw file):
You could also use storage/engine/gc.go, line 70 [r5] (raw file):
storage/engine/gc.go, line 72 [r5] (raw file):
storage/engine/gc.go, line 75 [r5] (raw file):
Comments from Reviewable |
Time travel queries (#5963) will need to be able to request data from the MVCC layer for anytime within the GC interval. As a first change, modify the GC to return correct results for all reads with timestamps within the GC interval. This means we must now keep any value that was valid for any time during the GC, which in particular includes values with timestamps before the GC interval. A deleted value is the exception: if the most recent value is deleted and it is outside the GC window, it is marked for GC. Further changes will record this valid GC window per replica and enforce that reads are not done outside of it.
Reviewed 2 of 2 files at r6. Comments from Reviewable |
Time travel queries (#5963) will need to be able to request data from the
MVCC layer for anytime within the GC interval. As a first change, modify
the GC to keep all values within the GC interval. This means we must now
keep any value that was valid for any time during the GC. A deleted value
is the exception: if the most recent value is deleted and it is outside
the GC window, it is marked for GC.
Further changes will record this valid GC window per replica and enforce
that reads are not done outside of it.
This change is