Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: change the GC to keep values within the GC interval #6778

Merged
merged 1 commit into from May 24, 2016
Merged

storage: change the GC to keep values within the GC interval #6778

merged 1 commit into from May 24, 2016

Conversation

mjibson
Copy link
Contributor

@mjibson mjibson commented May 18, 2016

Time travel queries (#5963) will need to be able to request data from the
MVCC layer for anytime within the GC interval. As a first change, modify
the GC to keep all values within the GC interval. This means we must now
keep any value that was valid for any time during the GC. A deleted value
is the exception: if the most recent value is deleted and it is outside
the GC window, it is marked for GC.

Further changes will record this valid GC window per replica and enforce
that reads are not done outside of it.


This change is Reviewable

@tbg
Copy link
Member

tbg commented May 18, 2016

Review status: 0 of 3 files reviewed at latest revision, 2 unresolved discussions, some commit checks pending.


storage/engine/gc.go, line 65 [r1] (raw file):

          continue
      }
      deleted := len(values[i]) == 0

does this correctly distinguish between putting an empty value and a nonexistent one?


storage/engine/gc.go, line 71 [r1] (raw file):

      }
      // Otherwise mark the next timestamp for deletion.
      if i := i + 1; i < len(keys) {

this is a little confusing. Better to stick with explicit i+1 (or at least not shadowing i).


Comments from Reviewable

@tbg
Copy link
Member

tbg commented May 18, 2016

Reviewed 3 of 3 files at r1.
Review status: all files reviewed at latest revision, 5 unresolved discussions, all commit checks successful.


storage/gc_queue_test.go, line 188 [r1] (raw file):

      {key1, ts2, false, false},
      {key1, ts5, false, false},
      // For key2, we expect first value to GC, even though most recent is deletion.

s/first/only oldest/

The example would benefit from another value sitting at ts2-1s. ts2 is exactly the GC threshold, but there's nothing special about it in this case.

Should also have (separate) examples in which the last value below the GC threshold timestamp is a deletion (in which case it should be deleted)
If those are covered anywhere, please point them out.


storage/engine/gc.go, line 48 [r1] (raw file):

// be garbage collected. If no values should be GC'd, returns
// roachpb.ZeroTimestamp.
func (gc GarbageCollector) Filter(keys []MVCCKey, values [][]byte) roachpb.Timestamp {

Add commentary here (or elsewhere; find a good place for it) about what the job of the garbage collector is: to give us the key-value pairs we can delete without invalidating any reads which happen in the time interval [now, \infinity).


storage/engine/gc.go, line 66 [r1] (raw file):

      }
      deleted := len(values[i]) == 0
      // If most recent after GC interval is empty, mark for deletion.

The code seems to make assumptions about the order in which the keys enter this function. Update the comment to Filter to document that (and check that the assumptions actually hold for all callers).


Comments from Reviewable

@tbg
Copy link
Member

tbg commented May 18, 2016

the wording "As a first change, modify the GC to keep all values within the GC interval" is a bit misleading. How about

As a first change, modify the GC to return correct results for all reads with timestamps within the GC interval. This means we must now keep any value that was valid for any time during the GC, which in particular includes values with timestamps before the GC interval.


Review status: all files reviewed at latest revision, 5 unresolved discussions, all commit checks successful.


Comments from Reviewable

@mjibson
Copy link
Contributor Author

mjibson commented May 18, 2016

Review status: all files reviewed at latest revision, 5 unresolved discussions, all commit checks successful.


storage/engine/gc.go, line 65 [r1] (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

does this correctly distinguish between putting an empty value and a nonexistent one?

An empty value isn't encoded as 0 bytes, so this should be ok. (And it's the same as the previous code, just relocated.)

storage/engine/gc.go, line 66 [r1] (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

The code seems to make assumptions about the order in which the keys enter this function. Update the comment to Filter to document that (and check that the assumptions actually hold for all callers).

The only usage of this function uses a `replicaDataIterator` to populate `keys`, so I think that assumption does hold. Documented.

storage/engine/gc.go, line 71 [r1] (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

this is a little confusing. Better to stick with explicit i+1 (or at least not shadowing i).

Done.

Comments from Reviewable

@mjibson
Copy link
Contributor Author

mjibson commented May 18, 2016

Done.

Previously, tschottdorf (Tobias Schottdorf) wrote…

the wording "As a first change, modify the GC to keep all values within the GC interval" is a bit misleading. How about

As a first change, modify the GC to return correct results for all reads with timestamps within the GC interval. This means we must now keep any value that was valid for any time during the GC, which in particular includes values with timestamps before the GC interval.


Review status: 0 of 2 files reviewed at latest revision, 3 unresolved discussions, some commit checks pending.


storage/gc_queue_test.go, line 188 [r1] (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

s/first/only oldest/

The example would benefit from another value sitting at ts2-1s. ts2 is exactly the GC threshold, but there's nothing special about it in this case.

Should also have (separate) examples in which the last value below the GC threshold timestamp is a deletion (in which case it should be deleted)
If those are covered anywhere, please point them out.

The `key5` test does part of what you describe. But you raise a good point that Filter should GC data as it's deleted. I've added a test and code for this more aggressive deletion.

storage/engine/gc.go, line 48 [r1] (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

Add commentary here (or elsewhere; find a good place for it) about what the job of the garbage collector is: to give us the key-value pairs we can delete without invalidating any reads which happen in the time interval [now, \infinity).

Done.

Comments from Reviewable

@tbg
Copy link
Member

tbg commented May 18, 2016

basically looks good, but I'm going to have to finish this up tomorrow.

Previously, mjibson (Matt Jibson) wrote…

Done.


Reviewed 1 of 1 files at r2.
Review status: 0 of 2 files reviewed at latest revision, 2 unresolved discussions, all commit checks successful.


storage/gc_queue_test.go, line 188 [r1] (raw file):

Previously, mjibson (Matt Jibson) wrote…

The key5 test does part of what you describe. But you raise a good point that Filter should GC data as it's deleted. I've added a test and code for this more aggressive deletion.

can you change the wording "expect values before GC to GC" for clarity? And i'd still like to see an explicit test in which we keep a value that's below the cutoff because it could be read by a read with a timestamp >= the cutoff (in particular, one that doesn't have an exact version at `ts2`, which is an unrealistic case in practice and makes Filter do less work). Maybe that exists, but see below:

I find these test cases are still very hard to parse, and that a comment on each key (at least for those that have something interesting happening) would be helpful.


storage/gc_queue_test.go, line 192 [r3] (raw file):

      // For key2, we expect values before GC to GC, even though most recent is deletion.
      {key2, ts1, false, false},
      {key2, ts2.Prev(), false, false},

intentionally in the wrong order?


Comments from Reviewable

@mjibson
Copy link
Contributor Author

mjibson commented May 19, 2016

Review status: 0 of 2 files reviewed at latest revision, 2 unresolved discussions, some commit checks pending.


storage/gc_queue_test.go, line 188 [r1] (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

can you change the wording "expect values before GC to GC" for clarity?
And i'd still like to see an explicit test in which we keep a value that's below the cutoff because it could be read by a read with a timestamp >= the cutoff (in particular, one that doesn't have an exact version at ts2, which is an unrealistic case in practice and makes Filter do less work). Maybe that exists, but see below:

I find these test cases are still very hard to parse, and that a comment on each key (at least for those that have something interesting happening) would be helpful.

Done.

I've also changed the key2 test to use a value before ts2. As well, the key11 test was already not testing ts2 and verifies that ts1 (before ts2), is retained.

I find the comments before each key block to be as much description as I need. Adding comments about each key would be pretty close to duplications of the block comments, so I think they would add confusion instead of clarity. However I've tried to revise and add a few more descriptive comments where useful.

I do find it unfortunate that the results list is so far separated from the inputs, but if that were to change it should be in another PR since it would confuse the diff about what this PR is changing.


storage/gc_queue_test.go, line 192 [r3] (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

intentionally in the wrong order?

Not sure I understand. ts1 < ts2.Prev < ts2 < ts5. Or am I missing something?

Comments from Reviewable

@tbg
Copy link
Member

tbg commented May 19, 2016

Agreed about that the test would be much more legible if the outcome were recorded along with the test data. Maybe you could send a follow-up.
LGTM after my comment (not deleting tombstones which are >= expiration for paranoia)

Previously, tschottdorf (Tobias Schottdorf) wrote…

basically looks good, but I'm going to have to finish this up tomorrow.


Reviewed 3 of 3 files at r3, 1 of 1 files at r4.
Review status: all files reviewed at latest revision, 2 unresolved discussions, all commit checks successful.


storage/gc_queue_test.go, line 192 [r3] (raw file):

Previously, mjibson (Matt Jibson) wrote…

Not sure I understand. ts1 < ts2.Prev < ts2 < ts5. Or am I missing something?

Nope, I am. Carry on.

storage/engine/gc.go, line 77 [r3] (raw file):

      break
  }
  // We can GC deleted values.

I think I might have mislead you (and myself) a little - if we remove deletion tombstones, we're inviting a hypothetical reincarnation of #6227 if the deletion tombstone we remove is at a timestamp which could possibly still be written at by a transaction.
We shouldn't remove any tombstones at >= gc.expiration (and independently, we should also add a side effect to GCRequest which updates the timestamp cache to make sure that a txn can't write "behind" what's been GC'ed or, and probably simpler, extend that read-timestamp check that you're going to be putting in as part of your time travel work to writing queries in general). Mind keeping that in your notes somewhere or copy-pasting into an appropriate issue so that we can come back to it then?


Comments from Reviewable

@mjibson
Copy link
Contributor Author

mjibson commented May 19, 2016

I decided to not do a major refactor of the tests now because I'd like to move on and things are working. Also maybe I forgot a reason it was needed in this PR. Otherwise give this a review.

Previously, tschottdorf (Tobias Schottdorf) wrote…

Agreed about that the test would be much more legible if the outcome were recorded along with the test data. Maybe you could send a follow-up.
LGTM after my comment (not deleting tombstones which are >= expiration for paranoia)


Review status: 0 of 3 files reviewed at latest revision, 2 unresolved discussions, some commit checks pending.


Comments from Reviewable

@tbg
Copy link
Member

tbg commented May 19, 2016

Reviewed 3 of 3 files at r5.
Review status: all files reviewed at latest revision, 8 unresolved discussions, all commit checks successful.


storage/gc_queue_test.go, line 216 [r5] (raw file):

      // For key9, resolve naked intent with no remaining values.
      {key9, ts3, false, true},
      // For key10, GC ts1 because it's a delete but not ts3 because it's within the threshold.

s/within/above/


storage/engine/gc.go, line 49 [r5] (raw file):

// roachpb.ZeroTimestamp. keys must be in descending time order.
// Values deleted at or before the returned timestamp can be deleted without
// invalidating any reads in the time interval [gc.expiration, \infinity).

(gc.expiration, \infinity)? And while you're at it, a high-level comment about how the filtering works (deleting


storage/engine/gc.go, line 67 [r5] (raw file):

          return roachpb.ZeroTimestamp
      }
      if gc.expiration.Less(key.Timestamp) {

You could also use !key.Timestamp.Less(gc.expiration) to keep the [threshold, \infty) interval.


storage/engine/gc.go, line 70 [r5] (raw file):

          continue
      }
      // Now key.Timestamp is <= gc.expiration.

Now key.Timestamp is <= gc.expiration, but the key-value pair is still "visible" at timestamp gc.expiration (and up to the next version).


storage/engine/gc.go, line 72 [r5] (raw file):

      // Now key.Timestamp is <= gc.expiration.
      if deleted := len(values[i]) == 0; deleted {
          // We can GC deleted keys only if they are <= gc.expiration, see #6227.

We don't have to keep a delete visible (since GCing it does not change the outcome of the read). Note however that we can't touch deletes at higher timestamps immediately preceding this one, since they're above gc.expiration and are needed for correctness; see #6227.


storage/engine/gc.go, line 75 [r5] (raw file):

          delTS = key.Timestamp
      } else if i+1 < len(keys) {
          // Otherwise mark the previous timestamp for deletion.

// Otherwise mark the previous timestamp for deletion (since it won't ever be returned for reads at gc.expiration and up)


Comments from Reviewable

Time travel queries (#5963) will need to be able to request data from
the MVCC layer for anytime within the GC interval. As a first change,
modify the GC to return correct results for all reads with timestamps
within the GC interval. This means we must now keep any value that was
valid for any time during the GC, which in particular includes values
with timestamps before the GC interval. A deleted value is the exception:
if the most recent value is deleted and it is outside the GC window,
it is marked for GC.

Further changes will record this valid GC window per replica and enforce
that reads are not done outside of it.
@tbg
Copy link
Member

tbg commented May 24, 2016

Reviewed 2 of 2 files at r6.
Review status: all files reviewed at latest revision, 7 unresolved discussions, all commit checks successful.


Comments from Reviewable

@mjibson mjibson merged commit cc4ce37 into cockroachdb:master May 24, 2016
@mjibson mjibson deleted the gc-keep-interval branch May 24, 2016 15:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants