Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

db: fix incorrect SeekPrefixGE call in merging iterator #3324

Closed

Conversation

RaduBerinde
Copy link
Member

We also add assertions checking the prefix at various levels.

@RaduBerinde RaduBerinde requested review from a team and sumeerbhola February 21, 2024 02:14
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member Author

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 3 files reviewed, 1 unresolved discussion (waiting on @sumeerbhola)


merging_iter.go line 1102 at r1 (raw file):

				key = l.tombstone.End
				if m.prefix != nil && !bytes.Equal(m.prefix, key[:m.split(key)]) {
					m.prefix = nil

It feels like we could just break here but it doesn't work.. Setting m.prefix to key[:m.split(key)] doesn't work either.

@RaduBerinde
Copy link
Member Author

Here is a case where we are calling SeekPrefixGE when we don't have a Split, which is against the contract:
https://github.com/cockroachdb/pebble/blob/master/get_iter.go#L195

Any advice how to fix that? Just call SeekGE when there is no Split?

More generally, it's a bit strange that we tolerate Split = nil instead of just defining the trivial implementation in DefaultComparator. But these behave differently, it's not an easy change to make.

Copy link
Collaborator

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This use of SeekPrefixGE was added in #2846.

More generally, it's a bit strange that we tolerate Split = nil instead of just defining the trivial implementation in DefaultComparator.

This is what #2846 did, at least for the get iterator. I believe the construction of bloom filters also use an implicit trivial func(k []byte) int { return len(k) } Split function in its absence.

But these behave differently, it's not an easy change to make.

How do these behave differently?

Reviewable status: 0 of 3 files reviewed, 1 unresolved discussion (waiting on @RaduBerinde and @sumeerbhola)


merging_iter.go line 1102 at r1 (raw file):

Previously, RaduBerinde wrote…

It feels like we could just break here but it doesn't work.. Setting m.prefix to key[:m.split(key)] doesn't work either.

What's motivating the change here? The iterator contract so far has been that the seek key need only be >= m.lower, allowing it to exceed the prefix. Breaking doesn't work because we'll leave subsequent levels in arbitrary positions and then construct the heap using those arbitrary positions. We could conceivably update the loop to just set iterKey, iterValue = nil, base.LazyValue{} for all subsequent levels, but I think that would violate the TrySeekUsingNext optimization's invariant that requires all levels to be re-positioned to a key >= key after SeekPrefixGE(key).

Copy link
Member Author

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This use of SeekPrefixGE was added in https://github.com/cockroachdb/pebble/blob/master/internal/base/iterator.go#L113

Thanks, I will use SeekGE when there is no Split (which is what we did previously).

How do these behave differently?

I tried setting it in DefaultComparator and the test fixtures (e.g. h.sst) all changed. I didn't look in more detail why, I can investigate separately.

Reviewable status: 0 of 3 files reviewed, 1 unresolved discussion (waiting on @jbowens and @sumeerbhola)


merging_iter.go line 1102 at r1 (raw file):

Previously, jbowens (Jackson Owens) wrote…

What's motivating the change here? The iterator contract so far has been that the seek key need only be >= m.lower, allowing it to exceed the prefix. Breaking doesn't work because we'll leave subsequent levels in arbitrary positions and then construct the heap using those arbitrary positions. We could conceivably update the loop to just set iterKey, iterValue = nil, base.LazyValue{} for all subsequent levels, but I think that would violate the TrySeekUsingNext optimization's invariant that requires all levels to be re-positioned to a key >= key after SeekPrefixGE(key).

If we leave the prefix as is, the next l.iter.SeekPrefixGE(m.prefix, key, flags) call above violates the SeekPrefixGE contract which says that prefix has to be key's prefix as per Split. I assumed this contract is true in the prefix rewriting iterator (I only check prefix for the synthetic prefix).

The contract is documented here: https://github.com/cockroachdb/pebble/blob/master/internal/base/iterator.go#L113

Maybe we can just not call SeekPrefixGE after this point and assume it would have returned nil since the key moved beyond the prefix we care about?

Copy link
Collaborator

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 3 files reviewed, 1 unresolved discussion (waiting on @sumeerbhola)


merging_iter.go line 1102 at r1 (raw file):

Maybe we can just not call SeekPrefixGE after this point and assume it would have returned nil since the key moved beyond the prefix we care about?

I think this will still run afoul of the TrySeekUsingNext optimizations because the underlying level's iterator may still be positioned beyond key. If it's positioned at k3, we seek for k1 and do not reposition the iterator (leaving it at k3), and then subsequently seek for k2, the TrySeekUsingNext optimization will try to position the iterator through next, but it's already positioned beyond k2. Maybe @sumeerbhola has thoughts here.

Copy link
Member Author

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 5 files reviewed, 1 unresolved discussion (waiting on @sumeerbhola)


merging_iter.go line 1102 at r1 (raw file):

Previously, jbowens (Jackson Owens) wrote…

Maybe we can just not call SeekPrefixGE after this point and assume it would have returned nil since the key moved beyond the prefix we care about?

I think this will still run afoul of the TrySeekUsingNext optimizations because the underlying level's iterator may still be positioned beyond key. If it's positioned at k3, we seek for k1 and do not reposition the iterator (leaving it at k3), and then subsequently seek for k2, the TrySeekUsingNext optimization will try to position the iterator through next, but it's already positioned beyond k2. Maybe @sumeerbhola has thoughts here.

Setting the prefix to nil (the current diff) seems to work fine. We just sometimes return a key when we wouldn't need to.

@RaduBerinde
Copy link
Member Author

Here's a difference between Split being nil vs a trivial implementation in the sstable writer:

			if w.split != nil {
				w.props.PrefixExtractorName = o.Comparer.Name
				w.props.PrefixFiltering = true
			} else {
				w.props.WholeKeyFiltering = true
			}

We also add assertions checking the prefix at various levels.
Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we actually use Properties.{PrefixFiltering,WholeKeyFiltering} to decide anything at read time -- I didn't find anything? Seems like we are making an implicit assumption that all bloom filter construction in a DB, or in ingested sstables always uses the same split function.

Reviewable status: 0 of 6 files reviewed, 2 unresolved discussions (waiting on @jbowens and @RaduBerinde)


get_iter.go line 238 at r2 (raw file):

      g.iterKey, g.iterValue = g.iter.SeekPrefixGE(prefix, g.key, base.SeekGEFlagsNone)
		} else {
      g.iterKey, g.iterValue = g.iter.SeekGE(g.key, base.SeekGEFlagsNone)

if there is no split defined, aren't all the bloom filters on the whole key? In which case we still want to use the bloom filters.


merging_iter.go line 1102 at r1 (raw file):

Previously, RaduBerinde wrote…

Setting the prefix to nil (the current diff) seems to work fine. We just sometimes return a key when we wouldn't need to.

I'm having difficulty understanding what bug this PR is trying to fix. Is there a test case with the bug?

Copy link
Member Author

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 6 files reviewed, 2 unresolved discussions (waiting on @jbowens and @sumeerbhola)


get_iter.go line 238 at r2 (raw file):

Previously, sumeerbhola wrote…

if there is no split defined, aren't all the bloom filters on the whole key? In which case we still want to use the bloom filters.

If you think the existing code is ok, then help me write a corrected version of the contract for SeekPrefixGE which is currently being violated: it says that "A user-defined Split function must be supplied to the Comparer for the DB":

	// The prefix argument is used by some InternalIterator implementations (e.g.
	// sstable.Reader) to avoid expensive operations. A user-defined Split
	// function must be supplied to the Comparer for the DB. The supplied prefix
	// will be the prefix of the given key returned by that Split function. If
	// the iterator is able to determine that no key with the prefix exists, it
	// can return (nil,nilv). Unlike SeekGE, this is not an indication that
	// iteration is exhausted.

merging_iter.go line 1102 at r1 (raw file):

Previously, sumeerbhola wrote…

I'm having difficulty understanding what bug this PR is trying to fix. Is there a test case with the bug?

We are failing this assertion that I recently added:
https://github.com/cockroachdb/pebble/blob/master/sstable/prefix_replacing_iterator.go#L137

I added similar assertions in the merge and level iterator in this PR, which also trigger without the other changes.

This assertion was added based on the documentation for InternalIterator.SeekPrefixGE:

	// The prefix argument is used by some InternalIterator implementations (e.g.
	// sstable.Reader) to avoid expensive operations. A user-defined Split
	// function must be supplied to the Comparer for the DB. The supplied prefix
	// will be the prefix of the given key returned by that Split function. If
	// the iterator is able to determine that no key with the prefix exists, it
	// can return (nil,nilv). Unlike SeekGE, this is not an indication that
	// iteration is exhausted.

If we think there's nothing to fix (other than the assertion I added), then help me correct the doc for InternalIterator.SeekPrefixGE

@RaduBerinde
Copy link
Member Author

Closing in favor of #3329 and #3344.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants