Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sstable: fixes to prefix replacing iterator #3344

Closed

Conversation

RaduBerinde
Copy link
Member

We add more logic inside the prefix replacing iterators:

  • pass through SeekGE calls even when not strictly necessary, to
    enable optimizations indicated by the flags.
  • keep track of whether we are positioned before/after the synthetic
    prefix range, to enable expected behaviors like SeekGE after the
    range followed by Prev (or SeekLT to before the range followed
    by Next)

@RaduBerinde RaduBerinde requested review from dt, itsbilal, jbowens and a team February 24, 2024 02:40
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member Author

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit hard to know for sure this is correct without some of the semantics being clear (see discussion in #3329) but this passes more iterations of the metamorphic test (w prefix replacement added) than what we have. I haven't stressed it for a long time yet but I plan to do that before merging.

Reviewable status: 0 of 3 files reviewed, 1 unresolved discussion (waiting on @dt, @itsbilal, and @jbowens)


sstable/prefix_replacing_iterator.go line 157 at r1 (raw file):

			// We still want to seek the underlying iterator to potentially enable
			// optimizations passed through flags.
			p.i.SeekGE(p.rewriteArg(p.lower), flags)

@jbowens do you think this is enough to retain any seek optimizations?

const (
// inSync indicates that the prefix replacing iterator is "in sync" with the
// underlying iterator; any Next/Prev calls can be passed through.
inSync iteratorState = iota
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this can be inRange, or alternatively iterPosCur if we want to really mimic wording in other iterators.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

return nil, base.LazyValue{}
case beforeRange:
p.state = inSync
return p.rewriteResult(p.i.First())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be a violation of the internalIterator contract, right? We don't allow First() to be called if there's a lower bound, instead we want SeekGE(lower) to be called. Same with Last() and the upper bound. You might need to save the bounds in SetBounds so you can do the appropriate translation here instead of passing through a first/last to the child iter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, done. TFTR!

Copy link
Member

@itsbilal itsbilal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM save for NextPrefix fix

case beforeRange:
p.state = inRange
if p.lowerBound != nil {
return p.rewriteResult(p.i.SeekGE(p.lowerBound, base.SeekGEFlagsNone))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might need to be SeekGE(succKey) as the lower bound might not have the prefix we want. And same for the First() call below; we can actually unconditionally call SeekGE(succKey) here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to think about this some more. What I did is incorrect, we can't just pass a lower bound below without rewriting its prefix.. We need to basically figure out the intersection of the given bounds and the vtable bounds (most of that is happening today below us).

We add more logic inside the prefix replacing iterators:
 - pass through `SeekGE` calls even when not strictly necessary, to
   enable optimizations indicated by the flags.
 - keep track of whether we are positioned before/after the synthetic
   prefix range, to enable expected behaviors like `SeekGE` after the
   range followed by `Prev` (or `SeekLT` to before the range followed
   by `Next`)
@RaduBerinde
Copy link
Member Author

Looks like this requires a bunch more work. I wonder if I should hold off and see if we can switch to the block-level prefix synthesis instead @dt @msbutler

@itsbilal
Copy link
Member

Does it require that much more work? If it's just NextPrefix that's affected, the lazy solution is to just call the prefix replacing iter's own SeekGE(succKey) in that case.

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 3 files reviewed, 14 unresolved discussions (waiting on @dt, @itsbilal, @jbowens, and @RaduBerinde)


sstable/prefix_replacing_iterator.go line 157 at r1 (raw file):

Previously, RaduBerinde wrote…

@jbowens do you think this is enough to retain any seek optimizations?

I'm a little confused here. I think all we need is to track whether we did the absolute positioning or ignored the SeekPrefixGE (see other comment below).
So we would remember positioned/unpositioned. The ignore case here is the only one which would result in a state transition to unpositioned. Any relative positioning call when unpositioned will return nil. Any absolute positioning call where the state was unpositioned would turn off the TrySeekUsingNext. This is following the same pattern as the next lower layer where we do

		if !i.lastBloomFilterMatched {
			// Iterator is not positioned based on last seek.
			flags = flags.DisableTrySeekUsingNext()
		}

I could be wrong, since we need to flesh out the reasoning in https://cockroachlabs.slack.com/archives/CAC6K3SLU/p1708716407912329?thread_ts=1708549343.720779&cid=CAC6K3SLU


sstable/prefix_replacing_iterator.go line 26 at r2 (raw file):

	syntheticPrefix []byte

	// lower is a valid key that has syntheticPreficx and is a lower bound for all

nit: syntheticPrefix

Aren't we requiring that i only produce keys with contentPrefix. So how can a key starting with syntheticPrefix be a lower bound for i?


sstable/prefix_replacing_iterator.go line 63 at r2 (raw file):

var _ Iterator = (*prefixReplacingIterator)(nil)

// newPrefixReplacingIterator wraps an iterator over keys that have

nit: over point keys ...


sstable/prefix_replacing_iterator.go line 68 at r2 (raw file):

// `contentPrefix`.
//
// lower is a valid key that has syntheticPreficx and is a lower bound for all

syntheticPrefix

Same question about "lower bound for all keys produced by i`.


sstable/prefix_replacing_iterator.go line 201 at r2 (raw file):

Previously, RaduBerinde wrote…

Good point, done. TFTR!

I suspect we are trying to be too clever with this code.

The underlying iterator i already has the bounds, but they are in terms of ContentPrefix. So our difficulty arises when someone tries SeekGE above the upper bound or SeekLT below the lower bound, since we can't translate them into seek keys in terms of ContentPrefix. But levelIter shouldn't be doing that, since it loads the relevant file with the first key >= seek key in the case of SeekGE and < seek key in case of SeekLT. So how about we treat those seek cases as simply calls to Last(); Next() and First(); Prev(), since they are not performance sensitive (unless I am mistaken).

I think this eliminates the need for beforeRange and afterRange and lowerBound, upperBound.


sstable/prefix_replacing_iterator.go line 49 at r3 (raw file):

	// inRange indicates that the prefix replacing iterator is "in sync" with the
	// underlying iterator; any Next/Prev calls can be passed through.
	inRange prefixReplacingIteratorState = iota

is there a reason to default to inRange instead of making the iota value panic-worthy?


sstable/prefix_replacing_iterator.go line 56 at r3 (raw file):

	// prefix range. A Next() call should return the first key/span in the range.
	beforeRange
	empty

comment for empty?


sstable/prefix_replacing_iterator.go line 136 at r3 (raw file):

	p.state = inRange
	if !bytes.HasPrefix(key, p.syntheticPrefix) {
		if p.cmp(key, p.lower) > 0 {

this is an odd use of p.lower that is worth a comment. One would expect this check to happen against upper, but this works because lower and upper share SyntheticPrefix.


sstable/prefix_replacing_iterator.go line 142 at r3 (raw file):

		// Key must be before the range; seek to the lower bound instead.
		// We don't use First because we may miss out on optimizations passed
		// through SeekEFlags.

SeekGEFlags


sstable/prefix_replacing_iterator.go line 143 at r3 (raw file):

		// We don't use First because we may miss out on optimizations passed
		// through SeekEFlags.
		key = p.lower

Presumably prefixReplacingIterator is being used in the performance sensitive case after the sstable has been ingested, so this is in a level and being iterated over using levelIter. Is that accurate?


sstable/prefix_replacing_iterator.go line 226 at r3 (raw file):

		return p.rewriteResult(p.i.First())
	case inRange:
		return p.rewriteResult(p.i.NextPrefix(succKey))

what is the contract regarding succKey. Are we guaranteed that succKey has SyntheticPrefix? I believe not since levelIter will just pass through. So we have to be able to handle a succKey that is greater than the upper, i.e.,

if !bytes.HasPrefix(succKey, p.syntheticPrefix) {
		if p.cmp(key, p.lower) > 0 {
                   // Go past the end of this sstable
                   ...
                   return
                 }  else {
                   //panic. Can't be calling NextPrefix on this iterator if it is already positioned after `succKey`
                 }
}
// succKey has synthetic prefix. So rewrite succKey and call on i.

sstable/prefix_replacing_iterator.go line 268 at r3 (raw file):

	// is going to rewrite them itself or pass them to something e.g. vState that
	// will rewrite them.
	if x, ok := p.i.(interface{ SetBoundsWithSyntheticPrefix() bool }); ok && x.SetBoundsWithSyntheticPrefix() {

this seems error-prone. What if someone has a typo and causes us to fall into the path below. I guess tests will catch it quickly enough, but what iterator type do we wrap that does not implement SetBoundsWithSyntheticPrefix?


sstable/reader_virtual.go line 137 at r2 (raw file):

	i, err := v.reader.newIterWithBlockPropertyFiltersAndContext(
		ctx, transforms, lower, upper, filterer, useFilterBlock,
		stats, categoryAndQoS, statsCollector, rp, &v.vState)

Seems like since virtualState.{lower,upper} also have SyntheticPrefix, the narrowed bound in virtualState.constrainBounds also have SyntheticPrefix, so the inverted bounds will have ContentPrefix. Is my understanding correct?

Copy link
Member Author

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 3 files reviewed, 14 unresolved discussions (waiting on @dt, @itsbilal, @jbowens, and @sumeerbhola)


sstable/prefix_replacing_iterator.go line 157 at r1 (raw file):

Previously, sumeerbhola wrote…

I'm a little confused here. I think all we need is to track whether we did the absolute positioning or ignored the SeekPrefixGE (see other comment below).
So we would remember positioned/unpositioned. The ignore case here is the only one which would result in a state transition to unpositioned. Any relative positioning call when unpositioned will return nil. Any absolute positioning call where the state was unpositioned would turn off the TrySeekUsingNext. This is following the same pattern as the next lower layer where we do

		if !i.lastBloomFilterMatched {
			// Iterator is not positioned based on last seek.
			flags = flags.DisableTrySeekUsingNext()
		}

I could be wrong, since we need to flesh out the reasoning in https://cockroachlabs.slack.com/archives/CAC6K3SLU/p1708716407912329?thread_ts=1708549343.720779&cid=CAC6K3SLU

I see, so we would not call SeekGE here at all and for any further Seek we will DisableTrySeekUsingNext in the flags. That makes sense but I don't have a good understanding of all the details.


sstable/prefix_replacing_iterator.go line 26 at r2 (raw file):

Previously, sumeerbhola wrote…

nit: syntheticPrefix

Aren't we requiring that i only produce keys with contentPrefix. So how can a key starting with syntheticPrefix be a lower bound for i?

Yeah, the description is incorrect. It's a lower bound for all keys produced by i after prefix replacement.


sstable/prefix_replacing_iterator.go line 201 at r2 (raw file):

Previously, sumeerbhola wrote…

I suspect we are trying to be too clever with this code.

The underlying iterator i already has the bounds, but they are in terms of ContentPrefix. So our difficulty arises when someone tries SeekGE above the upper bound or SeekLT below the lower bound, since we can't translate them into seek keys in terms of ContentPrefix. But levelIter shouldn't be doing that, since it loads the relevant file with the first key >= seek key in the case of SeekGE and < seek key in case of SeekLT. So how about we treat those seek cases as simply calls to Last(); Next() and First(); Prev(), since they are not performance sensitive (unless I am mistaken).

I think this eliminates the need for beforeRange and afterRange and lowerBound, upperBound.

The problem is that if SetBounds() was passed down, then it's illegal to call First and/or Last.. We would need to Seek from a bound. But we need to "trim" those bounds to the synthetic prefix range and convert them to content-prefix.

Currently, this happens underneath us - we pass the unchanged bounds to SingleLeveliterator.SetBounds which deals with the prefix replacement. It's a big mess.
https://github.com/cockroachdb/pebble/blob/master/sstable/reader_iter_single_lvl.go#L367

I think the correct thing to do is move all that logic in here instead. But I'm kind of losing steam on making this work given that we may be able to switch to just prepending a synthetic prefix and doing that in blockIter (#3350).


sstable/prefix_replacing_iterator.go line 56 at r3 (raw file):

Previously, sumeerbhola wrote…

comment for empty?

Sorry, left over


sstable/prefix_replacing_iterator.go line 143 at r3 (raw file):

Previously, sumeerbhola wrote…

Presumably prefixReplacingIterator is being used in the performance sensitive case after the sstable has been ingested, so this is in a level and being iterated over using levelIter. Is that accurate?

Correct.


sstable/prefix_replacing_iterator.go line 226 at r3 (raw file):

Previously, sumeerbhola wrote…

what is the contract regarding succKey. Are we guaranteed that succKey has SyntheticPrefix? I believe not since levelIter will just pass through. So we have to be able to handle a succKey that is greater than the upper, i.e.,

if !bytes.HasPrefix(succKey, p.syntheticPrefix) {
		if p.cmp(key, p.lower) > 0 {
                   // Go past the end of this sstable
                   ...
                   return
                 }  else {
                   //panic. Can't be calling NextPrefix on this iterator if it is already positioned after `succKey`
                 }
}
// succKey has synthetic prefix. So rewrite succKey and call on i.

Hm, good point. The contract as written is that succKey is the immediate successor key to the iterator's current key prefix: https://github.com/cockroachdb/pebble/blob/master/internal/base/iterator.go#L168

In practice at least, the immediate successor key should have the same synthetic prefix. But I learned to take these docs with a grain of salt :)


sstable/prefix_replacing_iterator.go line 268 at r3 (raw file):

Previously, sumeerbhola wrote…

this seems error-prone. What if someone has a typo and causes us to fall into the path below. I guess tests will catch it quickly enough, but what iterator type do we wrap that does not implement SetBoundsWithSyntheticPrefix?

I don't think we wrap anything else. I agree this kind of thing is very error prone. The correct thing would be to do the bounds conversion here instead of below us.


sstable/reader_virtual.go line 137 at r2 (raw file):

Previously, sumeerbhola wrote…

Seems like since virtualState.{lower,upper} also have SyntheticPrefix, the narrowed bound in virtualState.constrainBounds also have SyntheticPrefix, so the inverted bounds will have ContentPrefix. Is my understanding correct?

Right, the constrainBounds call in single/twoLevelIterator.init() will intersect [lower, upper) with [vState.lower, upper) then convert from SyntheticPrefix to ContentPrefix.

@RaduBerinde
Copy link
Member Author

Removing prefix replacing iterator in #3405

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants