Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: remove the interleaved logic from fetchers #70239

Merged
merged 3 commits into from Oct 25, 2021

Conversation

yuzefovich
Copy link
Member

@yuzefovich yuzefovich commented Sep 15, 2021

row: simplify fetcher to only handle a single table

Previously, row.Fetcher could handle multiple tables at the same time.
This was needed to support the interleaved tables efficiently, but since
the interleaved tables are no more, we can simplify the fetcher now.

There are more things that can be removed from the fetcher about
handling of the interleaved tables, but I chose to defer them to a later
commit. This commit's main focus is simplifying the fetcher to work
under the assumption that it will be fetching only from a single table.

Addresses: #69150

Release note: None

colbuilder: remove the check for index join on the interleaved index

Interleaved indexes are no more.

Release note: None

sql: remove the interleaved logic from fetchers

This commit removes the remaining bits of handling the interleaved
tables from both row fetcher and cFetcher. It also removes a test that
verifies that we can read from the interleaved table restored from
a backup.

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@yuzefovich yuzefovich mentioned this pull request Sep 15, 2021
@yuzefovich yuzefovich force-pushed the fetcher-multiple-tables branch 3 times, most recently from f6038e0 to 72ad1e9 Compare September 15, 2021 21:22
@yuzefovich yuzefovich added the do-not-merge bors won't merge a PR with this label. label Sep 15, 2021
@yuzefovich yuzefovich marked this pull request as ready for review September 15, 2021 21:22
@yuzefovich yuzefovich requested a review from a team as a code owner September 15, 2021 21:22
@yuzefovich yuzefovich requested a review from a team September 15, 2021 21:22
@yuzefovich
Copy link
Member Author

Will hold off on merging this until 21.2.0 is out.

Copy link
Collaborator

@michae2 michae2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The best type of PR. 🙂

Searching for "interleave" in fetcher.go turned up a few more spots. Not sure how comprehensive you're trying to make this... if I'm asking too much just skip them.

Reviewed 6 of 9 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @yuzefovich)


pkg/sql/row/fetcher.go, line 147 at r1 (raw file):

Quoted 5 lines of code…
	// The spans of keys to return for the given table. Fetcher
	// ignores keys outside these spans.
	// This is irrelevant if Fetcher is initialize with only one
	// table.
	Spans            roachpb.Spans

Should this also be deleted? (Or the comment changed?)


pkg/sql/row/fetcher.go, line 188 at r1 (raw file):

// Fetcher handles fetching kvs and forming table rows for an
// arbitrary number of tables.

nit: Should this now say "a single table"?


pkg/sql/row/fetcher.go, line 222 at r1 (raw file):

Quoted 5 lines of code…
	// True if the index key must be decoded.
	// If there is more than one table, the index key must always be decoded.
	// This is only false if there are no needed columns and the (single)
	// table has no interleave children.
	mustDecodeIndexKey bool

Is this still necessary?


pkg/sql/row/fetcher.go, line 414 at r1 (raw file):

Quoted 10 lines of code…
	// - If there is more than one table, we have to decode the index key to
	//   figure out which table the row belongs to.
	// - If there are interleaves, we need to read the index key in order to
	//   determine whether this row is actually part of the index we're scanning.
	// - If there are needed columns from the index key, we need to read it.
	//
	// Otherwise, we can completely avoid decoding the index key.
	if !rf.mustDecodeIndexKey && (neededIndexCols > 0 || table.index.NumInterleavedBy() > 0 || table.index.NumInterleaveAncestors() > 0) {
		rf.mustDecodeIndexKey = true
	}

I think some of this can also be deleted.


pkg/sql/row/fetcher.go, line 713 at r1 (raw file):

Quoted 5 lines of code…
			if _, foundSentinel := encoding.DecodeIfInterleavedSentinel(keySuffix); foundSentinel {
				// We found an interleaved sentinel, which means that the key we just
				// found belongs to a different interleave. That means we have to go
				// through with index key decoding.
				unchangedPrefix = false

Should some of this also be deleted?


pkg/sql/row/fetcher.go, line 731 at r1 (raw file):

Quoted 6 lines of code…
			if !moreKVs {
				// The key did not match any of the table
				// descriptors, which means it's interleaved
				// data from some other table or index.
				continue
			}

Can this be deleted?


pkg/sql/row/fetcher.go, line 1562 at r1 (raw file):

Quoted 41 lines of code…
	for i := 0; i < index.NumInterleaveAncestors(); i++ {
		ancestor := index.GetInterleaveAncestor(i)
		length := int(ancestor.SharedPrefixLen)
		// Skip up to length values.
		for j := 0; j < length; j++ {
			if consumedCols == nCols {
				// We're done early, in the middle of an interleave.
				return origKeyLen - len(key), nil
			}
			l, err := encoding.PeekLength(key)
			if err != nil {
				return 0, err
			}
			key = key[l:]
			consumedCols++
		}
		var ok bool
		key, ok = encoding.DecodeIfInterleavedSentinel(key)
		if !ok {
			return 0, errors.New("unexpected lack of sentinel key")
		}
		// Skip the TableID/IndexID pair for each ancestor except for the
		// first, which has already been skipped in our input.
		for j := 0; j < 2; j++ {
			idLen, err := encoding.PeekLength(key)
			if err != nil {
				return 0, err
			}
			key = key[idLen:]
		}
	}
	// Decode the remaining values in the key, in the final interleave.
	for ; consumedCols < nCols; consumedCols++ {
		l, err := encoding.PeekLength(key)
		if err != nil {
			return 0, err
		}
		key = key[l:]
	}
	return origKeyLen - len(key), nil
}

Some of this can probably change.

Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes, it's quite a pleasant exercise :)

Thanks for the review! I consciously deferred the removal of some other interleaved business from these files to make this commit simpler and serve the goal of making fetcher work only on one table. Updated the commit message to include this context.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @michae2)


pkg/sql/row/fetcher.go, line 147 at r1 (raw file):

Previously, michae2 (Michael Erickson) wrote…
	// The spans of keys to return for the given table. Fetcher
	// ignores keys outside these spans.
	// This is irrelevant if Fetcher is initialize with only one
	// table.
	Spans            roachpb.Spans

Should this also be deleted? (Or the comment changed?)

Good point, removed.


pkg/sql/row/fetcher.go, line 222 at r1 (raw file):

Previously, michae2 (Michael Erickson) wrote…
	// True if the index key must be decoded.
	// If there is more than one table, the index key must always be decoded.
	// This is only false if there are no needed columns and the (single)
	// table has no interleave children.
	mustDecodeIndexKey bool

Is this still necessary?

It needs to be adjusted, and I deferred this spot for #70235.


pkg/sql/row/fetcher.go, line 414 at r1 (raw file):

Previously, michae2 (Michael Erickson) wrote…
	// - If there is more than one table, we have to decode the index key to
	//   figure out which table the row belongs to.
	// - If there are interleaves, we need to read the index key in order to
	//   determine whether this row is actually part of the index we're scanning.
	// - If there are needed columns from the index key, we need to read it.
	//
	// Otherwise, we can completely avoid decoding the index key.
	if !rf.mustDecodeIndexKey && (neededIndexCols > 0 || table.index.NumInterleavedBy() > 0 || table.index.NumInterleaveAncestors() > 0) {
		rf.mustDecodeIndexKey = true
	}

I think some of this can also be deleted.

True, but I decided to defer this to #70235.


pkg/sql/row/fetcher.go, line 713 at r1 (raw file):

Previously, michae2 (Michael Erickson) wrote…
			if _, foundSentinel := encoding.DecodeIfInterleavedSentinel(keySuffix); foundSentinel {
				// We found an interleaved sentinel, which means that the key we just
				// found belongs to a different interleave. That means we have to go
				// through with index key decoding.
				unchangedPrefix = false

Should some of this also be deleted?

Ditto


pkg/sql/row/fetcher.go, line 731 at r1 (raw file):

Previously, michae2 (Michael Erickson) wrote…
			if !moreKVs {
				// The key did not match any of the table
				// descriptors, which means it's interleaved
				// data from some other table or index.
				continue
			}

Can this be deleted?

Ditto


pkg/sql/row/fetcher.go, line 1562 at r1 (raw file):

Previously, michae2 (Michael Erickson) wrote…
	for i := 0; i < index.NumInterleaveAncestors(); i++ {
		ancestor := index.GetInterleaveAncestor(i)
		length := int(ancestor.SharedPrefixLen)
		// Skip up to length values.
		for j := 0; j < length; j++ {
			if consumedCols == nCols {
				// We're done early, in the middle of an interleave.
				return origKeyLen - len(key), nil
			}
			l, err := encoding.PeekLength(key)
			if err != nil {
				return 0, err
			}
			key = key[l:]
			consumedCols++
		}
		var ok bool
		key, ok = encoding.DecodeIfInterleavedSentinel(key)
		if !ok {
			return 0, errors.New("unexpected lack of sentinel key")
		}
		// Skip the TableID/IndexID pair for each ancestor except for the
		// first, which has already been skipped in our input.
		for j := 0; j < 2; j++ {
			idLen, err := encoding.PeekLength(key)
			if err != nil {
				return 0, err
			}
			key = key[idLen:]
		}
	}
	// Decode the remaining values in the key, in the final interleave.
	for ; consumedCols < nCols; consumedCols++ {
		l, err := encoding.PeekLength(key)
		if err != nil {
			return 0, err
		}
		key = key[l:]
	}
	return origKeyLen - len(key), nil
}

Some of this can probably change.

Ditto.

@yuzefovich yuzefovich changed the title row: simplify fetcher to only handle a single table sql: remove the interleaved logic from fetchers Sep 16, 2021
@yuzefovich
Copy link
Member Author

Decided to include two more commits that are related to the fetchers simplification into this PR, and I think all spots mentioned by Michael should now be addressed.

@yuzefovich
Copy link
Member Author

Rebased on top of the latest master.

Copy link
Collaborator

@michae2 michae2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice! :lgtm:

Reviewed 6 of 7 files at r2, 11 of 11 files at r5, 3 of 3 files at r6, 5 of 5 files at r7, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @yuzefovich)


pkg/sql/row/fetcher.go, line 1153 at r5 (raw file):

Quoted 5 lines of code…
// It also returns the table and index descriptor associated with the row
// (relevant when more than one table is specified during initialization).
func (rf *Fetcher) NextRow(
	ctx context.Context,
) (row rowenc.EncDatumRow, table catalog.TableDescriptor, index catalog.Index, err error) {

One day (future PR) we could probably get rid of the table return value.

@yuzefovich
Copy link
Member Author

yuzefovich commented Oct 9, 2021

Hey @postamar, in #70618 you're removing TestRestoreOldVersions/interleaved, is that desirable? This test is failing on this PR once the interleaved logic is removed from the fetchers. Do we have a test somewhere that ensures that restoring a backup with interleaved tables will fail with a helpful message? On this PR it fails with a very unexpected index_value_decoding_error: unknown type NotNull. cc @fqazi

@postamar
Copy link
Contributor

I removed this test because:

  • The interleave feature is already disabled in 21.2; all existing customers using interleaved tables are aware that they can't restore backups with interleaves;
  • My PR will be merged in the 22.1 release cycle, I thought it reasonably far in the future to stop worrying about interleaves -- we will have to at some point in time, after all.

I may be wrong and am happy to have my mind changed, in any case I didn't single out this test in particular. As far as your PR is concerned I wouldn't have a problem with you removing that test entirely, assuming there's no straightforward fix. It's as if there never had been any interleaved backups out there in existence in the first place.

@yuzefovich yuzefovich requested review from a team and dt and removed request for a team October 15, 2021 19:39
@yuzefovich
Copy link
Member Author

@dt could you please take a look at the first commit?

@dt
Copy link
Member

dt commented Oct 15, 2021

Don't we still want the test that checks that attempting to restore an old backup that uses interleaves is rejected? it looks like this is deleting the interleaved backup from testdata and then removing the only call to restoreOldVersionTestWithInterleave which asserts that we get the rejection error? or am I misreading it?

@yuzefovich
Copy link
Member Author

it looks like this is deleting the interleaved backup from testdata and then removing the only call to restoreOldVersionTestWithInterleave which asserts that we get the rejection error? or am I misreading it?

Yes, that's currently the case.

With the removal of all interleaved tables logic from the fetchers, we are now hitting an internal error when attempting to restore a backup with an interleaved table. Is that acceptable? I guess a more user-friendly thing to do would be to convert that error to rejecting the backup.

@yuzefovich yuzefovich force-pushed the fetcher-multiple-tables branch 2 times, most recently from 0f22ed6 to 4d47411 Compare October 15, 2021 20:04
@yuzefovich
Copy link
Member Author

I misunderstood things, sorry for the noise - we only want to remove the test that actually attempts to read from the interleaved table.

@yuzefovich yuzefovich removed the do-not-merge bors won't merge a PR with this label. label Oct 25, 2021
Previously, `row.Fetcher` could handle multiple tables at the same time.
This was needed to support the interleaved tables efficiently, but since
the interleaved tables are no more, we can simplify the fetcher now.

There are more things that can be removed from the fetcher about
handling of the interleaved tables, but I chose to defer them to a later
commit. This commit's main focus is simplifying the fetcher to work
under the assumption that it will be fetching only from a single table.

Release note: None
Interleaved indexes are no more.

Release note: None
This commit removes the remaining bits of handling the interleaved
tables from both row fetcher and cFetcher. It also removes a test that
verifies that we can read from the interleaved table restored from
a backup.

Release note: None
@yuzefovich
Copy link
Member Author

I rebased on top of latest master, and I think now that we're about to pick the SHA for 21.2.0, it's safe to merge this.

TFTRs!

bors r+

@craig
Copy link
Contributor

craig bot commented Oct 25, 2021

Build failed (retrying...):

@andreimatei
Copy link
Contributor

bors says:

go/src/github.com/cockroachdb/cockroach/pkg/jobs/registry_external_test.go:849:6: InterleavedTablesEnabled not declared by package sq

https://teamcity.cockroachdb.com/viewLog.html?buildId=3633781&buildTypeId=Cockroach_UnitTests

I'm guessing it's about this PR. Although I see that it has passed CI...

@yuzefovich
Copy link
Member Author

@andreimatei it's #71537 to blame

@craig
Copy link
Contributor

craig bot commented Oct 25, 2021

Build succeeded:

@craig craig bot merged commit c5ca3a6 into cockroachdb:master Oct 25, 2021
@yuzefovich yuzefovich deleted the fetcher-multiple-tables branch October 25, 2021 23:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants