New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: use ClearRange for index truncation #31326

Open
wants to merge 1 commit into
base: master
from

Conversation

Projects
None yet
5 participants
@eriktrinh
Contributor

eriktrinh commented Oct 12, 2018

This change makes the deletion of index data use the ClearRange batch
request. DROP INDEX schema changes in the same transaction as the one
which created the table still uses the slower DelRange because
ClearRange cannot be run inside a transaction and will remove write
intents of the index keys which have not been resolved in the
transaction.

This deletion of index data happens once the GC TTL period has passed
and it is safe to remove all the data. See PR #30566 for which delays
this data deletion.

The GC threshold is also forwarded for the cleared range to prevent
reads and writes of index data.

Fixes #20696.

Release note (performance improvement): Deletion of index data is faster.

@eriktrinh eriktrinh requested a review from vivekmenezes Oct 12, 2018

@eriktrinh eriktrinh requested review from cockroachdb/core-prs as code owners Oct 12, 2018

@cockroach-teamcity

This comment has been minimized.

Show comment
Hide comment
@cockroach-teamcity

cockroach-teamcity Oct 12, 2018

Member

This change is Reviewable

Member

cockroach-teamcity commented Oct 12, 2018

This change is Reviewable

@vivekmenezes

Nicely done!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/sql/lease_test.go, line 1025 at r1 (raw file):

		t.Fatal(err)
	}

I think it's valuable to count the size of the index and assert that the above INSERT added a value to the index even though it was dropped.


pkg/sql/tablewriter_delete.go, line 261 at r1 (raw file):

	autoCommit autoCommitOpt,
	traceKV bool,
	allowClearRange bool,

We should create a new function clearIndex() and use that instead of passing in a bool. In clearRange() you can assert that !idx.IsInterleaved

@eriktrinh

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/sql/lease_test.go, line 1025 at r1 (raw file):

Previously, vivekmenezes wrote…

I think it's valuable to count the size of the index and assert that the above INSERT added a value to the index even though it was dropped.

Done.


pkg/sql/tablewriter_delete.go, line 261 at r1 (raw file):

Previously, vivekmenezes wrote…

We should create a new function clearIndex() and use that instead of passing in a bool. In clearRange() you can assert that !idx.IsInterleaved

Done.

@vivekmenezes

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/sql/tablewriter_delete.go, line 306 at r2 (raw file):

) (roachpb.Span, error) {
	if idx.IsInterleaved() {
		if log.V(2) {

you can return an error here. There should be no reason why IsInterleaved() is true


pkg/storage/batcheval/cmd_clear_range.go, line 95 at r2 (raw file):

	if err := stateLoader.SetGCThreshold(ctx, batch, cArgs.Stats, &gcThreshold); err != nil {
		return result.Result{}, err
	}

We should have a test in storage that tests this. I'm adding @benesch as a reviewer

@vivekmenezes vivekmenezes requested a review from benesch Oct 15, 2018

@vivekmenezes

This comment has been minimized.

Show comment
Hide comment
@vivekmenezes

vivekmenezes Oct 15, 2018

Contributor

looks like this has a broken test in Import

Contributor

vivekmenezes commented Oct 15, 2018

looks like this has a broken test in Import

@eriktrinh

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/sql/tablewriter_delete.go, line 306 at r2 (raw file):

Previously, vivekmenezes wrote…

you can return an error here. There should be no reason why IsInterleaved() is true

It can be true because truncateIndexes (which calls clearIndex) is still called in the synchronous code path for interleaved indexes. In fact it will also call this for non-interleaved indexes in the synchronous code path if the cluster version does not allow for ClearRange (on a related note, do we need to add a new cluster version flag for truncating indexes?).

We should check canClearRangeForDrop once again in truncateIndexes and call this if it can, and deleteIndex otherwise.

@vivekmenezes

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/sql/tablewriter_delete.go, line 306 at r2 (raw file):

Previously, eriktrinh (Erik Trinh) wrote…

It can be true because truncateIndexes (which calls clearIndex) is still called in the synchronous code path for interleaved indexes. In fact it will also call this for non-interleaved indexes in the synchronous code path if the cluster version does not allow for ClearRange (on a related note, do we need to add a new cluster version flag for truncating indexes?).

We should check canClearRangeForDrop once again in truncateIndexes and call this if it can, and deleteIndex otherwise.

Let's maintain two separate code paths for the deleteIndex and clearIndex and return an error from clearIndex if it shouldn't have been called. Thanks!

@eriktrinh

looks like this has a broken test in Import

looks like there is a race, still taking a look

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/sql/tablewriter_delete.go, line 306 at r2 (raw file):

Previously, vivekmenezes wrote…

Let's maintain two separate code paths for the deleteIndex and clearIndex and return an error from clearIndex if it shouldn't have been called. Thanks!

Done.

@benesch

Reviewed 2 of 3 files at r3.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/storage/batcheval/cmd_clear_range.go, line 95 at r2 (raw file):

Previously, vivekmenezes wrote…

We should have a test in storage that tests this. I'm adding @benesch as a reviewer

I think the implications of this are problematic. Say the index is located on the same range as the primary key, which is the case for any table whose primary + index data is less than 64MB. Dropping that index would forward the GC threshold on that entire range to now, meaning that AOST queries on the primary key data will be impossible!

In fact, even if you assume that the index is on a separate range, I'm not sure this is sufficient? Don't AOST queries see the old version of the descriptor? So they'll try to use the index and explode because the index's range won't serve reads at that time.

Anyway, @tschottdorf is the expert here. I might be off base.

@tschottdorf

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/storage/batcheval/cmd_clear_range.go, line 95 at r2 (raw file):

Previously, benesch (Nikhil Benesch) wrote…

I think the implications of this are problematic. Say the index is located on the same range as the primary key, which is the case for any table whose primary + index data is less than 64MB. Dropping that index would forward the GC threshold on that entire range to now, meaning that AOST queries on the primary key data will be impossible!

In fact, even if you assume that the index is on a separate range, I'm not sure this is sufficient? Don't AOST queries see the old version of the descriptor? So they'll try to use the index and explode because the index's range won't serve reads at that time.

Anyway, @tschottdorf is the expert here. I might be off base.

Yeah, we wouldn't want to wipe the AOST window here. I'm probably lacking context here, but why do we need to do this? I was assuming the schema change machinery would delete this data only once it's not being accessed any more.

@benesch

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/storage/batcheval/cmd_clear_range.go, line 95 at r2 (raw file):
I think you actually answered that question a year ago. :)

One complication is that all index operations are serialized through the surrounding table descriptor, so that the naive approach of waiting for the TTL is not great (as it holds up unrelated schema changes).

#20696

sql: use ClearRange for index truncation
This change makes the deletion of index data use the ClearRange batch
request. DROP INDEX schema changes in the same transaction as the one
which created the table still uses the slower DelRange because
ClearRange cannot be run inside a transaction and will remove write
intents of the index keys which have not been resolved in the
transaction.

This deletion of index data happens once the GC TTL period has passed
and it is safe to remove all the data. See PR #30566 for which delays
this data deletion.

The GC threshold is also forwarded for the cleared range to prevent
reads and writes of index data.

Fixes #20696.

Release note (performance improvement): Deletion of index data is faster.
@eriktrinh

PTAL, updated the pr to move ahead with this despite the ClearRange limitation. The failing test from before was probably the race between the split queue and forwarding the GC threshold.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment