sql: fast path uniqueness checks for single-row insert #111822

msirek · 2023-10-05T07:14:47Z

This adds support for building and executing simple INSERT statements
with a single-row VALUES clause where any required uniqueness constraint
checks can be handled via a constrained scan on an index.

This includes INSERT cases such as:

a single-row VALUES clause into a REGIONAL BY ROW table with a
PRIMARY KEY which has a UUID column generated by default, ie.
id UUID PRIMARY KEY DEFAULT gen_random_uuid(), where the
crdb_region column is not specified in the VALUES clause; either
the gateway region is used or it is computed based on other column
values.
a single-row VALUES clause into a REGIONAL BY ROW table with a
hash sharded unique index where the crdb_region column is not specified
in the VALUES clause

In optbuild, when creating a uniqueness check for rows which are added
to a table, a fast path index check relation is also built when the
mutation source is a single-row values expression or WithScan from
a single-row values expression. That relation is a filtered
Select of a Scan from the target table, where the filters equate
all of the unique check columns with their corresponding
constants or placeholders from the Values expression. If there is a
uniqueness check with a partial index predicate, fast path is
disallowed.

A new exploration rule called InsertFastPath is added to walk the memo
group members created during exploration in FastPathUniqueChecks of
the InsertExpr, to find any which have been rewritten as a constrained
ScanExpr. If found, that means that Scan fully represents the lookup
needed to check for duplicate entries and the Scan constraint can be
used to identify the constants to use in a KV lookup on the scanned
index in a fast path check.

Function CanUseUniqueChecksForInsertFastPath walks the expressions
generated during exploration of the FastPathUniqueChecks.Check
relation. If a constrained scan is found, it is used to build elements
of the FastPathUniqueChecksItemPrivate structure to communicate to the
execbuilder the table and index to use for the check, and the column ids
in the insert row to use for building the fast path KV request. In
addition, a new DatumsFromConstraint field is added, consisting of a
ScalarListExpr of TupleExprs specifying the index key, which allows an
index key column to be matched with multiple Datums. One insert row may
result in more than one KV lookup for a given uniqueness constraint.
These items are used to build the InsertFastPathFKUniqCheck structure
in the execbuilder. The new FastPathUniqueChecksItemPrivate is built
into a new the corresponding FastPathUniqueChecksItems of a new
FastPathUniqueChecksExpr and communicated to the caller via return
value newFastPathUniqueChecks.

A small adjustment is made in the coster to make the fast path unique
constraint slightly cheaper, so it should always be chosen over the
original non-fast path check.

Epic: CRDB-26290
Fixes: #58047

Release note (performance improvement): This patch adds support for
insert fast-path uniqueness checks on REGIONAL BY ROW tables where
the source is a VALUES clause with a single row. This results in a
reduction in latency for single-row inserts to REGIONAL BY ROW tables
and hash-sharded REGIONAL BY ROW tables with unique indexes.

optbuilder: build insert fast-path uniq checks only if all checks can be built

This commit avoids unnecessary processing of insert fast path uniqueness
checks by modifying the optbuilder to avoid building a
FastPathUniqueChecksExpr in the Insert expression if one or more
of the required FastPathUniqueChecksItems could not be built.

Epic: CRDB-26290
Informs: #58047

Release note: None

cockroach-teamcity · 2023-10-05T07:14:57Z

This change is

msirek · 2023-10-05T15:21:02Z

This PR contains the main commit enabling insert fast-path uniqueness checks, split off from #102995.

mgartner

Reviewed 20 of 20 files at r1, 2 of 2 files at r2, 5 of 5 files at r3, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @msirek and @rharding6373)

-- commits line 21 at r1:
nit: I'm not sure it's necessary to make this point—it's equivalent to the first point.

pkg/sql/opt/exec/execbuilder/mutation.go line 182 at r1 (raw file):

			execFastPathCheck.InsertCols[j] = exec.TableColumnOrdinal(md.ColumnMeta(insertCol).Table.ColumnOrdinal(insertCol))
		}
		execFastPathCheck.MatchMethod = tree.MatchFull

I beleive match method is only relevant for foriegn key checks. Why is it needed here?

pkg/sql/opt/exec/execbuilder/mutation.go line 881 at r1 (raw file):

		}
		if !found {
			panic(errors.AssertionFailedf(

It's probably not guaranteed that a panic will be caught in the context that execFastPathCheck.MkErr is used. There's no benefit to panicking here if we've already plumbed a return error. So to avoid crashing the node, let'd just return the error here.

pkg/sql/opt/optbuilder/mutation_builder_unique.go line 553 at r3 (raw file):

		newPossibleScan := newScanScope.expr
		if skipProjectExpr, ok := newPossibleScan.(*memo.ProjectExpr); ok {
			newPossibleScan = skipProjectExpr.Input

It looks like you're lacking test coverage of this case. Do you have an example of when it is needed?

pkg/sql/opt/optbuilder/mutation_builder_unique.go line 561 at r3 (raw file):

		} else {
			// Don't build a fast-path check if we failed to create a new ScanExpr.
			buildFastPathCheck = false

Looks like there isn't coverage here either. Is it needed?

pkg/sql/opt/xform/insert_funcs.go line 43 at r1 (raw file):

		return memo.FastPathUniqueChecksExpr{}, false
	}
	if c.e.evalCtx == nil {

Can evalCtx be nil?

pkg/sql/opt/xform/insert_funcs.go line 88 at r1 (raw file):

		// appendNewUniqueChecksItem handles building of the FastPathCheck into a
		// new UniqueChecksItem for the new InsertExpr being constructed.
		appendNewUniqueChecksItem := func(private *memo.FastPathUniqueChecksItemPrivate) {

nit: I don't see much benefit in encapsulating this logic in an anonymous function when it's only used once below and it's fairly simple.

pkg/sql/opt/xform/insert_funcs.go line 125 at r1 (raw file):

		if indexJoinExpr, isIndexJoin := expr.(*memo.IndexJoinExpr); isIndexJoin {
			expr = indexJoinExpr.Input

Can't we only skip over the index join if there the filter is true?

pkg/sql/opt/xform/insert_funcs.go line 129 at r1 (raw file):

		if scan, isScan := expr.(*memo.ScanExpr); isScan {
			// We need a scan with a constraint for analysis, and it should not be
			// a contradiction (having no spans).

In what case would it be a contradiction?

pkg/sql/opt/xform/insert_funcs.go line 166 at r1 (raw file):

		span := scanExpr.Constraint.Spans.Get(j)
		// Verify there is a single key...
		if span.Prefix(scanExpr.Memo().EvalContext()) != span.StartKey().Length() {

I don't think this is sufficient. You should use span.HasSingleKey here instead.

pkg/sql/opt/xform/insert_funcs.go line 170 at r1 (raw file):

		}
		// ... and that the span has the same number of columns as the index key.
		if span.StartKey().Length() != numKeyCols {

nit: Length is likely a faster check than checking that it's a single key, so you could move this up.

pkg/sql/opt/xform/rules/insert.opt line 16 at r1 (raw file):

# only applies to an insert of values, and not other forms like INSERT SELECT.
# If fast path information has already been built, this rule will not try to
# rewrite the insert a second time.

I know we've spoken about this before, but can you explain why this logic is done during the exploration phase? Is there any cases where we don't want to do a fast path because there is a more efficient alternative?

pkg/ccl/logictestccl/testdata/logic_test/regional_by_row_query_behavior line 1640 at r1 (raw file):

                          table: regional_by_row_table@new_idx
                          spans: [/'ca-central-1'/1/1 - /'ca-central-1'/1/1] [/'us-east-1'/1/1 - /'us-east-1'/1/1]

nit: unnecessary new line

pkg/ccl/logictestccl/testdata/logic_test/regional_by_row_query_behavior line 3758 at r1 (raw file):

statement error pq: duplicate key value violates unique constraint "users2_email_key"\nDETAIL: Key \(email\)=\('craig@cockroachlabs.com'\) already exists\.
INSERT INTO users2 (crdb_region, name, email)
VALUES ('ca-central-1', 'Craig Roacher', 'craig@cockroachlabs.com')

Can you add a test where the two insert rows conflict with each other, but do not conflict with existing rows in the table.

pkg/ccl/logictestccl/testdata/logic_test/regional_by_row_query_behavior line 3763 at r1 (raw file):

query T
EXPLAIN INSERT INTO users2 (name, email)
VALUES ('Bill Roacher', 'bill@cockroachlabs.com'), ('Jill Roacher', 'jill@cockroachlabs.com')

We typically don't put EXPLAIN queries in logictests because they can flake due to the random column families that logic tests can assign. EXPLAIN queries can go in logic tests within execbuilder tests where those mutations are not made. But since these require CCL features, I'd move these to a separate query_behavior test file and make the column families explicit so they don't flake. Example: https://github.com/cockroachdb/cockroach/blob/b34827ad7e26169dbc7213ecb542194e06376624/pkg/ccl/logictestccl/testdata/logic_test/regional_by_row_query_behavior

Don't forget the other EXPLAIN tests below.

pkg/ccl/logictestccl/testdata/logic_test/regional_by_row_query_behavior line 3863 at r1 (raw file):

ALTER TABLE users2 ADD CONSTRAINT users4_fk FOREIGN KEY (name) REFERENCES users4 (name) ON DELETE CASCADE

# Verify multi-row insert fast path detects FK constraint violations.

nit "FK" should be "unique", correct?

pkg/ccl/logictestccl/testdata/logic_test/regional_by_row_query_behavior line 4144 at r1 (raw file):

statement ok
CREATE TABLE hash_sharded_rbr_computed (
	region_id STRING(10) NOT NULL,

nit: replace tabs with spaces here

pkg/sql/logictest/testdata/logic_test/unique line 965 at r1 (raw file):

  INDEX (b,a) STORING(pk2, j),
  INVERTED INDEX (j),
  FAMILY (pk, pk2, a, b)

Don't you want these tables to mimic regional by row tables with a UNIQUE(prefix_col, other_cols...) and UNIQUE WITHOUT INDEX (other_cols...)? Or is there another reason for these tests?

pkg/sql/logictest/testdata/logic_test/unique line 1009 at r1 (raw file):

statement error pq: duplicate key value violates unique constraint "unique_b_c"\nDETAIL: Key \(b, c\)=\(2, 3\) already exists\.
INSERT INTO multiple_uniq (a,c,b,d)
VALUES (11,3,2,14), (15,16,17,18)

Can you add a test where the two insert rows conflict with each other, but do not conflict with existing rows in the table here too.

pkg/sql/opt/exec/execbuilder/testdata/unique line 4800 at r1 (raw file):

  INDEX (b,a) STORING(pk2, j),
  INVERTED INDEX (j),
  FAMILY (pk, pk2, a, b)

This should mimic a RBR table, shouldn't it?

msirek

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @rharding6373)

-- commits line 21 at r1:

Previously, mgartner (Marcus Gartner) wrote…

nit: I'm not sure it's necessary to make this point—it's equivalent to the first point.

OK, removed that point

pkg/sql/opt/exec/execbuilder/mutation.go line 182 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

I beleive match method is only relevant for foriegn key checks. Why is it needed here?

You're right. Removed it.

pkg/sql/opt/exec/execbuilder/mutation.go line 881 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

It's probably not guaranteed that a panic will be caught in the context that execFastPathCheck.MkErr is used. There's no benefit to panicking here if we've already plumbed a return error. So to avoid crashing the node, let'd just return the error here.

Done

pkg/sql/opt/optbuilder/mutation_builder_unique.go line 553 at r3 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

It looks like you're lacking test coverage of this case. Do you have an example of when it is needed?

Hash-sharded RBR tables sometimes need to do this, like the following case:
https://github.com/msirek/cockroach/blob/6ab911a9d6c8efe420487224d8c5125ec4741001/pkg/ccl/logictestccl/testdata/logic_test/regional_by_row_query_behavior#L4156-L4171

Added a code comment.

pkg/sql/opt/optbuilder/mutation_builder_unique.go line 561 at r3 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Looks like there isn't coverage here either. Is it needed?

Even if a test case cannot be found, this code is needed because if the new scan cannot be constructed or found, there is no way to build a fast path check for this case.

pkg/sql/opt/xform/insert_funcs.go line 43 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Can evalCtx be nil?

Yes. I believe the errors I saw were in tests using hypothetical indexes.

pkg/sql/opt/xform/insert_funcs.go line 88 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit: I don't see much benefit in encapsulating this logic in an anonymous function when it's only used once below and it's fairly simple.

OK, removed the function.

pkg/sql/opt/xform/insert_funcs.go line 125 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Can't we only skip over the index join if there the filter is true?

As far as I can tell, index join has no filter. It just unconditionally looks up the primary index row, right?

pkg/sql/opt/xform/insert_funcs.go line 129 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

In what case would it be a contradiction?

Maybe if there is a check constraint on the table that contradicts the unique check filters. That case is not optimized right now, e.g., by allowing the fast path checks, but skipping that particular check, so fast path is just disabled.

pkg/sql/opt/xform/insert_funcs.go line 166 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

I don't think this is sufficient. You should use span.HasSingleKey here instead.

Modified as suggested

pkg/sql/opt/xform/insert_funcs.go line 170 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit: Length is likely a faster check than checking that it's a single key, so you could move this up.

Done

pkg/sql/opt/xform/rules/insert.opt line 16 at r1 (raw file):
One reason is that placeholders are not filled in during optbuild, so for normalization vs. exploration, exploration phase is required. For execbuilder vs. exploration, building of check constraint filters, derivation of hash-sharded column constraints, invisible index handling and costing occur during exploration. Re-implementing these in the execbuilder is possible, but means we'd have essentially duplicated logic for every new feature which is added needing fast-path support.

Is there any cases where we don't want to do a fast path because there is a more efficient alternative?

It's true, exploration has some processing overhead. The overhead may be very low compared to latencies of performing the actual KV lookups as it's just exploring a single Select and Scan. Though, it may be possible to further optimize it. For example, instead of keeping the fast path check relation as a child of FastPathUniqueChecksItem, push it into FastPathUniqueChecksItemPrivate and change this rule to only call GenerateConstrainedScans on the check relation. This way less possible exploration functions are fired.

I guess the point is that there are benefits to keeping the logic in the optimizer/exploration phase, the overhead of exploring a single Select/Scan is low compared to KV lookup times, and there are ways to reduce exploration costs by limiting exploration to a single rule, if need be.

pkg/ccl/logictestccl/testdata/logic_test/regional_by_row_query_behavior line 1640 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit: unnecessary new line

Removed

pkg/ccl/logictestccl/testdata/logic_test/regional_by_row_query_behavior line 3863 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit "FK" should be "unique", correct?

Fixed

pkg/ccl/logictestccl/testdata/logic_test/regional_by_row_query_behavior line 4144 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit: replace tabs with spaces here

Done

msirek

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @rharding6373)

pkg/ccl/logictestccl/testdata/logic_test/regional_by_row_query_behavior line 3758 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Can you add a test where the two insert rows conflict with each other, but do not conflict with existing rows in the table.

Multirow insert doesn't currently use fast path, so that would just be testing pre-existing functionality.

pkg/sql/logictest/testdata/logic_test/unique line 965 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Don't you want these tables to mimic regional by row tables with a UNIQUE(prefix_col, other_cols...) and UNIQUE WITHOUT INDEX (other_cols...)? Or is there another reason for these tests?

No, logic tests for REGIONAL BY ROW tables don't need to use mimicking, and are in file regional_by_row_query_behavior. These are additional tests that check that for general functionality of UNIQUE WITHOUT INDEX with insert fast path.

pkg/sql/logictest/testdata/logic_test/unique line 1009 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Can you add a test where the two insert rows conflict with each other, but do not conflict with existing rows in the table here too.

Multirow insert doesn't currently use fast path, so that would just be testing pre-existing functionality.

msirek

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @rharding6373)

pkg/ccl/logictestccl/testdata/logic_test/regional_by_row_query_behavior line 3763 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

We typically don't put EXPLAIN queries in logictests because they can flake due to the random column families that logic tests can assign. EXPLAIN queries can go in logic tests within execbuilder tests where those mutations are not made. But since these require CCL features, I'd move these to a separate query_behavior test file and make the column families explicit so they don't flake. Example: https://github.com/cockroachdb/cockroach/blob/b34827ad7e26169dbc7213ecb542194e06376624/pkg/ccl/logictestccl/testdata/logic_test/regional_by_row_query_behavior

Don't forget the other EXPLAIN tests below.

Thanks for the tip. OK, I've moved the EXPLAIN queries where the tables don't define column families to a new file, and kept the actual query runs in place so that random column families will provide better test coverage. For tables already defining column families, I left those EXPLAIN queries in place so you can see the query plan alongside the actual execution.

pkg/sql/opt/exec/execbuilder/testdata/unique line 4800 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

This should mimic a RBR table, shouldn't it?

No, these are general UNIQUE WITHOUT INDEX cases. I've gone ahead and added a test which mimics an RBR table though.

mgartner

Reviewed 1 of 7 files at r4, 10 of 10 files at r6, 5 of 5 files at r7, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @msirek and @rharding6373)

pkg/sql/opt/exec/execbuilder/mutation.go line 881 at r1 (raw file):

Previously, msirek (Mark Sirek) wrote…

Done

This should still be an assertion error if the key values cannot be found, like it was before, but without the panic, because it is unexpected that we would be unable to find the key values and because mkUniqueCheckErr has not been proven to work with partial or no key values.

pkg/sql/opt/optbuilder/mutation_builder_unique.go line 561 at r3 (raw file):

Previously, msirek (Mark Sirek) wrote…

Even if a test case cannot be found, this code is needed because if the new scan cannot be constructed or found, there is no way to build a fast path check for this case.

Any reason not to just return uniqueChecks, memo.FastPathUniqueChecksItem{} here then?

pkg/sql/opt/xform/insert_funcs.go line 43 at r1 (raw file):

Previously, msirek (Mark Sirek) wrote…

Yes. I believe the errors I saw were in tests using hypothetical indexes.

That's surprising because there are many places where c.e.evalCtx is accessed without checking if it's nil, like here:

cockroach/pkg/sql/opt/xform/scan_funcs.go

Line 115 in 452a2de

if !c.e.evalCtx.SessionData().LocalityOptimizedSearch {

Do you think it can now be nil due to some change in this PR? If this is now required we need to understand why and verify that other accesses of c.e.evalCtx are not affected.

pkg/sql/opt/xform/insert_funcs.go line 125 at r1 (raw file):

Previously, msirek (Mark Sirek) wrote…

As far as I can tell, index join has no filter. It just unconditionally looks up the primary index row, right?

Right, sorry I was confused.

pkg/sql/opt/xform/insert_funcs.go line 118 at r6 (raw file):

	var scanExpr *memo.ScanExpr
	for check = check.FirstExpr(); check != nil; check = check.NextExpr() {

Iterating through members of the memo group is an odd thing to do in an exploration rule. Typically (maybe always) we rely on the code generated for all exploration rules to explore each equivalent expression in a group—I believe you'd need to move some of the pattern matching into the optgen portion of the rule to achieve this. Can you add an explanation why this iteration is done instead, if there is a specific reason.

pkg/sql/opt/xform/insert_funcs.go line 168 at r6 (raw file):

		}
		// ... and that the span has a single key.
		if !span.HasSingleKey(scanExpr.Memo().EvalContext()) {

nit: use c.e.evalCtx for consistency instead of scanExpr.Memo().EvalContext().

pkg/sql/opt/xform/rules/insert.opt line 16 at r1 (raw file):

Previously, msirek (Mark Sirek) wrote…

One reason is that placeholders are not filled in during optbuild, so for normalization vs. exploration, exploration phase is required. For execbuilder vs. exploration, building of check constraint filters, derivation of hash-sharded column constraints, invisible index handling and costing occur during exploration. Re-implementing these in the execbuilder is possible, but means we'd have essentially duplicated logic for every new feature which is added needing fast-path support.

Is there any cases where we don't want to do a fast path because there is a more efficient alternative?

It's true, exploration has some processing overhead. The overhead may be very low compared to latencies of performing the actual KV lookups as it's just exploring a single Select and Scan. Though, it may be possible to further optimize it. For example, instead of keeping the fast path check relation as a child of FastPathUniqueChecksItem, push it into FastPathUniqueChecksItemPrivate and change this rule to only call GenerateConstrainedScans on the check relation. This way less possible exploration functions are fired.

I guess the point is that there are benefits to keeping the logic in the optimizer/exploration phase, the overhead of exploring a single Select/Scan is low compared to KV lookup times, and there are ways to reduce exploration costs by limiting exploration to a single rule, if need be.

The exploration phase's purpose is to explore alternative query plans, so the question I was asking is "are there any cases where we would prefer a query plan without an insert fast path over a query with a fast path insert"? If the answer is "no", this rule needs some explanation for why this rule breaks from the normal pattern of exploration rules.

I agree there is some benefit to relying on exsiting xform logic for check constraint filters, derivation of hash-sharded column constraints—but my comment here isn't related to that logic, it's related to the logic of this rule specifically.

pkg/ccl/logictestccl/testdata/logic_test/regional_by_row_query_behavior line 3758 at r1 (raw file):

Previously, msirek (Mark Sirek) wrote…

Multirow insert doesn't currently use fast path, so that would just be testing pre-existing functionality.

Ahh good point.

pkg/ccl/logictestccl/testdata/logic_test/regional_by_row_query_behavior line 3763 at r1 (raw file):

Previously, msirek (Mark Sirek) wrote…

Thanks for the tip. OK, I've moved the EXPLAIN queries where the tables don't define column families to a new file, and kept the actual query runs in place so that random column families will provide better test coverage. For tables already defining column families, I left those EXPLAIN queries in place so you can see the query plan alongside the actual execution.

Thanks. You've switched the file names though—"query_behavior" should be the file with the EXPLAIN plans.

pkg/sql/opt/exec/execbuilder/testdata/unique line 5370 at r6 (raw file):

  INDEX (b,a) STORING(pk2, j),
  INVERTED INDEX (j),
  FAMILY (pk, pk2, a, b)

This does not mimic a RBR table. A unique index in a RBR table is syntactic sugar:

CREATE TABLE t (
  a INT,
  UNIQUE (a)
) REGIONAL BY ROW;
-- Is equivalent to:
CREATE TABLE t (
  region STRING, -- Using STRING, but would be crdb_region in practice.
  a INT,
  UNIQUE (region, a),
  UNIQUE WITHOUT INDEX (a),
  CHECK (region IN (...))
);

msirek

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @rharding6373)

pkg/sql/opt/exec/execbuilder/mutation.go line 881 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

This should still be an assertion error if the key values cannot be found, like it was before, but without the panic, because it is unexpected that we would be unable to find the key values and because mkUniqueCheckErr has not been proven to work with partial or no key values.

Makes sense. Made the change.

pkg/sql/opt/optbuilder/mutation_builder_unique.go line 561 at r3 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Any reason not to just return uniqueChecks, memo.FastPathUniqueChecksItem{} here then?

Yes, that's cleaner. I've moved the building of fastPathChecks below uniqueChecks.

pkg/sql/opt/xform/insert_funcs.go line 43 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

That's surprising because there are many places where c.e.evalCtx is accessed without checking if it's nil, like here:

cockroach/pkg/sql/opt/xform/scan_funcs.go

Line 115 in 452a2de

if !c.e.evalCtx.SessionData().LocalityOptimizedSearch {

Do you think it can now be nil due to some change in this PR? If this is now required we need to understand why and verify that other accesses of c.e.evalCtx are not affected.

I'll comment this out and see which, if any, tests fail. Then we can either remove it or add a comment on it.

pkg/sql/opt/xform/insert_funcs.go line 168 at r6 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit: use c.e.evalCtx for consistency instead of scanExpr.Memo().EvalContext().

Done

pkg/ccl/logictestccl/testdata/logic_test/regional_by_row_query_behavior line 3763 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Thanks. You've switched the file names though—"query_behavior" should be the file with the EXPLAIN plans.

Ahh, corrected this. Thanks!

pkg/sql/opt/exec/execbuilder/testdata/unique line 5370 at r6 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

This does not mimic a RBR table. A unique index in a RBR table is syntactic sugar:

CREATE TABLE t (
  a INT,
  UNIQUE (a)
) REGIONAL BY ROW;
-- Is equivalent to:
CREATE TABLE t (
  region STRING, -- Using STRING, but would be crdb_region in practice.
  a INT,
  UNIQUE (region, a),
  UNIQUE WITHOUT INDEX (a),
  CHECK (region IN (...))
);

I think I'm doing about the same thing, but in your version you have UNIQUE (region, a) while I have INDEX (b,a) STORING(pk2, j). Maybe the underlying index should be unique too, so I've updated the test case to use UNIQUE INDEX.

msirek · 2023-10-07T18:06:03Z

Can evalCtx be nil?

Yes. I believe the errors I saw were in tests using hypothetical indexes.

That's surprising because there are many places where c.e.evalCtx is accessed without checking if it's nil, like here:

cockroach/pkg/sql/opt/xform/scan_funcs.go

Line 115 in 452a2de

if !c.e.evalCtx.SessionData().LocalityOptimizedSearch {

Do you think it can now be nil due to some change in this PR? If this is now required we need to understand why and verify that other accesses of c.e.evalCtx are not affected.

I retested this with the nil pointer check removed, and no errors occurred, so I have removed it.

msirek

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @rharding6373)

pkg/sql/opt/xform/insert_funcs.go line 118 at r6 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Iterating through members of the memo group is an odd thing to do in an exploration rule. Typically (maybe always) we rely on the code generated for all exploration rules to explore each equivalent expression in a group—I believe you'd need to move some of the pattern matching into the optgen portion of the rule to achieve this. Can you add an explanation why this iteration is done instead, if there is a specific reason.

As discussed offline it doesn't appear that the optgen language provides pattern matching functionality with state, meaning we'd have to generate a new expression for each matched FastPathUniqueChecksItem.
As agreed, keeping this as-is for now.

Possible alternative mentioned offline: 'A normalization rule that just "fills in" the fast path unique checks when a constrained scan is built within it'

pkg/sql/opt/xform/rules/insert.opt line 16 at r1 (raw file):
No, I don't think non-fast path would ever be preferred over fast path. There should be less overhead and more parallelization possibilities with fast path.

If the answer is "no", this rule needs some explanation for why this rule breaks from the normal pattern of exploration rules.

The optgen language doesn't seem to have a component which allows pattern matching with state (e.g. collection of a list of items which all satisfy a given property, with one resulting transformation). So, the only alternative is logic in a custom function.

msirek · 2023-10-07T19:12:27Z

I've added a check to disallow fast path uniqueness checks involving volatile expressions, as discussed offline.
This could potentially be supported in the future by grabbing the value from the insert row itself, instead of the constraint span, in the execbuilder.

This adds support for building and executing simple INSERT statements with a single-row VALUES clause where any required uniqueness constraint checks can be handled via a constrained scan on an index. This includes INSERT cases such as: - a single-row VALUES clause into a REGIONAL BY ROW table with a PRIMARY KEY which has a UUID column generated by default, ie. `id UUID PRIMARY KEY DEFAULT gen_random_uuid()`, where the crdb_region column is not specified in the VALUES clause; either the gateway region is used or it is computed based on other column values. - a single-row VALUES clause into a REGIONAL BY ROW table with a hash sharded unique index where the crdb_region column is not specified in the VALUES clause In optbuild, when creating a uniqueness check for rows which are added to a table, a fast path index check relation is also built when the mutation source is a single-row values expression or WithScan from a single-row values expression. That relation is a filtered Select of a Scan from the target table, where the filters equate all of the unique check columns with their corresponding constants or placeholders from the Values expression. If there is a uniqueness check with a partial index predicate, fast path is disallowed. A new exploration rule called InsertFastPath is added to walk the memo group members created during exploration in `FastPathUniqueChecks` of the `InsertExpr`, to find any which have been rewritten as a constrained `ScanExpr`. If found, that means that Scan fully represents the lookup needed to check for duplicate entries and the Scan constraint can be used to identify the constants to use in a KV lookup on the scanned index in a fast path check. Function CanUseUniqueChecksForInsertFastPath walks the expressions generated during exploration of the `FastPathUniqueChecks.Check` relation. If a constrained scan is found, it is used to build elements of the `FastPathUniqueChecksItemPrivate` structure to communicate to the execbuilder the table and index to use for the check, and the column ids in the insert row to use for building the fast path KV request. In addition, a new `DatumsFromConstraint` field is added, consisting of a ScalarListExpr of TupleExprs specifying the index key, which allows an index key column to be matched with multiple Datums. One insert row may result in more than one KV lookup for a given uniqueness constraint. These items are used to build the `InsertFastPathFKUniqCheck` structure in the execbuilder. The new `FastPathUniqueChecksItemPrivate` is built into a new the corresponding `FastPathUniqueChecksItem`s of a new `FastPathUniqueChecksExpr` and communicated to the caller via return value `newFastPathUniqueChecks`. A small adjustment is made in the coster to make the fast path unique constraint slightly cheaper, so it should always be chosen over the original non-fast path check. Epic: CRDB-26290 Fixes: cockroachdb#58047 Release note (performance improvement): This patch adds support for insert fast-path uniqueness checks on REGIONAL BY ROW tables where the source is a VALUES clause with a single row. This results in a reduction in latency for single-row inserts to REGIONAL BY ROW tables and hash-sharded REGIONAL BY ROW tables with unique indexes.

… be built This commit avoids unnecessary processing of insert fast path uniqueness checks by modifying the optbuilder to avoid building a `FastPathUniqueChecksExpr` in the Insert expression if one or more of the required `FastPathUniqueChecksItem`s could not be built. Epic: CRDB-26290 Informs: cockroachdb#58047 Release note: None

mgartner

Reviewed 3 of 9 files at r8, 1 of 6 files at r10, 5 of 5 files at r14, 5 of 5 files at r15, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @msirek and @rharding6373)

pkg/sql/opt/xform/testdata/rules/insert line 344 at r14 (raw file):

# random() because the evaluation used in the check relation may differ from the
# value used in the insert row.
opt format=show-fastpathchecks expect-not=InsertFastPath

This check is within optbuilder, so it'd make more sense to put the test in an optbuilder test—in addition optbuilder tests print the fast path check expressions by default, which is nice.

msirek

TFTR!
bors r=mgartner

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @mgartner and @rharding6373)

pkg/sql/opt/xform/testdata/rules/insert line 344 at r14 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

This check is within optbuilder, so it'd make more sense to put the test in an optbuilder test—in addition optbuilder tests print the fast path check expressions by default, which is nice.

Perhaps, but this test also has expect and expect-not checks to test when the rule fires, so we'd have to keep a duplicate of the test as a rules test to keep that functionality.

craig · 2023-10-09T17:17:10Z

Build failed (retrying...):

Bazel Essential CI (Cockroach)

msirek · 2023-10-09T17:17:42Z

bors retry

craig · 2023-10-09T17:17:44Z

Already running a review

craig · 2023-10-09T18:53:10Z

Build succeeded:

Bazel Essential CI (Cockroach)

mgartner · 2023-10-09T19:51:02Z

pkg/sql/opt/xform/testdata/rules/insert line 344 at r14 (raw file):

Previously, msirek (Mark Sirek) wrote…

Perhaps, but this test also has expect and expect-not checks to test when the rule fires, so we'd have to keep a duplicate of the test as a rules test to keep that functionality.

The expect-not isn't really relevant here—you're testing that optbuilder doesn't build a fk-fast-path-check if an expression in the VALUES clause of the insert is volatile, which is not related to the exploration rule.

msirek

Reviewable status: complete! 1 of 0 LGTMs obtained

pkg/sql/opt/xform/testdata/rules/insert line 344 at r14 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

The expect-not isn't really relevant here—you're testing that optbuilder doesn't build a fk-fast-path-check if an expression in the VALUES clause of the insert is volatile, which is not related to the exploration rule.

Sorry, I thought you meant change the whole test file to be an optbuilder test. Sure, I can open a PR to move this one test somewhere else.

msirek requested review from mgartner and rharding6373 October 5, 2023 07:14

msirek requested a review from a team as a code owner October 5, 2023 07:14

msirek force-pushed the insert-fast-path-uniqueness-checks branch from 2e69f03 to 1a95cc8 Compare October 5, 2023 19:14

msirek mentioned this pull request Oct 5, 2023

sql: add new check relation to help build insert fast path uniqueness checks #111403

Merged

mgartner requested changes Oct 5, 2023

View reviewed changes

msirek force-pushed the insert-fast-path-uniqueness-checks branch from 1a95cc8 to 0e9e42f Compare October 5, 2023 23:40

msirek commented Oct 5, 2023

View reviewed changes

msirek commented Oct 6, 2023

View reviewed changes

msirek force-pushed the insert-fast-path-uniqueness-checks branch from 0e9e42f to 4fba820 Compare October 6, 2023 08:02

msirek commented Oct 6, 2023

View reviewed changes

msirek requested a review from mgartner October 6, 2023 14:37

mgartner requested changes Oct 6, 2023

View reviewed changes

msirek force-pushed the insert-fast-path-uniqueness-checks branch from 4fba820 to e33ad40 Compare October 6, 2023 17:31

msirek commented Oct 6, 2023

View reviewed changes

msirek force-pushed the insert-fast-path-uniqueness-checks branch from e33ad40 to 7edd6d2 Compare October 7, 2023 18:05

msirek commented Oct 7, 2023

View reviewed changes

msirek requested a review from mgartner October 7, 2023 18:21

msirek force-pushed the insert-fast-path-uniqueness-checks branch from 7edd6d2 to 21d1e8b Compare October 7, 2023 19:07

Mark Sirek added 2 commits October 7, 2023 14:49

msirek force-pushed the insert-fast-path-uniqueness-checks branch from 21d1e8b to 2fefefc Compare October 7, 2023 21:49

mgartner approved these changes Oct 9, 2023

View reviewed changes

msirek commented Oct 9, 2023

View reviewed changes

craig bot merged commit 69731b1 into cockroachdb:master Oct 9, 2023
8 checks passed

msirek commented Oct 9, 2023

View reviewed changes

msirek mentioned this pull request Oct 10, 2023

sql: fast path uniqueness checks for single-row insert #102995

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: fast path uniqueness checks for single-row insert #111822

sql: fast path uniqueness checks for single-row insert #111822

msirek commented Oct 5, 2023 •

edited

cockroach-teamcity commented Oct 5, 2023

msirek commented Oct 5, 2023

mgartner left a comment

msirek left a comment

msirek left a comment

msirek left a comment

mgartner left a comment

msirek left a comment

msirek commented Oct 7, 2023

msirek left a comment

msirek commented Oct 7, 2023

mgartner left a comment

msirek left a comment

craig bot commented Oct 9, 2023

msirek commented Oct 9, 2023

craig bot commented Oct 9, 2023

craig bot commented Oct 9, 2023

mgartner commented Oct 9, 2023

msirek left a comment

sql: fast path uniqueness checks for single-row insert #111822

sql: fast path uniqueness checks for single-row insert #111822

Conversation

msirek commented Oct 5, 2023 • edited

cockroach-teamcity commented Oct 5, 2023

msirek commented Oct 5, 2023

mgartner left a comment

Choose a reason for hiding this comment

msirek left a comment

Choose a reason for hiding this comment

msirek left a comment

Choose a reason for hiding this comment

msirek left a comment

Choose a reason for hiding this comment

mgartner left a comment

Choose a reason for hiding this comment

msirek left a comment

Choose a reason for hiding this comment

msirek commented Oct 7, 2023

msirek left a comment

Choose a reason for hiding this comment

msirek commented Oct 7, 2023

mgartner left a comment

Choose a reason for hiding this comment

msirek left a comment

Choose a reason for hiding this comment

craig bot commented Oct 9, 2023

msirek commented Oct 9, 2023

craig bot commented Oct 9, 2023

craig bot commented Oct 9, 2023

mgartner commented Oct 9, 2023

msirek left a comment

Choose a reason for hiding this comment

msirek commented Oct 5, 2023 •

edited