sql: fast path uniqueness checks for single-row insert #102995

msirek · 2023-05-10T05:59:11Z

sql: add new check relation to help build insert fast path uniqueness checks

This creates a FastPathUniqueChecks list as a child of InsertExpr,
similar to UniqueChecks, but consisting of relational expressions
which can be used to build a fast path uniqueness check using a
particular index. Each item in the FastPathUniqueChecks list is a
FastPathUniqueChecksItem whose Check expression, if the unique
constraint is provisionally eligible to use a fast path uniqueness
check, is a filtered scan from the target table of an
INSERT INTO ... VALUES statement. It is only built if the
VALUES clause has a single row. The filter equates each of the unique
constraint key columns with its corresponding value from the VALUES
clause. The values may either be a ValuesExpr or a WithScanExpr
whose source is a ValuesExpr. During exploration,
GenerateConstrainedScans will find all valid constrained index scans
which are equivalent to the original filtered Scan. The built relations
are not executed or displayed in the EXPLAIN, but will be analyzed in a
later commit to specify the index and index key values to use for insert
fast path uniqueness checks.

FastPathUniqueChecksItemPrivate is added with elements which are
designed to communicate details of how to build a fast path uniqueness
check to the execbuilder.

Epic: CRDB-26290
Informs: #58047

Release note: None

memo: optbuild unit tests for fast path unique check exprs

This commit adds the ExprFmtHideFastPathChecks flag, which hides fast
path check expressions from the output of expression formatting. It is
used in all places except TestBuilder. Also, unit tests for
buildFiltersForFastPathCheck are added.

Epic: CRDB-26290
Informs: #58047

Release note: None

xform: opt test for insert fast path checks

This commit adds an insert transformation rules test which
displays optimized fast path checks which are built in the optbuilder
and later explored and optimized. This test is modified in a later
commit to test for rule firing.

Epic: CRDB-26290
Informs: #58047

Release note: None

sql: insert fast path uniq checks, execution support

This commit wires up insert fast path execution to support fast path
uniqueness checks via KV Scan requests in a similar method as is
currently done for foreign key checks. Except with uniqueness checks, the
statement errors out if a row is found instead of if a row is not found.
Both addUniqChecks and runUniqChecks functions are provided to
initialize the checks and run them. Interfaces will be modified in a
subsequent commit to utilize the new methods.
exec.InsertFastPathFKCheck is renamed to
exec.InsertFastPathFKUniqCheck to reflect that it will be used to
build both foreign key and uniqueness fast path checks.

A DatumsFromConstraint field is added to
exec.InsertFastPathFKUniqCheck.
DatumsFromConstraint contains constant values from the WHERE clause
constraint which are part of the index key to look up. The number of
entries corresponds with the number of KV lookups. For example, when built
from analyzing a locality-optimized operation accessing 1 local region and
remote regions, the resulting DatumsFromConstraint would have 3 entries.

A TestAddUniqChecks unit test is added.

Epic: CRDB-26290
Informs: #58047

Release note: None

eval: add AutoCommit method to Planner interface.

This commit adds an AutoCommit method to the Planner interface
AutoCommit indicates whether the Planner has flagged the current
statement as eligible for transaction auto-commit.

Epic: CRDB-26290
Informs: #58047

Release note: None

sql: fast path uniqueness checks for single-row insert

This adds support for building and executing simple INSERT statements
with a single-row VALUES clause where any required uniqueness constraint
checks can be handled via a constrained scan on an index.

This includes INSERT cases such as:

a single-row VALUES clause into a REGIONAL BY ROW table with a
PRIMARY KEY which has a UUID column generated by default, ie.
id UUID PRIMARY KEY DEFAULT gen_random_uuid(), where the
crdb_region column is not specified in the VALUES clause; either
the gateway region is used or it is computed based on other column
values.
a single-row VALUES clause into a REGIONAL BY ROW table with a
hash sharded unique index where the crdb_region column is not specified
in the VALUES clause
a single-row VALUES clause into a table with an experimental UNIQUE
WITHOUT INDEX constraint which also has an index with leading key
columns matching the constraint columns.

In optbuild, when creating a uniqueness check for rows which are added
to a table, a fast path index check relation is also built when the
mutation source is a single-row values expression or WithScan from
a single-row values expression. That relation is a filtered
Select of a Scan from the target table, where the filters equate
all of the unique check columns with their corresponding
constants or placeholders from the Values expression. If there is a
uniqueness check with a partial index predicate, fast path is
disallowed.

A new exploration rule called InsertFastPath is added to walk the memo
group members created during exploration in FastPathUniqueChecks of
the InsertExpr, to find any which have been rewritten as a constrained
ScanExpr. If found, that means that Scan fully represents the lookup
needed to check for duplicate entries and the Scan constraint can be
used to identify the constants to use in a KV lookup on the scanned
index in a fast path check.

Function CanUseUniqueChecksForInsertFastPath walks the expressions
generated during exploration of the FastPathUniqueChecks.Check
relation. If a constrained scan is found, it is used to build elements
of the FastPathUniqueChecksItemPrivate structure to communicate to the
execbuilder the table and index to use for the check, and the column ids
in the insert row to use for building the fast path KV request. In
addition, a new DatumsFromConstraint field is added, consisting of a
ScalarListExpr of TupleExprs specifying the index key, which allows an
index key column to be matched with multiple Datums. One insert row may
result in more than one KV lookup for a given uniqueness constraint.
These items are used to build the InsertFastPathFKUniqCheck structure
in the execbuilder. The new FastPathUniqueChecksItemPrivate is built
into a new the corresponding FastPathUniqueChecksItems of a new
FastPathUniqueChecksExpr and communicated to the caller via return
value newFastPathUniqueChecks.

A small adjustment is made in the coster to make the fast path unique
constraint slightly cheaper, so it should always be chosen over the
original non-fast path check.

Epic: CRDB-26290
Fixes: #58047

Release note (performance improvement): This patch adds support for
insert fast-path uniqueness checks on REGIONAL BY ROW tables where
the source is a VALUES clause with a single row. This results in a
reduction in latency for single-row inserts to REGIONAL BY ROW tables
and hash-sharded REGIONAL BY ROW tables with unique indexes.

sql: run insert fast-path FK checks in parallel with uniqueness checks

This commit combines the foreign key checks of insert fast-path with the
uniqueness checks into a single batch so that they may be processed in
parallel for reduced latency.

Epic: CRDB-26290
Informs: #58047

Release note: None

bench: insert fast path single row unique check benchmark

This commit adds a small benchmark for single-row insert fast path with
a UNIQUE WITHOUT INDEX constraint.

BenchmarkUniqInsert/SingleRow
BenchmarkUniqInsert/SingleRow/NoFastPath
BenchmarkUniqInsert/SingleRow/NoFastPath-10                 1623            634627 ns/op
BenchmarkUniqInsert/SingleRow/FastPath
BenchmarkUniqInsert/SingleRow/FastPath-10                   3058            399215 ns/op

Epic: CRDB-26290
Informs: #58047

Release note: None

xform: minimize the number of KV lookups for fast-path unique checks

This commit optimizes fast-path uniqueness checks by picking index scans
which minimize the number of constraint spans which in turn minimizes
the number of KV lookups performed by the check. A latency cost is
associated with each KV lookup, so it's desirable to use as few KV
requests as possible.

A TODO is added to make this a cost-based decision in the future, where
latency costs are considered.

Epic: CRDB-26290
Informs: #58047

Release note: None

cockroach-teamcity · 2023-05-10T05:59:19Z

This change is

yuzefovich

Reviewed 11 of 14 files at r1, 1 of 2 files at r3.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @msirek and @rytaft)

pkg/sql/insert_fast_path.go line 191 at r1 (raw file):

		// Combine them together to get the final index prefix key to search.
		for _, templateRow := range c.DatumsFromConstraint {
			// Each row must have its own memory as it is part of a batch.

I'm confused by this - I'd assume that the BatchRequest would only contain the span generated based on the combined row which should internally copy datums into roachpb.Keys, no?

pkg/sql/insert_fast_path.go line 487 at r3 (raw file):

		// Perform the FK checks.
		// TODO(radu): we could run the FK batch in parallel with the main batch (if

nit: do you plan on addressing this TODO as well? It seems like it'd be nice to do it since you're in this area.

pkg/sql/opt/exec/factory.opt line 468 at r1 (raw file):

#  - all FK checks can be performed using direct lookups into unique indexes.
#  - all UNIQUE WITHOUT INDEX checks can be performed using direct lookups
#  - into an index (could be a key prefix and not the entire key).

nit: probably we don't want - before "into an index"?

msirek

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rytaft and @yuzefovich)

pkg/sql/insert_fast_path.go line 191 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

I'm confused by this - I'd assume that the BatchRequest would only contain the span generated based on the combined row which should internally copy datums into roachpb.Keys, no?

Good catch. I updated the logic to allocate the combined row once, and updated the comment.

pkg/sql/insert_fast_path.go line 487 at r3 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: do you plan on addressing this TODO as well? It seems like it'd be nice to do it since you're in this area.

There was more work I wanted to do in uniqueness checks in general, to support more cases (more forms of INSERT statements). But I cut it short in favor of switching to partial stats work. If we were to defer the partial stats work a bit longer, I'd rather give priority to those missing cases above this TODO. Also, only the FK checks could be done in parallel with the insert, because uniqueness checks could be trying to read from the same index as being inserted, and we don't want to "see" the newly-inserted row. So, in that case we'd have to run the uniqueness check before the insert(s) anyway. So, this would be an enhancement for cases which only have FK checks and no uniqueness checks, which this PR is not covering.

pkg/sql/opt/exec/factory.opt line 468 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: probably we don't want - before "into an index"?

Fixed

msirek · 2023-09-21T21:47:51Z

The last 2 pushes update the code to the latest master and add the Locking attribute to fast path checks, copied over from the Scan which the check was built upon.

rharding6373

Wow, this is impressive work! I've reviewed 5/9 commits so far with mostly just clarification questions to understand it better. If you wanted to I think you could split some of the commits off into their own PRs (e.g., the Autocommit commit) but I'm not sure that would make the whole thing more manageable, and splitting off some of the meaty commits would lose some comment history, so not sure that's desirable.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @mgartner, @michae2, @msirek, @rytaft, and @yuzefovich)

pkg/sql/explain_bundle.go line 338 at r57 (raw file):

		memo.ExprFmtHideQualifications|memo.ExprFmtHideScalars|memo.ExprFmtHideTypes|memo.ExprFmtHideNotVisibleIndexInfo|memo.ExprFmtHideFastPathChecks,
	))
	b.z.AddFile("opt-vv.txt", formatOptPlan(memo.ExprFmtHideQualifications|memo.ExprFmtHideNotVisibleIndexInfo|memo.ExprFmtHideFastPathChecks))

Why omit it from opt-vv.txt? Seems like it would be useful to include the plan for the fast path check in stmt bundles. If it will be used for some checks in production, how will we debug if it turns out there are issues with the fast path?

Including it would also be consistent with the treatment of EXPLAIN(TYPES) below.

pkg/sql/opt/exec/factory.go line 256 at r59 (raw file):

// ConstructInsertFastPath). It identifies the index into which we can perform
// the lookup.
type InsertFastPathFKUniqCheck struct {

optional nit: maybe rename this and others to InsertFastPathCheck instead.

pkg/sql/opt/memo/interner.go line 1197 at r48 (raw file):

	}
	for i := range l {
		if l[i].ReferencedTableID != r[i].ReferencedTableID {

Should we also use h.IsRelExprEqual to compare the Check RelExpr, too?

pkg/sql/opt/optbuilder/testdata/unique-checks-insert line 2134 at r57 (raw file):

      │              └── t.a:24 = 10
      └── fast-path-unique-checks-item: &{0 0 [] [] {  record best-effort}}
           └── values

Is this arm of the explain tree from making a non-nil fast path so we don't break RelExpr? Trying to understand what

fast-path-unique-checks-item: &{0 0 [] [] {  record best-effort}}
           └── values

means, and why we have both this and a fleshed out fast path check above it.

pkg/sql/opt/xform/testdata/rules/insert line 187 at r58 (raw file):

      │         │         ├── columns: t.k:26!null t.r:27!null t.c:30
      │         │         ├── constraint: /27/30/26
      │         │         │    ├── [/'east' - /'east']

I thought that fast path unique checks were characterized by a single KV lookup, but this looks like at least 2 KV look ups (for east and west) plus an index join. Am I not reading or understanding this right?

rharding6373

I've reviewed the remaining files and don't have any additional feedback, but I'd like to check out the benchmark results (are the ones in the commit message up-to-date?) and have responses to the open comments before giving a stamp of approval.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @mgartner, @michae2, @msirek, @rytaft, and @yuzefovich)

pkg/sql/opt/bench/uniq_test.go line 54 at r63 (raw file):

}

func BenchmarkUniqInsert(b *testing.B) {

This might have gotten lost in the sea of comments, but could you please re-share the latest benchmark results? Thanks!

mgartner

I've left my comments for the first commit. The first commit also needs:

optbuilder tests that show the new expressions being built for the cases covered by the new code.
norm tests for the new pruning normalization rules.
TestInterner tests for the new checks in interner_test.go - this might have caught the missing check that Rachael found

If you wanted to I think you could split some of the commits off into their own PRs

+1 building this incrementally would speed up time-to-merge. It doesn't help that Reviewable is struggling with so many commit revisions.

Reviewed 3 of 28 files at r32, 1 of 3 files at r38, 37 of 37 files at r40, 3 of 5 files at r47, 49 of 49 files at r48.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @michae2, @msirek, and @yuzefovich)

pkg/sql/opt/norm/rules/prune_cols.opt line 542 at r40 (raw file):

)

# PruneReturningCols removes columns from the Insert operator's ReturnCols

nit: PruneInsertReturnCols

pkg/sql/opt/ops/mutation.opt line 302 at r40 (raw file):

    # columns for a uniqueness check, in the order of the ReferencedIndexOrdinal
    # columns. For each, the value in the array indicates the index of the
    # column in the input table.

nit: This last sentence confuses me. A ColList is a list of column IDs, not ordinals. I believe that these column IDs contain the values being inserted, and the position in this list corresponds with the ordinal of the referenced index columns. Is that correct?

pkg/sql/opt/ops/mutation.opt line 305 at r40 (raw file):

    InsertCols ColList

    # DatumsFromConstraint contains constant values from the WHERE clause

This WHERE clause is from partial unique constraints? Or somewhere else?

pkg/sql/opt/optbuilder/mutation_builder_unique.go line 501 at r34 (raw file):

Previously, msirek (Mark Sirek) wrote…

It may make things more complicated if we try to specifically search for and process a projection of a virtual computed column, and my understanding is we wish to target a very narrow set of cases in the first pass. In some cases, where we build a Values expression, the projection may be merged with the Values by a rewrite, in which case we'll find the value and handle it. In other cases where we have a WithScanExpr of a Values expression, and a projection, the projection may not be merged. I think that if and when foreign key fast path checking code gets refactored, the need to build a WithScanExpr may go away and the current simple handling will also handle those cases.

It sounds like calling uniqueChecksHelper.buildTableScan is preferred over DuplicateScanPrivate though, so I've updated the code to use the former.

I was genuinely curious, not necessarily preferring buildTableScan. DuplicateScanPrivate would be fine and seems simpler.

pkg/sql/opt/optbuilder/mutation_builder_unique.go line 96 at r40 (raw file):

			// any existing rows. The check prevents rows from matching themselves by
			// adding a filter based on the primary key.
			uniqueCheckItems, _ := h.buildInsertionCheck()

Can you leave a TODO to avoid the extra work of building the fast paths for this case?

pkg/sql/opt/optbuilder/mutation_builder_unique.go line 136 at r40 (raw file):

			// any existing rows. The check prevents rows from matching themselves by
			// adding a filter based on the primary key.
			uniqueCheckItems, _ := h.buildInsertionCheck()

ditto: add a TODO here to avoid the extra work of building the fast paths for this case

pkg/sql/opt/optbuilder/mutation_builder_unique.go line 334 at r40 (raw file):

	if withScanScopeIsValues {
		valuesExpr = withScanScopeValues
		isValues = true

nit: simplify this to

withScanExpr, isWithScan := withScanScope.expr.(*memo.WithScanExpr)
values, isValues := withScanScope.expr.(*memo.ValuesExpr)

pkg/sql/opt/optbuilder/mutation_builder_unique.go line 428 at r40 (raw file):

		))
	}
	// Find the ScanExpr which reads from the table this unique check applies to.

nit: mention what cases there would be a project here and why it's safe to skip over the projects - I presume because we won't support tables with computed columns yet?

pkg/sql/opt/optbuilder/mutation_builder_unique.go line 513 at r40 (raw file):

	} else {
		// Things blow up if a RelExpr is nil, so construct a minimal dummy relation
		// that will not be used.

Can we just return nil as the second return value of this function instead?

pkg/sql/opt/optbuilder/mutation_builder_unique.go line 315 at r48 (raw file):

// path uniqueness check).
func (h *uniqueCheckHelper) buildFiltersForFastPathCheck(
	withScanScope *scope, scanExpr *memo.ScanExpr,

This deserves some more explanation of how it works. It looks like you're either pulling out a Values expression from the unique check that's already been inlined, or you're trying to find a Values expression in the Insert's input. Is that correct?

Also, I think the naming like withScanScope is adding confusion here because it's not necessarily a WithScan expression. You could consider the names uniqueCheckScope. And insertInputValues instead of possibleValues, for example.

pkg/sql/opt/optbuilder/mutation_builder_unique.go line 323 at r48 (raw file):

	// Skip to the WithScan or Values.
	for skipProjectExpr, ok := possibleValues.(*memo.ProjectExpr); ok; skipProjectExpr, ok = possibleValues.(*memo.ProjectExpr) {
		possibleValues = skipProjectExpr.Input

This code relies on h.mb.outScope.expr to be the input of the insert expression, correct? That invariant should be made clear, at the least in a comment somewhere. Even better would be an assertion of some type to check this, or to use another method to find the Insert's input, though I'm not sure at the moment what that would be. Maybe it'd be possible to store a pointer to that expression when it is built in mutationBuilder and access it directly.

msirek

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @mgartner, @michae2, @rharding6373, and @yuzefovich)

pkg/sql/explain_bundle.go line 338 at r57 (raw file):

Previously, rharding6373 (Rachael Harding) wrote…

Why omit it from opt-vv.txt? Seems like it would be useful to include the plan for the fast path check in stmt bundles. If it will be used for some checks in production, how will we debug if it turns out there are issues with the fast path?

Including it would also be consistent with the treatment of EXPLAIN(TYPES) below.

Good point. Modified it to include fast path check expressions.

pkg/sql/opt/exec/factory.go line 256 at r59 (raw file):

Previously, rharding6373 (Rachael Harding) wrote…

optional nit: maybe rename this and others to InsertFastPathCheck instead.

Done

pkg/sql/opt/memo/interner.go line 1197 at r48 (raw file):

Previously, rharding6373 (Rachael Harding) wrote…

Should we also use h.IsRelExprEqual to compare the Check RelExpr, too?

Done

pkg/sql/opt/norm/rules/prune_cols.opt line 542 at r40 (raw file):