Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: fast path uniqueness checks for single-row insert #102995

Closed
wants to merge 9 commits into from

Conversation

msirek
Copy link
Contributor

@msirek msirek commented May 10, 2023

sql: add new check relation to help build insert fast path uniqueness checks

This creates a FastPathUniqueChecks list as a child of InsertExpr,
similar to UniqueChecks, but consisting of relational expressions
which can be used to build a fast path uniqueness check using a
particular index. Each item in the FastPathUniqueChecks list is a
FastPathUniqueChecksItem whose Check expression, if the unique
constraint is provisionally eligible to use a fast path uniqueness
check, is a filtered scan from the target table of an
INSERT INTO ... VALUES statement. It is only built if the
VALUES clause has a single row. The filter equates each of the unique
constraint key columns with its corresponding value from the VALUES
clause. The values may either be a ValuesExpr or a WithScanExpr
whose source is a ValuesExpr. During exploration,
GenerateConstrainedScans will find all valid constrained index scans
which are equivalent to the original filtered Scan. The built relations
are not executed or displayed in the EXPLAIN, but will be analyzed in a
later commit to specify the index and index key values to use for insert
fast path uniqueness checks.

FastPathUniqueChecksItemPrivate is added with elements which are
designed to communicate details of how to build a fast path uniqueness
check to the execbuilder.

Epic: CRDB-26290
Informs: #58047

Release note: None


memo: optbuild unit tests for fast path unique check exprs

This commit adds the ExprFmtHideFastPathChecks flag, which hides fast
path check expressions from the output of expression formatting. It is
used in all places except TestBuilder. Also, unit tests for
buildFiltersForFastPathCheck are added.

Epic: CRDB-26290
Informs: #58047

Release note: None


xform: opt test for insert fast path checks

This commit adds an insert transformation rules test which
displays optimized fast path checks which are built in the optbuilder
and later explored and optimized. This test is modified in a later
commit to test for rule firing.

Epic: CRDB-26290
Informs: #58047

Release note: None


sql: insert fast path uniq checks, execution support

This commit wires up insert fast path execution to support fast path
uniqueness checks via KV Scan requests in a similar method as is
currently done for foreign key checks. Except with uniqueness checks, the
statement errors out if a row is found instead of if a row is not found.
Both addUniqChecks and runUniqChecks functions are provided to
initialize the checks and run them. Interfaces will be modified in a
subsequent commit to utilize the new methods.
exec.InsertFastPathFKCheck is renamed to
exec.InsertFastPathFKUniqCheck to reflect that it will be used to
build both foreign key and uniqueness fast path checks.

A DatumsFromConstraint field is added to
exec.InsertFastPathFKUniqCheck.
DatumsFromConstraint contains constant values from the WHERE clause
constraint which are part of the index key to look up. The number of
entries corresponds with the number of KV lookups. For example, when built
from analyzing a locality-optimized operation accessing 1 local region and
remote regions, the resulting DatumsFromConstraint would have 3 entries.

A TestAddUniqChecks unit test is added.

Epic: CRDB-26290
Informs: #58047

Release note: None


eval: add AutoCommit method to Planner interface.

This commit adds an AutoCommit method to the Planner interface
AutoCommit indicates whether the Planner has flagged the current
statement as eligible for transaction auto-commit.

Epic: CRDB-26290
Informs: #58047

Release note: None


sql: fast path uniqueness checks for single-row insert

This adds support for building and executing simple INSERT statements
with a single-row VALUES clause where any required uniqueness constraint
checks can be handled via a constrained scan on an index.

This includes INSERT cases such as:

  • a single-row VALUES clause into a REGIONAL BY ROW table with a
    PRIMARY KEY which has a UUID column generated by default, ie.
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(), where the
    crdb_region column is not specified in the VALUES clause; either
    the gateway region is used or it is computed based on other column
    values.
  • a single-row VALUES clause into a REGIONAL BY ROW table with a
    hash sharded unique index where the crdb_region column is not specified
    in the VALUES clause
  • a single-row VALUES clause into a table with an experimental UNIQUE
    WITHOUT INDEX constraint which also has an index with leading key
    columns matching the constraint columns.

In optbuild, when creating a uniqueness check for rows which are added
to a table, a fast path index check relation is also built when the
mutation source is a single-row values expression or WithScan from
a single-row values expression. That relation is a filtered
Select of a Scan from the target table, where the filters equate
all of the unique check columns with their corresponding
constants or placeholders from the Values expression. If there is a
uniqueness check with a partial index predicate, fast path is
disallowed.

A new exploration rule called InsertFastPath is added to walk the memo
group members created during exploration in FastPathUniqueChecks of
the InsertExpr, to find any which have been rewritten as a constrained
ScanExpr. If found, that means that Scan fully represents the lookup
needed to check for duplicate entries and the Scan constraint can be
used to identify the constants to use in a KV lookup on the scanned
index in a fast path check.

Function CanUseUniqueChecksForInsertFastPath walks the expressions
generated during exploration of the FastPathUniqueChecks.Check
relation. If a constrained scan is found, it is used to build elements
of the FastPathUniqueChecksItemPrivate structure to communicate to the
execbuilder the table and index to use for the check, and the column ids
in the insert row to use for building the fast path KV request. In
addition, a new DatumsFromConstraint field is added, consisting of a
ScalarListExpr of TupleExprs specifying the index key, which allows an
index key column to be matched with multiple Datums. One insert row may
result in more than one KV lookup for a given uniqueness constraint.
These items are used to build the InsertFastPathFKUniqCheck structure
in the execbuilder. The new FastPathUniqueChecksItemPrivate is built
into a new the corresponding FastPathUniqueChecksItems of a new
FastPathUniqueChecksExpr and communicated to the caller via return
value newFastPathUniqueChecks.

A small adjustment is made in the coster to make the fast path unique
constraint slightly cheaper, so it should always be chosen over the
original non-fast path check.

Epic: CRDB-26290
Fixes: #58047

Release note (performance improvement): This patch adds support for
insert fast-path uniqueness checks on REGIONAL BY ROW tables where
the source is a VALUES clause with a single row. This results in a
reduction in latency for single-row inserts to REGIONAL BY ROW tables
and hash-sharded REGIONAL BY ROW tables with unique indexes.


sql: run insert fast-path FK checks in parallel with uniqueness checks

This commit combines the foreign key checks of insert fast-path with the
uniqueness checks into a single batch so that they may be processed in
parallel for reduced latency.

Epic: CRDB-26290
Informs: #58047

Release note: None


bench: insert fast path single row unique check benchmark

This commit adds a small benchmark for single-row insert fast path with
a UNIQUE WITHOUT INDEX constraint.

BenchmarkUniqInsert/SingleRow
BenchmarkUniqInsert/SingleRow/NoFastPath
BenchmarkUniqInsert/SingleRow/NoFastPath-10                 1623            634627 ns/op
BenchmarkUniqInsert/SingleRow/FastPath
BenchmarkUniqInsert/SingleRow/FastPath-10                   3058            399215 ns/op

Epic: CRDB-26290
Informs: #58047

Release note: None


xform: minimize the number of KV lookups for fast-path unique checks

This commit optimizes fast-path uniqueness checks by picking index scans
which minimize the number of constraint spans which in turn minimizes
the number of KV lookups performed by the check. A latency cost is
associated with each KV lookup, so it's desirable to use as few KV
requests as possible.

A TODO is added to make this a cost-based decision in the future, where
latency costs are considered.

Epic: CRDB-26290
Informs: #58047

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@msirek msirek added the do-not-merge bors won't merge a PR with this label. label May 10, 2023
@msirek msirek force-pushed the 58047 branch 9 times, most recently from f888d7b to 8f6b764 Compare May 15, 2023 22:42
@msirek msirek force-pushed the 58047 branch 11 times, most recently from 1475d2a to 4748ce3 Compare June 14, 2023 04:23
@msirek msirek removed the do-not-merge bors won't merge a PR with this label. label Jun 14, 2023
@msirek msirek changed the title [DNM] sql: multi-row VALUES insert fast-path UNIQUE WITHOUT INDEX support sql: uniqueness checks for insert fast-path Jun 14, 2023
@msirek msirek marked this pull request as ready for review June 14, 2023 04:27
@msirek msirek requested a review from a team as a code owner June 14, 2023 04:27
@msirek msirek requested a review from rytaft June 14, 2023 04:27
Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 11 of 14 files at r1, 1 of 2 files at r3.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @msirek and @rytaft)


pkg/sql/insert_fast_path.go line 191 at r1 (raw file):

		// Combine them together to get the final index prefix key to search.
		for _, templateRow := range c.DatumsFromConstraint {
			// Each row must have its own memory as it is part of a batch.

I'm confused by this - I'd assume that the BatchRequest would only contain the span generated based on the combined row which should internally copy datums into roachpb.Keys, no?


pkg/sql/insert_fast_path.go line 487 at r3 (raw file):

		// Perform the FK checks.
		// TODO(radu): we could run the FK batch in parallel with the main batch (if

nit: do you plan on addressing this TODO as well? It seems like it'd be nice to do it since you're in this area.


pkg/sql/opt/exec/factory.opt line 468 at r1 (raw file):

#  - all FK checks can be performed using direct lookups into unique indexes.
#  - all UNIQUE WITHOUT INDEX checks can be performed using direct lookups
#  - into an index (could be a key prefix and not the entire key).

nit: probably we don't want - before "into an index"?

Copy link
Contributor Author

@msirek msirek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @rytaft and @yuzefovich)


pkg/sql/insert_fast_path.go line 191 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

I'm confused by this - I'd assume that the BatchRequest would only contain the span generated based on the combined row which should internally copy datums into roachpb.Keys, no?

Good catch. I updated the logic to allocate the combined row once, and updated the comment.


pkg/sql/insert_fast_path.go line 487 at r3 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: do you plan on addressing this TODO as well? It seems like it'd be nice to do it since you're in this area.

There was more work I wanted to do in uniqueness checks in general, to support more cases (more forms of INSERT statements). But I cut it short in favor of switching to partial stats work. If we were to defer the partial stats work a bit longer, I'd rather give priority to those missing cases above this TODO. Also, only the FK checks could be done in parallel with the insert, because uniqueness checks could be trying to read from the same index as being inserted, and we don't want to "see" the newly-inserted row. So, in that case we'd have to run the uniqueness check before the insert(s) anyway. So, this would be an enhancement for cases which only have FK checks and no uniqueness checks, which this PR is not covering.


pkg/sql/opt/exec/factory.opt line 468 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: probably we don't want - before "into an index"?

Fixed

@msirek
Copy link
Contributor Author

msirek commented Sep 21, 2023

The last 2 pushes update the code to the latest master and add the Locking attribute to fast path checks, copied over from the Scan which the check was built upon.

Copy link
Collaborator

@rharding6373 rharding6373 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, this is impressive work! I've reviewed 5/9 commits so far with mostly just clarification questions to understand it better. If you wanted to I think you could split some of the commits off into their own PRs (e.g., the Autocommit commit) but I'm not sure that would make the whole thing more manageable, and splitting off some of the meaty commits would lose some comment history, so not sure that's desirable.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mgartner, @michae2, @msirek, @rytaft, and @yuzefovich)


pkg/sql/explain_bundle.go line 338 at r57 (raw file):

		memo.ExprFmtHideQualifications|memo.ExprFmtHideScalars|memo.ExprFmtHideTypes|memo.ExprFmtHideNotVisibleIndexInfo|memo.ExprFmtHideFastPathChecks,
	))
	b.z.AddFile("opt-vv.txt", formatOptPlan(memo.ExprFmtHideQualifications|memo.ExprFmtHideNotVisibleIndexInfo|memo.ExprFmtHideFastPathChecks))

Why omit it from opt-vv.txt? Seems like it would be useful to include the plan for the fast path check in stmt bundles. If it will be used for some checks in production, how will we debug if it turns out there are issues with the fast path?

Including it would also be consistent with the treatment of EXPLAIN(TYPES) below.


pkg/sql/opt/exec/factory.go line 256 at r59 (raw file):

// ConstructInsertFastPath). It identifies the index into which we can perform
// the lookup.
type InsertFastPathFKUniqCheck struct {

optional nit: maybe rename this and others to InsertFastPathCheck instead.


pkg/sql/opt/memo/interner.go line 1197 at r48 (raw file):

	}
	for i := range l {
		if l[i].ReferencedTableID != r[i].ReferencedTableID {

Should we also use h.IsRelExprEqual to compare the Check RelExpr, too?


pkg/sql/opt/optbuilder/testdata/unique-checks-insert line 2134 at r57 (raw file):

      │              └── t.a:24 = 10
      └── fast-path-unique-checks-item: &{0 0 [] [] {  record best-effort}}
           └── values

Is this arm of the explain tree from making a non-nil fast path so we don't break RelExpr? Trying to understand what

fast-path-unique-checks-item: &{0 0 [] [] {  record best-effort}}
           └── values

means, and why we have both this and a fleshed out fast path check above it.


pkg/sql/opt/xform/testdata/rules/insert line 187 at r58 (raw file):

      │         │         ├── columns: t.k:26!null t.r:27!null t.c:30
      │         │         ├── constraint: /27/30/26
      │         │         │    ├── [/'east' - /'east']

I thought that fast path unique checks were characterized by a single KV lookup, but this looks like at least 2 KV look ups (for east and west) plus an index join. Am I not reading or understanding this right?

Copy link
Collaborator

@rharding6373 rharding6373 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reviewed the remaining files and don't have any additional feedback, but I'd like to check out the benchmark results (are the ones in the commit message up-to-date?) and have responses to the open comments before giving a stamp of approval.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mgartner, @michae2, @msirek, @rytaft, and @yuzefovich)


pkg/sql/opt/bench/uniq_test.go line 54 at r63 (raw file):

}

func BenchmarkUniqInsert(b *testing.B) {

This might have gotten lost in the sea of comments, but could you please re-share the latest benchmark results? Thanks!

Copy link
Collaborator

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left my comments for the first commit. The first commit also needs:

  1. optbuilder tests that show the new expressions being built for the cases covered by the new code.
  2. norm tests for the new pruning normalization rules.
  3. TestInterner tests for the new checks in interner_test.go - this might have caught the missing check that Rachael found

If you wanted to I think you could split some of the commits off into their own PRs

+1 building this incrementally would speed up time-to-merge. It doesn't help that Reviewable is struggling with so many commit revisions.

Reviewed 3 of 28 files at r32, 1 of 3 files at r38, 37 of 37 files at r40, 3 of 5 files at r47, 49 of 49 files at r48.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @michae2, @msirek, and @yuzefovich)


pkg/sql/opt/norm/rules/prune_cols.opt line 542 at r40 (raw file):

)

# PruneReturningCols removes columns from the Insert operator's ReturnCols

nit: PruneInsertReturnCols


pkg/sql/opt/ops/mutation.opt line 302 at r40 (raw file):

    # columns for a uniqueness check, in the order of the ReferencedIndexOrdinal
    # columns. For each, the value in the array indicates the index of the
    # column in the input table.

nit: This last sentence confuses me. A ColList is a list of column IDs, not ordinals. I believe that these column IDs contain the values being inserted, and the position in this list corresponds with the ordinal of the referenced index columns. Is that correct?


pkg/sql/opt/ops/mutation.opt line 305 at r40 (raw file):

    InsertCols ColList

    # DatumsFromConstraint contains constant values from the WHERE clause

This WHERE clause is from partial unique constraints? Or somewhere else?


pkg/sql/opt/optbuilder/mutation_builder_unique.go line 501 at r34 (raw file):

Previously, msirek (Mark Sirek) wrote…

It may make things more complicated if we try to specifically search for and process a projection of a virtual computed column, and my understanding is we wish to target a very narrow set of cases in the first pass. In some cases, where we build a Values expression, the projection may be merged with the Values by a rewrite, in which case we'll find the value and handle it. In other cases where we have a WithScanExpr of a Values expression, and a projection, the projection may not be merged. I think that if and when foreign key fast path checking code gets refactored, the need to build a WithScanExpr may go away and the current simple handling will also handle those cases.

It sounds like calling uniqueChecksHelper.buildTableScan is preferred over DuplicateScanPrivate though, so I've updated the code to use the former.

I was genuinely curious, not necessarily preferring buildTableScan. DuplicateScanPrivate would be fine and seems simpler.


pkg/sql/opt/optbuilder/mutation_builder_unique.go line 96 at r40 (raw file):

			// any existing rows. The check prevents rows from matching themselves by
			// adding a filter based on the primary key.
			uniqueCheckItems, _ := h.buildInsertionCheck()

Can you leave a TODO to avoid the extra work of building the fast paths for this case?


pkg/sql/opt/optbuilder/mutation_builder_unique.go line 136 at r40 (raw file):

			// any existing rows. The check prevents rows from matching themselves by
			// adding a filter based on the primary key.
			uniqueCheckItems, _ := h.buildInsertionCheck()

ditto: add a TODO here to avoid the extra work of building the fast paths for this case


pkg/sql/opt/optbuilder/mutation_builder_unique.go line 334 at r40 (raw file):

	if withScanScopeIsValues {
		valuesExpr = withScanScopeValues
		isValues = true

nit: simplify this to

withScanExpr, isWithScan := withScanScope.expr.(*memo.WithScanExpr)
values, isValues := withScanScope.expr.(*memo.ValuesExpr)

pkg/sql/opt/optbuilder/mutation_builder_unique.go line 428 at r40 (raw file):

		))
	}
	// Find the ScanExpr which reads from the table this unique check applies to.

nit: mention what cases there would be a project here and why it's safe to skip over the projects - I presume because we won't support tables with computed columns yet?


pkg/sql/opt/optbuilder/mutation_builder_unique.go line 513 at r40 (raw file):

	} else {
		// Things blow up if a RelExpr is nil, so construct a minimal dummy relation
		// that will not be used.

Can we just return nil as the second return value of this function instead?


pkg/sql/opt/optbuilder/mutation_builder_unique.go line 315 at r48 (raw file):

// path uniqueness check).
func (h *uniqueCheckHelper) buildFiltersForFastPathCheck(
	withScanScope *scope, scanExpr *memo.ScanExpr,

This deserves some more explanation of how it works. It looks like you're either pulling out a Values expression from the unique check that's already been inlined, or you're trying to find a Values expression in the Insert's input. Is that correct?

Also, I think the naming like withScanScope is adding confusion here because it's not necessarily a WithScan expression. You could consider the names uniqueCheckScope. And insertInputValues instead of possibleValues, for example.


pkg/sql/opt/optbuilder/mutation_builder_unique.go line 323 at r48 (raw file):

	// Skip to the WithScan or Values.
	for skipProjectExpr, ok := possibleValues.(*memo.ProjectExpr); ok; skipProjectExpr, ok = possibleValues.(*memo.ProjectExpr) {
		possibleValues = skipProjectExpr.Input

This code relies on h.mb.outScope.expr to be the input of the insert expression, correct? That invariant should be made clear, at the least in a comment somewhere. Even better would be an assertion of some type to check this, or to use another method to find the Insert's input, though I'm not sure at the moment what that would be. Maybe it'd be possible to store a pointer to that expression when it is built in mutationBuilder and access it directly.

Copy link
Contributor Author

@msirek msirek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mgartner, @michae2, @rharding6373, and @yuzefovich)


pkg/sql/explain_bundle.go line 338 at r57 (raw file):

Previously, rharding6373 (Rachael Harding) wrote…

Why omit it from opt-vv.txt? Seems like it would be useful to include the plan for the fast path check in stmt bundles. If it will be used for some checks in production, how will we debug if it turns out there are issues with the fast path?

Including it would also be consistent with the treatment of EXPLAIN(TYPES) below.

Good point. Modified it to include fast path check expressions.


pkg/sql/opt/exec/factory.go line 256 at r59 (raw file):

Previously, rharding6373 (Rachael Harding) wrote…

optional nit: maybe rename this and others to InsertFastPathCheck instead.

Done


pkg/sql/opt/memo/interner.go line 1197 at r48 (raw file):

Previously, rharding6373 (Rachael Harding) wrote…

Should we also use h.IsRelExprEqual to compare the Check RelExpr, too?

Done


pkg/sql/opt/norm/rules/prune_cols.opt line 542 at r40 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit: PruneInsertReturnCols

Done


pkg/sql/opt/ops/mutation.opt line 302 at r40 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit: This last sentence confuses me. A ColList is a list of column IDs, not ordinals. I believe that these column IDs contain the values being inserted, and the position in this list corresponds with the ordinal of the referenced index columns. Is that correct?

Reworded this to:

	# InsertCols contains the table column ordinals of the referenced index key
	# columns. The position in this list corresponds with the ordinal of the
	# referenced index column (its position in the index key).

pkg/sql/opt/ops/mutation.opt line 305 at r40 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

This WHERE clause is from partial unique constraints? Or somewhere else?

Mention of a WHERE clause is confusing, perhaps. The values are from the insert row or check constraint. Updated it to:

    # DatumsFromConstraint contains constant values from the insert row for the
    # columns in the unique constraint. Columns not available directly from the
    # insert row or computed from insert row values (computed column) may be
    # filled in from a CHECK constraint on the column. The number of entries
    # corresponds with the number of KV lookups. For example, when built from
    # analyzing a locality-optimized operation accessing 1 local region and 2
    # remote regions, the resulting DatumsFromConstraint would have 3 entries.

pkg/sql/opt/optbuilder/mutation_builder_unique.go line 501 at r34 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

I was genuinely curious, not necessarily preferring buildTableScan. DuplicateScanPrivate would be fine and seems simpler.

OK. Looking at this again, it seems like calling buildTableScan may be more robust as it will always build the correct fields under the Scan, and DuplicateScanPrivate is not coded to copy all scan private fields.


pkg/sql/opt/optbuilder/mutation_builder_unique.go line 96 at r40 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Can you leave a TODO to avoid the extra work of building the fast paths for this case?

Added a buildFastPathCheck parameter to buildInsertionCheck to handle this.


pkg/sql/opt/optbuilder/mutation_builder_unique.go line 136 at r40 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

ditto: add a TODO here to avoid the extra work of building the fast paths for this case

Added a buildFastPathCheck parameter to buildInsertionCheck to handle this.


pkg/sql/opt/optbuilder/mutation_builder_unique.go line 334 at r40 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit: simplify this to

withScanExpr, isWithScan := withScanScope.expr.(*memo.WithScanExpr)
values, isValues := withScanScope.expr.(*memo.ValuesExpr)

Done


pkg/sql/opt/optbuilder/mutation_builder_unique.go line 428 at r40 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit: mention what cases there would be a project here and why it's safe to skip over the projects - I presume because we won't support tables with computed columns yet?

OK, added:

		// Projections may have been added such as for computed column expressions.
		// Skip over these to find the Scan on the target table of the Insert.
		// Fast path uniqueness checks on computed columns aren't supported
		// currently. We just need to access the Scan defining regular columns.

pkg/sql/opt/optbuilder/mutation_builder_unique.go line 513 at r40 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Can we just return nil as the second return value of this function instead?

The 2nd return value is not a pointer, so nil can't be returned. Modified it so we return memo.FastPathUniqueChecksItem{} when buildFastPathCheck is false. When it's true we still need to build a valid relation that can be optimized, otherwise we'll hit an error during exploration.


pkg/sql/opt/optbuilder/mutation_builder_unique.go line 315 at r48 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

This deserves some more explanation of how it works. It looks like you're either pulling out a Values expression from the unique check that's already been inlined, or you're trying to find a Values expression in the Insert's input. Is that correct?

Also, I think the naming like withScanScope is adding confusion here because it's not necessarily a WithScan expression. You could consider the names uniqueCheckScope. And insertInputValues instead of possibleValues, for example.

Renamed items as suggested. Added more details to the description and more comments. Let me know if more details are needed of if anything is not clear:

// buildFiltersForFastPathCheck builds ANDed equality filters between the
// columns in the uniqueness check defined by h.uniqueOrdinals and scalar
// expressions present in a single Values row being inserted. It is expected
// that buildCheckInputScan has been called and has set up in
// uniqueCheckExpr the columns corresponding with the scalars in the
// insert row. buildCheckInputScan has either inlined the insert row as a Values
// expression, or embedded it within a WithScanExpr, in which case `h.mb.inputForInsertExpr`
// holds the input to the WithScanExpr. In the latter case, for a
// given table column ordinal `i` in `h.uniqueOrdinals`, instead of finding the
// matching scalar in the Values row via uniqueCheckCols[i],
// withScanExpr.InCols[i] holds the column ID to match on. scanExpr is
// the scan on the insert target table used on the right side of the semijoins
// in the non-fast-path uniqueness checks, with column ids matching h.scanScope.cols.
//
// The purpose of this function is to build filters representing a uniqueness
// check on a given insert row, which can be applied as a Select from a Scan,
// and optimized during exploration when all placeholders have been filled in.
// The goal is to find a constrained Scan of an index, which consumes all
// filters (meaning it could also be safely executed via a KV lookup in a fast
// path uniqueness check).

pkg/sql/opt/optbuilder/mutation_builder_unique.go line 323 at r48 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

This code relies on h.mb.outScope.expr to be the input of the insert expression, correct? That invariant should be made clear, at the least in a comment somewhere. Even better would be an assertion of some type to check this, or to use another method to find the Insert's input, though I'm not sure at the moment what that would be. Maybe it'd be possible to store a pointer to that expression when it is built in mutationBuilder and access it directly.

I guess there's no guarantee someone won't re-use h.mb.outScope.expr, so I've created a inputForInsertExpr in mutationBuilder to guarantee it won't change since the most recent call to buildInputForInsert and added a comment.


pkg/sql/opt/optbuilder/testdata/unique-checks-insert line 2134 at r57 (raw file):

Previously, rharding6373 (Rachael Harding) wrote…

Is this arm of the explain tree from making a non-nil fast path so we don't break RelExpr? Trying to understand what

fast-path-unique-checks-item: &{0 0 [] [] {  record best-effort}}
           └── values

means, and why we have both this and a fleshed out fast path check above it.

fast-path-unique-checks holds two fast-path-unique-checks-items here. One for each check:

  UNIQUE WITHOUT INDEX (a),
  UNIQUE WITHOUT INDEX (b),

For the check on b, that is a computed column, so the dummy values check is built, meaning fast path is not possible for that check. For the check on a, we build a full fast-path-unique-checks-item on it with a Select expression, meaning fast path is possible on it. Since not all fast-path-unique-checks-items are fully built out, fast path will not be picked for this query.


pkg/sql/opt/xform/testdata/rules/insert line 187 at r58 (raw file):

Previously, rharding6373 (Rachael Harding) wrote…

I thought that fast path unique checks were characterized by a single KV lookup, but this looks like at least 2 KV look ups (for east and west) plus an index join. Am I not reading or understanding this right?

Your read of the tree is correct, and there will be more than one KV lookup for this check. Check out the comments on DatumsFromConstraint. Basically, there will be one KV lookup per constraint span.


pkg/sql/opt/bench/uniq_test.go line 54 at r63 (raw file):

Previously, rharding6373 (Rachael Harding) wrote…

This might have gotten lost in the sea of comments, but could you please re-share the latest benchmark results? Thanks!

Sure.
Here are a few runs of it:

BenchmarkUniqInsert/SingleRow/NoFastPath-10         	    1690	    600194 ns/op
BenchmarkUniqInsert/SingleRow/FastPath-10           	    3080	    381874 ns/op

BenchmarkUniqInsert/SingleRow/NoFastPath-10         	    1774	    596803 ns/op
BenchmarkUniqInsert/SingleRow/FastPath-10           	    3147	    375995 ns/op

BenchmarkUniqInsert/SingleRow/NoFastPath-10         	    1726	    586618 ns/op
BenchmarkUniqInsert/SingleRow/FastPath-10           	    3145	    376511 ns/op

BenchmarkUniqInsert/SingleRow/NoFastPath-10         	    1760	    594442 ns/op
BenchmarkUniqInsert/SingleRow/FastPath-10           	    3129	    374240 ns/op

BenchmarkUniqInsert/SingleRow/NoFastPath-10         	    1710	    595147 ns/op
BenchmarkUniqInsert/SingleRow/FastPath-10           	    3184	    378356 ns/op

Copy link
Contributor Author

@msirek msirek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first commit also needs:

  1. optbuilder tests that show the new expressions being built for the cases covered by the new code.

The commit titled "memo: optbuild unit tests for fast path unique check exprs" has these tests, in file pkg/sql/opt/optbuilder/testdata/unique-checks-insert.

  1. norm tests for the new pruning normalization rules.

I found PruneInsertInputCols is never fired and there doesn't seem to be a use case for it, so removed it. I've added some norm tests for PruneInsertReturnCols.

  1. TestInterner tests for the new checks in interner_test.go

Added TestInterner tests

If you wanted to I think you could split some of the commits off into their own PRs

+1 building this incrementally would speed up time-to-merge. It doesn't help that Reviewable is struggling with so many commit revisions.

I've split the first 3 commits off into a separate PR: #111403
Once that merges I can either rebase this PR, or split off the next couple commits, etc.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mgartner, @michae2, @rharding6373, @rytaft, and @yuzefovich)

Mark Sirek added 9 commits September 27, 2023 23:57
… checks

This creates a `FastPathUniqueChecks` list as a child of `InsertExpr`,
similar to `UniqueChecks`, but consisting of relational expressions
which can be used to build a fast path uniqueness check using a
 particular index. Each item in the `FastPathUniqueChecks` list is a
`FastPathUniqueChecksItem` whose `Check` expression, if the unique
constraint is provisionally eligible to use a fast path uniqueness
check, is a filtered scan from the target table of an
`INSERT INTO ... VALUES` statement. It is only built if the
VALUES clause has a single row. The filter equates each of the unique
constraint key columns with its corresponding value from the VALUES
clause. The values may either be a `ValuesExpr` or a `WithScanExpr`
whose source is a `ValuesExpr`. During exploration,
GenerateConstrainedScans will find all valid constrained index scans
which are equivalent to the original filtered Scan. The built relations
are not executed or displayed in the EXPLAIN, but will be analyzed in a
later commit to specify the index and index key values to use for insert
fast path uniqueness checks.

FastPathUniqueChecksItemPrivate is added with elements which are
designed to communicate details of how to build a fast path uniqueness
check to the execbuilder.

Epic: CRDB-26290
Informs: cockroachdb#58047

Release note: None
This commit adds the `ExprFmtHideFastPathChecks` flag, which hides fast
path check expressions from the output of expression formatting. It is
used in all places except TestBuilder. Also, unit tests for
`buildFiltersForFastPathCheck` are added.

Epic: CRDB-26290
Informs: cockroachdb#58047

Release note: None
This commit adds an insert transformation rules test which
displays optimized fast path checks which are built in the optbuilder
and later explored and optimized. This test is modified in a later
commit to test for rule firing.

Epic: CRDB-26290
Informs: cockroachdb#58047

Release note: None
This commit wires up insert fast path execution to support fast path
uniqueness checks via KV Scan requests in a similar method as is
currently done for foreign key checks. Except with uniqueness checks, the
statement errors out if a row is found instead of if a row is not found.
Both `addUniqChecks` and `runUniqChecks` functions are provided to
initialize the checks and run them. Interfaces will be modified in a
subsequent commit to utilize the new methods.
`exec.InsertFastPathFKCheck` is renamed to
`exec.InsertFastPathFKUniqCheck` to reflect that it will be used to
build both foreign key and uniqueness fast path checks.

A `DatumsFromConstraint` field is added to
`exec.InsertFastPathFKUniqCheck`.
DatumsFromConstraint contains constant values from the WHERE clause
constraint which are part of the index key to look up. The number of
entries corresponds with the number of KV lookups. For example, when built
from analyzing a locality-optimized operation accessing 1 local region and
remote regions, the resulting DatumsFromConstraint would have 3 entries.

A `TestAddUniqChecks` unit test is added.

Epic: CRDB-26290
Informs: cockroachdb#58047

Release note: None
This commit adds an AutoCommit method to the Planner interface
AutoCommit indicates whether the Planner has flagged the current
statement as eligible for transaction auto-commit.

Epic: CRDB-26290
Informs: cockroachdb#58047

Release note: None
This adds support for building and executing simple INSERT statements
with a single-row VALUES clause where any required uniqueness constraint
checks can be handled via a constrained scan on an index.

This includes INSERT cases such as:

- a single-row VALUES clause into a REGIONAL BY ROW table with a
PRIMARY KEY which has a UUID column generated by default, ie.
`id UUID PRIMARY KEY DEFAULT gen_random_uuid()`, where the
crdb_region column is not specified in the VALUES clause; either
the gateway region is used or it is computed based on other column
values.
- a single-row VALUES clause into a REGIONAL BY ROW table with a
hash sharded unique index where the crdb_region column is not specified
in the VALUES clause
- a single-row VALUES clause into a table with an experimental UNIQUE
WITHOUT INDEX constraint which also has an index with leading key
columns matching the constraint columns.

In optbuild, when creating a uniqueness check for rows which are added
to a table, a fast path index check relation is also built when the
mutation source is a single-row values expression or WithScan from
a single-row values expression. That relation is a filtered
Select of a Scan from the target table, where the filters equate
all of the unique check columns with their corresponding
constants or placeholders from the Values expression. If there is a
uniqueness check with a partial index predicate, fast path is
disallowed.

A new exploration rule called InsertFastPath is added to walk the memo
group members created during exploration in `FastPathUniqueChecks` of
the `InsertExpr`, to find any which have been rewritten as a constrained
`ScanExpr`. If found, that means that Scan fully represents the lookup
needed to check for duplicate entries and the Scan constraint can be
used to identify the constants to use in a KV lookup on the scanned
index in a fast path check.

Function CanUseUniqueChecksForInsertFastPath walks the expressions
generated during exploration of the `FastPathUniqueChecks.Check`
relation.  If a constrained scan is found, it is used to build elements
of the `FastPathUniqueChecksItemPrivate` structure to communicate to the
execbuilder the table and index to use for the check, and the column ids
in the insert row to use for building the fast path KV request. In
addition, a new `DatumsFromConstraint` field is added, consisting of a
ScalarListExpr of TupleExprs specifying the index key, which allows an
index key column to be matched with multiple Datums. One insert row may
result in more than one KV lookup for a given uniqueness constraint.
These items are used to build the `InsertFastPathFKUniqCheck` structure
in the execbuilder. The new `FastPathUniqueChecksItemPrivate` is built
into a new the corresponding `FastPathUniqueChecksItem`s of a new
`FastPathUniqueChecksExpr` and communicated to the caller via return
value `newFastPathUniqueChecks`.

A small adjustment is made in the coster to make the fast path unique
constraint slightly cheaper, so it should always be chosen over the
original non-fast path check.

Epic: CRDB-26290
Fixes: cockroachdb#58047

Release note (performance improvement): This patch adds support for
insert fast-path uniqueness checks on REGIONAL BY ROW tables where
the source is a VALUES clause with a single row. This results in a
reduction in latency for single-row inserts to REGIONAL BY ROW tables
and hash-sharded REGIONAL BY ROW tables with unique indexes.
This commit combines the foreign key checks of insert fast-path with the
uniqueness checks into a single batch so that they may be processed in
parallel for reduced latency.

Epic: CRDB-26290
Informs: cockroachdb#58047

Release note: None
This commit adds a small benchmark for single-row insert fast path with
a UNIQUE WITHOUT INDEX constraint.

```
BenchmarkUniqInsert/SingleRow
BenchmarkUniqInsert/SingleRow/NoFastPath
BenchmarkUniqInsert/SingleRow/NoFastPath-10         	    1623	    634627 ns/op
BenchmarkUniqInsert/SingleRow/FastPath
BenchmarkUniqInsert/SingleRow/FastPath-10           	    3058	    399215 ns/op
```

Epic: CRDB-26290
Informs: cockroachdb#58047

Release note: None
This commit optimizes fast-path uniqueness checks by picking index scans
which minimize the number of constraint spans which in turn minimizes
the number of KV lookups performed by the check. A latency cost is
associated with each KV lookup, so it's desirable to use as few KV
requests as possible.

A TODO is added to make this a cost-based decision in the future, where
latency costs are considered.

Epic: CRDB-26290
Informs: cockroachdb#58047

Release note: None
@msirek
Copy link
Contributor Author

msirek commented Oct 2, 2023

RFAL, either in this PR, or in the split off PR: #111403

@yuzefovich
Copy link
Member

It's been a while since I looked at this PR, is there a particular commit(s) that you'd like me to take a look at?

@msirek
Copy link
Contributor Author

msirek commented Oct 2, 2023

It's been a while since I looked at this PR, is there a particular commit(s) that you'd like me to take a look at?

Thanks! The execution side changes haven't really changed much, except there is now unit test TestAddUniqChecks in 5c0000a and file mutation.go in b7f7665 looks a bit different.

@msirek
Copy link
Contributor Author

msirek commented Oct 10, 2023

The changes in this PR have been merged as separate PRs:
#111403
#111786
#111822
#112059

The last commit, xform: minimize the number of KV lookups for fast-path unique checks, was not merged, but could be opened as a possible future improvement.

@msirek msirek closed this Oct 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

opt: add support for using the insert fast path with uniqueness checks
8 participants