Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt: remove conflicting input rows in ON CONFLICT DO NOTHING case #45443

Merged
merged 1 commit into from
Mar 5, 2020

Conversation

andy-kimball
Copy link
Contributor

@andy-kimball andy-kimball commented Feb 26, 2020

Previously, the INSERT..ON CONFLICT DO NOTHING statement raised an error
if its input contained rows which conflicted with one another. For example:

CREATE TABLE t (x INT PRIMARY KEY)
INSERT INTO t (x) VALUES (1), (1) ON CONFLICT DO NOTHING

This raised a unique key conflict error rather than doing nothing.

This commit fixes that by wrapping the input in one or more UpsertDistinctOn
operators - one for each unique key in the target table. This ensures that
any duplicates which might cause a conflict are removed from the input.

Fixes #37880
Fixes #44434

Release note (sql change): Duplicate rows in the input to an INSERT..ON
CONFLICT DO NOTHING statement will now be ignored rather than triggering
an error.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball, @RaduBerinde, and @rytaft)


pkg/sql/opt/ops/relational.opt, line 533 at r1 (raw file):

    # ErrorOnDup, if true, triggers an error if any aggregation group contains
    # more than one row. This can only be true for the UpsertDistinctOn operator.
    ErrorOnDup bool

I think this code needs to be updated to check the new field instead: https://github.com/cockroachdb/cockroach/blob/master/pkg/sql/opt/memo/statistics_builder.go#L1500

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: (once you address Radu's point)

Reviewed 17 of 17 files at r1.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @andy-kimball)


pkg/sql/opt/memo/testdata/logprops/upsert, line 222 at r1 (raw file):

      ├── key: (1)
      ├── fd: (1)-->(2-4), (2)-->(3), (4)~~>(1-3), (2,3)~~>(1,4)
      └── upsert-distinct-on

should ErrOnDup be printed here?


pkg/sql/opt/norm/groupby.go, line 219 at r1 (raw file):

		}

	case *memo.UpsertDistinctOnExpr:

Can you also include regular DistinctOn here?


pkg/sql/opt/optbuilder/insert.go, line 741 at r1 (raw file):

		// Add an UpsertDistinctOn operator to ensure there are no duplicate input
		// rows for this unique index. Duplicate rows trigger conflict errors,

Might help to specify that these conflict errors happen at execution time

Copy link
Contributor Author

@andy-kimball andy-kimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @RaduBerinde and @rytaft)


pkg/sql/opt/memo/testdata/logprops/upsert, line 222 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

should ErrOnDup be printed here?

Done.


pkg/sql/opt/norm/groupby.go, line 219 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Can you also include regular DistinctOn here?

I think it could in theory be included, but I think the case would be so rare as to be not worth the extra complication to test and maintain. I want to keep the cases constrained to what we might see in upsert queries (I added DistinctOn at the top-level mostly to make it easier to test). I thought about using a Rule property so that we don't descend the tree on every distinct operator, but figured it was OK as-is, since we only descend down a single path here, and not the entire subtree.


pkg/sql/opt/ops/relational.opt, line 533 at r1 (raw file):

Previously, RaduBerinde wrote…

I think this code needs to be updated to check the new field instead: https://github.com/cockroachdb/cockroach/blob/master/pkg/sql/opt/memo/statistics_builder.go#L1500

Yes, I missed that. Fixed.


pkg/sql/opt/optbuilder/insert.go, line 741 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Might help to specify that these conflict errors happen at execution time

Done.

Copy link
Contributor Author

@andy-kimball andy-kimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that I also fixed the problem that Radu found in the previous review, regarding incorrect removal of upsert-distinct-on when it wraps left-join.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @RaduBerinde and @rytaft)

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andy-kimball, @RaduBerinde, and @rytaft)


pkg/sql/opt/norm/groupby.go, line 238 at r2 (raw file):

			}
		}
		if !keyCols.SubsetOf(eqCols) {

Don't we want to use ColsAreLaxKey? This won't work well if we have multiple keys.

Copy link
Contributor Author

@andy-kimball andy-kimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @RaduBerinde and @rytaft)


pkg/sql/opt/norm/groupby.go, line 238 at r2 (raw file):

Previously, RaduBerinde wrote…

Don't we want to use ColsAreLaxKey? This won't work well if we have multiple keys.

Yes, ColsAreLaxKey is better. Switched to that.

The current code was able to work OK b/c we always left-join with an index scan having a key that matches the upsert-distinct-on grouping cols.

Copy link
Contributor Author

@andy-kimball andy-kimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another possibility I've thought about is whether we should instead have an insert-distinct-on operator rather than adding ErrorOnDup to upsert-distinct-on.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @RaduBerinde and @rytaft)

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of having a separate operator so you don't need the extra field

Reviewed 9 of 12 files at r2, 4 of 4 files at r3.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andy-kimball and @RaduBerinde)


pkg/sql/opt/memo/extract.go, line 227 at r3 (raw file):

// ExtractJoinEquality returns true if the given condition is a simple equality
// condition with two variables (e.g. a=b), where one of the variables (returned
// as "left") is in the set of leftCols and the other (returned as "right") in

[nit] (returned as "right") in -> (returned as "right") is in


pkg/sql/opt/memo/statistics_builder.go, line 1554 at r3 (raw file):

		// will raise an error, so in non-error cases its distinct count is the
		// same as its row count.
		colStat.DistinctCount = s.RowCount

But this only applies if the columns in colSet match the grouping columns, right? I think the code above will already do what you want in that case.

Copy link
Contributor Author

@andy-kimball andy-kimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @RaduBerinde and @rytaft)


pkg/sql/opt/memo/extract.go, line 227 at r3 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

[nit] (returned as "right") in -> (returned as "right") is in

Done.


pkg/sql/opt/memo/statistics_builder.go, line 1554 at r3 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

But this only applies if the columns in colSet match the grouping columns, right? I think the code above will already do what you want in that case.

How so? The RowCount of the distinct operator in the error case is equal to the RowCount of its input. But it's DistinctCount should also be equal to the RowCount of its input, which the code above would not do.

Maybe I should put this new code only in the 2nd branch of that code (when cols are grouping cols). So DistinctCount would equal RowCount when cols are grouping cols, but DistinctCount would be equal to 1st branch if not.

@andy-kimball andy-kimball force-pushed the donothing branch 2 times, most recently from 4d12d4d to e7051d0 Compare March 3, 2020 03:51
Copy link
Contributor Author

@andy-kimball andy-kimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re: the separate operator, I think I'll just keep as-is for now. There are pros and cons to each approach, and I don't think we gain much by switching. We can always change later if we decide it's not working well.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @RaduBerinde and @rytaft)

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 5 of 5 files at r4.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andy-kimball and @RaduBerinde)


pkg/sql/opt/memo/statistics_builder.go, line 1554 at r3 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

How so? The RowCount of the distinct operator in the error case is equal to the RowCount of its input. But it's DistinctCount should also be equal to the RowCount of its input, which the code above would not do.

Maybe I should put this new code only in the 2nd branch of that code (when cols are grouping cols). So DistinctCount would equal RowCount when cols are grouping cols, but DistinctCount would be equal to 1st branch if not.

But you'd need to be sure that the colSet is equal to groupingColSet. If there are two grouping columns and colSet only contains one of them, then this logic is incorrect. The GroupBy and DistinctOn operators don't change the distinct counts of grouping columns, regardless of whether ErrOnDup is set or not. (ErrOnDup does indicate something about the row count, though, which you've already handled above in buildGroupBy).

So the distinct count for colSet here should be the same as the distinct count of colSet in the input, which is what line 1546 is already doing (copying from input).

Copy link
Contributor Author

@andy-kimball andy-kimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @RaduBerinde and @rytaft)


pkg/sql/opt/memo/statistics_builder.go, line 1554 at r3 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

But you'd need to be sure that the colSet is equal to groupingColSet. If there are two grouping columns and colSet only contains one of them, then this logic is incorrect. The GroupBy and DistinctOn operators don't change the distinct counts of grouping columns, regardless of whether ErrOnDup is set or not. (ErrOnDup does indicate something about the row count, though, which you've already handled above in buildGroupBy).

So the distinct count for colSet here should be the same as the distinct count of colSet in the input, which is what line 1546 is already doing (copying from input).

Done.

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 2 of 2 files at r5.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @RaduBerinde)

Copy link
Contributor Author

@andy-kimball andy-kimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bors r+

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @RaduBerinde)

@craig
Copy link
Contributor

craig bot commented Mar 4, 2020

Build failed (retrying...)

@RaduBerinde
Copy link
Member

@craig
Copy link
Contributor

craig bot commented Mar 4, 2020

Canceled

Previously, the INSERT..ON CONFLICT DO NOTHING statement raised an error
if its input contained rows which conflicted with one another. For example:

  CREATE TABLE t (x INT PRIMARY KEY)
  INSERT INTO t (x) VALUES (1), (1) ON CONFLICT DO NOTHING

This raised a unique key conflict error rather than doing nothing.

This commit fixes that by wrapping the input in one or more UpsertDistinctOn
operators - one for each unique key in the target table. This ensures that
any duplicates which might cause a conflict are removed from the input.

Fixes cockroachdb#37880
Fixes cockroachdb#44434

Release note (sql change): Duplicate rows in the input to an INSERT..ON
CONFLICT DO NOTHING statement will now be ignored rather than triggering
an error.
@andy-kimball
Copy link
Contributor Author

bors r+

@craig
Copy link
Contributor

craig bot commented Mar 5, 2020

Build failed

@andy-kimball
Copy link
Contributor Author

bors r+

@craig
Copy link
Contributor

craig bot commented Mar 5, 2020

Build failed

@andy-kimball
Copy link
Contributor Author

bors r+

@craig
Copy link
Contributor

craig bot commented Mar 5, 2020

Build succeeded

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants