-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
opt: remove conflicting input rows in ON CONFLICT DO NOTHING case #45443
Conversation
2dcb3ce
to
5bb6021
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball, @RaduBerinde, and @rytaft)
pkg/sql/opt/ops/relational.opt, line 533 at r1 (raw file):
# ErrorOnDup, if true, triggers an error if any aggregation group contains # more than one row. This can only be true for the UpsertDistinctOn operator. ErrorOnDup bool
I think this code needs to be updated to check the new field instead: https://github.com/cockroachdb/cockroach/blob/master/pkg/sql/opt/memo/statistics_builder.go#L1500
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(once you address Radu's point)
Reviewed 17 of 17 files at r1.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @andy-kimball)
pkg/sql/opt/memo/testdata/logprops/upsert, line 222 at r1 (raw file):
├── key: (1) ├── fd: (1)-->(2-4), (2)-->(3), (4)~~>(1-3), (2,3)~~>(1,4) └── upsert-distinct-on
should ErrOnDup be printed here?
pkg/sql/opt/norm/groupby.go, line 219 at r1 (raw file):
} case *memo.UpsertDistinctOnExpr:
Can you also include regular DistinctOn
here?
pkg/sql/opt/optbuilder/insert.go, line 741 at r1 (raw file):
// Add an UpsertDistinctOn operator to ensure there are no duplicate input // rows for this unique index. Duplicate rows trigger conflict errors,
Might help to specify that these conflict errors happen at execution time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @RaduBerinde and @rytaft)
pkg/sql/opt/memo/testdata/logprops/upsert, line 222 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
should ErrOnDup be printed here?
Done.
pkg/sql/opt/norm/groupby.go, line 219 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
Can you also include regular
DistinctOn
here?
I think it could in theory be included, but I think the case would be so rare as to be not worth the extra complication to test and maintain. I want to keep the cases constrained to what we might see in upsert queries (I added DistinctOn
at the top-level mostly to make it easier to test). I thought about using a Rule property so that we don't descend the tree on every distinct operator, but figured it was OK as-is, since we only descend down a single path here, and not the entire subtree.
pkg/sql/opt/ops/relational.opt, line 533 at r1 (raw file):
Previously, RaduBerinde wrote…
I think this code needs to be updated to check the new field instead: https://github.com/cockroachdb/cockroach/blob/master/pkg/sql/opt/memo/statistics_builder.go#L1500
Yes, I missed that. Fixed.
pkg/sql/opt/optbuilder/insert.go, line 741 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
Might help to specify that these conflict errors happen at execution time
Done.
5bb6021
to
1a2c1b1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that I also fixed the problem that Radu found in the previous review, regarding incorrect removal of upsert-distinct-on when it wraps left-join.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @RaduBerinde and @rytaft)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andy-kimball, @RaduBerinde, and @rytaft)
pkg/sql/opt/norm/groupby.go, line 238 at r2 (raw file):
} } if !keyCols.SubsetOf(eqCols) {
Don't we want to use ColsAreLaxKey
? This won't work well if we have multiple keys.
1a2c1b1
to
bf2f9c8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @RaduBerinde and @rytaft)
pkg/sql/opt/norm/groupby.go, line 238 at r2 (raw file):
Previously, RaduBerinde wrote…
Don't we want to use
ColsAreLaxKey
? This won't work well if we have multiple keys.
Yes, ColsAreLaxKey
is better. Switched to that.
The current code was able to work OK b/c we always left-join with an index scan having a key that matches the upsert-distinct-on grouping cols.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another possibility I've thought about is whether we should instead have an insert-distinct-on
operator rather than adding ErrorOnDup
to upsert-distinct-on
.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @RaduBerinde and @rytaft)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of having a separate operator so you don't need the extra field
Reviewed 9 of 12 files at r2, 4 of 4 files at r3.
Reviewable status:complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andy-kimball and @RaduBerinde)
pkg/sql/opt/memo/extract.go, line 227 at r3 (raw file):
// ExtractJoinEquality returns true if the given condition is a simple equality // condition with two variables (e.g. a=b), where one of the variables (returned // as "left") is in the set of leftCols and the other (returned as "right") in
[nit] (returned as "right") in -> (returned as "right") is in
pkg/sql/opt/memo/statistics_builder.go, line 1554 at r3 (raw file):
// will raise an error, so in non-error cases its distinct count is the // same as its row count. colStat.DistinctCount = s.RowCount
But this only applies if the columns in colSet match the grouping columns, right? I think the code above will already do what you want in that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @RaduBerinde and @rytaft)
pkg/sql/opt/memo/extract.go, line 227 at r3 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
[nit] (returned as "right") in -> (returned as "right") is in
Done.
pkg/sql/opt/memo/statistics_builder.go, line 1554 at r3 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
But this only applies if the columns in colSet match the grouping columns, right? I think the code above will already do what you want in that case.
How so? The RowCount
of the distinct operator in the error case is equal to the RowCount
of its input. But it's DistinctCount
should also be equal to the RowCount
of its input, which the code above would not do.
Maybe I should put this new code only in the 2nd branch of that code (when cols are grouping cols). So DistinctCount
would equal RowCount
when cols are grouping cols, but DistinctCount
would be equal to 1st branch if not.
4d12d4d
to
e7051d0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re: the separate operator, I think I'll just keep as-is for now. There are pros and cons to each approach, and I don't think we gain much by switching. We can always change later if we decide it's not working well.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @RaduBerinde and @rytaft)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 5 of 5 files at r4.
Reviewable status:complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andy-kimball and @RaduBerinde)
pkg/sql/opt/memo/statistics_builder.go, line 1554 at r3 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
How so? The
RowCount
of the distinct operator in the error case is equal to theRowCount
of its input. But it'sDistinctCount
should also be equal to theRowCount
of its input, which the code above would not do.Maybe I should put this new code only in the 2nd branch of that code (when cols are grouping cols). So
DistinctCount
would equalRowCount
when cols are grouping cols, butDistinctCount
would be equal to 1st branch if not.
But you'd need to be sure that the colSet
is equal to groupingColSet
. If there are two grouping columns and colSet
only contains one of them, then this logic is incorrect. The GroupBy
and DistinctOn
operators don't change the distinct counts of grouping columns, regardless of whether ErrOnDup
is set or not. (ErrOnDup
does indicate something about the row count, though, which you've already handled above in buildGroupBy
).
So the distinct count for colSet
here should be the same as the distinct count of colSet
in the input, which is what line 1546 is already doing (copying from input).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @RaduBerinde and @rytaft)
pkg/sql/opt/memo/statistics_builder.go, line 1554 at r3 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
But you'd need to be sure that the
colSet
is equal togroupingColSet
. If there are two grouping columns andcolSet
only contains one of them, then this logic is incorrect. TheGroupBy
andDistinctOn
operators don't change the distinct counts of grouping columns, regardless of whetherErrOnDup
is set or not. (ErrOnDup
does indicate something about the row count, though, which you've already handled above inbuildGroupBy
).So the distinct count for
colSet
here should be the same as the distinct count ofcolSet
in the input, which is what line 1546 is already doing (copying from input).
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r5.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @RaduBerinde)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bors r+
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @RaduBerinde)
Build failed (retrying...) |
bors r- optbuilder testfiles need a rewrite: https://teamcity.cockroachdb.com/viewLog.html?buildId=1785780&buildTypeId=Cockroach_UnitTests |
Canceled |
Previously, the INSERT..ON CONFLICT DO NOTHING statement raised an error if its input contained rows which conflicted with one another. For example: CREATE TABLE t (x INT PRIMARY KEY) INSERT INTO t (x) VALUES (1), (1) ON CONFLICT DO NOTHING This raised a unique key conflict error rather than doing nothing. This commit fixes that by wrapping the input in one or more UpsertDistinctOn operators - one for each unique key in the target table. This ensures that any duplicates which might cause a conflict are removed from the input. Fixes cockroachdb#37880 Fixes cockroachdb#44434 Release note (sql change): Duplicate rows in the input to an INSERT..ON CONFLICT DO NOTHING statement will now be ignored rather than triggering an error.
bors r+ |
Build failed |
bors r+ |
Build failed |
bors r+ |
Build succeeded |
Previously, the INSERT..ON CONFLICT DO NOTHING statement raised an error
if its input contained rows which conflicted with one another. For example:
CREATE TABLE t (x INT PRIMARY KEY)
INSERT INTO t (x) VALUES (1), (1) ON CONFLICT DO NOTHING
This raised a unique key conflict error rather than doing nothing.
This commit fixes that by wrapping the input in one or more UpsertDistinctOn
operators - one for each unique key in the target table. This ensures that
any duplicates which might cause a conflict are removed from the input.
Fixes #37880
Fixes #44434
Release note (sql change): Duplicate rows in the input to an INSERT..ON
CONFLICT DO NOTHING statement will now be ignored rather than triggering
an error.