Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-record UPSERT inserts duplicate values in PRIMARY KEY, resulting in inconsistent results #44466

Closed
mrigger opened this issue Jan 28, 2020 · 12 comments · Fixed by #45372
Closed
Assignees
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.

Comments

@mrigger
Copy link

mrigger commented Jan 28, 2020

Consider the following testcase:

CREATE TABLE t0(c0 INT PRIMARY KEY, c1 BOOL, c2 INT UNIQUE);
INSERT INTO t0(c0) VALUES (0);
UPSERT INTO t0(c2, c0) VALUES (0, 0), (1, 0);
SELECT * FROM t0; -- {0 | NULL | 1}

A single row is fetched, while the following query fetches two records with a duplicate value for c0:

SELECT c0 FROM t0; -- {0, 0}

I found this issue in CockroachDB based on commit 1917a12003daad100b491e2b0195d7df5909e63f.

@yuzefovich yuzefovich added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Jan 29, 2020
@rohany
Copy link
Contributor

rohany commented Jan 30, 2020

I found the cause of this, I'll have to dig in more to see why it happened though:

  output row: ['movr']
  rows affected: 1
  Scan /Table/59/1/0{-/#}
  querying next range at /Table/59/1/0
  r59: sending batch 1 Scan to (n1,s1):1
  fetched: /t0/primary/0 -> NULL
  Put /Table/59/1/0/0 -> /TUPLE/3:3:Int/0
  Del /Table/59/2/NULL/0/0
  CPut /Table/59/2/0/0 -> /BYTES/0x88 (expecting does not exist)
  Put /Table/59/1/0/0 -> /TUPLE/3:3:Int/1
  Del /Table/59/2/NULL/0/0
  CPut /Table/59/2/1/0 -> /BYTES/0x88 (expecting does not exist)
  querying next range at /Table/59/1/0/0
  r59: sending batch 2 Put, 2 CPut, 2 Del, 1 EndTxn to (n1,s1):1
  fast path completed
  rows affected: 2
  output row: ['movr']
  rows affected: 1
  output row: ['movr']
  rows affected: 1

The upsert seems to delete Del /Table/59/2/NULL/0/0 twice rather than deleting /Table/59/2/0/0. The query then does an index scan on index 2, causing 2 rows to show up.

@rohany
Copy link
Contributor

rohany commented Feb 4, 2020

I investigated this more, and it is because the upserter cannot see writes that it makes. It first updates the row from 0, NULL, NULL to 0, NULL, 0, bet then cannot see that update. So it next deletes 0, NULL, NULL and inserts 0, NULL, 1 instead of deleting 0, NULL, 0. Based on offline conversation with @RaduBerinde this seems like something that was intended in the optimizer?

@RaduBerinde
Copy link
Member

The optimizer's intended behavior w.r.t duplicate values in UPSERT are described in #37880. In this case it looks like the PK and the index become inconsistent (the first query returns one row and the second query returns two rows). This makes it more clear:

root@127.149.115.8:37621/movr> SELECT c0, c2 FROM t0@primary;
  c0 | c2  
+----+----+
   0 |  1  
(1 row)

root@127.149.115.8:37621/movr> SELECT c0, c2 FROM t0@t0_c2_key;
  c0 | c2  
+----+----+
   0 |  0  
   0 |  1  

Regardless of the semantics of UPSERTS this is definitely a bug.. Seems like the "cannot affect row a second time" error should have been triggered.
CC @andy-kimball

@andy-kimball
Copy link
Contributor

My first look suggests that we'd need to add a DISTINCT ON clause on the primary key in order to fix this. However, there are downsides:

  1. In cases where we can't optimize it away it'd be a not insignificant drag on performance.
  2. We'd be less compatible with PG, since PG errors when the same upsert statement attempts to touch the same row twice. Instead, we'd silently pick one of the duplicate values to apply.

I may be missing another option, but I think avoiding problems #1 and #2 above would require KV support. We'd need a way to tell KV to raise an error if a write tries to place an intent on a row that already has an intent from a previous write with seqNum >= X, where X = first seqNum used by the current statement.

CC @knz, @andreimatei, @nvanbenschoten, in case they have opinions on:

  1. How much of a perf concern is adding DISTINCT ON to every UPSERT statement (assuming we optimize it away in common case where input are constant VALUES)?
  2. How difficult/disruptive would a solution at the KV level be? Does my suggestion above have fatal flaws? Is there an alternate better way?

@knz
Copy link
Contributor

knz commented Feb 5, 2020

Generally:

  1. the DISTINCT ON on the input will make more pg users happy I think? I vaguely recall there was a discussion about this before. No comments on performance.

  2. regarding a KV solution. It's not unreasonable because it's closer to what pg does, but there's a caveat (see below)

We'd need a way to tell KV to raise an error if a write tries to place an intent on a row that already has an intent from a previous write with seqNum >= X, where X = first seqNum used by the current statement.

That sounds complicated. I'd instead use the new machinery introduced to fix the halloween problem: seqnums that mark statement boundaries.

Then the rule is simpler, we can ask MVCC to raise an error when creating an intent if there's already one past the current read seqNum. That's more-or-less a direct translation of pg's rule "no touching the same row twice under the same sub-transaction"

The caveat: all these KV-based solutions can only detect duplicate writes when there is a single column family:

  1. If there are 2+ column families, it's possible for a complex INSERT ON CONFLICT clause to alternatively touch different families of the same rows without detecting a duplicate write.
  2. there's a weird special case when an UPDATE or INSERT ON CONFLICT updates a column that's part of the PK, in that case we delete the current row and create a new one. I believe (but I am not sure) it's also possible for this to touch one column family without checking the others, in which case an earlier write may not be detected as duplicate.

What I am trying to convey with these pitfalls is that we need a test suite that exercises all the combinations of code paths and then evaluate the solution using that test suite. If we don't, we'll get weird problems (like the one motivating the discussion).

@nvanbenschoten
Copy link
Member

I agree with @knz - pushing this complexity into KV is possible, but before doing so, we'd need to make sure that the semantics we want can be defined at the per column family level and not at the per row level. Verifying that seems like a non-trivial task and I don't know that any one person has a good enough sense of when we read and write (and omit reads and writes) to partial rows to be able to say a per column family level check would sufficient.

I'd like to hear a bit more about how we could optimize the DISTINCT ON away for constant VALUES clauses. That is by far the most common use case of UPSERT, so if we could make this behavior free in that case and only marginally slow down performance in the others in order to fully mirror PG's behavior, that would be a compelling option.

@RaduBerinde
Copy link
Member

I'd like to hear a bit more about how we could optimize the DISTINCT ON away for constant VALUES clauses

Essentially, it would be the optimizer running DISTINCT ON logic during planning. So it won't be free, but the optimizer already interns Datums so it would use pointers instead of encoding Datums.

@knz
Copy link
Contributor

knz commented Feb 5, 2020

Heads up we found this blog post and point 6 in there seems to be related.

I'd like to invite @karlseguin to post their example here that triggered the dup error on insert on conflict error.

@andy-kimball
Copy link
Contributor

@knz, I think you're getting the INSERT ... DO NOTHING variation confused with the simple UPSERT case. That blog post and a lot of discussion on #37880 are about INSERT ... DO NOTHING (i.e. we shouldn't raise any error on conflicts caused by input rows with duplicate key values). This issue is only about the UPSERT case. While the two are related, they're also different in important ways, and fixing one does not fix the other.

@andy-kimball
Copy link
Contributor

I tried to implement the DistinctOn solution, but ran into several issues:

  1. PG (usually, but not always) raises an error when the same row is updated twice. However, if I use the stock DistinctOn operator, CRDB would just arbitrarily apply one of the operations and ignore the others. Is that difference in behavior OK? We'd have to document it as being undefined which which operation CRDB applies when there are multiples.

  2. Turns out that the DistinctOn operator doesn't have the right NULL behavior. It treats NULL values as equal to one another. However, the Insert operator treats NULL values as being not equal to one another. I don't see any reasonable way to get around this problem, short of implementing another variation of the DistinctOn operator. Ideas are welcome.

@andy-kimball
Copy link
Contributor

Re: #2, one janky solution I thought about was creating an expression similar to this for nullable columns in a conflict index:

CREATE TABLE t (x INT PRIMARY KEY, y INT, UNIQUE (y));
SELECT DISTINCT ON (y, CASE WHEN y IS NULL THEN x ELSE NULL END) * FROM t

@andy-kimball
Copy link
Contributor

I decided to go ahead with a new variation of the DistinctOn operator that has the right NULL behavior and that raises an error when it detects duplicate input rows.

andy-kimball added a commit to andy-kimball/cockroach that referenced this issue Feb 22, 2020
Previously, there were cases where an UPSERT or INSERT..ON CONFLICT
statement could operate on the same row more than once. This is a
problem because these statements are designed to read once to get
existing rows, and then to insert or update based on that information.
They don't expect that previous writes for the same statement will
affect the correctness of subsequent writes. This can lead to index
corruption in cases where a row "moves" from one location to another
when one of its index keys changes. See issue cockroachdb#44466 for an example.

This commit fixes the problem by introducing a new variation of the
DistinctOn operator that ensures that the input to the Upsert
operator never has duplicates. It differs from the regular DistinctOn
operator in two ways:

  1. Null behavior: UpsertDistinctOn treats NULL values as not equal
     to one another for purposes of grouping. Two rows having a NULL-
     valued grouping column will be placed in different groups. This
     differs from DistinctOn behavior, where the two rows would be
     grouped together. This behavior difference reflects SQL semantics,
     in which a unique index key still allows multiple NULL values.

  2. Duplicate behavior: UpsertDistinctOn raises an error if any
     distinct grouping contains more than one row. It has "input must
     be distinct" semantics rather than "make the input distinct"
     semantics. This is used to ensure that no row will be updated
     more than once.

The optbuilder now wraps the input to the Upsert operator with this
new UpsertDistinctOn operator.

Fixes cockroachdb#44466

Release note (sql change): UPSERT and INSERT..ON CONFLICT statements
will sometimes need to do an extra check to ensure that they never
update the same row twice. This may adversely affect performance in
cases where the optimizer cannot statically prove the extra check is
unnecessary.
andy-kimball added a commit to andy-kimball/cockroach that referenced this issue Feb 25, 2020
Previously, there were cases where an UPSERT or INSERT..ON CONFLICT
statement could operate on the same row more than once. This is a
problem because these statements are designed to read once to get
existing rows, and then to insert or update based on that information.
They don't expect that previous writes for the same statement will
affect the correctness of subsequent writes. This can lead to index
corruption in cases where a row "moves" from one location to another
when one of its index keys changes. See issue cockroachdb#44466 for an example.

This commit fixes the problem by introducing a new variation of the
DistinctOn operator that ensures that the input to the Upsert
operator never has duplicates. It differs from the regular DistinctOn
operator in two ways:

  1. Null behavior: UpsertDistinctOn treats NULL values as not equal
     to one another for purposes of grouping. Two rows having a NULL-
     valued grouping column will be placed in different groups. This
     differs from DistinctOn behavior, where the two rows would be
     grouped together. This behavior difference reflects SQL semantics,
     in which a unique index key still allows multiple NULL values.

  2. Duplicate behavior: UpsertDistinctOn raises an error if any
     distinct grouping contains more than one row. It has "input must
     be distinct" semantics rather than "make the input distinct"
     semantics. This is used to ensure that no row will be updated
     more than once.

The optbuilder now wraps the input to the Upsert operator with this
new UpsertDistinctOn operator.

Fixes cockroachdb#44466

Release note (sql change): UPSERT and INSERT..ON CONFLICT statements
will sometimes need to do an extra check to ensure that they never
update the same row twice. This may adversely affect performance in
cases where the optimizer cannot statically prove the extra check is
unnecessary.
andy-kimball added a commit to andy-kimball/cockroach that referenced this issue Feb 25, 2020
Previously, there were cases where an UPSERT or INSERT..ON CONFLICT
statement could operate on the same row more than once. This is a
problem because these statements are designed to read once to get
existing rows, and then to insert or update based on that information.
They don't expect that previous writes for the same statement will
affect the correctness of subsequent writes. This can lead to index
corruption in cases where a row "moves" from one location to another
when one of its index keys changes. See issue cockroachdb#44466 for an example.

This commit fixes the problem by introducing a new variation of the
DistinctOn operator that ensures that the input to the Upsert
operator never has duplicates. It differs from the regular DistinctOn
operator in two ways:

  1. Null behavior: UpsertDistinctOn treats NULL values as not equal
     to one another for purposes of grouping. Two rows having a NULL-
     valued grouping column will be placed in different groups. This
     differs from DistinctOn behavior, where the two rows would be
     grouped together. This behavior difference reflects SQL semantics,
     in which a unique index key still allows multiple NULL values.

  2. Duplicate behavior: UpsertDistinctOn raises an error if any
     distinct grouping contains more than one row. It has "input must
     be distinct" semantics rather than "make the input distinct"
     semantics. This is used to ensure that no row will be updated
     more than once.

The optbuilder now wraps the input to the Upsert operator with this
new UpsertDistinctOn operator.

Fixes cockroachdb#44466

Release note (sql change): UPSERT and INSERT..ON CONFLICT statements
will sometimes need to do an extra check to ensure that they never
update the same row twice. This may adversely affect performance in
cases where the optimizer cannot statically prove the extra check is
unnecessary.
craig bot pushed a commit that referenced this issue Feb 26, 2020
45372: sql: ensure that Upsert will never update the same row twice r=andy-kimball a=andy-kimball

Previously, there were cases where an UPSERT or INSERT..ON CONFLICT
statement could operate on the same row more than once. This is a
problem because these statements are designed to read once to get
existing rows, and then to insert or update based on that information.
They don't expect that previous writes for the same statement will
affect the correctness of subsequent writes. This can lead to index
corruption in cases where a row "moves" from one location to another
when one of its index keys changes. See issue #44466 for an example.

This PR fixes the problem by introducing a new variation of the
DistinctOn operator that ensures that the input to the Upsert
operator never has duplicates. It differs from the regular DistinctOn
operator in two ways:

  1. Null behavior: UpsertDistinctOn treats NULL values as not equal
     to one another for purposes of grouping. Two rows having a NULL-
     valued grouping column will be placed in different groups. This
     differs from DistinctOn behavior, where the two rows would be
     grouped together. This behavior difference reflects SQL semantics,
     in which a unique index key still allows multiple NULL values.

  2. Duplicate behavior: UpsertDistinctOn raises an error if any
     distinct grouping contains more than one row. It has "input must
     be distinct" semantics rather than "make the input distinct"
     semantics. This is used to ensure that no row will be updated
     more than once.

The optbuilder now wraps the input to the Upsert operator with this
new UpsertDistinctOn operator. In addition, there are several commits
that add optimization rules designed to remove these distinct operators
when they can be proven to not be necessary.

Fixes #44466


Co-authored-by: Andrew Kimball <andyk@cockroachlabs.com>
@craig craig bot closed this as completed in 166f98e Feb 26, 2020
SQL Features (Deprecated - use SQL Experience board) automation moved this from Next up to Done Feb 26, 2020
bobvawter added a commit to cockroachdb/replicator that referenced this issue May 31, 2022
This change works around a CockroachDB implementation quirk wherein an UPSERT
is unable to read its own writes, which prevents the same row from being
upserted twice within the same statement. This also corrects a potential source
of data inconsistency in the applier code, if a single call to Apply contains
both upserts and deletes of the same row.

Both of the above issues can be addressed by sorting the mutations according to
their HLC time and then de-duplicating them by primary key.

The newly-added test in the apply package checks for the specific upsert
behavior, so that this workaround can be reconsidered if a future version of
CockroachDB allows duplicate rows within an UPSERT or CTE.

X-Ref: cockroachdb/cockroach#44466
X-Ref: cockroachdb/cockroach#70731
X-Ref: cockroachdb/cockroach#45372
bobvawter added a commit to cockroachdb/replicator that referenced this issue May 31, 2022
This change works around a CockroachDB implementation quirk wherein an UPSERT
is unable to read its own writes, which prevents the same row from being
upserted twice within the same statement. This also corrects a potential source
of data inconsistency in the applier code, if a single call to Apply contains
both upserts and deletes of the same row.

Both of the above issues can be addressed by sorting the mutations according to
their HLC time and then de-duplicating them by primary key.

The newly-added test in the apply package checks for the specific upsert
behavior, so that this workaround can be reconsidered if a future version of
CockroachDB allows duplicate rows within an UPSERT or CTE.

X-Ref: cockroachdb/cockroach#44466
X-Ref: cockroachdb/cockroach#70731
X-Ref: cockroachdb/cockroach#45372
bobvawter added a commit to cockroachdb/replicator that referenced this issue May 31, 2022
This change works around a CockroachDB implementation quirk wherein an UPSERT
is unable to read its own writes, which prevents the same row from being
upserted twice within the same statement. This also corrects a potential source
of data inconsistency in the applier code, if a single call to Apply contains
both upserts and deletes of the same row.

Both of the above issues can be addressed by sorting the mutations according to
their HLC time and then de-duplicating them by primary key.

The newly-added test in the apply package checks for the specific upsert
behavior, so that this workaround can be reconsidered if a future version of
CockroachDB allows duplicate rows within an UPSERT or CTE.

X-Ref: cockroachdb/cockroach#44466
X-Ref: cockroachdb/cockroach#70731
X-Ref: cockroachdb/cockroach#45372
bobvawter added a commit to cockroachdb/replicator that referenced this issue May 31, 2022
This change works around a CockroachDB implementation quirk wherein an UPSERT
is unable to read its own writes, which prevents the same row from being
upserted twice within the same statement. This also corrects a potential source
of data inconsistency in the applier code, if a single call to Apply contains
both upserts and deletes of the same row.

Both of the above issues can be addressed by sorting the mutations according to
their HLC time and then de-duplicating them by primary key.

The newly-added test in the apply package checks for the specific upsert
behavior, so that this workaround can be reconsidered if a future version of
CockroachDB allows duplicate rows within an UPSERT or CTE.

X-Ref: cockroachdb/cockroach#44466
X-Ref: cockroachdb/cockroach#70731
X-Ref: cockroachdb/cockroach#45372
bobvawter added a commit to cockroachdb/replicator that referenced this issue May 31, 2022
This change works around a CockroachDB implementation quirk wherein an UPSERT
is unable to read its own writes, which prevents the same row from being
upserted twice within the same statement. This also corrects a potential source
of data inconsistency in the applier code, if a single call to Apply contains
both upserts and deletes of the same row.

Both of the above issues can be addressed by sorting the mutations according to
their HLC time and then de-duplicating them by primary key.

The newly-added test in the apply package checks for the specific upsert
behavior, so that this workaround can be reconsidered if a future version of
CockroachDB allows duplicate rows within an UPSERT or CTE.

X-Ref: cockroachdb/cockroach#44466
X-Ref: cockroachdb/cockroach#70731
X-Ref: cockroachdb/cockroach#45372
bobvawter added a commit to cockroachdb/replicator that referenced this issue May 31, 2022
This change works around a CockroachDB implementation quirk wherein an UPSERT
is unable to read its own writes, which prevents the same row from being
upserted twice within the same statement. This also corrects a potential source
of data inconsistency in the applier code, if a single call to Apply contains
both upserts and deletes of the same row.

Both of the above issues can be addressed by sorting the mutations according to
their HLC time and then de-duplicating them by primary key.

The newly-added test in the apply package checks for the specific upsert
behavior, so that this workaround can be reconsidered if a future version of
CockroachDB allows duplicate rows within an UPSERT or CTE.

X-Ref: cockroachdb/cockroach#44466
X-Ref: cockroachdb/cockroach#70731
X-Ref: cockroachdb/cockroach#45372
bobvawter added a commit to cockroachdb/replicator that referenced this issue May 31, 2022
This change works around a CockroachDB implementation quirk wherein an UPSERT
is unable to read its own writes, which prevents the same row from being
upserted twice within the same statement. This also corrects a potential source
of data inconsistency in the applier code, if a single call to Apply contains
both upserts and deletes of the same row.

Both of the above issues can be addressed by sorting the mutations according to
their HLC time and then de-duplicating them by primary key.

The newly-added test in the apply package checks for the specific upsert
behavior, so that this workaround can be reconsidered if a future version of
CockroachDB allows duplicate rows within an UPSERT or CTE.

X-Ref: cockroachdb/cockroach#44466
X-Ref: cockroachdb/cockroach#70731
X-Ref: cockroachdb/cockroach#45372
bobvawter added a commit to cockroachdb/replicator that referenced this issue Jun 7, 2022
This change works around a CockroachDB implementation quirk wherein an UPSERT
is unable to read its own writes, which prevents the same row from being
upserted twice within the same statement. This also corrects a potential source
of data inconsistency in the applier code, if a single call to Apply contains
both upserts and deletes of the same row.

Both of the above issues can be addressed by sorting the mutations according to
their HLC time and then de-duplicating them by primary key.

The newly-added test in the apply package checks for the specific upsert
behavior, so that this workaround can be reconsidered if a future version of
CockroachDB allows duplicate rows within an UPSERT or CTE.

X-Ref: cockroachdb/cockroach#44466
X-Ref: cockroachdb/cockroach#70731
X-Ref: cockroachdb/cockroach#45372
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

8 participants