sql: rollback to savepoint broken in mixed-version (19.2/20.1) clusters #46372

andreimatei · 2020-03-20T17:02:25Z

A 20.1 node rolling back to a savepoint doesn't affect request evaluation on 19.2 nodes, as 19.2 doesn't understand about ignored sequence number ranges. This means that reads served by old nodes will still return values that should have been rolled back, and they'll also commit them.
It's not the highest impact bug in the world seems it seems dubious to start using savepoints in a mixed-version cluster since some nodes would error out, but if you do start using it it's not good.

I guess we should gate execution of savepoints on a cluster version. One problem is with rolling back to the cockroach_restart savepoint which, in 20.1 nodes, uses the same ignored seqnum mechanism when the rollback happens from the Open state. That's a bit of an edge case, as generally one rolls back to cockroach_restart after a retriable error (so, not from the Open state).

Repro below. t is a table with the leaseholder on 19.2, t2 has the leaseholder on 20.1. After a rollback, writes to the latter go away, but not to the former.

root@localhost:26259/defaultdb> begin;
BEGIN
root@localhost:26259/defaultdb  OPEN> savepoint a;
SAVEPOINT
root@localhost:26259/defaultdb  OPEN> insert into t(x) values (1);
INSERT 1
root@localhost:26259/defaultdb  OPEN> insert into t2(x) values (1);
INSERT 1
root@localhost:26259/defaultdb  OPEN> select * from t;
  x
-----
  1
(1 row)
root@localhost:26259/defaultdb  OPEN> select * from t2;
  x
-----
  1
(1 row)
root@localhost:26259/defaultdb  OPEN> rollback to savepoint a;
ROLLBACK
root@localhost:26259/defaultdb  OPEN> select * from t;
  x
-----
  1
(1 row)
root@localhost:26259/defaultdb  OPEN> select * from t2;
  x
-----
(0 rows)

The text was updated successfully, but these errors were encountered:

knz · 2020-03-20T18:45:36Z

oh I know what to do with this

46391: kv/concurrency: never regress timestamp in lockTable on lock acquisition r=sumeerbhola a=nvanbenschoten Fixes #46290. Fixes #43735. Fixes #41425. Fixes #43809. This change adjusts the lockTable to be more lenient to the timestamp of new lock acquisitions. Specifically, it lifts the restriction that all calls to AcquireLock use monotonically increasing timestamps. Instead, it properly handles apparent timestamp regressions by ratcheting lock timestamps instead of replacing them directly. This matches the corresponding behavior in MVCC: https://github.com/cockroachdb/cockroach/blob/92107b551bbafe54fddb496442c590cb6feb5d65/pkg/storage/mvcc.go#L1631 This leniency is needed for sequences of events like the following: - txn A acquires lock at epoch 1, ts 10 - txn B pushes txn A to ts 20 - txn B updates lock to ts 20 - txn A restarts at ts 15 without noticing that it has been pushes - txn A re-acquires lock at epoch 2, ts 15 - we hit the lock timestamp regression assertion We see this frequently in CDC roachtests because the rangefeed processor performs high-priority timestamp pushes on long-running transactions. Outside of CDC, this is rare in our system. Release note (bug fix): CDC no longer combines with long running transactions to trigger an assertion. Release justification: fixes a high-priority bug in existing functionality. The bug could crash a server if the right sequence of events occurred. This was typically rare, but was much more common when CDC was in use. This is a fairly large PR, but there's only a single line of non-testing code touched. 46415: sql: misc savepoint fixes r=andreimatei a=knz Fixes #46372 Release justification: bug fixes and low-risk updates to new functionality These commits correct a few gaps in the recent implementation. Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com> Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net>

andreimatei added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Mar 20, 2020

andreimatei self-assigned this Mar 20, 2020

jordanlewis mentioned this issue Mar 20, 2020

20.1 release blockers list #45599

Closed

83 tasks

andreimatei mentioned this issue Mar 20, 2020

kv: idempotency failure across implicit commit breaks refreshing a parallel commit request #46341

Closed

knz self-assigned this Mar 21, 2020

knz mentioned this issue Mar 23, 2020

sql: misc savepoint fixes #46415

Merged

craig bot closed this as completed in #46415 Mar 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: rollback to savepoint broken in mixed-version (19.2/20.1) clusters #46372

sql: rollback to savepoint broken in mixed-version (19.2/20.1) clusters #46372

andreimatei commented Mar 20, 2020 •

edited

Loading

knz commented Mar 20, 2020

sql: rollback to savepoint broken in mixed-version (19.2/20.1) clusters #46372

sql: rollback to savepoint broken in mixed-version (19.2/20.1) clusters #46372

Comments

andreimatei commented Mar 20, 2020 • edited Loading

knz commented Mar 20, 2020

andreimatei commented Mar 20, 2020 •

edited

Loading