kv: timetravel queries do not error on ClearRange'd data #31563

eriktrinh · 2018-10-17T22:40:12Z

Describe the problem

After dropping a table and the underlying data has been deleted by the ClearRange command, queries with a transaction timestamp before the table has been dropped serve empty, invalid results. It should error because by the time ClearRange has been issued, the GC ttl should have already been passed.

This also affects drop index when the index truncation uses ClearRange, where if a transaction uses an older version of a table descriptor with the index then incorrect results are returned if the index is used.

To Reproduce

create table t(i int primary key);
insert some data into t
note timestamp from select now();
(optionally) alter table t configure zone with gc.ttlseconds={ttlseconds}
drop table t
wait GC period until the drop job has completed
select * from t as of system time '{timestamp from 3}'

Jira issue: CRDB-4786

The text was updated successfully, but these errors were encountered:

eriktrinh · 2018-10-18T19:06:51Z

cc: @vivekmenezes

31326: sql: use ClearRange for index truncation r=eriktrinh a=eriktrinh This change makes the deletion of index data use the ClearRange batch request. DROP INDEX schema changes in the same transaction as the one which created the table still uses the slower DelRange because ClearRange cannot be run inside a transaction and will remove write intents of the index keys which have not been resolved in the transaction. This deletion of index data happens once the GC TTL period has passed and it is safe to remove all the data. See PR #30566 for which delays this data deletion. Note: See #31563 for a limitation with queries with old timestamps which returns inconsistent data which has been deleted with ClearRange. Fixes #20696. Release note (performance improvement): Deletion of index data is faster. Co-authored-by: Erik Trinh <erik@cockroachlabs.com>

andreimatei · 2018-12-13T21:11:24Z

@vivekmenezes is there a plan here? This seems like a biggie to me; I don't think s-3 does it justice. Particularly since I think this now affects dropped indexes, which is much worse I'd say than dropped tables.

vivekmenezes · 2018-12-18T02:52:57Z

@andreimatei there is no plan for this. i think there is value here in considering returning an error from sql-land to handle this case.

tbg · 2021-03-08T14:16:14Z

Looking at #31504 which is a PR that attempted to address the issue, it seems that we got hung up on the semantics of moving the GCThreshold forward. The PR made setting a GCThreshold part of ClearRange, which would solve the issue observed here. However, since the GC threshold is per range, this raised questions about how this might affect unrelated colocated data not targeted by the ClearRange.

Also, has this been fixed? On ./cockroach demo on today's master:

demo@127.0.0.1:26257/movr> create table t(i int primary key); insert into t values (1), (2), (3); select now();
                  now
---------------------------------------
  2021-03-08 14:11:52.597502+00:00:00
(1 row)

demo@127.0.0.1:26257/movr> alter table t configure zone using gc.ttlseconds=5;
CONFIGURE ZONE 1

demo@127.0.0.1:26257/movr> drop table t; select pg_sleep(10);
select * from t as of system time   pg_sleep
------------
    true
(1 row)

demo@127.0.0.1:26257/movr> select * from t as of system time '2021-03-08 14:11:52.597502+00:00:00';
ERROR: relation "t" does not exist
SQLSTATE: 42P01

It seems as though we're smart enough to prevent use of the foo descriptor once it's deleted:

demo@127.0.0.1:26257/movr> create table foo (id int primary key); drop table foo; select * from foo as of system time interval '-1h';
ERROR: relation "foo" does not exist
SQLSTATE: 42P01

Maybe @ajwerner can point to a specific change SQL has made here.

ajwerner · 2021-03-08T15:29:56Z

This is funny. I have a PR open right now to add back the ability to read data from a time before a descriptor was deleted. I “broke” it during this release. It turns out to be a somewhat important property for correctness in some cases, namely if you have a read only transaction which knows (via pg catalog or something like that) that some table exists but then cannot interact with it. It was one of the things breaking the randomized schema change tests.

#61493

This does seem problematic re:clear range. What I wish we’d do is set the GC threshold in the same batch.

ajwerner · 2021-03-08T15:31:13Z

The real motivation for the above pr isn’t about table data but rather about types and schemas which have their descriptors removed immediately without a ttl. However, the same logic is what enables to code to discover old table descriptor after the ttl has expired and the data has been removed.

ajwerner · 2021-03-08T16:06:46Z

I've stewed on this a bit and I now don't think that the GC threshold is necessarily the right thing. Unfortunately this space is sort of complex. Ideally we'd find a way to expose a consistent view of the catalog to time travel queries. Right now we just blindly move the read timestamp of the kv txn and things almost perfectly just work out. However, this has its own problems, namely that we use historical privileges (https://groups.google.com/a/cockroachlabs.com/g/sql-schema-team/c/L4oUTiceGY8/m/J030sIylAQAJ, #51861). Ideally what we'd do is know the status of these dropped descriptors.

However, it's even worse than all of this; we support setting a GC TTL for an index. We would need to track the status of that index somewhere and currently that somewhere is not on the descriptor. The point of all of this is that our time travel queries when it comes to schema are somewhat under-considered. Fixing them up will require a reasonably big investment. In the short term we are working slowly on building better abstractions. Namely on pulling the catalog functionality which drives virtual table usage into the catalog layer of the code. At that point we could talk about doing fancier things.

As far as the above PR, there is a more ideal thing we could do that still wouldn't be perfect. My issue is that I really want for tables which have not been GC'd to still be accessible -- but I'm fine with preventing access after the descriptor has been dropped. The problem is that right now, when an AS OF SYSTEM TIME query wants to find the relevant descriptor, the way we do it is by having in memory the current leased descriptor and then walking the chain backwards using the ModificationTime values. This means that if we don't have a descriptor newer than the current descriptor, then we can't find the relevant descriptor. The way that the code used to handle that is any time it cannot find a descriptor, it always just goes to the store to try to read it. In that case, we lose any insight into the present state of the descriptor.

The fix here is to instead take the timestamp we want and do an ExportRequest to get the version of the descriptor that applies as well as its successor. If we did that, then, in principle, we could also go grab the most recent version and detect whether the descriptor has actually been dropped. The only flaw with that approach is that there's now way to paginate the versions of a key so if a user somehow had a ton of versions of a big descriptor, we might be in trouble. FWIW that's the case today, we just would accumulate that memory more slowly so I'm inclined to not immediately count that as a disqualification. Cross referencing with #61623 which I just created as loosely related.

tl;dr

Unfortunately we live with this for the time being
Schema will track this as a broader problem to solve related to historical schema access
There exists an intermediate state which allows us to solve this specific issue which we can pursue in 21.2 to revitalize the reverted change in sql/catalog/lease: revert fallback change and fallback also when doing by-id lookups #61493 and fail with a GC TTL error for tables which have been dropped fully.
Dropped and clear-ranged indexes pose an interesting challenge that needs some additional work even with the above point.

cc @vy-ton as a problem area

ajwerner · 2021-05-28T20:01:13Z

The solution here should probably have something to do with #62585.

ajwerner · 2022-03-11T21:20:23Z

This should go away magically after #70429.

ajwerner · 2023-03-13T20:38:06Z

This is fixed for 23.1, closing.

eriktrinh mentioned this issue Oct 18, 2018

sql: use ClearRange for index truncation #31326

Merged

vivekmenezes added this to Triage in Disaster Recovery Backlog via automation Oct 19, 2018

eriktrinh mentioned this issue Oct 19, 2018

storage: forward GC threshold when clearing range #31504

Closed

vivekmenezes moved this from Triage to New schema lease in Disaster Recovery Backlog Oct 19, 2018

vivekmenezes moved this from New schema lease to Backlog for later in Disaster Recovery Backlog Oct 19, 2018

knz added S-3-erroneous-edge-case Database produces or stores erroneous data without visible error/warning, in rare edge cases. and removed S-3 Medium-low impact: incurs increased costs for some users (incl lower avail, recoverable bad data) labels Nov 12, 2018

dt moved this from Untriaged to Backlog: Disowned in Disaster Recovery Backlog Jul 16, 2019

dt added this to Incoming in KV via automation Jul 16, 2019

dt removed this from Backlog: Disowned in Disaster Recovery Backlog Jul 16, 2019

lunevalex moved this from Incoming to Transactions in KV Jul 28, 2020

tbg added this to Considered in KV 21.1 Stability Period Sep 10, 2020

tbg moved this from Considered to SH in KV 21.1 Stability Period Sep 10, 2020

irfansharif changed the title ~~timetravel queries do not error on ClearRange'd data~~ kv: timetravel queries do not error on ClearRange'd data Sep 10, 2020

tbg moved this from SH to 21.2 Radar in KV 21.1 Stability Period Mar 4, 2021

tbg added this to Triage in SQL Foundations via automation Mar 9, 2021

tbg removed this from Transactions in KV Mar 9, 2021

tbg removed this from 21.2 Radar in KV 21.1 Stability Period Mar 9, 2021

tbg removed the A-kv-client Relating to the KV client and the KV interface. label Mar 9, 2021

ajwerner moved this from Triage to Backlog: transactional schema changes in SQL Foundations Mar 9, 2021

ajwerner mentioned this issue Mar 25, 2021

kv: record non-MVCC operations to support ExportRequests and RangeFeed #62585

Closed

ajwerner mentioned this issue May 28, 2021

kvserver: historical reads after clearrange incorrect #53047

Closed

jlinder added the T-sql-schema-deprecated Use T-sql-foundations instead label Jun 16, 2021

erikgrinaker mentioned this issue Sep 20, 2021

sql: migrate schema object removals and GC to DeleteRange #70427

Closed

ajwerner mentioned this issue Dec 14, 2021

kvserver: maintain key-span GC thresholds #73826

Closed

postamar moved this from Backlog: transactional schema changes to Triage in SQL Foundations Mar 8, 2022

ajwerner moved this from Triage to Cold storage in SQL Foundations Mar 11, 2022

ajwerner closed this as completed Mar 13, 2023

SQL Foundations automation moved this from Cold storage to Done Mar 13, 2023

exalate-issue-sync bot added T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) and removed T-sql-schema-deprecated Use T-sql-foundations instead labels May 10, 2023

chengxiong-ruan mentioned this issue Jul 6, 2023

sql: enable gc tests #106270

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kv: timetravel queries do not error on ClearRange'd data #31563

kv: timetravel queries do not error on ClearRange'd data #31563

eriktrinh commented Oct 17, 2018 •

edited by cockroach-jira-scripts

eriktrinh commented Oct 18, 2018

andreimatei commented Dec 13, 2018

vivekmenezes commented Dec 18, 2018

tbg commented Mar 8, 2021

ajwerner commented Mar 8, 2021

ajwerner commented Mar 8, 2021

ajwerner commented Mar 8, 2021 •

edited

ajwerner commented May 28, 2021

ajwerner commented Mar 11, 2022

ajwerner commented Mar 13, 2023

kv: timetravel queries do not error on ClearRange'd data #31563

kv: timetravel queries do not error on ClearRange'd data #31563

Comments

eriktrinh commented Oct 17, 2018 • edited by cockroach-jira-scripts

eriktrinh commented Oct 18, 2018

andreimatei commented Dec 13, 2018

vivekmenezes commented Dec 18, 2018

tbg commented Mar 8, 2021

ajwerner commented Mar 8, 2021

ajwerner commented Mar 8, 2021

ajwerner commented Mar 8, 2021 • edited

ajwerner commented May 28, 2021

ajwerner commented Mar 11, 2022

ajwerner commented Mar 13, 2023

eriktrinh commented Oct 17, 2018 •

edited by cockroach-jira-scripts

ajwerner commented Mar 8, 2021 •

edited