Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv: timetravel queries do not error on ClearRange'd data #31563

Closed
eriktrinh opened this issue Oct 17, 2018 · 10 comments
Closed

kv: timetravel queries do not error on ClearRange'd data #31563

eriktrinh opened this issue Oct 17, 2018 · 10 comments
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. S-3-erroneous-edge-case Database produces or stores erroneous data without visible error/warning, in rare edge cases. T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)

Comments

@eriktrinh
Copy link

eriktrinh commented Oct 17, 2018

Describe the problem

After dropping a table and the underlying data has been deleted by the ClearRange command, queries with a transaction timestamp before the table has been dropped serve empty, invalid results. It should error because by the time ClearRange has been issued, the GC ttl should have already been passed.

This also affects drop index when the index truncation uses ClearRange, where if a transaction uses an older version of a table descriptor with the index then incorrect results are returned if the index is used.

To Reproduce

  1. create table t(i int primary key);
  2. insert some data into t
  3. note timestamp from select now();
  4. (optionally) alter table t configure zone with gc.ttlseconds={ttlseconds}
  5. drop table t
  6. wait GC period until the drop job has completed
  7. select * from t as of system time '{timestamp from 3}'

Jira issue: CRDB-4786

@eriktrinh
Copy link
Author

cc: @vivekmenezes

@vivekmenezes vivekmenezes added A-kv-client Relating to the KV client and the KV interface. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. S-3 Medium-low impact: incurs increased costs for some users (incl lower avail, recoverable bad data) labels Oct 19, 2018
@vivekmenezes vivekmenezes added this to Triage in Disaster Recovery Backlog via automation Oct 19, 2018
craig bot pushed a commit that referenced this issue Oct 19, 2018
31326: sql: use ClearRange for index truncation r=eriktrinh a=eriktrinh

This change makes the deletion of index data use the ClearRange batch
request. DROP INDEX schema changes in the same transaction as the one
which created the table still uses the slower DelRange because
ClearRange cannot be run inside a transaction and will remove write
intents of the index keys which have not been resolved in the
transaction.

This deletion of index data happens once the GC TTL period has passed
and it is safe to remove all the data. See PR #30566 for which delays
this data deletion.

Note: See #31563 for a limitation with queries with old timestamps which returns inconsistent data which has been deleted with ClearRange.

Fixes #20696.

Release note (performance improvement): Deletion of index data is faster.

Co-authored-by: Erik Trinh <erik@cockroachlabs.com>
@vivekmenezes vivekmenezes moved this from Triage to New schema lease in Disaster Recovery Backlog Oct 19, 2018
@vivekmenezes vivekmenezes moved this from New schema lease to Backlog for later in Disaster Recovery Backlog Oct 19, 2018
@knz knz added S-3-erroneous-edge-case Database produces or stores erroneous data without visible error/warning, in rare edge cases. and removed S-3 Medium-low impact: incurs increased costs for some users (incl lower avail, recoverable bad data) labels Nov 12, 2018
@andreimatei
Copy link
Contributor

@vivekmenezes is there a plan here? This seems like a biggie to me; I don't think s-3 does it justice. Particularly since I think this now affects dropped indexes, which is much worse I'd say than dropped tables.

@vivekmenezes
Copy link
Contributor

@andreimatei there is no plan for this. i think there is value here in considering returning an error from sql-land to handle this case.

@dt dt moved this from Untriaged to Backlog: Disowned in Disaster Recovery Backlog Jul 16, 2019
@dt dt added this to Incoming in KV via automation Jul 16, 2019
@dt dt removed this from Backlog: Disowned in Disaster Recovery Backlog Jul 16, 2019
@lunevalex lunevalex moved this from Incoming to Transactions in KV Jul 28, 2020
@tbg tbg added this to Considered in KV 21.1 Stability Period Sep 10, 2020
@tbg tbg moved this from Considered to SH in KV 21.1 Stability Period Sep 10, 2020
@irfansharif irfansharif changed the title timetravel queries do not error on ClearRange'd data kv: timetravel queries do not error on ClearRange'd data Sep 10, 2020
@tbg tbg moved this from SH to 21.2 Radar in KV 21.1 Stability Period Mar 4, 2021
@tbg
Copy link
Member

tbg commented Mar 8, 2021

Looking at #31504 which is a PR that attempted to address the issue, it seems that we got hung up on the semantics of moving the GCThreshold forward. The PR made setting a GCThreshold part of ClearRange, which would solve the issue observed here. However, since the GC threshold is per range, this raised questions about how this might affect unrelated colocated data not targeted by the ClearRange.

Also, has this been fixed? On ./cockroach demo on today's master:

demo@127.0.0.1:26257/movr> create table t(i int primary key); insert into t values (1), (2), (3); select now();
                  now
---------------------------------------
  2021-03-08 14:11:52.597502+00:00:00
(1 row)

demo@127.0.0.1:26257/movr> alter table t configure zone using gc.ttlseconds=5;
CONFIGURE ZONE 1

demo@127.0.0.1:26257/movr> drop table t; select pg_sleep(10);
select * from t as of system time   pg_sleep
------------
    true
(1 row)

demo@127.0.0.1:26257/movr> select * from t as of system time '2021-03-08 14:11:52.597502+00:00:00';
ERROR: relation "t" does not exist
SQLSTATE: 42P01

It seems as though we're smart enough to prevent use of the foo descriptor once it's deleted:

demo@127.0.0.1:26257/movr> create table foo (id int primary key); drop table foo; select * from foo as of system time interval '-1h';
ERROR: relation "foo" does not exist
SQLSTATE: 42P01

Maybe @ajwerner can point to a specific change SQL has made here.

@ajwerner
Copy link
Contributor

ajwerner commented Mar 8, 2021

This is funny. I have a PR open right now to add back the ability to read data from a time before a descriptor was deleted. I “broke” it during this release. It turns out to be a somewhat important property for correctness in some cases, namely if you have a read only transaction which knows (via pg catalog or something like that) that some table exists but then cannot interact with it. It was one of the things breaking the randomized schema change tests.

#61493

This does seem problematic re:clear range. What I wish we’d do is set the GC threshold in the same batch.

@ajwerner
Copy link
Contributor

ajwerner commented Mar 8, 2021

The real motivation for the above pr isn’t about table data but rather about types and schemas which have their descriptors removed immediately without a ttl. However, the same logic is what enables to code to discover old table descriptor after the ttl has expired and the data has been removed.

@ajwerner
Copy link
Contributor

ajwerner commented Mar 8, 2021

I've stewed on this a bit and I now don't think that the GC threshold is necessarily the right thing. Unfortunately this space is sort of complex. Ideally we'd find a way to expose a consistent view of the catalog to time travel queries. Right now we just blindly move the read timestamp of the kv txn and things almost perfectly just work out. However, this has its own problems, namely that we use historical privileges (https://groups.google.com/a/cockroachlabs.com/g/sql-schema-team/c/L4oUTiceGY8/m/J030sIylAQAJ, #51861). Ideally what we'd do is know the status of these dropped descriptors.

However, it's even worse than all of this; we support setting a GC TTL for an index. We would need to track the status of that index somewhere and currently that somewhere is not on the descriptor. The point of all of this is that our time travel queries when it comes to schema are somewhat under-considered. Fixing them up will require a reasonably big investment. In the short term we are working slowly on building better abstractions. Namely on pulling the catalog functionality which drives virtual table usage into the catalog layer of the code. At that point we could talk about doing fancier things.

As far as the above PR, there is a more ideal thing we could do that still wouldn't be perfect. My issue is that I really want for tables which have not been GC'd to still be accessible -- but I'm fine with preventing access after the descriptor has been dropped. The problem is that right now, when an AS OF SYSTEM TIME query wants to find the relevant descriptor, the way we do it is by having in memory the current leased descriptor and then walking the chain backwards using the ModificationTime values. This means that if we don't have a descriptor newer than the current descriptor, then we can't find the relevant descriptor. The way that the code used to handle that is any time it cannot find a descriptor, it always just goes to the store to try to read it. In that case, we lose any insight into the present state of the descriptor.

The fix here is to instead take the timestamp we want and do an ExportRequest to get the version of the descriptor that applies as well as its successor. If we did that, then, in principle, we could also go grab the most recent version and detect whether the descriptor has actually been dropped. The only flaw with that approach is that there's now way to paginate the versions of a key so if a user somehow had a ton of versions of a big descriptor, we might be in trouble. FWIW that's the case today, we just would accumulate that memory more slowly so I'm inclined to not immediately count that as a disqualification. Cross referencing with #61623 which I just created as loosely related.

tl;dr

  • Unfortunately we live with this for the time being
  • Schema will track this as a broader problem to solve related to historical schema access
  • There exists an intermediate state which allows us to solve this specific issue which we can pursue in 21.2 to revitalize the reverted change in sql/catalog/lease: revert fallback change and fallback also when doing by-id lookups #61493 and fail with a GC TTL error for tables which have been dropped fully.
  • Dropped and clear-ranged indexes pose an interesting challenge that needs some additional work even with the above point.

cc @vy-ton as a problem area

@tbg tbg added this to Triage in SQL Foundations via automation Mar 9, 2021
@tbg tbg removed this from Transactions in KV Mar 9, 2021
@tbg tbg removed this from 21.2 Radar in KV 21.1 Stability Period Mar 9, 2021
@tbg tbg removed the A-kv-client Relating to the KV client and the KV interface. label Mar 9, 2021
@ajwerner ajwerner moved this from Triage to Backlog: transactional schema changes in SQL Foundations Mar 9, 2021
@ajwerner
Copy link
Contributor

The solution here should probably have something to do with #62585.

@jlinder jlinder added the T-sql-schema-deprecated Use T-sql-foundations instead label Jun 16, 2021
@postamar postamar moved this from Backlog: transactional schema changes to Triage in SQL Foundations Mar 8, 2022
@ajwerner
Copy link
Contributor

This should go away magically after #70429.

@ajwerner ajwerner moved this from Triage to Cold storage in SQL Foundations Mar 11, 2022
@ajwerner
Copy link
Contributor

This is fixed for 23.1, closing.

SQL Foundations automation moved this from Cold storage to Done Mar 13, 2023
@exalate-issue-sync exalate-issue-sync bot added T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) and removed T-sql-schema-deprecated Use T-sql-foundations instead labels May 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. S-3-erroneous-edge-case Database produces or stores erroneous data without visible error/warning, in rare edge cases. T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)
Projects
Development

Successfully merging a pull request may close this issue.

7 participants