sql: potential inconsistency after certain schema changes #44007

rohany · 2020-01-15T15:01:58Z

In some cases (such as interleaved tables or foreign keys), a table A has a reference to table B, and table B maintains a back reference to A. At least in cases where we drop an object that requires also removing the back reference from referenced table, I believe there is a chance for nodes to be in an inconsistent state.

For example, take the code path for dropping an index, with cascade. If the index being dropped is in use for a foreign key constraint, then the constraint is also dropped. What this means is that the appropriate foreign key constraint is removed from both the origin table and the referenced table in the same transaction.

Now because descriptors are cached and gossip is used to invalidate the caches, a node could end up in an inconsistent state with this deleted foreign key constraint. Imagine that a node gets the invalidation for the origin table, but has not yet received the gossip update for the referenced table -- then it will be in a state where the origin table says there isn't a foreign key, while the referenced table says that there is one. (assuming my understanding of gossip is correct)

I believe this problem could affect most places where we drop resources that have this reference and back reference (like dropping an interleaved index etc.), and I don't see if there is something in place that avoids this. It also feels like it is affects #43759 for the same reason.

Jira issue: CRDB-5259

rohany · 2020-01-15T15:09:12Z

Not that I know how gossip works, but could grouping descriptors modified in a txn into the same gossip message solve this problem? That would be difficult though, because different ranges could be affected by the descriptor writes.

ajwerner · 2020-01-15T15:10:13Z

We need to define what is meant by "inconsistent state". Let's make this more concrete with descriptor versions.

A1: Referenced by B
B1: References A
----- Txn Commits ----
A2: No reference from B
B2: No reference to A

Any combination of these pairs is valid (i.e. (A1,B1), (A2,B1), (A1,B2), (A2,B2)). The question I have is which combination of these lease pairs is problematic.

Not that I know how gossip works, but could grouping descriptors modified in a txn into the same gossip message solve this problem? That would be difficult though, because different ranges could be affected by the descriptor writes.

They will be gossiped at the same time but I don't think that's relevant to the discussion at hand. Let's forget about gossip and its role in cache invalidation as it's not that critical and is more of an optimization.

rohany · 2020-01-15T15:17:01Z

Generally, I'd imagine that either A1, B2 or A2, B1 are problematic, but it would depend on the specific case (foreign keys, interleaved tables), and what operation is being performed.

I think for operations that modify the descriptors we don't have to worry about this, so we just have to look at operations that just read the descriptors.

Take an insert on a table with a foreign key reference. If the origin table has the reference, but the referenced table doesn't, should we still check that the constraint is validated on the insert?

In the case of primary key changes, we aren't going to be deleting these references, but instead changing the objects on the descriptors that they point to (for example, in the interleaved case, we want to change the indexID). If we have one of these version mismatched pairs, then these references could point to objects which no longer exist on the newer version descriptors, which is definitely a problem.

ajwerner · 2020-01-15T15:21:30Z

Generally, I'd imagine that either A1, B2 or A2, B1 are problematic, but it would depend on the specific case (foreign keys, interleaved tables), and what operation is being performed.

It is critical for schema changes to worry about exactly this.

I think for operations that modify the descriptors we don't have to worry about this, so we just have to look at operations that just read the descriptors.

I disagree, great care has been taken to make sure that the transitions between versions are safe. Generally by modifying either the logical or physical schema layout of a table in one step but never both.

Take an insert on a table with a foreign key reference. If the origin table has the reference, but the referenced table doesn't, should we still check that the constraint is validated on the insert?

I'm still not sure I see a hazard at this point. Can you spell out a problematic case where we'll hit an error if in A1, B2 or A2, B1?

In the case of primary key changes, we aren't going to be deleting these references, but instead changing the objects on the descriptors that they point to (for example, in the interleaved case, we want to change the indexID). If we have one of these version mismatched pairs, then these references could point to objects which no longer exist on the newer version descriptors, which is definitely a problem.

This implies that the protocol for schema changes when changing a primary key will need to step through multiple versions.

ajwerner · 2020-01-15T15:22:06Z

If the origin table has the reference, but the referenced table doesn't, should we still check that the constraint is validated on the insert?

Either seems valid to me. That's a good bit of complexity to put into the optimizer to detect that case.

rohany · 2020-01-15T15:38:47Z

Is it not wrong for one node (on A1 B2) to have different behavior than another node on (A2 B1)?

Also, my understanding of the schema change system is that multiple stages are there to ensure that everything works when different versions of the same table exist in the cluster, not different versions of different tables.

ajwerner · 2020-01-15T15:41:14Z

Is it not wrong for one node (on A1 B2) to have different behavior than another node on (A2 B1)?

Not obviously, no. It seems wrong for there to be more than 2 behaviors total but if A1,B2 is like A2,B2 and A2,B1 is like A1,B1 that seems fine to me.

Also, my understanding of the schema change system is that multiple stages are there to ensure that everything works when different versions of the same table exist in the cluster, not different versions of different tables.

That seems correct to me. Perhaps we need to add a dependency mechanism. When a table refers to another it should do so at a minimum version or something. Then we could make it illegal to use mixed versions which are incompatible.

rohany · 2020-01-15T15:54:18Z

Oh that could work! Though I'm not sure how we could migrate all the existing descriptors to do that.

To clarify what I meant above "I think for operations that modify the descriptors we don't have to worry about this, so we just have to look at operations that just read the descriptors." -- the operations that modify descriptors don't have to worry about the mixed version cases, because they will always read (A2, B2) right?

ajwerner · 2020-01-15T16:01:47Z

the operations that modify descriptors don't have to worry about the mixed version cases, because they will always read (A2, B2) right?

At least in the scope of execution of a transaction I believe that's roughly correct. A statement that modifies descriptors will have references to both the old versions and the uncommitted copies. For the rest of the transaction the uncommitted versions will get used. They are stored in the connExecutor in the extraTxnState in the TableCollection.uncommittedVersions.
All this being said, it's critical that that transaction not change the physical layout of any table during the transaction. We'd like to get there eventually but that's a whole different project.

Though I'm not sure how we could migrate all the existing descriptors to do that.

Does it matter? Certainly it matters for the upcoming primary key change work but does it matter prior to that work? I'm not clear on whether we've identified a hazardous case that exists today.

rohany · 2020-01-15T16:08:33Z

All this being said, it's critical that that transaction not change the physical layout of any table during the transaction.

Can you clarify this?

Does it matter? Certainly it matters for the upcoming primary key change work but does it matter prior to that work? I'm not clear on whether we've identified a hazardous case that exists today.

I'll look around more and see if I can find something, but as of now, no.

ajwerner · 2020-01-15T16:11:23Z

All this being said, it's critical that that transaction not change the physical layout of any table during the transaction.

Can you clarify this?

Say we add a column to a table in a transaction. We cannot write to that column during that transaction. If we did then other concurrent transactions wouldn't know how to interpret data in the table. See #43057 / #42061 (comment) for that issue.

rohany · 2020-01-15T17:01:21Z

FWIW, the "issue" that I'm for-seeing with primary key changes on interleaved tables exists right now when dropping an interleaved index. In this case, A1 has the interleaved child in InterleavedBy while A2 doesn't, while B1 has index descriptor A1 points to, and B2 doesn't. In the code right now there is a possibility in the (A1, B2) state for some errors to get raised because the operations on A1 are looking for an index on B2 that they cannot find. We probably rarely see errors like this because the index would have to progress through all the schema change states (B2 in this case would be more like B6) before A2 was received by a node.

ajwerner · 2020-01-15T17:02:46Z

Sounds like you found a bug.

rohany · 2020-01-15T17:09:04Z

Sounds like you found a bug.

Actually, I'm not sure. It seems like the only uses are in places which modify the table descriptor, so they wouldn't enter this mixed version state in the first place.

rohany · 2020-01-15T18:04:19Z

@ajwerner and I discussed this offline. The summary of the discussion was:

This isn't really a problem right now, but could definitely become a problem in the future. For example, consider a chain of FK references where A -> B -> C. The optimizer could use this information to eliminate some checks, but if one of these tables was in this "mixed version" state where one of the references actually isn't correct, then this could lead to an incorrect result.
The above problem affects the planning of DML statements where the descriptors are not available in a transactional manner.
A possible solution is for each tableID reference on a table descriptor, a minimum version of the referenced table is also required. When the TableCollection is loading tables, it can see whether certain tables should be refreshed from because the versions they reference are too low. This solution however would require a complicated migration process.

Please update this if I missed anything or remembered incorrectly.

rohany added the A-schema-changes label Jan 15, 2020

knz added the C-investigation Further steps needed to qualify. C-label will change. label Jan 15, 2020

This was referenced Jan 15, 2020

Support primary key changes on interleaved tables #43759

Closed

*: long-running migrations #39182

Closed

mwang1026 added this to Triage in SQL Foundations via automation Jan 27, 2020

jordanlewis changed the title ~~potential inconsistency after certain schema changes~~ sql: potential inconsistency after certain schema changes Mar 18, 2020

thoszhang moved this from Needs Triage (4/28/2020) to Backlog in SQL Foundations Apr 30, 2020

ajstorm mentioned this issue Apr 28, 2021

sql: Repartition tables before dropping regions #64273

Merged

jlinder added the T-sql-schema-deprecated Use T-sql-foundations instead label Jun 16, 2021

postamar moved this from Backlog to Tech Debt That Nobody Outside Of The Team Cares About in SQL Foundations Nov 10, 2022

exalate-issue-sync bot added T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) and removed T-sql-schema-deprecated Use T-sql-foundations instead labels May 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: potential inconsistency after certain schema changes #44007

sql: potential inconsistency after certain schema changes #44007

rohany commented Jan 15, 2020 •

edited by cockroach-jira-scripts

rohany commented Jan 15, 2020

ajwerner commented Jan 15, 2020 •

edited

rohany commented Jan 15, 2020 •

edited

ajwerner commented Jan 15, 2020

ajwerner commented Jan 15, 2020 •

edited

rohany commented Jan 15, 2020

ajwerner commented Jan 15, 2020

rohany commented Jan 15, 2020 •

edited

ajwerner commented Jan 15, 2020

rohany commented Jan 15, 2020 •

edited

ajwerner commented Jan 15, 2020

rohany commented Jan 15, 2020 •

edited

ajwerner commented Jan 15, 2020

rohany commented Jan 15, 2020

rohany commented Jan 15, 2020 •

edited

sql: potential inconsistency after certain schema changes #44007

sql: potential inconsistency after certain schema changes #44007

Comments

rohany commented Jan 15, 2020 • edited by cockroach-jira-scripts

rohany commented Jan 15, 2020

ajwerner commented Jan 15, 2020 • edited

rohany commented Jan 15, 2020 • edited

ajwerner commented Jan 15, 2020

ajwerner commented Jan 15, 2020 • edited

rohany commented Jan 15, 2020

ajwerner commented Jan 15, 2020

rohany commented Jan 15, 2020 • edited

ajwerner commented Jan 15, 2020

rohany commented Jan 15, 2020 • edited

ajwerner commented Jan 15, 2020

rohany commented Jan 15, 2020 • edited

ajwerner commented Jan 15, 2020

rohany commented Jan 15, 2020

rohany commented Jan 15, 2020 • edited

rohany commented Jan 15, 2020 •

edited by cockroach-jira-scripts

ajwerner commented Jan 15, 2020 •

edited

rohany commented Jan 15, 2020 •

edited

ajwerner commented Jan 15, 2020 •

edited

rohany commented Jan 15, 2020 •

edited

rohany commented Jan 15, 2020 •

edited

rohany commented Jan 15, 2020 •

edited

rohany commented Jan 15, 2020 •

edited