-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crash when deleting a row on a table with a unique index and similar PK #54629
Comments
Thanks for the report and for the reproduction steps. We'll look into this. |
Note that if you create the |
Also, if you drop the unique index, then drop the hidden
I wonder if the issue is that |
I looked into this, and I believe that @dpkirchner's hypothesis is correct - the hidden However, I'm not sure where the problem (or the fix) lies - the comments on I decided to try a different approach of updating the deleter itself to ignore |
This sounds like the problem. ExtraColumnID should never be set for the primary index. In particular, in this case the optimizer determines that it only needs to produce a value for column |
The optimizer can't tell (through the catalog) whether the primary index has a column in Perhaps the right solution here is to always have empty |
Moving over to SQL Experience because of their experience with primary key changes and we've reached the end of what the execution team can do. |
to fix Crash when deleting a row on a table with a unique index and similar PK cockroachdb#54629
to fix Crash when deleting a row on a table with a unique index and similar PK cockroachdb#54629
@lucy-zhang this feels related to some of the things we were talking about recently related to special properties of primary indexes. |
Summarizing some discussion today: This seems to be a problem with the secondary index on
Looking at the descriptors using
it's apparent that the new primary index (ID 3) on Note that in the implementation of primary key changes, we always generate a fresh primary index descriptor, which never has @yuzefovich or anyone else, do you have any idea why we might be handling hidden columns incorrectly in the deleter specifically for secondary indexes? As for the primary key change itself, I think the right thing to do for |
Thinking about this again, we probably shouldn't do this, since it'll break existing queries. So |
One more observation: I put
in a logic test, and found that it sometimes passes without crashing. It's only when I add |
From a UX perspective this makes sense to me. I suppose it's possible that the user could have queries which depend on rowid but it feels unlikely to me. We should also consider the fact that other objects could have dependencies on the rowid column. For instance there's nothing stopping a user from creating a secondary index which includes rowid in the key. Again, unlikely, but possible. |
I think that the "alter primary key" operation does something wrong in this edge case because the comment on |
I'm not seeing how this is true. When I tried setting up the conditions to reproduce the crash, with the following:
I see an empty Were you seeing something different? |
Hm, I don't remember - I was just looking at my old comment
It's quite possible that I wrote up my observations incorrectly. |
@yuzefovich would you or anyone else on the Execution team be able to take a look at what's going on with the deleter for secondary indexes, specifically, then? From my end, I don't see anything invalid about the primary index descriptor after the PK change, and while the behavior of a non-PK |
Thinking about this again, I'm no longer entirely convinced that it's a good idea to just drop One argument in favor of dropping This, by the way, is also broken and crashes for the same reason:
|
I continued experimenting with this. It turns out that manually creating an index on
It doesn't even have to be a unique index, though that works too. I don't know if this actually "fixes" the crash, or if this is some sort of coincidental alignment of slice elements. But this all seems to suggest that the problem isn't with hidden columns, per se, but something to do with the other indexes. One special thing that's going on here is that when we perform a PK swap and the prior PK was cockroach/pkg/sql/alter_primary_key.go Lines 343 to 350 in 862edb7
|
Here is a simple diff that appears to fix the problem too:
Overall, it seems very suspicious to me that the exec factory modifies |
Also, for what it's worth, this doesn't seem have anything to do with
|
I am now pretty confident that #58153 is the right fix. |
Thanks Yahor. Do you mind briefly filling me in on what the bug was? We have an impending rewrite of primary key changes and I want to make sure we keep doing the right thing (even though it seems like the descriptors are presently fine as they are). |
I believe the bug is as follows: Unfortunately, I don't think we have a person who really knows how mutations work (I guess I might be the person with most knowledge, but my knowledge is very limited :/ ) and who is familiar with the primary key change, but I don't see anything wrong with the current approach. |
Thanks for the explanation. At this point, I'm still not 100% convinced that it's safe to have non-PK columns in
This is the table schema we get:
Here, This seems like an unpleasant case to have to account for, and it'd be better if we could somehow rewrite the metadata to avoid non-PK In any case, this problem is probably beyond the scope of this particular issue. I'll investigate more and open a separate issue if necessary. |
Talked to @jordanlewis about this offline and I'm persuaded now that we don't have a real problem here. The impact of the |
Describe the problem
Deleting a row in one of our tables causes the node to instantly crash and the row to remain. I believe this is the same issue as in #47541 although I can't prove it.
This is likely a rare circumstance due to the poor indexing (see below).
To Reproduce
docker-compose.yaml
: https://gist.github.com/dpkirchner/1ab4d1e0450a2d381c762837711cbb5fdocker exec -it crdb-1 cockroach sql --insecure -e 'create database test'
docker exec -it crdb-1 cockroach sql --insecure -d test
DELETE FROM foo WHERE ref1_name = 'a' AND ref2_id = 1;
ERROR: driver: bad connection
)Expected behavior
I expected the row to be deleted and for the node not to crash.
Additional data / screenshots
Log of the full panic for Google-bait:
I'll send the logs to support@ after getting the issue id but I can also share them with whoever wants 'em, nothing proprietary here. The above repro works 100% of the time.
Environment:
cockroach sql
, JDBC, ...]:cockroach sql
Additional context
What was the impact?
The node crashes causing query failures throughout the cluster.
Add any other context about the problem here.
Deleting either the PK or the unique index resolves the issue.
The text was updated successfully, but these errors were encountered: