-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql/catalog/lease,descs: do not cross validate when waiting for leases #86690
sql/catalog/lease,descs: do not cross validate when waiting for leases #86690
Conversation
cb17ec8
to
660a17a
Compare
// Use the lower-level GetCrossReferencedDescriptorsForValidation to avoid | ||
// performing cross-descriptor validation while waiting for leases to | ||
// drain. On the one hand, it's somewhat expensive. More importantly, | ||
// there are valid cases where descriptors can be removed or made invalid, | ||
// and we don't particularly care about them when using this method. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can we give a more specific example in the comment of "valid cases where descriptors can be removed or made invalid" so that the reader knows better why we don't want to do cross-descriptor validation when waiting for one lease.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added an example.
var version descpb.DescriptorVersion | ||
if desc != nil { | ||
version = desc.GetVersion() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good that we now check on err
!
But is it possible that desc
and err
are both nil
, in which case log.Infof
risks "nil pointer dereference" on desc.GetVersion()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added commentary to indicate that WaitForOneVersion
won't return nil, nil
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this. I've left two nits here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 4 files at r1, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner)
@@ -0,0 +1,10 @@ | |||
exec db=defaultdb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the purpose of this test? In other words, how is it going to test the change that we no longer do cross-descriptor validation when "waitForOneVersion"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TLDR: it will fail with a descriptor not found
error without changes in this PR.
It's because we call cleanupTempSystemTables()
here in a txn, where we run DDLs using an internal executor. Prior to this PR, if I migrate that to using descs.TxnWithExector()
(which is the correct interface to run DDLs with internal executor), it will give a descriptor not found
error. This is because, in descs.TxnWithExecutor()
, when running WaitForOneVersion
, we weren’t waiting for the true descriptor id, rather we were waiting for something which referenced that id, and it was deleted.
See more discussion here (internal link)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a welcome change.
Should #87067 reach a suitable state soon, I suggest rebasing this change onto it, the call to |
This change removes descriptor cross-validation when waiting for one version. It also means that an `ErrDescriptorNotFound` now definitively means that the descriptor in question does not exist. Lastly, it extends the logic in the descs.Txn loop to fall back to just waiting for leases to drain if a descriptor is not found. This will be used as part of cockroachdb#86340. Release justification: Needed for bug fixes. Release note: None
660a17a
to
d106bd6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
// Use the lower-level GetCrossReferencedDescriptorsForValidation to avoid | ||
// performing cross-descriptor validation while waiting for leases to | ||
// drain. On the one hand, it's somewhat expensive. More importantly, | ||
// there are valid cases where descriptors can be removed or made invalid, | ||
// and we don't particularly care about them when using this method. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added an example.
var version descpb.DescriptorVersion | ||
if desc != nil { | ||
version = desc.GetVersion() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added commentary to indicate that WaitForOneVersion
won't return nil, nil
bors r+ |
Build failed: |
flake seems unrelated bors r+ |
Build succeeded: |
80474: gcp: make per-chunk retry upload timeout configurable r=dt a=adityamaru This change adds a cluster setting `cloudstorage.gs.chunking.retry_timeout` that can be used to change the default per-chunk retry timeout that GCS imposes when chunking of file upload is enabled. The default value is set to 60 seconds, which is double of the default google sdk value of 30s. This change was motivated by sporadic occurrences of a 503 service unavailable error during backups. On its own this change is not expected to solve the resiliency issues of backup when the upload service is unavailable, but it is nice to have configurable setting nonetheless. Release note (sql change): `cloudstorage.gs.chunking.retry_timeout` is a cluster setting that can be used to configure the per-chunk retry timeout of files to Google Cloud Storage. The default value is 60 seconds. 86696: sql/gcjob: make index GC robust to descriptors being deleted r=ajwerner a=ajwerner First commit is #86690 If the descriptor was deleted, the GC job should exit gracefully. Fixes #86340 Release justification: bug fix for backport Release note (bug fix): In some scenarios, when a DROP INDEX was run around the same time as a DROP TABLE or DROP DATABASE covering the same data, the `DROP INDEX` gc job could get caught retrying indefinitely. This has been fixed. 87378: builtins: stream consistency checker output r=pavelkalinnikov a=tbg Also makes it resilient to per-Range errors, which now no longer tank the entire operation. ```sql -- avoid buffering in cli \set display_format=csv; -- avoid rows getting buffered at server set avoid_buffering=true; -- compute as fast as possible SET CLUSTER SETTING server.consistency_check.max_rate = '1tb'; SELECT * FROM crdb_internal.check_consistency(false, '', ''); ``` Release justification: improvement for a debugging-related feature Release note: None 87657: Makefile: always use `cockroach-short` for file generation r=knz a=rickystewart This defaults to the full `cockroach` executable which requires pulling in all the UI stuff. Use `cockroach-short` to make generation require fewer dependencies. Release note: None 87701: testccl/workload/schemachange: skip random schema test r=ajwerner a=ajwerner This is very flakey. Some of it is due to #87672. Some of it was due to #85677. There are some issues with inserts which need to be fixed. Until this stabilizes, it's causing problems. Along the way, I'm marking #78478 as a GA blocker so we do actually fix it. Release note: None Co-authored-by: Aditya Maru <adityamaru@gmail.com> Co-authored-by: Andrew Werner <awerner32@gmail.com> Co-authored-by: Tobias Grieger <tobias.b.grieger@gmail.com> Co-authored-by: Ricky Stewart <ricky@cockroachlabs.com>
This change removes descriptor cross-validation when waiting for one version.
It also means that an
ErrDescriptorNotFound
now definitively means that thedescriptor in question does not exist. Lastly, it extends the logic in the
descs.Txn loop to fall back to just waiting for leases to drain if a descriptor
is not found. This will be used as part of #86340.
Release justification: Needed for bug fixes.
Release note: None