-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
backupccl: add timeout to ExportRequest sent by a backup #63383
Conversation
Just wanted to get an idea if this is the approach we want to go with for adding timeouts. While exploring this space I realized there is some work that can be done to retry in the face of non-intent errors (but would we do anything more than what dist sender already does?), and to populate the job progress with the intents we hit when retrying for better visibility, but both those seemed orthogonal to this work. |
49298cc
to
55106e5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wanted to follow up that this approach looks reasonable to me
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru and @dt)
pkg/ccl/backupccl/backup_processor.go, line 63 at r1 (raw file):
timeoutPerAttmpt = settings.RegisterDurationSetting( "bulkio.backup.read_timeout", "amount of time after which a read attempt is considered timed out and is canceled",
Probably good to make it more explicit that this timeout will fail the backup job.
55106e5
to
76054de
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dt and @pbardea)
pkg/ccl/backupccl/backup_processor.go, line 63 at r1 (raw file):
Previously, pbardea (Paul Bardea) wrote…
Probably good to make it more explicit that this timeout will fail the backup job.
Done.
Previously, there was no timeout around the ExportRequest that was sent as part of a backup. If the request returned certain kinds of unambiguous errors, dist sender would keep retrying the request thereby causing the backup to hang. This change adds an upper bound on the time it should take the sender to receive a response, thereby erroring out instead of mysteriously hanging. The upper bound is configurable via a cluster settings and is set to 5 mins by default, which is very conservative as these requests should be very fast under normal circumstances. Release note: None
76054de
to
ae48e20
Compare
TFTR! bors r=pbardea |
Build succeeded: |
Previously, there was no timeout around the ExportRequest that
was sent as part of a backup. If the request returned certain
kinds of unambiguous errors, dist sender would keep retrying the request
thereby causing the backup to hang.
This change adds an upper bound on the time it should take the sender
to receive a response, thereby erroring out instead of mysteriously
hanging. The upper bound is configurable via a cluster settings and is
set to 5 mins by default, which is very conservative as these requests
should be very fast under normal circumstances.
Release note: None