-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cdc: Treat node draining errors as retryable. #49743
Conversation
5387253
to
fc4a300
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 5 of 5 files at r1.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @miretskiy and @pbardea)
pkg/ccl/changefeedccl/changefeed_stmt.go, line 583 at r1 (raw file):
// Instead, we want to make sure that the changefeed job is not marked failed // due to a transient, retryable error. err = jobs.NewRetryJobError(fmt.Sprintf("retryable flow error: %+v", err))
Is there a way to use https://godoc.org/github.com/cockroachdb/errors#CombineErrors to retain the structured error?
pkg/ccl/changefeedccl/changefeed_test.go, line 2866 at r1 (raw file):
// Even though we disabled merges via the store testing knob, we must also // disable the setting in order for manual splits to be allowed. sqlDB.Exec(t, "SET CLUSTER SETTING kv.range_merge.queue_enabled = false")
Is this true? I thought since 19.2 you can do manual splits with the queue enabled. You may need to specify an expiration using a WITH EXPIRATION
clause.
Just a note that I think the first issue linked the PR looks like it's not the right issue. Perhaps you meant #46515? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner, @miretskiy, and @pbardea)
pkg/ccl/changefeedccl/changefeed_stmt.go, line 583 at r1 (raw file):
Previously, ajwerner wrote…
Is there a way to use https://godoc.org/github.com/cockroachdb/errors#CombineErrors to retain the structured error?
Probably... This jobs error api is not great. I think we need to take a pass to change type retry-able job error type to be something other than the string. I think it's better to take this cleanup as a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. Updated.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner and @pbardea)
pkg/ccl/changefeedccl/changefeed_test.go, line 2866 at r1 (raw file):
Previously, ajwerner wrote…
// Even though we disabled merges via the store testing knob, we must also // disable the setting in order for manual splits to be allowed. sqlDB.Exec(t, "SET CLUSTER SETTING kv.range_merge.queue_enabled = false")
Is this true? I thought since 19.2 you can do manual splits with the queue enabled. You may need to specify an expiration using a
WITH EXPIRATION
clause.
I just copied this from another test. Removing it didn't change anything. So, it's gone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner and @pbardea)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r2.
Reviewable status: complete! 2 of 0 LGTMs obtained (waiting on @miretskiy)
pkg/ccl/changefeedccl/changefeed_stmt.go, line 583 at r1 (raw file):
Previously, miretskiy (Yevgeniy Miretskiy) wrote…
Probably... This jobs error api is not great. I think we need to take a pass to change type retry-able job error type to be something other than the string. I think it's better to take this cleanup as a separate PR.
Works for me. Committer's discretion.
11d7475
to
816c314
Compare
Handle flow registration errors due to draining node as retryable. Release notes (reliability): Treat errors due to draining nodes as retryable when starting CDC.
bors r+ |
Build succeeded |
50088: release-20.1: cdc: Treat node draining errors as retryable. r=miretskiy a=miretskiy Backport 1/1 commits from #49743. /cc @cockroachdb/release --- Fixes #46515 Fixes #43771 Handle flow registration errors due to draining node as retryable. Release notes (reliability): Treat errors due to draining nodes as retryable when starting CDC. Co-authored-by: Yevgeniy Miretskiy <yevgeniy@cockroachlabs.com>
Fixes #46515
Fixes #43771
Handle flow registration errors due to draining node as retryable.
Release notes (reliability): Treat errors due to draining nodes
as retryable when starting CDC.