changefeedccl: reset retry backoff on highwater advance#164933
changefeedccl: reset retry backoff on highwater advance#164933trunk-io[bot] merged 2 commits intocockroachdb:masterfrom
Conversation
|
😎 Merged successfully - details. |
d0ea233 to
16754f1
Compare
4c07ade to
df6219b
Compare
| // BeforeCheckpoint, if set, is called at the start of | ||
| // checkpointJobProgress. Returning a non-nil error causes a retryable | ||
| // failure before the checkpoint is persisted. | ||
| BeforeCheckpoint func() error |
There was a problem hiding this comment.
I kinda like the old name more cause it tells you what it does.
There was a problem hiding this comment.
I was somewhat torn too, open to suggestions here. Specifically, let me know what you think about the context of the test I added. There, we're using this knob to count checkpoints (even though the first few times we don't raise an error), and the connection to checkpoints isn't clear from the name.
RaiseRetryableErrorOnCheckpoint is a bit wordy (and is still weird when we don't actually raise an error). The way it is now, though, it's not clear that the error will necessary be retryable, so there is that drawback.
Seems like it should be simple, but it's something of a hard needle to thread. Don't want to overthink it though.
There was a problem hiding this comment.
The knob's main purpose is to raise an error, which is why I having error mentioned. I'll say I like MaybeRaiseRetryableError the most, but I'll let you make the final call.
There was a problem hiding this comment.
Drive by: Both the name and behavior for RaiseRetryableError have always bugged me because it's not at all descriptive where in the pipeline the retryable error is raised and I also don't like that the error you return for this hook isn't the one that actually gets returned to the changefeed (at that point you might as well just have the hook return a bool). I'm in favor of renaming (and I think naming it BeforeCheckpoint is fine), but if possible, I'd like it if we could also refactor the hook so that we just return the error instead of creating a different error that just says the hook name.
There was a problem hiding this comment.
Thanks @andyyang890 ! I like that suggestion, Implemented that
Rename the `RaiseRetryableError` testing knob to `BeforeCheckpoint` to follow the existing `Before*` naming convention (e.g. `BeforeEmitRow`) and to make it clear that the knob fires at the start of `checkpointJobProgress`. The old name described what it did (raise a retryable error) but not where it fired, making tests that relied on this hook being at that time harder to read. Also propagate the hook's returned error directly instead of wrapping it in a new error with the hook name, so tests control the exact error the changefeed sees. Informs: cockroachdb#164695
Previously, the changefeed retry backoff only reset after running error-free for a configurable duration (default 10 minutes via `changefeed.retry_backoff_reset`). Now the retry backoff also resets whenever the changefeed's highwater mark (resolved timestamp) advances between retries. This directly measures forward progress, so transient errors during rolling restarts no longer leave the backoff unnecessarily elevated when the changefeed is otherwise healthy. The time-based reset remains as a fallback. Fixes: cockroachdb#164695 Release note (bug fix): Changefeed retry backoff now resets when the changefeed's resolved timestamp (highwater mark) advances between retries, in addition to the existing time-based reset (changefeed.retry_backoff_reset). This prevents transient rolling restarts from causing changefeeds to fall behind because of excessive backoff.
|
/trunk merge TFTR! |
Put the retry backoff reset on highwater advance (from PR cockroachdb#164933) behind a new cluster setting: changefeed.reset_backoff_on_highwater_advance.enabled The setting is enabled by default on master, preserving the current behavior. This gating allows the change to be safely backported to release branches where the setting can be disabled by default. Informs: cockroachdb#164695 Release note: None
Put the retry backoff reset on highwater advance (from PR cockroachdb#164933) behind a new cluster setting: changefeed.reset_backoff_on_highwater_advance.enabled The setting is enabled by default on master, preserving the current behavior. This gating allows the change to be safely backported to release branches where the setting can be disabled by default. Informs: cockroachdb#164695 Release note: None
Put the retry backoff reset on highwater advance (from PR cockroachdb#164933) behind a new cluster setting: changefeed.reset_backoff_on_highwater_advance.enabled The setting is enabled by default on master, preserving the current behavior. This gating allows the change to be safely backported to release branches where the setting can be disabled by default. Informs: cockroachdb#164695 Release note: None
Put the retry backoff reset on highwater advance (from PR cockroachdb#164933) behind a new cluster setting: changefeed.reset_backoff_on_highwater_advance.enabled The setting is enabled by default on master, preserving the current behavior. This gating allows the change to be safely backported to release branches where the setting can be disabled by default. Informs: cockroachdb#164695 Release note: None
Put the retry backoff reset on highwater advance (from PR cockroachdb#164933) behind a new cluster setting: changefeed.reset_backoff_on_highwater_advance.enabled The setting is enabled by default on master, preserving the current behavior. This gating allows the change to be safely backported to release branches where the setting can be disabled by default. Informs: cockroachdb#164695 Release note: None
Put the retry backoff reset on highwater advance (from PR cockroachdb#164933) behind a new cluster setting: changefeed.reset_backoff_on_highwater_advance.enabled The setting is enabled by default on master, preserving the current behavior. This gating allows the change to be safely backported to release branches where the setting can be disabled by default. Informs: cockroachdb#164695 Release note: None
Put the retry backoff reset on highwater advance (from PR cockroachdb#164933) behind a new cluster setting: changefeed.reset_backoff_on_highwater_advance.enabled The setting is enabled by default on master, preserving the current behavior. This gating allows the change to be safely backported to release branches where the setting can be disabled by default. Informs: cockroachdb#164695 Release note: None
Put the retry backoff reset on highwater advance (from PR cockroachdb#164933) behind a new cluster setting: changefeed.reset_backoff_on_highwater_advance.enabled The setting is enabled by default on master, preserving the current behavior. This gating allows the change to be safely backported to release branches where the setting can be disabled by default. Informs: cockroachdb#164695 Release note: None
Put the retry backoff reset on highwater advance (from PR cockroachdb#164933) behind a new cluster setting: changefeed.reset_backoff_on_highwater_advance.enabled The setting is enabled by default on master, preserving the current behavior. This gating allows the change to be safely backported to release branches where the setting can be disabled by default. Informs: cockroachdb#164695 Release note: None
Put the retry backoff reset on highwater advance (from PR cockroachdb#164933) behind a new cluster setting: changefeed.reset_backoff_on_highwater_advance.enabled The setting is enabled by default on master, preserving the current behavior. This gating allows the change to be safely backported to release branches where the setting can be disabled by default. Informs: cockroachdb#164695 Release note: None
Put the retry backoff reset on highwater advance (from PR cockroachdb#164933) behind a new cluster setting: changefeed.reset_backoff_on_highwater_advance.enabled The setting is enabled by default on master, preserving the current behavior. This gating allows the change to be safely backported to release branches where the setting can be disabled by default. Informs: cockroachdb#164695 Release note: None
Put the retry backoff reset on highwater advance (from PR cockroachdb#164933) behind a new cluster setting: changefeed.reset_backoff_on_highwater_advance.enabled The setting is enabled by default on master, preserving the current behavior. This gating allows the change to be safely backported to release branches where the setting can be disabled by default. Informs: cockroachdb#164695 Release note: None
Put the retry backoff reset on highwater advance (from PR cockroachdb#164933) behind a new cluster setting: changefeed.reset_backoff_on_highwater_advance.enabled The setting is enabled by default on master, preserving the current behavior. This gating allows the change to be safely backported to release branches where the setting can be disabled by default. Informs: cockroachdb#164695 Release note: None
Put the retry backoff reset on highwater advance (from PR cockroachdb#164933) behind a new cluster setting: changefeed.reset_backoff_on_highwater_advance.enabled The setting is enabled by default on master, preserving the current behavior. This gating allows the change to be safely backported to release branches where the setting can be disabled by default. Informs: cockroachdb#164695 Release note: None
Put the retry backoff reset on highwater advance (from PR cockroachdb#164933) behind a new cluster setting: changefeed.reset_backoff_on_highwater_advance.enabled The setting is enabled by default on master, preserving the current behavior. This gating allows the change to be safely backported to release branches where the setting can be disabled by default. Informs: cockroachdb#164695 Release note: None
Put the retry backoff reset on highwater advance (from PR cockroachdb#164933) behind a new cluster setting: changefeed.reset_backoff_on_highwater_advance.enabled The setting is enabled by default on master, preserving the current behavior. This gating allows the change to be safely backported to release branches where the setting can be disabled by default. Informs: cockroachdb#164695 Release note: None
Put the retry backoff reset on highwater advance (from PR cockroachdb#164933) behind a new cluster setting: changefeed.reset_backoff_on_highwater_advance.enabled The setting is enabled by default on master, preserving the current behavior. This gating allows the change to be safely backported to release branches where the setting can be disabled by default. Informs: cockroachdb#164695 Release note: None
Previously, the changefeed retry backoff only reset after running
error-free for a configurable duration (default 10 minutes via
changefeed.retry_backoff_reset).Now the retry backoff also resets whenever the changefeed's highwater
mark (resolved timestamp) advances between retries. This directly
measures forward progress, so transient errors during rolling restarts
no longer leave the backoff unnecessarily elevated when the changefeed
is otherwise healthy.
The time-based reset remains as a fallback.
Fixes: #164695
Release note (bug fix): Changefeed retry backoff now resets when
the changefeed's resolved timestamp (highwater mark) advances between
retries, in addition to the existing time-based reset
(changefeed.retry_backoff_reset). This prevents transient rolling
restarts from causing changefeeds to fall behind because of
excessive backoff.