-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
streamingccl: mark cutback retention jobs as successful #123934
Conversation
Previously we started creating a stream producer job in the destination cluster when we completed replication cutover, to preserve the history as of that cutover time in case the another cluster would subsequently want to start replicating as of that time, e.g. reversing the direction of replication, or in case the promoted cluster would want to revert to the cutover time as part of a demotion back to a standby. However, this placeholder job is, by design, never actually used by replication -- it exists only to keep the option open for some other replication job to be started -- and thus is never heartbeated or marked as no longer needed due to successful completion of replication, causing it to be marked as FAILED when it expires. This changes the initial status so that it is created already indicating that replication succeeded. Thus when it expires, it is marked as successful instead of failed, avoiding the spurious 'failures' that one observes in the job system surfaces. Release note (enterprise change): History Retention jobs created at the completion of cluster replication no longer erroneously indicate they failed when the expire. Epic: none.
This comment was marked as off-topic.
This comment was marked as off-topic.
TFTR! bors r+ |
blathers backport 24.1 |
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. error creating backport branch refs/heads/blathers/backport-release-24.1-123934: POST https://api.github.com/repos/cockroachdb/cockroach/git/refs: 422 Reference already exists [] Backport to branch 24.1 failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
As of cockroachdb#123934, the producer job succeeds instead of fails. This patch teaches some test infra about this. Fixes cockroachdb#124139 Fixes cockroachdb#124138 Fixes cockroachdb#124151 Fixes cockroachdb#124137 Release note: none
As of cockroachdb#123934, the producer job succeeds instead of fails. This patch teaches some test infra about this. Fixes cockroachdb#124139 Fixes cockroachdb#124138 Fixes cockroachdb#124151 Fixes cockroachdb#124137 Release note: none
124162: streamingccl: deflake a few tests r=msbutler a=msbutler As of #123934, the producer job succeeds instead of fails. This patch teaches some test infra about this. Fixes #124139 Fixes #124138 Fixes #124151 Fixes #124137 Release note: none Co-authored-by: Michael Butler <butler@cockroachlabs.com>
Previously we started creating a stream producer job in the destination cluster when we completed replication cutover, to preserve the history as of that cutover time in case the another cluster would subsequently want to start replicating as of that time, e.g. reversing the direction of replication, or in case the promoted cluster would want to revert to the cutover time as part of a demotion back to a standby.
However, this placeholder job is, by design, never actually used by replication -- it exists only to keep the option open for some other replication job to be started -- and thus is never heartbeated or marked as no longer needed due to successful completion of replication, causing it to be marked as FAILED when it expires.
This changes the initial status so that it is created already indicating that replication succeeded. Thus when it expires, it is marked as successful instead of failed, avoiding the spurious 'failures' that one observes in the job system surfaces.
Release note (enterprise change): History Retention jobs created at the completion of cluster replication no longer erroneously indicate they failed when the expire.
Epic: none.