jobs: add expiration for paused jobs #84598

amruss · 2022-07-18T17:50:49Z

Paused jobs can be easily forgotten about. Long-paused jobs can cause cluster issues, because of the accumulation of garbage that paused jobs can create. Additionally, supporting long-paused jobs through cluster upgrades is hard.

We should:

Expire paused jobs after 3 days of being paused, add expiration time as a column in jobs table
If a job is unpaused then paused, the expiration counter should reset

Rational behind 3 days: you have enough working-day time to have done something about this

Follow up work:

Make the default pause limit configurable (jobs: add configurability to paused jobs expiration #84600)
Display expiration time in DB Console (jobs: display job expiration time in the DB console #84599)
Add the ability to extend this time jobs: follow up paused jobs work (extend time) #85978

Jira issue: CRDB-17756

Epic CRDB-20422

amruss · 2022-08-15T22:12:26Z

Discussion: 10 days instead of 3

lunevalex · 2022-09-23T21:05:30Z

@amruss dont we recommend people do full backups once a day and incrementals throughout the day? In that case if I am going to do a full backup in less in a day, why would we want to have a busted incremental linger for 3 or even 10 days?

irfansharif · 2022-09-23T21:45:53Z

I find the suggestion of 3d already somewhat surprising given these PTS records are left around on the error path -- an accrual of up to 3d of MVCC garbage for the entire keyspan (full backups for ex.) can be debilitating storage wise, effects on read latencies, etc. We can improve MVCC garbage observability and and admission control for this work, and that's all good, but I still question anything longer than single digit hours here. Certainly there will be users wanting to accomodate an incident response time of >= 3d, which configurability could let them do, I'm just questioning whether it should be made the default. +cc @williamkulju, @mwang1026.

irfansharif · 2022-09-23T21:48:43Z

If we had some way of making PTS records dramatically cheaper WRT them holding up garbage collection, say doing something like #86697, then perhaps a multi-day expiration time would be palatable; the data being kept around is O(live-bytes). But today it's O(live-bytes * time), a compounding we should consider in the defaults we ship with. +cc @nvanbenschoten.

dt · 2022-09-23T22:00:21Z

Expiring a paused job is not intended to be the defense against mvcc garbage accumulation due to PTS records; note that garbage can accumulate for a variety of reasons other than a paused job (i.e. a non-paused job that is running slowly, is stuck, etc, or not even a job at all but rather a backup schedule that hasn't completed yet, etc), so the answer catching and fixing garbage accumulation is better kv obsv tools, like garbage bytes % alerts, displaying garbage amounts on the dbs and tables console page, etc.

Expiring paused jobs after 10 or 3 days is just there to catch something that is forgotten so it doesn't linger indefinitely, i.e. across major versions.

amruss · 2022-11-10T16:19:11Z

Moving this back to triage to discuss w/ cdc team

joshimhoff · 2022-12-01T19:53:39Z

SRE just discussed this. Can we execute on this ticket ~now and include it in a patch update, with a cluster setting disabled by default if needed that we can flip on in CC immediately? The pause job if backup fails behavior implies the need for SRE to do a lot of regular operational work (read: toil) to keep CC backups from holding up PTO and spending lots of disk unnecessarily. Re: my suggestion to fix this ~now, in CC at least, I think landing pause instead of fail in 22.1 without also landing expiration for paused jobs should be considered a regression.

dt · 2022-12-01T19:56:35Z

I'm not exactly sure how a patch release version of this would look, since we don't track when a job was paused right now, so existing, paused jobs don't reflect how long they've been paused. We'll need to start writing a new field at pause time to use for this, and any jobs that predate it will likely just be left alone?

joshimhoff · 2022-12-01T19:57:59Z

There is nothing in the system that at least indirectly keeps track of when a job went from running to paused? Isn't there a hearbeated at column, or something like that?

miretskiy · 2023-02-14T20:12:23Z

I'm not too sure I will wind up adding job expiration; but I will add protected timestamp expiration.

Changefeeds utilize protected timestamp system (PTS) to ensure that the data targeted by changefeed is not garbage collected prematurely. PTS record is managed by running changefeed by periodically updating PTS record timestamp, so that the data older than the that timestamp may be GCed. However, if the changefeed stops running when it is paused (either due to operator action, or due to `on_error=pause` option, the PTS record remains so that the changefeed can be resumed at a later time. However, it is also possible that operator may not notice that the job is paused for too long, thus causing buildup of garbage data. Excessive buildup of GC work is not great since it impacts overall cluster performance, and, once GC can resume, its cost is proportional to how much GC work needs to be done. This PR introduces a new setting `changefeed.protect_timestamp.expire_after` to automatically expire PTS records that are too old. This automatic expiration is a safety mechanism and the setting default is rather concervative. The operator is still expected to monitor changefeed jobs, and to restart paused changefeeds expediently. If the changefeed job remains paused, and the underlying PTS records is removed due to expiration, then the changefeed job will fail when it is restarted. This failure is preferable to unbounded buildup of garbage data in the cluster. Epic: CRDB-21953 This PR does not add expiration to the job itself, as requested by cockroachdb#84598, but it does accomplish the same goal. Release note (enterprise change): Changefeed will automatically expire PTS records after `changefeed.protect_timestamp.expire_after` so that paused changefeed jobs do not cause GC data buildup in the cluster.

Changefeeds utilize protected timestamp system (PTS) to ensure that the data targeted by changefeed is not garbage collected prematurely. PTS record is managed by running changefeed by periodically updating PTS record timestamp, so that the data older than the that timestamp may be GCed. However, if the changefeed stops running when it is paused (either due to operator action, or due to `on_error=pause` option, the PTS record remains so that the changefeed can be resumed at a later time. However, it is also possible that operator may not notice that the job is paused for too long, thus causing buildup of garbage data. Excessive buildup of GC work is not great since it impacts overall cluster performance, and, once GC can resume, its cost is proportional to how much GC work needs to be done. This PR introduces a new changefeed option `gc_protect_expires_after` to automatically expire PTS records that are too old. This automatic expiration is a safety mechanism in case changefeed job gets paused by an operator or due to an error, while holding onto PTS record due to `protect_gc_on_pause` option. The operator is still expected to monitor changefeed jobs, and to restart paused changefeeds expediently. If the changefeed job remains paused, and the underlying PTS records expires, then the changefeed job will be canceled to prevent build up of GC data. Epic: CRDB-21953 This PR does not add expiration to the job itself, as requested by cockroachdb#84598, but it does accomplish the same goal. Release note (enterprise change): Changefeed will automatically expire PTS records for paused jobs if changefeed is configured with `gc_protect_expires_after` option.

Changefeeds utilize protected timestamp system (PTS) to ensure that the data targeted by changefeed is not garbage collected prematurely. PTS record is managed by running changefeed by periodically updating PTS record timestamp, so that the data older than the that timestamp may be GCed. However, if the changefeed stops running when it is paused (either due to operator action, or due to `on_error=pause` option, the PTS record remains so that the changefeed can be resumed at a later time. However, it is also possible that operator may not notice that the job is paused for too long, thus causing buildup of garbage data. Excessive buildup of GC work is not great since it impacts overall cluster performance, and, once GC can resume, its cost is proportional to how much GC work needs to be done. This PR introduces a new changefeed option `gc_protect_expires_after` to automatically expire PTS records that are too old. This automatic expiration is a safety mechanism in case changefeed job gets paused by an operator or due to an error, while holding onto PTS record due to `protect_gc_on_pause` option. The operator is still expected to monitor changefeed jobs, and to restart paused changefeeds expediently. If the changefeed job remains paused, and the underlying PTS records expires, then the changefeed job will be canceled to prevent build up of GC data. Epic: CRDB-21953 Informs cockroachdb#84598 Release note (enterprise change): Changefeed will automatically expire PTS records for paused jobs if changefeed is configured with `gc_protect_expires_after` option.

97148: changefeedccl: Expire protected timestamps r=miretskiy a=miretskiy Changefeeds utilize protected timestamp system (PTS) to ensure that the data targeted by changefeed is not garbage collected prematurely. PTS record is managed by running changefeed by periodically updating PTS record timestamp, so that the data older than the that timestamp may be GCed. However, if the changefeed stops running when it is paused (either due to operator action, or due to `on_error=pause` option, the PTS record remains so that the changefeed can be resumed at a later time. However, it is also possible that operator may not notice that the job is paused for too long, thus causing buildup of garbage data. Excessive buildup of GC work is not great since it impacts overall cluster performance, and, once GC can resume, its cost is proportional to how much GC work needs to be done. This PR introduces a new changefeed option `gc_protect_expires_after` to automatically expire PTS records that are too old. This automatic expiration is a safety mechanism in case changefeed job gets paused by an operator or due to an error, while holding onto PTS record due to `protect_gc_on_pause` option. The operator is still expected to monitor changefeed jobs, and to restart paused changefeeds expediently. If the changefeed job remains paused, and the underlying PTS records expires, then the changefeed job will be canceled to prevent build up of GC data. Epic: [CRDB-21953](https://cockroachlabs.atlassian.net/browse/CRDB-21953) Informs #84598 Release note (enterprise change): Changefeed will automatically expire PTS records for paused jobs if changefeed is configured with `gc_protect_expires_after` option. 97539: kvserver: fix deadlock on rebalance obj change r=kvoli a=kvoli Previously, changing the rebalance objective could lead to inconsistent locking order between the load based splitter and rebalance objective. When the objective was updated, the previous method also blocked batch requests from completing until every replica lb splitter was reset. This commit moves the split objective to be a variable owned by the decider, rather than inferred on each decider operation. The split objective is updated on a rebalance objective change atomically over each replica but not atomically over a store. This removes the need for blocking batch requests until every replica is updated. Resolves: #97000 Resolves: #97445 Resolves: #97450 Resolves: #97452 Resolves: #97457 Release note: None Co-authored-by: Yevgeniy Miretskiy <yevgeniy@cockroachlabs.com> Co-authored-by: Austen McClernon <austen@cockroachlabs.com>

miretskiy · 2023-02-24T14:24:03Z

I'm going to close this issue as complete per above PR.

irfansharif · 2023-02-24T16:55:28Z

@miretskiy, @dt: am I correct in reading #97148 only applies to protected timestamps left by changefeeds? But we also have backups that can be paused, and the motivating incidents above happened to due to PTS records left behind when backups were paused. I'm going to re-open.

miretskiy · 2023-02-24T17:15:21Z

Sure; but we will not be working on other job implementations... Perhaps we can move this to DR...

amruss added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Jul 18, 2022

amruss added this to Triage in Jobs via automation Jul 18, 2022

blathers-crl bot added the T-jobs label Jul 18, 2022

This was referenced Jul 18, 2022

jobs: display job expiration time in the DB console #84599

Open

jobs: add configurability to paused jobs expiration #84600

Open

amruss assigned chengxiong-ruan Jul 18, 2022

amruss moved this from Triage to In Progress in Jobs Jul 18, 2022

amruss assigned dt and unassigned chengxiong-ruan Aug 11, 2022

irfansharif mentioned this issue Sep 23, 2022

kvserver: pacing/admission control for mvcc gc #82955

Open

9 tasks

miretskiy mentioned this issue Feb 14, 2023

changefeedccl: Expire protected timestamps #97148

Merged

This was referenced Feb 24, 2023

PR #97148 - jobs: Extend job metric poller to collect PTS stats cockroachdb/docs#16345

Closed

PR #97148 - changefeedccl: Expire protected timestamps cockroachdb/docs#16346

Closed

miretskiy closed this as completed Feb 24, 2023

irfansharif reopened this Feb 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

jobs: add expiration for paused jobs #84598

jobs: add expiration for paused jobs #84598

amruss commented Jul 18, 2022 •

edited by exalate-issue-sync bot

amruss commented Aug 15, 2022

lunevalex commented Sep 23, 2022

irfansharif commented Sep 23, 2022 •

edited

irfansharif commented Sep 23, 2022

dt commented Sep 23, 2022 •

edited

amruss commented Nov 10, 2022

joshimhoff commented Dec 1, 2022 •

edited

dt commented Dec 1, 2022

joshimhoff commented Dec 1, 2022 •

edited

miretskiy commented Feb 14, 2023

miretskiy commented Feb 24, 2023

irfansharif commented Feb 24, 2023

miretskiy commented Feb 24, 2023

jobs: add expiration for paused jobs #84598

jobs: add expiration for paused jobs #84598

Comments

amruss commented Jul 18, 2022 • edited by exalate-issue-sync bot

amruss commented Aug 15, 2022

lunevalex commented Sep 23, 2022

irfansharif commented Sep 23, 2022 • edited

irfansharif commented Sep 23, 2022

dt commented Sep 23, 2022 • edited

amruss commented Nov 10, 2022

joshimhoff commented Dec 1, 2022 • edited

dt commented Dec 1, 2022

joshimhoff commented Dec 1, 2022 • edited

miretskiy commented Feb 14, 2023

miretskiy commented Feb 24, 2023

irfansharif commented Feb 24, 2023

miretskiy commented Feb 24, 2023

amruss commented Jul 18, 2022 •

edited by exalate-issue-sync bot

irfansharif commented Sep 23, 2022 •

edited

dt commented Sep 23, 2022 •

edited

joshimhoff commented Dec 1, 2022 •

edited

joshimhoff commented Dec 1, 2022 •

edited