-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sqlbase: change default jobs ttl back to default (25h) #45767
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @andreimatei, and @pbardea)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm on board. Do we have a test that creates a boatload of jobs that update themselves a ton? Might be nice.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @pbardea)
We changed the default TTL on the jobs table to be 10min in reaction to some issues where bisbehaved jobs destabilized a cluster by causing the jobs range to become oversized due to mvcc garbage. However this low TTL breaks incremental backups. Previously this was not a significant issue for most users as BACKUP was typically only run on user tables, but increasingly users are finding they want to backup metadata in their system tables too, plus 20.1 will showcase full-cluster backups that include system tables automatically. This reverts the default TTL override for the jobs table, effectively meaning 20.1-created clusters will inherit the 25h TTL. This also means that if a user changes the default TTL to account for a different BACKUP cadence, they don't need to also update a second TTL for the jobs table. Release note (general change): on new clusters the internal system.jobs table now uses the default zoneconfig and TTL (25h)
We don't -- any job that updated a ton has usually been fixed. I think this is usually a misbehaving job that gets us into trouble and that isn't something we really can test for. We can test an over-full range, but that is independent of jobs, right? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @pbardea)
bors r+ |
Build succeeded |
We changed the default TTL on the jobs table to be 10min in reaction to some
issues where bisbehaved jobs destabilized a cluster by causing the jobs range
to become oversized due to mvcc garbage.
However this low TTL breaks incremental backups. Previously this was not a
significant issue for most users as BACKUP was typically only run on user
tables, but increasingly users are finding they want to backup metadata in
their system tables too, plus 20.1 will showcase full-cluster backups that
include system tables automatically.
This reverts the default TTL override for the jobs table, effectively
meaning 20.1-created clusters will inherit the 25h TTL. This also means
that if a user changes the default TTL to account for a different BACKUP
cadence, they don't need to also update a second TTL for the jobs table.
Release note (general change): on new clusters the internal system.jobs table now uses the default zoneconfig and TTL (25h)