Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spanconfig: mark reconciliation job as idle #76384

Merged
merged 1 commit into from
Feb 10, 2022

Conversation

irfansharif
Copy link
Contributor

Fixes #70538.

We have a forever running background AUTO SPAN CONFIG RECONCILIATION job
on tenant pods. To know when it's safe to wind down pods, we use the
number of currently running jobs as an indicator. Given the job is
forever running, we need an indicator to suggest that despite the job's
presence, it's safe to wind down.

In #74747 we added a thin API to the jobs subsystem to do just that,
with the intent of using it for idle changefeed jobs. We just cargo-cult
that same approach here to mark the reconciliation job as always idle.

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@irfansharif
Copy link
Contributor Author

image

Sanity checking, shows up as jobs.auto_span_config_reconciliation.currently_idle.

Copy link
Collaborator

@arulajmani arulajmani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change looks good to me. Have you considered the interactions marking this thing as "always idle" will have with the pagination story we hope to address soon? Might be worth filing an issue/capturing thoughts on the pagination issue itself about this.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @irfansharif, and @nvanbenschoten)


pkg/spanconfig/spanconfigjob/job.go, line 49 at r1 (raw file):

	// The reconciliation job is a forever running background job. It's always
	// safe to wind the SQL pod down whenever it's running -- something

"-- something"?

Fixes cockroachdb#70538.

We have a forever running background AUTO SPAN CONFIG RECONCILIATION job
on tenant pods. To know when it's safe to wind down pods, we use the
number of currently running jobs as an indicator. Given the job is
forever running, we need an indicator to suggest that despite the job's
presence, it's safe to wind down.

In cockroachdb#74747 we added a thin API to the jobs subsystem to do just that,
with the intent of using it for idle changefeed jobs. We just cargo-cult
that same approach here to mark the reconciliation job as always idle.

Release note: None
Copy link
Contributor Author

@irfansharif irfansharif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I didn't realize there were any implications. Was there something specific you were thinking about?

bors r+

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @arulajmani, and @nvanbenschoten)


pkg/spanconfig/spanconfigjob/job.go, line 49 at r1 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

"-- something"?

woops, Done.

@arulajmani
Copy link
Collaborator

I was thinking about the case where we've got a bunch of IDs from the SQLWatcher to reconcile and the job gets torn down while we're batching through the updates. We could end up in a scenario where zone configuration A was applied before zone configuration B, but zone configuration B makes its way down into KV but zone configuration A does not because the job was shut down. Maybe that's okay given our invariant though.

@craig
Copy link
Contributor

craig bot commented Feb 10, 2022

Build succeeded:

@craig craig bot merged commit 45d1c9e into cockroachdb:master Feb 10, 2022
@irfansharif irfansharif deleted the 220210.spanconfig-job-idle branch February 10, 2022 19:58
Copy link
Contributor Author

@irfansharif irfansharif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, that should be fine -- we're not making any sort of ordering guarantees like that. Besides, if the pod is wound down, I don't think there are any observers left for the "inconsistent" state anyway.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained

@arulajmani
Copy link
Collaborator

KV will still be observing the "inconsistent" state, but you're right, we're not making such ordering guarantees so we should be fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

jobs: provide a metric to determine non-idle running jobs
3 participants