Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: fix hot ranges logging scheduling interval change #111305

Merged
merged 1 commit into from Sep 27, 2023

Conversation

xinhaoz
Copy link
Member

@xinhaoz xinhaoz commented Sep 26, 2023

Hot ranges telemetry events are on a logging schedule controlled by the cluster setting TelemetryHotRangesStatsInterval. A ticker init'd with this interval duration is used to control the logging cycles.

After the ticker is initialized, callback fn is registered at the scheduler startup to reset the ticker on any interval setting changes. The scheduler start is part of the async tenant server startup process which means we could run into a scenario where the interval is changed before we register the callback to reset the ticker. If this occurs the ticker will never get updated with the new duration.

More concretely, the following can happen:

  1. Hot ranges scheduler starts
  2. Ticker is initialized with TelemetryHotRangesStatsInterval val
  3. TelemetryHotRangesStatsInterval setting is changed
  4. Callback fn registered to watch for TelemetryHotRangesStatsInterval changes and reset ticker
  5. Scheduler waits for ticker notifications
  6. Ticker will never update with the new duration

To fix this we ensure we register the callback fn before the ticker is started. The fn is changed to send a chan notification. This ensures that the ticker will always be updated with the most recent interval value.

Release note (bug fix): Fixes a potential bug where changing the setting
server.telemetry.hot_ranges_stats.interval directly after startup is
a no-op.
Fixes: #111104

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@xinhaoz xinhaoz force-pushed the fix-hotranges-interval-changes branch 3 times, most recently from 7be912c to 7d9c5fd Compare September 26, 2023 20:38
@xinhaoz xinhaoz marked this pull request as ready for review September 26, 2023 20:55
@xinhaoz xinhaoz requested review from a team as code owners September 26, 2023 20:55
@xinhaoz xinhaoz requested review from ericharmeling and removed request for a team September 26, 2023 20:55
Copy link
Contributor

@j82w j82w left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ericharmeling and @xinhaoz)


-- commits line 30 at r1:
What is the impact if a users hit this?

@xinhaoz
Copy link
Member Author

xinhaoz commented Sep 27, 2023

-- commits line 30 at r1:

Previously, j82w (Jake) wrote…

What is the impact if a users hit this?

That they will never see the value they changed the interval to go into effect. In practice though I believe this will rarely occur. It's just in the test we immediately override the setting after server startup. I can change the release note to a bug fix.

Hot ranges telemetry events are on a logging schedule controlled by the
cluster setting `TelemetryHotRangesStatsInterval`. A ticker init'd with
this interval duration is used to control the logging cycles.

After the ticker is initialized, callback fn is registered at the
scheduler startup to reset the ticker on any interval setting changes.
The scheduler start is part of the async tenant server startup process
which means we could run into a scenario where the interval is
changed before we register the callback to reset the ticker. If this
occurs the ticker will never get updated with the new duration.

More concretely, the following can happen:

1. Hot ranges scheduler starts
2. Ticker is initialized with `TelemetryHotRangesStatsInterval` val
3. `TelemetryHotRangesStatsInterval` setting is changed
4. Callback fn registered to watch for `TelemetryHotRangesStatsInterval`
   changes and reset ticker
5. Scheduler waits for ticker notifications
6. Ticker will never update with the new duration

To fix this we ensure we register the callback fn before the ticker
is started. The fn is changed to send a chan notification.
This ensures that the ticker will always be updated with the most recent
interval value.

Release note (bug fix): Fixes a potential bug where changing the setting
`server.telemetry.hot_ranges_stats.interval` directly after startup is
a no-op.

Fixes: cockroachdb#111104
@xinhaoz xinhaoz force-pushed the fix-hotranges-interval-changes branch from 7d9c5fd to 6377707 Compare September 27, 2023 16:47
@xinhaoz xinhaoz added the backport-23.1.x Flags PRs that need to be backported to 23.1 label Sep 27, 2023
@xinhaoz xinhaoz requested review from j82w and a team September 27, 2023 16:47
Copy link
Contributor

@zachlite zachlite left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 3 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ericharmeling and @j82w)

Copy link
Contributor

@j82w j82w left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @ericharmeling)

@xinhaoz
Copy link
Member Author

xinhaoz commented Sep 27, 2023

bors r+

@craig
Copy link
Contributor

craig bot commented Sep 27, 2023

Build succeeded:

@craig craig bot merged commit 4a9f4b6 into cockroachdb:master Sep 27, 2023
8 checks passed
@xinhaoz xinhaoz deleted the fix-hotranges-interval-changes branch April 1, 2024 17:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-23.1.x Flags PRs that need to be backported to 23.1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pkg/server/structlogging/structlogging_test: TestHotRangesStats failed
4 participants