Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

print replication levels in coordinator segment logs #12511

Merged
merged 3 commits into from
May 17, 2022

Conversation

clintropolis
Copy link
Member

Description

Improves coordinator load rule logging to include current replication levels, and adds missing segment id and tier information from some of the log messages. I've seen some strange behavior where segments become very over-replicated, particularly when multiple tiers are involved, but sometimes even when not, and am hoping these logging improvements can help track down the issue.

These logs should will print replication for all tiers, as well as consistently provide segment id and tier information:

2022-05-11T10:26:26,009 WARN [Coordinator-Exec--0] org.apache.druid.server.coordinator.rules.LoadRule - No available [_default_tier] servers or node capacity to assign segment [wikipedia_hour_2016-06-27T00:00:00.000Z_2016-06-27T01:00:00.000Z_2022-02-18T23:27:41.739Z]! Current replication: [[_default_tier:1/2]]
2022-05-11T10:31:26,123 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.rules.LoadRule - Assigning 'replica' for segment [nested_test2_-146136543-09-08T08:23:32.096Z_146140482-04-24T15:36:27.903Z_2022-04-06T00:07:18.795Z] to server [localhost:8084] in tier [_default_tier]. Current replication: [[_default_tier:1/2]]

Also fixes a confusing log message about skipping drops which prior to this PR would always print if a segment was under-replicated on any tier, even if there was no tier to drop it from. Now this log message should only print when there is actually something to drop, as well as includes segmentId and from what tier the drop was skipped.

Finally, I've added currently served segment count to the EmitClusterStatsAndMetrics server details:

2022-05-11T10:26:06,056 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.duty.EmitClusterStatsAndMetrics - Server[localhost:8083, historical, _default_tier] has 0 left to load, 0 left to drop, 28 served, 0 bytes queued, 370,550,415 bytes served.

@clintropolis clintropolis merged commit b23ddc5 into apache:master May 17, 2022
@clintropolis clintropolis deleted the coordinator-log-stuff branch May 17, 2022 09:24
@abhishekagarwal87 abhishekagarwal87 added this to the 24.0.0 milestone Aug 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants