-
Notifications
You must be signed in to change notification settings - Fork 13.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MINOR: Adding KRaft Monitoring Related Metrics to docs/ops.html #12679
Conversation
3f48418
to
5bed465
Compare
docs/ops.html
Outdated
<tr> | ||
<td>Current State</td> | ||
<td>The current state of this member; possible values are leader, candidate, voted, follower, unattached.</td> | ||
<td>kafka.raft:type=raft-metrics,name=Current-state</td> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if the type etc is correct here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please correct the mbean name:
kafka.server:type=raft-metrics
And below. Thanks.
docs/ops.html
Outdated
</tr> | ||
<tr> | ||
<td>Active Controller Count</td> | ||
<td></td> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None of the metrics available in QuorumControllerMetrics
have a description attached to them. I can write up descriptions for these, but want to make sure that they match up with any literature that we already have. I could not see mentions of these in the KIPs I looked at either. What would be the best course of action here?
<td>Metadata Apply Error Count</td> | ||
<td>The number of errors encountered by the BrokerMetadataPublisher while applying a new MetadataImage based on the latest MetadataDelta.</td> | ||
<td>kafka.server:type=broker-metadata-metrics,name=metadata-apply-error-count</td> | ||
</tr> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming this is the complete set of metrics, based on the files I went through. Please point out any files/KIPs that I should look at to add more metrics here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@niket-goel , thanks for the PR to add the missing KRaft metrics. I found there are still some metrics missing in this PR. I suggested you run a KRaft kafka in local to get all the metrics it has, then map back to the code/KIP to get the description.
docs/ops.html
Outdated
@@ -1815,6 +1815,162 @@ <h4 class="anchor-heading"><a id="remote_jmx" class="anchor-link"></a><a href="# | |||
</tr> | |||
</tbody></table> | |||
|
|||
<h5 class="anchor-heading"><a id="kraft_monitoring" class="anchor-link"></a><a href="#kraft_monitoring">KRaft Metrics</a></h5> | |||
All of the following metrics allow monitoring of the KRaft quourm and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and [what?]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm.! how did that get deleted. Fixing it. Thanks for catching.
docs/ops.html
Outdated
<td>The time in milliseconds to elect a new leader.</td> | ||
<td>kafka.raft:type=raft-metrics,name=election-latency</td> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The description of election latency is not clear to me. Maybe:
The time in milliseconds spent on electing a leader.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, there is no election-latency
metrics, only
election-latency-avg
election-latency-max
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to remain true to the code. I agree that it is not the best description. Let me fix both.
RE: the avg and max I think I made a mistake and picked up the sensor name instead of the metric name. this is tru for another few metrics here. Will fix all.
docs/ops.html
Outdated
<tr> | ||
<td>Current State</td> | ||
<td>The current state of this member; possible values are leader, candidate, voted, follower, unattached.</td> | ||
<td>kafka.raft:type=raft-metrics,name=Current-state</td> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please correct the mbean name:
kafka.server:type=raft-metrics
And below. Thanks.
docs/ops.html
Outdated
<tr> | ||
<td>Fetch Records</td> | ||
<td>The average number of records fetched from the leader of the raft quorum.</td> | ||
<td>kafka.raft:type=raft-metrics,name=fetch-records</td> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
name: fetch-records-rate
Are your sure it is the avg number?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The description in code says that it is. Let me verify the value and update both of needed.
docs/ops.html
Outdated
<tr> | ||
<td>Poll Idle Ratio</td> | ||
<td>The average fraction of time the client's poll() is idle as opposed to waiting for the user code to process records.</td> | ||
<td>kafka.raft:type=raft-metrics,name=poll-idle-ratio</td> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
name=poll-idle-ratio-avg
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack! Will fix all metrics coming from this file.
metric description and naming to align with their usage.
@showuon Thanks for the careful feedback. I have updated the PR with the correct Mbean names and fixed many typos. Please take a look when you have a chance. |
* Split the metrics into Broker and Controller Sections * Changed the num-unknown-voter-connections to reflect the KIP
kafkaRaftMetrics.updateNumUnknownVoterConnections(quorum.remoteVoters().size()); | ||
// All Raft voters are statically configured and known at startup | ||
// so there are no unknown voter connections. Report this metric as 0. | ||
kafkaRaftMetrics.updateNumUnknownVoterConnections(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should file a jira to follow up with removal of this metric if it is not reporting anything useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah! Was planning on filing a JIRA and KIP for its removal. Will do that next week. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I will let @showuon take one more look before we merge this. Thanks for the patch!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks!
Failed tests are unrelated:
|
This commit adds KRaft monitoring related metrics to the Kafka docs (docs/ops.html). Reviewers: Jason Gustafson <jason@confluent.io>, Luke Chen <showuon@gmail.com>
backport to v3.3 |
…he#12679) This commit adds KRaft monitoring related metrics to the Kafka docs (docs/ops.html). Reviewers: Jason Gustafson <jason@confluent.io>, Luke Chen <showuon@gmail.com>
…he#12679) This commit adds KRaft monitoring related metrics to the Kafka docs (docs/ops.html). Reviewers: Jason Gustafson <jason@confluent.io>, Luke Chen <showuon@gmail.com>
This commit adds KRaft monitoring related metrics to the Kafka docs (docs/ops.html).
All metrics contained in the following classes have been added to the commit:
KRaft related KIPs mention more metrics, but this commit only adds in the metrics available in the source code at the time of authoring.