Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MINOR: Adding KRaft Monitoring Related Metrics to docs/ops.html #12679

Merged
merged 6 commits into from
Sep 26, 2022

Conversation

niket-goel
Copy link
Contributor

This commit adds KRaft monitoring related metrics to the Kafka docs (docs/ops.html).

All metrics contained in the following classes have been added to the commit:

BrokerServerMetrics
QuorumControllerMetrics
KafkaRaftMetrics

KRaft related KIPs mention more metrics, but this commit only adds in the metrics available in the source code at the time of authoring.

docs/ops.html Outdated
<tr>
<td>Current State</td>
<td>The current state of this member; possible values are leader, candidate, voted, follower, unattached.</td>
<td>kafka.raft:type=raft-metrics,name=Current-state</td>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if the type etc is correct here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please correct the mbean name:
kafka.server:type=raft-metrics
And below. Thanks.

docs/ops.html Outdated
</tr>
<tr>
<td>Active Controller Count</td>
<td></td>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of the metrics available in QuorumControllerMetrics have a description attached to them. I can write up descriptions for these, but want to make sure that they match up with any literature that we already have. I could not see mentions of these in the KIPs I looked at either. What would be the best course of action here?

<td>Metadata Apply Error Count</td>
<td>The number of errors encountered by the BrokerMetadataPublisher while applying a new MetadataImage based on the latest MetadataDelta.</td>
<td>kafka.server:type=broker-metadata-metrics,name=metadata-apply-error-count</td>
</tr>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming this is the complete set of metrics, based on the files I went through. Please point out any files/KIPs that I should look at to add more metrics here.

@niket-goel niket-goel marked this pull request as ready for review September 22, 2022 23:37
Copy link
Contributor

@showuon showuon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@niket-goel , thanks for the PR to add the missing KRaft metrics. I found there are still some metrics missing in this PR. I suggested you run a KRaft kafka in local to get all the metrics it has, then map back to the code/KIP to get the description.

docs/ops.html Outdated
@@ -1815,6 +1815,162 @@ <h4 class="anchor-heading"><a id="remote_jmx" class="anchor-link"></a><a href="#
</tr>
</tbody></table>

<h5 class="anchor-heading"><a id="kraft_monitoring" class="anchor-link"></a><a href="#kraft_monitoring">KRaft Metrics</a></h5>
All of the following metrics allow monitoring of the KRaft quourm and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and [what?]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm.! how did that get deleted. Fixing it. Thanks for catching.

docs/ops.html Outdated
Comment on lines 1859 to 1860
<td>The time in milliseconds to elect a new leader.</td>
<td>kafka.raft:type=raft-metrics,name=election-latency</td>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description of election latency is not clear to me. Maybe:
The time in milliseconds spent on electing a leader.
WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, there is no election-latency metrics, only
election-latency-avg
election-latency-max

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to remain true to the code. I agree that it is not the best description. Let me fix both.

RE: the avg and max I think I made a mistake and picked up the sensor name instead of the metric name. this is tru for another few metrics here. Will fix all.

docs/ops.html Outdated
<tr>
<td>Current State</td>
<td>The current state of this member; possible values are leader, candidate, voted, follower, unattached.</td>
<td>kafka.raft:type=raft-metrics,name=Current-state</td>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please correct the mbean name:
kafka.server:type=raft-metrics
And below. Thanks.

docs/ops.html Outdated
<tr>
<td>Fetch Records</td>
<td>The average number of records fetched from the leader of the raft quorum.</td>
<td>kafka.raft:type=raft-metrics,name=fetch-records</td>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name: fetch-records-rate
Are your sure it is the avg number?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description in code says that it is. Let me verify the value and update both of needed.

docs/ops.html Outdated
<tr>
<td>Poll Idle Ratio</td>
<td>The average fraction of time the client's poll() is idle as opposed to waiting for the user code to process records.</td>
<td>kafka.raft:type=raft-metrics,name=poll-idle-ratio</td>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name=poll-idle-ratio-avg

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack! Will fix all metrics coming from this file.

metric description and naming to align with their usage.
@niket-goel
Copy link
Contributor Author

@showuon Thanks for the careful feedback. I have updated the PR with the correct Mbean names and fixed many typos. Please take a look when you have a chance.

* Split the metrics into Broker and Controller Sections
* Changed the num-unknown-voter-connections to reflect the KIP
docs/ops.html Outdated Show resolved Hide resolved
kafkaRaftMetrics.updateNumUnknownVoterConnections(quorum.remoteVoters().size());
// All Raft voters are statically configured and known at startup
// so there are no unknown voter connections. Report this metric as 0.
kafkaRaftMetrics.updateNumUnknownVoterConnections(0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should file a jira to follow up with removal of this metric if it is not reporting anything useful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah! Was planning on filing a JIRA and KIP for its removal. Will do that next week. :)

docs/ops.html Outdated Show resolved Hide resolved
Copy link
Contributor

@hachikuji hachikuji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I will let @showuon take one more look before we merge this. Thanks for the patch!

Copy link
Contributor

@showuon showuon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks!

@showuon showuon changed the title Adding KRaft Monitoring Related Metrics to docs/ops.html MINOR: Adding KRaft Monitoring Related Metrics to docs/ops.html Sep 26, 2022
@showuon
Copy link
Contributor

showuon commented Sep 26, 2022

Failed tests are unrelated:

Build / JDK 11 and Scala 2.13 / kafka.admin.LeaderElectionCommandTest.[1] Type=Raft-Distributed, Name=testPathToJsonFile, MetadataVersion=3.3-IV3, Security=PLAINTEXT

@showuon showuon merged commit eb8f0bd into apache:trunk Sep 26, 2022
showuon pushed a commit that referenced this pull request Sep 26, 2022
This commit adds KRaft monitoring related metrics to the Kafka docs (docs/ops.html).

Reviewers: Jason Gustafson <jason@confluent.io>, Luke Chen <showuon@gmail.com>
@showuon
Copy link
Contributor

showuon commented Sep 26, 2022

backport to v3.3

guozhangwang pushed a commit to guozhangwang/kafka that referenced this pull request Jan 25, 2023
…he#12679)

This commit adds KRaft monitoring related metrics to the Kafka docs (docs/ops.html).

Reviewers: Jason Gustafson <jason@confluent.io>, Luke Chen <showuon@gmail.com>
rutvijmehta-harness pushed a commit to rutvijmehta-harness/kafka that referenced this pull request Feb 9, 2024
…he#12679)

This commit adds KRaft monitoring related metrics to the Kafka docs (docs/ops.html).

Reviewers: Jason Gustafson <jason@confluent.io>, Luke Chen <showuon@gmail.com>
rutvijmehta-harness added a commit to rutvijmehta-harness/kafka that referenced this pull request Feb 9, 2024
…he#12679) (#12)

This commit adds KRaft monitoring related metrics to the Kafka docs (docs/ops.html).

Reviewers: Jason Gustafson <jason@confluent.io>, Luke Chen <showuon@gmail.com>

Co-authored-by: Niket <niket-goel@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants