Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KAFKA-16731: Added share group metrics class. #16488

Merged
merged 6 commits into from
Jul 7, 2024
Merged

Conversation

smjn
Copy link
Contributor

@smjn smjn commented Jun 28, 2024

What

  • Added ShareGroupMetrics inner class to SharePartitionManager.
  • Added code to record metrics at various checkpoints in code.

Why

  • We want to record the shareAcknowledgement rate and count, partition load time max and average and record ack rate and count.

Testing

  • Updated tests to verify partition load metrics, record ack and share ack requests

Copy link
Collaborator

@AndrewJSchofield AndrewJSchofield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. Just a couple of minor comments.

@@ -96,6 +101,7 @@
import static org.mockito.Mockito.when;

@Timeout(120)
@SuppressWarnings({ "ClassDataAbstractionCoupling"})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Superfluous space

partitionLoadTimeSensor.add(metrics.metricName(
PARTITION_LOAD_TIME_AVG,
METRICS_GROUP_NAME,
"Average time taken to load the share partitions."),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"The average time in milliseconds to load the share partitions".

partitionLoadTimeSensor.add(metrics.metricName(
PARTITION_LOAD_TIME_MAX,
METRICS_GROUP_NAME,
"Maximum time taken to load the share partitions."),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"The maximum time in milliseconds to load the share partitions."

Copy link
Collaborator

@apoorvmittal10 apoorvmittal10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ll take a look at it, early next week, just need to re-check the metrics definition once again around offset and records commit rate.

Copy link
Collaborator

@AndrewJSchofield AndrewJSchofield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. A few minor comments.

void recordAcknowledgement(byte ackType) {
if (recordAcksSensorMap.containsKey(ackType)) {
recordAcksSensorMap.get(ackType).record();
} else {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ackType can be zero when a gap is being acknowledged. For example, when a batch of transactional records is received by the client, it responds with 0 for the offsets which correspond to the transactional control records. This will cause an error to be logged by the broker for each of the records.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@apoorvmittal10 I've seen this line being logged during integration tests. How do you think we should handle this? Either we need to ensure the metrics are only logged for ack types the metrics understands, or we need to permit the metrics to just ignore types which are not understood (which is my preferred option).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, as discussed let's ignore the unknown ones (gaps).

metrics.metricName(
SHARE_ACK_RATE,
METRICS_GROUP_NAME,
"The rate of number of acknowledge requests."),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Rate of number" sounds odd. "Rate of acknowledge requests" or "acknowledge request rate" would be better.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably "Rate of acknowledge requests" is best given my next comment.

metrics.metricName(
RECORD_ACK_RATE,
METRICS_GROUP_NAME,
"The rate of number of records acknowledged per acknowledgement type.",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Rate of number" sounds odd. "Rate of records acknowledged per acknowledgement type" sounds better I think.

Copy link
Collaborator

@apoorvmittal10 apoorvmittal10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, LGTM!

Copy link
Collaborator

@AndrewJSchofield AndrewJSchofield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. lgtm

@@ -808,8 +808,6 @@ void shareAcknowledgement() {
void recordAcknowledgement(byte ackType) {
if (recordAcksSensorMap.containsKey(ackType)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add a comment above the if test that says something like // unknown ack types (such as gaps for control records) are intentionally ignored

Copy link
Contributor

@omkreddy omkreddy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smjn Thanks for the PR. LGTM

@omkreddy omkreddy merged commit 932759b into apache:trunk Jul 7, 2024
1 check failed
abhi-ksolves pushed a commit to ksolves/kafka that referenced this pull request Jul 31, 2024
Added ShareGroupMetrics inner class to SharePartitionManager.
Added code to record metrics at various checkpoints in code.

Reviewers:  Andrew Schofield <aschofield@confluent.io>,Apoorv Mittal <apoorvmittal10@gmail.com>, Manikumar Reddy <manikumar.reddy@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants