Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KAFKA-15618: Kafka metrics collector and supporting classes (KIP-714) #14620

Merged
merged 18 commits into from
Nov 29, 2023

Conversation

apoorvmittal10
Copy link
Contributor

The PR (jira) outlines classes to collect metrics for client by KafkaMetricsCollector implementation. The MetricsCollector defines mechanism to collect client metrics in sum and gauge metrics format. This requires to define cumulative and delta telemetry metrics while collecting raw metrics.

Singl point metric class helps creating OTLP format Metric object wrapped over Single point metric class itself.

The PR requires merge of other open PRs for successful build: #14575, #14618, #14619.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@hudeqi hudeqi added the kip Requires or implements a KIP label Oct 24, 2023
Copy link
Member

@mjsax mjsax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made a pass but did not get to the tests yet. Couple of minor questions.

@apoorvmittal10
Copy link
Contributor Author

Thanks @AndrewJSchofield for approving the PR. @mjsax @xvrl @philipnee please if you can review again as well.

@philipnee
Copy link
Collaborator

Hi @apoorvmittal10 - Thanks for addressing the comments, I have no more.

@apoorvmittal10
Copy link
Contributor Author

@xvrl Can you please re-review and let me know if fixing temporality change by resetting ledger in follow up PR makes sense?

@apoorvmittal10
Copy link
Contributor Author

@AndrewJSchofield @philipnee Thanks for approving the PR. @xvrl I have also added support for handling temporality change in KafkaMetricsCollector (KAFKA-15877).

Below is the status of build which reports 11 failing tests where none of the test is related to the changes in the PR. However I investigated and can find, among 11, 10 already have jira reported as per other PR build failure. I tried reproducing the 1 remaining tests and cannot reproduce hence have created jira to track same.

@mjsax Please let me know if we can go ahead with the PR merge or I shall address something else as well?

New failing - 11

Already existing jira, flaky test already known: Build / JDK 11 and Scala 2.13 / testResetSinkConnectorOffsetsOverriddenConsumerGroupId – org.apache.kafka.connect.integration.OffsetsApiIntegrationTest
16s: https://issues.apache.org/jira/browse/KAFKA-15891

Already existing jira, flaky test already known: Build / JDK 11 and Scala 2.13 / testResetSinkConnectorOffsetsZombieSinkTasks – org.apache.kafka.connect.integration.OffsetsApiIntegrationTest
59s: https://issues.apache.org/jira/browse/KAFKA-15524

Already existing jira, flaky test already known: Build / JDK 11 and Scala 2.13 / testResetSinkConnectorOffsetsZombieSinkTasks – org.apache.kafka.connect.integration.OffsetsApiIntegrationTest
52s: https://issues.apache.org/jira/browse/KAFKA-15524

Already existing jira, flaky test already known: Build / JDK 11 and Scala 2.13 / testTaskRequestWithOldStartMsGetsUpdated() – org.apache.kafka.trogdor.coordinator.CoordinatorTest
2m 0s: https://issues.apache.org/jira/browse/KAFKA-15760

Already existing jira, flaky test already known: Build / JDK 17 and Scala 2.13 / testReplicateSourceDefault() – org.apache.kafka.connect.mirror.integration.IdentityReplicationIntegrationTest
1m 50s: https://issues.apache.org/jira/browse/KAFKA-15292

Created jira, unable to reproduce locally: Build / JDK 21 and Scala 2.13 / testOutdatedCoordinatorAssignment() – org.apache.kafka.clients.consumer.internals.EagerConsumerCoordinatorTest
<1s: https://issues.apache.org/jira/browse/KAFKA-15900

Already existing jira, flaky test already known: Build / JDK 21 and Scala 2.13 / testFenceMultipleBrokers() – org.apache.kafka.controller.QuorumControllerTest
47s: https://issues.apache.org/jira/browse/KAFKA-15898

Already existing jira, flaky test already known: Build / JDK 21 and Scala 2.13 / testTaskRequestWithOldStartMsGetsUpdated() – org.apache.kafka.trogdor.coordinator.CoordinatorTest
2m 0s: https://issues.apache.org/jira/browse/KAFKA-15760

Already existing jira, flaky test already known: Build / JDK 8 and Scala 2.12 / shouldUpgradeFromEosAlphaToEosV2[true] – org.apache.kafka.streams.integration.EosV2UpgradeIntegrationTest
4m 23s: https://issues.apache.org/jira/browse/KAFKA-15797

Already existing jira, flaky test already known: Build / JDK 8 and Scala 2.12 / [1] Type=Raft-Combined, Name=testDescribeQuorumReplicationSuccessful, MetadataVersion=3.7-IV1, Security=PLAINTEXT – org.apache.kafka.tools.MetadataQuorumCommandTest
1m 7s: https://issues.apache.org/jira/browse/KAFKA-15104

Already existing jira, flaky test already known: Build / JDK 8 and Scala 2.12 / [5] Type=Raft-Combined, Name=testDescribeQuorumStatusSuccessful, MetadataVersion=3.7-IV1, Security=PLAINTEXT – org.apache.kafka.tools.MetadataQuorumCommandTest: https://issues.apache.org/jira/browse/KAFKA-15104

@apoorvmittal10
Copy link
Contributor Author

@xvrl @mjsax Please help closing the PR, do we need to address any concern in the PR or we can merge. If required we can have a follow up PR as well.

@mjsax mjsax merged commit 009b57d into apache:trunk Nov 29, 2023
1 check failed
@mjsax
Copy link
Member

mjsax commented Nov 29, 2023

Was actually hoping to get an official approval from @xvrl before merging, but I think the PR is ok and feature freeze deadline is close. Merged as-is. If there is more comments from Xavier, we can do a follow up PR to address them.

@apoorvmittal10
Copy link
Contributor Author

Was actually hoping to get an official approval from @xvrl before merging, but I think the PR is ok and feature freeze deadline is close. Merged as-is. If there is more comments from Xavier, we can do a follow up PR to address them.

Thanks @mjsax, make sense. I ll watch the PR if @xvrl's feedback arrives further.

wcarlson5 pushed a commit that referenced this pull request Dec 5, 2023
…KIP-714) (#14843)

The PR adds changes for the client APIs to register ClientTelemetryReporter, if enabled, and periodically report client metrics. The changes include front facing API changes with NetworkCLient issuing telemetry APIs.

The PR build is dependent on: #14620, #14724

Reviewers: Philip Nee <pnee@confluent.io>, Andrew Schofield <aschofield@confluent.io>, Kirk True <ktrue@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Walker Carlson <wcarlson@apache.org>
ex172000 pushed a commit to ex172000/kafka that referenced this pull request Dec 15, 2023
…apache#14620)

The PR outlines classes to collect metrics for client by KafkaMetricsCollector implementation. The MetricsCollector defines mechanism to collect client metrics in sum and gauge metrics format. This requires to define cumulative and delta telemetry metrics while collecting raw metrics.

Singl point metric class helps creating OTLP format Metric object wrapped over Single point metric class itself.

Reviewers: Andrew Schofield <aschofield@confluent.io>, Xavier Léauté <xavier@confluent.io>, Philip Nee <pnee@confluent.io>, Matthias J. Sax <matthias@confluent.io>
ex172000 pushed a commit to ex172000/kafka that referenced this pull request Dec 15, 2023
…KIP-714) (apache#14843)

The PR adds changes for the client APIs to register ClientTelemetryReporter, if enabled, and periodically report client metrics. The changes include front facing API changes with NetworkCLient issuing telemetry APIs.

The PR build is dependent on: apache#14620, apache#14724

Reviewers: Philip Nee <pnee@confluent.io>, Andrew Schofield <aschofield@confluent.io>, Kirk True <ktrue@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Walker Carlson <wcarlson@apache.org>
@apoorvmittal10 apoorvmittal10 deleted the kip-714-ak-collector branch January 5, 2024 12:47
gaurav-narula pushed a commit to gaurav-narula/kafka that referenced this pull request Jan 24, 2024
…KIP-714) (apache#14843)

The PR adds changes for the client APIs to register ClientTelemetryReporter, if enabled, and periodically report client metrics. The changes include front facing API changes with NetworkCLient issuing telemetry APIs.

The PR build is dependent on: apache#14620, apache#14724

Reviewers: Philip Nee <pnee@confluent.io>, Andrew Schofield <aschofield@confluent.io>, Kirk True <ktrue@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Walker Carlson <wcarlson@apache.org>
yyu1993 pushed a commit to yyu1993/kafka that referenced this pull request Feb 15, 2024
…apache#14620)

The PR outlines classes to collect metrics for client by KafkaMetricsCollector implementation. The MetricsCollector defines mechanism to collect client metrics in sum and gauge metrics format. This requires to define cumulative and delta telemetry metrics while collecting raw metrics.

Singl point metric class helps creating OTLP format Metric object wrapped over Single point metric class itself.

Reviewers: Andrew Schofield <aschofield@confluent.io>, Xavier Léauté <xavier@confluent.io>, Philip Nee <pnee@confluent.io>, Matthias J. Sax <matthias@confluent.io>
yyu1993 pushed a commit to yyu1993/kafka that referenced this pull request Feb 15, 2024
…KIP-714) (apache#14843)

The PR adds changes for the client APIs to register ClientTelemetryReporter, if enabled, and periodically report client metrics. The changes include front facing API changes with NetworkCLient issuing telemetry APIs.

The PR build is dependent on: apache#14620, apache#14724

Reviewers: Philip Nee <pnee@confluent.io>, Andrew Schofield <aschofield@confluent.io>, Kirk True <ktrue@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Walker Carlson <wcarlson@apache.org>
AnatolyPopov pushed a commit to aiven/kafka that referenced this pull request Feb 16, 2024
…apache#14620)

The PR outlines classes to collect metrics for client by KafkaMetricsCollector implementation. The MetricsCollector defines mechanism to collect client metrics in sum and gauge metrics format. This requires to define cumulative and delta telemetry metrics while collecting raw metrics.

Singl point metric class helps creating OTLP format Metric object wrapped over Single point metric class itself.

Reviewers: Andrew Schofield <aschofield@confluent.io>, Xavier Léauté <xavier@confluent.io>, Philip Nee <pnee@confluent.io>, Matthias J. Sax <matthias@confluent.io>
AnatolyPopov pushed a commit to aiven/kafka that referenced this pull request Feb 16, 2024
…KIP-714) (apache#14843)

The PR adds changes for the client APIs to register ClientTelemetryReporter, if enabled, and periodically report client metrics. The changes include front facing API changes with NetworkCLient issuing telemetry APIs.

The PR build is dependent on: apache#14620, apache#14724

Reviewers: Philip Nee <pnee@confluent.io>, Andrew Schofield <aschofield@confluent.io>, Kirk True <ktrue@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Walker Carlson <wcarlson@apache.org>
clolov pushed a commit to clolov/kafka that referenced this pull request Apr 5, 2024
…apache#14620)

The PR outlines classes to collect metrics for client by KafkaMetricsCollector implementation. The MetricsCollector defines mechanism to collect client metrics in sum and gauge metrics format. This requires to define cumulative and delta telemetry metrics while collecting raw metrics.

Singl point metric class helps creating OTLP format Metric object wrapped over Single point metric class itself.

Reviewers: Andrew Schofield <aschofield@confluent.io>, Xavier Léauté <xavier@confluent.io>, Philip Nee <pnee@confluent.io>, Matthias J. Sax <matthias@confluent.io>
clolov pushed a commit to clolov/kafka that referenced this pull request Apr 5, 2024
…KIP-714) (apache#14843)

The PR adds changes for the client APIs to register ClientTelemetryReporter, if enabled, and periodically report client metrics. The changes include front facing API changes with NetworkCLient issuing telemetry APIs.

The PR build is dependent on: apache#14620, apache#14724

Reviewers: Philip Nee <pnee@confluent.io>, Andrew Schofield <aschofield@confluent.io>, Kirk True <ktrue@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Walker Carlson <wcarlson@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
consumer kip Requires or implements a KIP producer
Projects
None yet
6 participants