Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KAFKA-14114: Add Metadata Error Related Metrics #12496

Closed

Conversation

cmccabe
Copy link
Contributor

@cmccabe cmccabe commented Aug 8, 2022

This PR adds in 3 metrics as described in KIP-859:
kafka.server:type=broker-metadata-metrics,name=metadata-load-error-count
kafka.server:type=broker-metadata-metrics,name=metadata-apply-error-count
kafka.controller:type=KafkaController,name=MetadataErrorCount

These metrics are incremented by fault handlers when the appropriate fault happens. Broker-side
load errors happen in BrokerMetadataListener. Broker-side apply errors happen in the
BrokerMetadataPublisher. The metric on the controller is incremented when the standby controller
(not active) encounters a metadata error.

In BrokerMetadataPublisher, try to limit the damage caused by an exception by introducing more
catch blocks. The only fatal failures here are those that happen during initialization, when we
initialize the manager objects (these would also be fatal in ZK mode).

In BrokerMetadataListener, try to improve the logging of faults, especially ones that happen when
replaying a snapshot. Try to limit the damage caused by an exception.

Replace MetadataFaultHandler with LoggingFaultHandler, which is more flexible and takes a Runnable
argument. Add LoggingFaultHandlerTest.

Make QuorumControllerMetricsTest stricter. Fix a bug where we weren't cleaning up some metrics from
the yammer registry on close in QuorumControllerMetrics.

Co-author: Niket Goel ngoel@confluent.io

@cmccabe cmccabe force-pushed the kafka-14114-kraft-error-metrics-II branch 2 times, most recently from bc22147 to 9e0d2ed Compare August 9, 2022 00:02
This PR adds in 3 metrics as described in KIP-859:
 kafka.server:type=broker-metadata-metrics,name=metadata-load-error-count
 kafka.server:type=broker-metadata-metrics,name=metadata-apply-error-count
 kafka.controller:type=KafkaController,name=MetadataErrorCount

These metrics are incremented by fault handlers when the appropriate fault happens. Broker-side
load errors happen in BrokerMetadataListener. Broker-side apply errors happen in the
BrokerMetadataPublisher. The metric on the controller is incremented when the standby controller
(not active) encounters a metadata error.

In BrokerMetadataPublisher, try to limit the damage caused by an exception by introducing more
catch blocks. The only fatal failures here are those that happen during initialization, when we
initialize the manager objects (these would also be fatal in ZK mode).

In BrokerMetadataListener, try to improve the logging of faults, especially ones that happen when
replaying a snapshot. Try to limit the damage caused by an exception.

Replace MetadataFaultHandler with LoggingFaultHandler, which is more flexible and takes a Runnable
argument. Add LoggingFaultHandlerTest.

Make QuorumControllerMetricsTest stricter. Fix a bug where we weren't cleaning up some metrics from
the yammer registry on close in QuorumControllerMetrics.

Co-author: Niket Goel <ngoel@confluent.io>
@cmccabe cmccabe force-pushed the kafka-14114-kraft-error-metrics-II branch from 9e0d2ed to 038cffb Compare August 9, 2022 19:01
@cmccabe cmccabe closed this Aug 9, 2022
@cmccabe cmccabe deleted the kafka-14114-kraft-error-metrics-II branch August 9, 2022 22:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants