Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GOBBLIN-876: Expose metrics() API in GobblinKafkaConsumerClient to al… #2730

Closed
wants to merge 5 commits into from

Conversation

sv2000
Copy link
Contributor

@sv2000 sv2000 commented Sep 9, 2019

…low consume metrics to be reported.

Dear Gobblin maintainers,

Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below!

JIRA

Description

  • Here are some details about my PR, including screenshots (if applicable):
    Newer Kafka consumer expose metrics() API that report a number of consumer metrics such as lag, latency, etc. which are very useful for monitoring and debugging. We expose a metrics() API in GobblinKafkaConsumerClient to allow consumer metrics to be reported.

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:
    NA.

Commits

  • My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

@codecov-io
Copy link

codecov-io commented Sep 9, 2019

Codecov Report

Merging #2730 into master will increase coverage by 0.15%.
The diff coverage is 4%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master    #2730      +/-   ##
============================================
+ Coverage     44.99%   45.15%   +0.15%     
- Complexity     8742     8801      +59     
============================================
  Files          1884     1890       +6     
  Lines         70295    70620     +325     
  Branches       7715     7747      +32     
============================================
+ Hits          31629    31886     +257     
- Misses        35735    35780      +45     
- Partials       2931     2954      +23
Impacted Files Coverage Δ Complexity Δ
.../kafka/client/AbstractBaseKafkaConsumerClient.java 0% <0%> (ø) 0 <0> (ø) ⬇️
...bblin/kafka/client/GobblinKafkaConsumerClient.java 0% <0%> (ø) 0 <0> (?)
...he/gobblin/kafka/client/Kafka09ConsumerClient.java 76.81% <8.33%> (-14.42%) 8 <1> (ø)
...in/java/org/apache/gobblin/cluster/SingleTask.java 73.58% <0%> (-7.55%) 9% <0%> (ø)
...lin/util/filesystem/FileSystemInstrumentation.java 85.71% <0%> (-7.15%) 3% <0%> (ø)
...pache/gobblin/cluster/JobConfigurationManager.java 90% <0%> (-4.45%) 11% <0%> (ø)
...a/org/apache/gobblin/cluster/GobblinHelixTask.java 76.08% <0%> (-4.35%) 5% <0%> (ø)
...bblin/compaction/mapreduce/orc/OrcValueMapper.java 78.87% <0%> (-2.38%) 16% <0%> (+11%)
...apache/gobblin/runtime/local/LocalJobLauncher.java 61.81% <0%> (-2.34%) 5% <0%> (ø)
...ache/gobblin/couchbase/writer/CouchbaseWriter.java 64.39% <0%> (-1.89%) 15% <0%> (+4%)
... and 41 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a409908...cf6cf7b. Read the comment docs.

Copy link
Contributor

@autumnust autumnust left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM. Minor comments

* @param kafkaMetric
* @return
*/
private Metric kafkaToCodaHaleMetric(final KafkaMetric kafkaMetric) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this likely to be re-used in other implementation under the same package? I am suggesting to make it package private.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Unfortunately, the KafkaMetric class is specific to a given Kafka version, making code duplication necessary. So it seems that each consumer client version will have its own implementation of the method.

* @param metricName the name of the Kafka metric e.g. "records-lag-max", "fetch-throttle-time-max" etc.
* @return the canonicalized metric name.
*/
public String canonicalMetricName(String metricGroup, Collection<String> metricTags, String metricName) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we make it protected or package private?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will make the change.

@shirshanka
Copy link
Contributor

@sv2000 : why is codecov so red? Something wrong with the codecov plugin?

Copy link
Contributor

@shirshanka shirshanka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some clarifying questions and suggestions...

return gauge;
}

private String canonicalMerticName(KafkaMetric kafkaMetric) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo on Mertic

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Thanks!

* @param metricName the name of the Kafka metric e.g. "records-lag-max", "fetch-throttle-time-max" etc.
* @return the canonicalized metric name.
*/
String canonicalMetricName(String metricGroup, Collection<String> metricTags, String metricName) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

who should call it and when? is it a good idea to call it many times with the same input?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a clarifying comment.


private String canonicalMerticName(KafkaMetric kafkaMetric) {
MetricName name = kafkaMetric.metricName();
return canonicalMetricName(name.group(), name.tags().values(), name.name());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this method will be called repeatedly with the same input values (the Kafka Metric Names) over the lifetime of the process, does it make sense to cache the mapping of KafkaMetricName -> canonicalMetricName and re-use it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - thought about it. Even if we cache the metric name mappings, we need to compute the key for the map every time. It seems like the key needs to be derived off of metric group, metric name and tags to make the metric unique, which is essentially the canonical representation of the kafka metric.

@sv2000
Copy link
Contributor Author

sv2000 commented Sep 9, 2019

@sv2000 : why is codecov so red? Something wrong with the codecov plugin?

Not sure how code coverage is computed. It shows my changes impacting classes that seem totally unrelated.

Copy link
Contributor

@autumnust autumnust left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

* org.apache.kafka.common.Metric to Coda Hale Metrics.
* @return
*/
public default Map<String, Metric> metrics() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since metrics is not a verb... maybe this method should be called getMetrics()?

@Override
public Map<String, Metric> metrics() {
Map<MetricName, KafkaMetric> kafkaMetrics = (Map<MetricName, KafkaMetric>) this.consumer.metrics();
Map<String, Metric> codaHaleMetricMap = new HashMap<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need a new instance every time?
This diff doesn't show when and how this method is being called, so its hard to say if this new instance and copy on each call to metrics is a good idea or not.

Copy link
Contributor

@shirshanka shirshanka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the PR can be improved a bit by

  1. Changing the name of the metrics method to getMetrics

  2. Not creating extra instances of CodaHale metrics

    2 is less of a show-stopper than 1

@sv2000
Copy link
Contributor Author

sv2000 commented Sep 25, 2019

I think the PR can be improved a bit by

  1. Changing the name of the metrics method to getMetrics
  2. Not creating extra instances of CodaHale metrics
    2 is less of a show-stopper than 1

Thanks @shirshanka ! Changed metrics method to getMetrics per your suggestion. For 2, the possibility of TopicPartition reassignments to the underlying Kafka consumer makes it difficult to retain the same map, unless we compute a diff to obtain new metrics on each call to KafkaConsumer#metrics(). For simplicity and due to the fact that this method will be called inside a scheduled thread (typically, once every 30 s or more), the relative overhead of instantiating a new map should be insignificant. As a result, I am keeping the implementation as is.

Copy link
Contributor

@shirshanka shirshanka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@asfgit asfgit closed this in 9389a4b Sep 26, 2019
jhsenjaliya pushed a commit to jhsenjaliya/incubator-gobblin that referenced this pull request Apr 26, 2020
…erClient to al…

Closes apache#2730 from sv2000/kafkaConsumerMetrics
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants