New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-4389] Expose metrics to WebFrontend #2363
Conversation
The test case |
Fixed the failing test. |
* Abstract request handler that returns a list of all available metrics or the values for a set of metrics. | ||
* | ||
* If the query parameters do not contain a "get" parameter the list of all metrics is returned. | ||
* {@code {"available": [ { "name" : "X", "id" : "X" } ] } } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"name" : "X"
won't be written, will it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That javadoc is a bit outdated, it should be {@code [ { "id" : "X" } ] }
Thanks for your contribution @zentol. I've gone over the code and made some inline comments. My main concern/question is actually the representation of metric's type and hierarchy information. I think that encoding it in a string and then re-parsing it on the receiver side to reconstruct the information is rather fragile and error-prone especially wrt maintainability. Maybe you can give me some background why you decided to do it so. Apart from that, I think the code contains many tests, which I really like :-) |
@tillrohrmann I've addressed most of your comments. Excluded are calling |
But we still send metric data as strings encoded over the wire and have no checks that the histogram field order is actually correct, right? |
only Gauge values are sent as strings. |
Sorry, I meant that the hierarchy information is still encoded in a string and then re-parsed. Furthermore, the histogram data is sent as an object array without any information about the field orderings. |
well...currently that is still done. Whether it will be done once this is merged is up in the air. |
I think this should be addressed (either way) before merging this PR. |
Regarding hierarchy: I'm close to being done with a container for the scope information. |
Great to hear @zentol 👍 |
986127f
to
0122061
Compare
I've updated and rebased the PR. The scope information is now stored in a Metrics, or rather their scope, name and value(s), are now serialized with a new On the other end we have the Neither the There is no encoding for field orderings but tests that verify that the fields are assigned correctly. If a developer were to change the order of fields a test would fail, and the only way for this to make it into master would be if a) the test is simply changed to give a green light and b) it isn't noticed in the review, at which point all bets are of anyway. So i decided to keep it a bit simpler. The |
b4ffee2
to
202c823
Compare
} | ||
|
||
switch (info.getCategory()) { | ||
case INFO_CATEGORY_JM: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the benefit of having an explicit type field over using instanceof
? I think encoding the type via the actual type has the advantage that you don't mix up classes with wrong category types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
eh, seemed like the proper way of handling it. Also, (up to) 4 comparisons vs a jump.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is true. Performance-wise it is the more efficient way to execute it, no doubt. I was just wondering whether this is not a case of premature optimization with the price of harder maintainability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the other hand, it does not seem too overly complicated to be not maintainable. With that in mind, my other comments are mainly obsolete.
I think the changes look good. Thanks for your work @zentol :-) I only had a minor question whether we can substitute the explicit category information by the type information of the metric dumps and the |
I'll address the checkNotNull/comment formatting while merging, which I'm doing now. Thank you for looking over it again @tillrohrmann . |
af21eb4
to
e62a02b
Compare
1f0c779
to
26f91a4
Compare
282122d
to
2363530
Compare
This PR exposes metrics to the Webfrontend, as proposed in FLIP-7.
This PR builds on-top of #2300, meaning that 2866f56 is not part of the PR.
I've split the implementation into 5 commits that implement
MetricQueryService
The MetricQueryService is an actor running inside the MetricRegistry acting like an unscheduled reporter that is queried from the outside for a report. The MetricRegistry notifies it of added/removed metrics whereas the MetricFetcher sends report requests to the JM/TM which are then forwarded to the MetricQueryService, which answers directly to the MetricFetcher.
The report is one big
Object[]
, which contains for each metricMetricStore
The MetricStore is a relatively simple nested data-structure that contains one HashMap<String, Object> for every JM/TM/job/task. Received metrics are added to these HashMaps based on the format string. There is only a single MetricStore instance in the WebInterface.
MetricFetcher
The MetricFetcher initiates the transfer and cleanup of metrics. It contains the MetricStore instance, which is accessed by MetricHandlers. The fetching is only done when a handler asks for it, with a minimum duration of 10 seconds between updates. As such no fetching will be done if the metrics are not accessed with REST calls.
The fetching procedure can be summed up in pseudo-code as following:
MetricsHandler
The MetricsHandlers deal with two requests:
get
query parameter is treated as a request for all available metrics for a given JM/TM/job/task, denoted by the REST path. The reply will be a JSON array, for example:[{"id":"metric_1"},{"id":"metric_2"}]
get
query parameter. The reply will be a JSON array of id:value pairs, for example:[{"id":"metric_1", "value":"4"}]
or an empty string if an error occurred.