-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
STORM-3101: Fix unexpected metrics registration in StormMetricsRegistry. #2714
Conversation
Some questions:
|
@@ -73,6 +130,12 @@ private static void startMetricsReporter(PreparableReporter reporter, Map<String | |||
} | |||
|
|||
private static <T extends Metric> T register(String name, T metric) { | |||
if (source != null && !name.startsWith(source)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better to register with a Daemon type enum and metric name. This would allows metrics that are daemon specific and general to work for a given daemon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was originally thinking so but then I found out there are some other components not specified as 'daemon' in DaemonType but also using metrics, such as logviewer and RocksDB. I wasn't sure how to handle the case so internally I convert them all to String. It's possible to register both with a Daemon and with a name in String.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarification. Anyone else also has suggestions in the implementation?
Additional questions:
|
Thanks for the contribution. It's good to exploit more on metrics functionality. I don't see the real need for adding filters since it's all storm internal code. But if you think it's better to have, I would suggest to use a filter layer to make this more flexible. For example, Create a filter interface with a filter function. e.g.
then you can have a function (e.g. addFilter) in StormMetricsRegistryFilter to add real implementation of the filter interface before StormMetricsRegistry starts to registerMeters(). The above is a very simple interface and might not be able to do much except filtering based on the I think the current implementation in this PR won't work because you are calling |
Talked with @zd-project offline, I now have a better understanding of the problem he is trying to solve as explained in https://issues.apache.org/jira/browse/STORM-3101. It's a really good catch. |
Another implementation would be to require all metrics to be registered when at least one instance of a class has been instantiated. This will disqualify the final property of all metrics variables but we can encapsulate all the assignment into a static method and invoke at appropriate time. For example,
|
It might be good to consider whether we could move away from We're already kind of using StormMetricsRegistry/StormMetricRegistry as if they belong to the daemon classes, because the daemons are responsible for calling I think for many of the daemons (e.g. Nimbus, Supervisor, DRPCServer), we could probably instantiate a StormMetricsRegistry in the main method or main class constructor, and pass that down to any other classes that need it. That way we could move metrics registration to be non-static, so we don't run into this kind of problem. What do you think @zd-project? |
@srdo |
@danny0405 A quick search for Role or MetricsGroup in the flink repo didn't turn up anything. Could you elaborate on what you mean, and why/how we could use Flink's mechanism here? |
@srdo |
@danny0405 Okay, thanks. I think the MetricRegistryImpl class looks nice https://github.com/apache/flink/blob/16ec3d7ea12c520c5c86f0721553355cc938c2ae/flink-runtime/src/main/java/org/apache/flink/runtime/metrics/MetricRegistryImpl.java. I'd think we could do something similar with StormMetricsRegistry, where we create one as part of e.g. starting Nimbus, and then that instance is passed around via injection (e.g. as in https://github.com/apache/flink/blob/4de72bbee189ab357e4d9e6fea33e27ff1ab233f/flink-runtime/src/main/java/org/apache/flink/runtime/minicluster/MiniCluster.java#L239). I assume this is what you mean? |
@srdo Yeah, this is a valuable promotion for storm metrics and the implementation will be much cleaner. |
We have 4 separate metrics APIs right now 2 for daemons and similarly 2 for the workers. IClusterMetricsConsumer and IMetricsConsumer are way too open so we deprecated them, or should do so. They allowed you to send anything (literately an Object) but they had a lot of context about the metrics, a.k.a. dimensions, which process it came from, what component, etc. The new APIs fix the Object issue, because they are based off of the dropwizard/yammer/codahale API, but it does not support dimensions. The Flink APIs appears to more or less fix the dimensions issue, but it is a fork of codahale/yammer. They have no external dependencies. So are you proposing that we fork just like flink does? do you want us to use the flink-metrics-core instead as a backend? Either of these choices will require that we have at a minimum a new API for reporting metrics to other systems, and possibly a 3rd API for registering metrics. This is not a small amount of work, and it is possibly going to be painful to our users, who may have spent some time trying to move to the new APIs. I am not opposed to it, as it will solve some problems and ugliness that we currently have, I just want to be sure that we understand the ramifications of a move like this. On the plus side it would make it so we could have a single API for all metrics. |
Dropwizard Metrics seems to have |
Looks like StormMetricRegistry has already had filtering by daemon type feature that we need. It also supports report at different intervals and such. Apart from the injection suggestion, do you think it's better to consider moving towards this registry in the future, instead of StormMetricsRegistry (Mind the 's' in the class name lol). |
Another thought on improvement of StormMetricsRegistry in general. We should have something that works better with dependency as well. In the latest commit to #2710 that addresses metric of exceptions from external shell command, I have to register metrics from storm-client component to StormMetricsRegistry, which is in storm-server. To bypass the dependency I ended up declaring the meter in ShellUtil class but register it when launching the supervisor. But this does prompt me to think about decoupling between declaration and registration. Maybe we can have an Enum class that declares all metrics (since all of them are singleton anyway); components can make reference to them, and daemons can register/unregister them on demand when launching. It also makes maintaining existing metrics a bit easier as now we have a place to declare and organize all metrics. |
@zd-project I think the Enum idea sounds nice, but I'm not sure whether it will be so easy to implement. Some metrics depend on state in the class they're declared, e.g. the two gauges in Container. I'm not sure how you could move those to an enum without making the fields in Container public. I'd still like to investigate whether we can make the metrics registries non-static. Would you mind if I played around with it a bit? |
This should be closed after #2805 is resolved. |
The current implementation is based on the idea to set the source to the running daemon. All the metrics not belonging to the source (e.g., supervisor) will be removed or rejected. This implementation also adds in the utility method for naming a metric.
Please refer to the apache issue page for the purpose of this improvement. This is actually a band-aid fix. I'm wondering if there's better approach.