Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

STORM-3101: Fix unexpected metrics registration in StormMetricsRegistry. #2714

Closed
wants to merge 1 commit into from

Conversation

zd-project
Copy link
Contributor

@zd-project zd-project commented Jun 13, 2018

The current implementation is based on the idea to set the source to the running daemon. All the metrics not belonging to the source (e.g., supervisor) will be removed or rejected. This implementation also adds in the utility method for naming a metric.

Please refer to the apache issue page for the purpose of this improvement. This is actually a band-aid fix. I'm wondering if there's better approach.

@zd-project
Copy link
Contributor Author

Some questions:

  1. Currently, by rejecting a metric, StormMetricsRegistry simply discard it and log the warning without throwing any errors to the calling method/classes. Is this an appropriate practice?
  2. Should we instead use a secondary registry for filtering and reporting only, apart from the primary which does include all registrations?
  3. Not sure how to handle RocksDB metrics as it isn't technically a daemon.

@@ -73,6 +130,12 @@ private static void startMetricsReporter(PreparableReporter reporter, Map<String
}

private static <T extends Metric> T register(String name, T metric) {
if (source != null && !name.startsWith(source)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to register with a Daemon type enum and metric name. This would allows metrics that are daemon specific and general to work for a given daemon.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was originally thinking so but then I found out there are some other components not specified as 'daemon' in DaemonType but also using metrics, such as logviewer and RocksDB. I wasn't sure how to handle the case so internally I convert them all to String. It's possible to register both with a Daemon and with a name in String.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification. Anyone else also has suggestions in the implementation?

@zd-project
Copy link
Contributor Author

Additional questions:

  1. Is worker considered a separate daemon from supervisor? How should we differentiate worker's metrics from supervisor's metrics? For example, the worker's kill/restart operations are handled by supervisors. Does this make them supervisor's metrics?
  2. Are DRPC and LogViewer considered server Daemon? Can I add them to the DaemonType enum?

@Ethanlm
Copy link
Contributor

Ethanlm commented Jun 14, 2018

Thanks for the contribution. It's good to exploit more on metrics functionality.

I don't see the real need for adding filters since it's all storm internal code. But if you think it's better to have, I would suggest to use a filter layer to make this more flexible.

For example,

Create a filter interface with a filter function. e.g.

public interface IStormMetricsRegistryFilter {
   public default boolean filter(String metricName) {
        return false;
    }
}

then you can have a function (e.g. addFilter) in StormMetricsRegistryFilter to add real implementation of the filter interface before StormMetricsRegistry starts to registerMeters().

The above is a very simple interface and might not be able to do much except filtering based on the String parameter. You can think about more on this.

I think the current implementation in this PR won't work because you are calling setsource() to late

@zd-project zd-project changed the title STORM-3101: Add filter to StormMetricsRegistry. STORM-3101: Fix accidental metrics registration in StormMetricsRegistry. Jun 14, 2018
@zd-project zd-project changed the title STORM-3101: Fix accidental metrics registration in StormMetricsRegistry. STORM-3101: Fix unexpected metrics registration in StormMetricsRegistry. Jun 14, 2018
@Ethanlm
Copy link
Contributor

Ethanlm commented Jun 14, 2018

Talked with @zd-project offline, I now have a better understanding of the problem he is trying to solve as explained in https://issues.apache.org/jira/browse/STORM-3101. It's a really good catch.
But we can discuss more about how to address it.

@zd-project
Copy link
Contributor Author

zd-project commented Jun 14, 2018

Another implementation would be to require all metrics to be registered when at least one instance of a class has been instantiated. This will disqualify the final property of all metrics variables but we can encapsulate all the assignment into a static method and invoke at appropriate time.

For example,

class Scratch {
    private static Integer x = null;

    public Scratch() {
        if (Scratch.x == null) {
            x = 7;
        }
    }
}

@srdo
Copy link
Contributor

srdo commented Jun 17, 2018

It might be good to consider whether we could move away from static for metrics, and consider if we could do dependency injection instead? The cause of the issue you mention on JIRA (accidentally loading metrics) seems to me to be because the metrics registration is static.

We're already kind of using StormMetricsRegistry/StormMetricRegistry as if they belong to the daemon classes, because the daemons are responsible for calling start/stop on them.

I think for many of the daemons (e.g. Nimbus, Supervisor, DRPCServer), we could probably instantiate a StormMetricsRegistry in the main method or main class constructor, and pass that down to any other classes that need it. That way we could move metrics registration to be non-static, so we don't run into this kind of problem.

What do you think @zd-project?

@danny0405
Copy link

@srdo
I kind of saw the Apache Flink metrics system, is has a internal metrics layer on top of the Yammer Metrics, every Role in Flink has a MetricsGroup, one MetricsGroup has many Metrics registered in. This metrics group can be passed around along with the Role, and user can chose to register their metrics in very easily.

@srdo
Copy link
Contributor

srdo commented Jun 18, 2018

@danny0405 A quick search for Role or MetricsGroup in the flink repo didn't turn up anything. Could you elaborate on what you mean, and why/how we could use Flink's mechanism here?

@danny0405
Copy link

@srdo
Sorry, it's MetricGroup and Metric.

@srdo
Copy link
Contributor

srdo commented Jun 19, 2018

@danny0405 Okay, thanks. I think the MetricRegistryImpl class looks nice https://github.com/apache/flink/blob/16ec3d7ea12c520c5c86f0721553355cc938c2ae/flink-runtime/src/main/java/org/apache/flink/runtime/metrics/MetricRegistryImpl.java. I'd think we could do something similar with StormMetricsRegistry, where we create one as part of e.g. starting Nimbus, and then that instance is passed around via injection (e.g. as in https://github.com/apache/flink/blob/4de72bbee189ab357e4d9e6fea33e27ff1ab233f/flink-runtime/src/main/java/org/apache/flink/runtime/minicluster/MiniCluster.java#L239). I assume this is what you mean?

@danny0405
Copy link

@srdo Yeah, this is a valuable promotion for storm metrics and the implementation will be much cleaner.

@revans2
Copy link
Contributor

revans2 commented Jun 20, 2018

We have 4 separate metrics APIs right now

2 for daemons
https://github.com/apache/storm/blob/master/storm-server/src/main/java/org/apache/storm/metric/api/IClusterMetricsConsumer.java
https://github.com/apache/storm/blob/master/storm-server/src/main/java/org/apache/storm/metric/StormMetricsRegistry.java

and similarly 2 for the workers.
https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/metric/api/IMetricsConsumer.java
https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/metrics2/StormMetricRegistry.java

IClusterMetricsConsumer and IMetricsConsumer are way too open so we deprecated them, or should do so. They allowed you to send anything (literately an Object) but they had a lot of context about the metrics, a.k.a. dimensions, which process it came from, what component, etc. The new APIs fix the Object issue, because they are based off of the dropwizard/yammer/codahale API, but it does not support dimensions.

The Flink APIs appears to more or less fix the dimensions issue, but it is a fork of codahale/yammer.

https://github.com/apache/flink/blob/16ec3d7ea12c520c5c86f0721553355cc938c2ae/flink-metrics/flink-metrics-core/pom.xml

They have no external dependencies.

So are you proposing that we fork just like flink does? do you want us to use the flink-metrics-core instead as a backend? Either of these choices will require that we have at a minimum a new API for reporting metrics to other systems, and possibly a 3rd API for registering metrics. This is not a small amount of work, and it is possibly going to be painful to our users, who may have spent some time trying to move to the new APIs. I am not opposed to it, as it will solve some problems and ugliness that we currently have, I just want to be sure that we understand the ramifications of a move like this.

On the plus side it would make it so we could have a single API for all metrics.

@zd-project
Copy link
Contributor Author

zd-project commented Jul 9, 2018

Dropwizard Metrics seems to have MetricFilter built-in as well as removeMatching. I think we can leverage so for filtering out unwanted metrics, which would be similar to my original proposal but easier to maintain on our side.

@zd-project
Copy link
Contributor Author

Looks like StormMetricRegistry has already had filtering by daemon type feature that we need. It also supports report at different intervals and such. Apart from the injection suggestion, do you think it's better to consider moving towards this registry in the future, instead of StormMetricsRegistry (Mind the 's' in the class name lol).

@zd-project
Copy link
Contributor Author

Another thought on improvement of StormMetricsRegistry in general. We should have something that works better with dependency as well. In the latest commit to #2710 that addresses metric of exceptions from external shell command, I have to register metrics from storm-client component to StormMetricsRegistry, which is in storm-server. To bypass the dependency I ended up declaring the meter in ShellUtil class but register it when launching the supervisor. But this does prompt me to think about decoupling between declaration and registration. Maybe we can have an Enum class that declares all metrics (since all of them are singleton anyway); components can make reference to them, and daemons can register/unregister them on demand when launching. It also makes maintaining existing metrics a bit easier as now we have a place to declare and organize all metrics.
@srdo @danny0405 @revans2

@srdo
Copy link
Contributor

srdo commented Jul 29, 2018

@zd-project I think the Enum idea sounds nice, but I'm not sure whether it will be so easy to implement. Some metrics depend on state in the class they're declared, e.g. the two gauges in Container. I'm not sure how you could move those to an enum without making the fields in Container public.

I'd still like to investigate whether we can make the metrics registries non-static. Would you mind if I played around with it a bit?

@zd-project
Copy link
Contributor Author

This should be closed after #2805 is resolved.

@srdo
Copy link
Contributor

srdo commented Apr 15, 2019

@srdo srdo closed this Apr 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants