-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Emitter monitoring #4973
Add Emitter monitoring #4973
Conversation
@LazySingleton | ||
public class EmitterMonitorProvider | ||
{ | ||
private Monitor emitterMontor; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is a missing "I"? Should be "emitterMonitor" instead of "emitterMontor"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, fixed
emitter, | ||
ImmutableMap.of() | ||
); | ||
emitterMonitorProvider.setEmitterMontor(emitterMonitor); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just an opinion: should we inject the emitter monitor in rather than creating it in a singleton provider?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to do this, but didn't find a way. Emitter is not injected statically, it's "chosen" in runtime. Simple injection leads to "duplicate injection" errors. If monitors are also provided via @Provider
methods along with emitters, emitters still should dynamically "decide" which monitor to use. But I don't see how this approach is better than the current solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was thinking about the @Provider
way.
Anyway, my concern is on the order in which the dependencies are injected, so would there be a situation where MetricsModule
gets loaded before the emitter Module
s such that emitterMonitorProvider
returns null
because it is not set yet? If this is not an issue, then I guess your method would simpler and more straightforward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not possible, because in the method where EmitterMonitorProvider is injected, ServiceEmitter is injected as a parameter as well. ServiceEmitter depends on concrete Emitter. So there is a direct dependency chain and Guice will call one of Emitter providers first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it 👍
|
Btw, do you suspect bugs in the emitter that caused you to add these metrics? How do these metrics get emitted if the emitter is not working right? |
@@ -96,6 +98,10 @@ public MonitorScheduler getMonitorScheduler( | |||
|
|||
monitors.add(monitor); | |||
} | |||
Monitor emitterMontor = emitterMonitorProvider.getEmitterMonitor(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
emitterMonitor
(spelling)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
config.get(), | ||
HttpClientInit.createClient(builder.build(), LifecycleUtils.asMmxLifecycle(lifecycle)), | ||
jsonMapper | ||
); | ||
ParametrizedUriEmitterMonitor emitterMonitor = new ParametrizedUriEmitterMonitor( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think "parameterized" is the proper spelling. But the class is from another package so I guess we can't change it right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, It was called that early on not by me, I preserved. Dictionary says that this is acceptable spelling also.
FeedDefiningMonitor.DEFAULT_METRICS_FEED, | ||
emitter | ||
); | ||
emitterMonitorProvider.setEmitterMonitor(emitterMonitor); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a guarantee that this will be called before MetricsModule's getMonitorScheduler is called to provide a MonitorScheduler? I guess it's just because that uses a ServiceEmitter, which needs an Emitter, which must call this method. If so, it's sort of "indirect" and seems like something that is worth a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added comment
The workflow is that if a buffer is failed to be emitted, it goes to a separate "fail queue" (reflected by Also, I fixed description of
I would say not bugs, but the way the code is written, may lead to uncontrolled growth of the amount of buffers, if there are network problems with emitting. We are exploring that at the moment.
Thanks, I didn't think about this before. At least HttpPostEmitter logs something when it couldn't emit a buffer, so it could be spotted in logs. Also in case of network problems, emitting old events doesn't block the new events, so if problems are gone, we could see new events. But in this case the information about problems could be lost, if we don't have gauge |
Thanks for the additional comments @leventov. I can review again when this is no longer WIP. |
Adding to |
Looks that the blocker is the bug you mentioned not the Emitter monitoring feature. Suggest to split this PR into two PRs which are for upgrading java-util and adding Emitter monitoring. |
@jihoonson well, this PR would look like it only adds monitoring, but it automatically would fix emitter bug as well, because those changes are in the same java-util version |
Making PR smaller for backport by just bumping java-util version is not possible, because it breaks backward compatibility of HttpPostEmitter as well, so need the code changes |
@leventov ah ok. |
@leventov is this still WIP or is it ready to review? |
@gianm still WIP, we just found new issues with java-util |
* testing new emitter * fix on failed test
import javax.annotation.Nullable; | ||
|
||
@LazySingleton | ||
public class EmitterMonitorProvider |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this needed? Can a Monitor simply be added to druid.monitoring.monitors
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so basically my suggestion is adding in some monitors that can be used in druid.monitoring.monitors
for the emitters in question, rather than having a special monitor case in the workflow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #4973 (comment). Also I think that more complicated code is better than requirement for extra config. I think that minimizing the amount of config and "good by default" behaviour is the top priority.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think most users would be able to act on these metrics in any meaningful way. Unless there's some kind of action people can take, then it is a set of metrics mostly used for debugging purposes or monitoring by advanced setups. Plus there's a lot of monitors that are very handy for standard use that we don't include by default.
There's also the consideration of code maintainability. Having special cases for different kinds of monitors in https://github.com/druid-io/druid/pull/4973/files#diff-cba74f4106709a96941c9a365e22eddcR104 is against the spirit of what that class is trying to do. The monitoring approach here also has a reverse-injection kind of approach where the thing which is used to inject is injected so the injection can back propagate to where it is used, which is a bit of a tangle for maintainability.
A cleaner impl would have a monitor which checks the injected emitter to see if it can appropriately monitor it, and if not just ignore any monitoring (maybe with a warning).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@drcrallen there will be a problem, that such emitter will be able to support only emitters, included into the druid codebase, namely HttpPostEmitter and ParametrizedUriEmitter, but not emitters, defined in private extensions. Do you think it's ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@drcrallen there will be a problem, that such emitter will be able to support only emitters, included into the druid codebase, namely HttpPostEmitter and ParametrizedUriEmitter, but not emitters, defined in private extensions. Do you think it's ok?
Are you asking if it's ok if a builtin emitter-monitor only works on builtin emitters? Just making sure I understand the question.
If so, I think that's fine, as long as extensions defining their own emitters could also define their own emitter-monitors. (I think this should be possible already, even before this patch, but maybe I'm missing something)
pom.xml
Outdated
<artifactId>server-metrics</artifactId> | ||
<version>0.5.2</version> | ||
<artifactId>java-util</artifactId> | ||
<version>1.3.0</version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this still the best version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
@gianm do you have more comments? |
} | ||
return new HttpPostEmitter(config.get(), HttpClientInit.createClient(builder.build(), LifecycleUtils.asMmxLifecycle(lifecycle)), jsonMapper); | ||
final AsyncHttpClient client = new DefaultAsyncHttpClient(builder.build()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The emitter doesn't use the metamx http-client anymore? Out of curiosity why is this?
(ParametrizedUriEmitter) emitter | ||
); | ||
} else { | ||
throw new IllegalStateException( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a follow up, I'd suggest detecting ComposingEmitter
and unpacking that to monitor the emitters within. And probably in that case, we'll want this to be a warning rather than an error if there is at least one emitter that can be monitored.
Does not need to block this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would a ComposingEmitterMonitor
be a better place for such functionality?
* Add Emitter monitoring * Fix typo * Fixes * testing new emitter * Fix failed test (#71) * testing new emitter * fix on failed test * Remove emitter's readTimeout from docs * Update docs * Add HttpEmittingMonitor * Update java-util to 1.3.2
I'm not really sure but it appears that after this PR, sometimes peon process jvm fails to exit due to possibly following non-daemon thread
did we add a new non-daemon thread pool somewhere in java-util or http client etc ? |
also I have seen errors like
again not sure if really related to this PR |
@himanshug which version of netty do you use? |
Seen this as well, again not sure if its related
|
we're getting following netty related jars bundled right now from druid master...
|
…#6425) @gianm gauge `emitter/buffers/failed` and `emitter/buffers/totalAllocated` metrics proved to be very inconvenient in practice: they are not additive across a fleet of nodes, are randomly reset when a JVM restarts (that may be completely independent from metrics emitting issues). Regarding [problem identification](#4973 (comment)), `emitter/buffers/dropped` serves this well. Currently in this PR I added `/delta` vs. `/total` metrics, but if there are no objections I would remove `/total` and so align `emitter/buffers/allocated` and `/failed` with `/dropped` and `/events/emitted` which are already delta-only.
com.metamx:java-util:1.3.2
, that includes everything that was formerly scattered amongjava-util
,http-client
,emitter
andserver-metrics
libraries. Also, this update fixes this defect: Uncontrolled build up of buffers in HttpPostEmitter metamx/java-util#65HttpPostEmitter
andParametrizedUriEmitter
(depends on what is configured), with the following metrics:emitter/buffers/totalAllocated
-- gauge, normally the sum of it since a Druid instance startup should be < 5, but if this number grows indefinitely, something is wrong with your network to the emitting endpoint, likely.emitter/buffers/emitQueue
-- gauge, normally should stay close to zero, 1-2.emitter/buffers/failed
-- gauge, normally should be 0.emitter/buffers/dropped
- delta, normally should be 0.emitter/buffers/reuseQueue
-- gauge, normally should be < 5.emitter/events/emitted
-- delta, number of events emitted in the given period.emitter/events/emitQueue
-- gauge, related toemitter/buffers/emitQueue
emitter/events/large/emitQueue
-- gauge, related toemitter/buffers/emitQueue
emitter/batchFilling/timeMsSum
- delta, total time spend in filling emitter batches, in milliseconds.emitter/batchFilling/count
- delta, number of batches filled. timeMsSum / count gives "average batch filling time".emitter/batchFilling/maxTimeMs
- maximum time spend in filling one batch, since the last emitted.emitter/batchFilling/minTimeMs
- minimum time spend in filling one batch, since the last emitted.And 8 more metrics, similar to the 4 metrics above, with prefixes
emitter/successfulSending/
andemitter/failedSending/
. Which mean the successful and failed batch sending over HTTP to the emitter endpoint, respectively.The metrics of
emitter/successfulSending/
could help to determine the good value forminHttpTimeoutMillis
config, which is added tohttp
andparametrized
emitters.