-
Notifications
You must be signed in to change notification settings - Fork 201
Adds Tracing.getExportComponent().flushAndShutdown() for use within application shutdown hooks. #1141
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1141 +/- ##
============================================
- Coverage 82.1% 81.87% -0.24%
- Complexity 1228 1229 +1
============================================
Files 192 192
Lines 5941 5975 +34
Branches 551 553 +2
============================================
+ Hits 4878 4892 +14
- Misses 916 935 +19
- Partials 147 148 +1
Continue to review full report at Codecov.
|
/cc @bogdandrutu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this feature! I have a few comments, mainly about the behavior of the Disruptor. I also added some suggestions on testing the two main parts of this change.
@@ -66,6 +66,8 @@ public static ExportComponent newNoopExportComponent() { | |||
*/ | |||
public abstract SampledSpanStore getSampledSpanStore(); | |||
|
|||
public abstract void flushAndShutdown(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add a Javadoc with a @since
tag? Also, I would expect a shutdown combined with a flush to be a more common use case than an immediate shutdown, so I think this method could just be named "shutdown". That name is also consistent with ExecutorService.shutdown, which finishes executing the existing tasks.
@@ -66,6 +66,8 @@ public static ExportComponent newNoopExportComponent() { | |||
*/ | |||
public abstract SampledSpanStore getSampledSpanStore(); | |||
|
|||
public abstract void flushAndShutdown(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method should be concrete, for backwards-compatibility. The same applies to all of the new methods on public non-final classes in the api
directory.
* Shuts down the underlying disruptor. | ||
* | ||
* <p>Unfortunately there is no underlying public flush mechanism, without it there is a race | ||
* condition in the ring buffer where it can hold events into the jvm shutdown. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately there is no underlying public flush mechanism, without it there is a race
condition in the ring buffer where it can hold events into the jvm shutdown.
I'm not sure I understand this comment. Is this an explanation for why DisruptorEventQueue can't support flushing without also shutting down the disruptor, or does this implementation contain a race condition? If there is a race condition, what effect could it have?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the comment. It was lamenting the lack of a flush vs a shutdown, unclearly.
@@ -132,6 +132,13 @@ static SampledSpanStore newNoopSampledSpanStore() { | |||
@PublicForTesting | |||
public abstract Set<String> getRegisteredSpanNamesForCollection(); | |||
|
|||
/** | |||
* This forces any underlying event queue to flush any pending events and shutdown and handlers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this say "any handlers"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the underlying implementation is simply shutting down the event queue. there are no handlers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't understand the part of the Javadoc that mentions handlers. Should that part be removed?
@Override | ||
public void flushAndShutdown() { | ||
sampledSpanStore.shutdown(); | ||
spanExporter.flush(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this method also shut down the SpanExporter thread?
*/ | ||
@Override | ||
public void shutdown() { | ||
disruptor.shutdown(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation for Disruptor.shutdown says "It is critical that publishing to the ring buffer has stopped before calling this method, otherwise it may never return." Should this method stop the DisruptorEventQueue from accepting more Entries before calling Disruptor.shutdown?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory, but this would require a check on every publish/enqueue call from what I can tell, unless we swap something out with a Noop impl.
but this is a shutdown, its intended to be run during the jvm shutdown, so new spans should stop at some point naturally. thus this continuing to accept spans until there are no more might be considered a virtue.
if the jvm never shuts down, there is clearly an issue elsewhere (the thing accepting requests that spawns spans is not itself shutdown).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant that I don't think that this class should continue to enqueue Entries on the Disruptor after the Disruptor has been shut down. That could cause the thread that recorded the instrumentation data to block forever. I think it would be better to discard the data that cannot be processed, and log an error, to avoid interfering with the application.
I'm not sure how we could make this class continue to accept data after shutdown without a method supporting similar behavior on the Disruptor.
/cc @bogdandrutu
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is only danger if users misuse the shutdown()
call. that is, try to shut down while the application is running, not when the application is shutting down.
the two dangers are the shutdown()
call not returning if spans are added at a frequency greater than the cycle frequency, preventing the hasBacklog()
call from ever returning false.
and the enqueue()
call not discarding entries causing, I presume, a memory/resource leak. I haven’t tested this theory.
as I see it, we document this behavior. or within DisruptorEventQueue#enqueue()
we wrap the enqueue logic in an anonymous class, and swap it out for a Noop version at shutdown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the number of entries is limited by the buffer size, but I want to avoid having the call to enqueue
not return when the buffer is full. I prefer your suggestion to swap the disruptor for a no-op version. The no-op enqueue
method could also log a warning the first time it is called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
took a shot at both.
@@ -188,5 +193,18 @@ public void run() { | |||
} | |||
} | |||
} | |||
|
|||
void flush() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you could test this method by creating a SpanExporterImpl with a very long scheduleDelay
, adding a few spans (less than the buffer size), calling flush(), and then asserting that the spans were exported, as in exportNotSampledSpans.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, i left this out for now. looks like there are three options
- copy the test class, and change the delay (removing the other tests)
- add a method on SpanExporter to change the delay via a test visible method, change the delay in the new test
- refactor the test class so the setup is in setup() and subclass to change the delay and other params
i’m sure you have something better in mind, so let me know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the new test could probably go in the existing test class, but the initialization code would need to be refactored to allow the scheduleDelay
to be customized for each test. I think you could refactor it with these steps:
- Move
spanExporter
andstartEndHandler
into the tests that need them, as local variables. - Move the call to
spanExporter.registerHandler
into the tests that need the SpanExporter. - Make
createSampledEndedSpan
andcreateNotSampledEndedSpan
each take theStartEndHandler
as an argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any comments on how I should implement the test?
* condition in the ring buffer where it can hold events into the jvm shutdown. | ||
*/ | ||
@Override | ||
public void shutdown() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method seems difficult to test, because the Disruptor is a singleton. I'm fine with adding a TODO to test it after we refactor the class to avoid the singleton.
* | ||
* @since 0.13 | ||
*/ | ||
public abstract void shutdown(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not so sure if we should expose this API in RunningSpanStore
.
For me, adding a shutdown()
in ExportComponent
is fine - that could be the entry for users and it hides all the details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would a shutdown method do in RunningSpanStore? I don't think it has a separate thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe make it protected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sebright sorry, I meant SampledSpanStore
in the first place. But for RunningSpanStore
, a shutdown
does make sense. For a RunningSpanStore
, one might want to close, set proper status and sample option and export some important running spans before the program exit unexpectedly - I made that up :)
Back to the topic, what I originally meant was that we should probably only expose ExportComponent.shutdown()
. Other subsequent shutdown
methods should be hide from users (by making them protected, or mark as Internal
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I like that idea. Users probably wouldn't need to call shutdown on only one part of the TraceComponent. I think that the new methods on SampledSpanStore and SpanExporter are only called from the implementation anyway, so they could be completely removed from the superclasses under api/.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i’ll wait on guidance after you have review the new commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to avoid exposing new APIs until they are needed. Did you only add the methods so that they could be called by ExportComponent.shutdown
, or is there a use case for only calling shutdown
or flush
on a subset of the components?
pushed changes per some of the comments. still needs a test, see above. unsure if we should still hide the shutdown call chain. waiting on guidance. |
ok, added the test |
} | ||
|
||
@Test | ||
public void exportNotSampledSpansFlushed() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test could use a JUnit explicit timeout: https://github.com/junit-team/junit4/wiki/timeout-for-tests#timeout-parameter-on-test-annotation-applies-to-test-method
@Test | ||
public void exportNotSampledSpansFlushed() { | ||
// Set the export delay to a very long value in order to confirm the #flush() below works | ||
SpanExporterImpl spanExporter = SpanExporterImpl.create(4, Duration.create(Long.MAX_VALUE, 0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like Duration.create
has a limit on the number of seconds, and it uses zero when the value is greater than 315576000000L:
opencensus-java/api/src/main/java/io/opencensus/common/Duration.java
Lines 55 to 56 in 773c763
if (seconds < -MAX_SECONDS || seconds > MAX_SECONDS) { | |
return ZERO; |
I created an issue to improve the API: #1179
spanExporter.registerHandler("test.service", serviceHandler); | ||
|
||
SpanImpl span1 = createNotSampledEndedSpan(startEndHandler, SPAN_NAME_1); | ||
SpanImpl span2 = createSampledEndedSpan(startEndHandler, SPAN_NAME_2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this test could be simplified by removing the non-sampled span, since that was tested in exportNotSampledSpans
.
cleaned up test. if looks good, I can squash and rebase to resolve any conflicts. |
The test looks good. I don't mind if you rebase. I think that there are only two remaining review comments, related to removing the public methods on SpanExporter and SampledSpanStore (#1141 (comment)), and preventing threads from blocking when enqueueing events after Disruptor shutdown (#1141 (comment)). |
lowered the visibility on the shutdown methods. and see comments above #1141 (comment) |
pushed a noop handler for the enqueuing. let me know if I missed anything else. |
…or javadoc changes
rebased, didn’t squash. went w/ AtomicBoolean, since it isn’t initialized unless |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good. I think the only comment left is the one about making the DisruptorEnqueuer volatile.
CHANGELOG.md
Outdated
- Map http attributes to Stackdriver format (fix [#1153](https://github.com/census-instrumentation/opencensus-java/issues/1153)). | ||
|
||
## 0.13.1 - 2018-05-02 | ||
- Fix a typo on displaying Aggregation Type for a View on StatsZ page. | ||
- Set bucket bounds as "le" labels for Prometheus Stats exporter. | ||
|
||
## 0.13.0 - 2018-04-27 | ||
======= |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this left over from the merge?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
CHANGELOG.md
Outdated
@@ -1,13 +1,15 @@ | |||
## Unreleased | |||
|
|||
## 0.13.2 - 2018-05-08 | |||
- Adds Tracing.getExportComponent().shutdown() for use within application shutdown hooks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0.13.2 was already released, so this should go under unreleased.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ugh, sorry, wasn’t paying attention. fixed.
Thanks! I'll merge it when the build passes. Do you mind if I squash it? |
please do squash it. i didn’t for fear of losing track of all the pending change comments. |
I think the nullness checker error is a false positive. You could fix the build by adding |
ok, if i change i’m using both the intellij google-java-format plugin and google java format style, which have slightly different results, but neither fixes the above issue. i’m unsure how to make verGJF more informative on the failure. |
I'm surprised that I tried running diff --git a/impl/src/main/java/io/opencensus/impl/internal/DisruptorEventQueue.java b/impl/src/main/java/io/opencensus/impl/internal/DisruptorEventQueue.java
index 95c5d14d1..5145ca3b7 100644
--- a/impl/src/main/java/io/opencensus/impl/internal/DisruptorEventQueue.java
+++ b/impl/src/main/java/io/opencensus/impl/internal/DisruptorEventQueue.java
@@ -157,13 +157,11 @@ public final class DisruptorEventQueue implements EventQueue {
@Override
public void enqueue(Entry entry) {
enqueuer.enqueue(entry);
}
- /**
- * Shuts down the underlying disruptor.
- */
+ /** Shuts down the underlying disruptor. */
@Override
public void shutdown() {
enqueuer =
new DisruptorEnqueuer() {
final AtomicBoolean logged = new AtomicBoolean(false);
@@ -188,12 +186,11 @@ public final class DisruptorEventQueue implements EventQueue {
// An event in the {@link EventQueue}. Just holds a reference to an EventQueue.Entry.
private static final class DisruptorEvent {
// TODO(bdrutu): Investigate if volatile is needed. This object is shared between threads so
// intuitively this variable must be volatile.
- @Nullable
- private volatile Entry entry = null;
+ @Nullable private volatile Entry entry = null;
// Sets the EventQueueEntry associated with this DisruptorEvent.
void setEntry(@Nullable Entry entry) {
this.entry = entry;
} |
ah, goGJF, new to me. will try that. |
Thanks! |
This allows a developer to force a flush from within a shutdown hook or other means.
Unfortunately the underlying Disruptor instance only provides a #shutdown() call, not a flush, or a public method for testing for backlog. Thus shutdown has propagated up to the above api call.
I did not add a test case, happy to discuss how this would be implemented reliably.