Skip to content
This repository has been archived by the owner on Dec 23, 2023. It is now read-only.

Clock Skew(?) can cause the distruptor thread to crash #2068

Closed
steveniemitz opened this issue Dec 10, 2020 · 0 comments · Fixed by #2071
Closed

Clock Skew(?) can cause the distruptor thread to crash #2068

steveniemitz opened this issue Dec 10, 2020 · 0 comments · Fixed by #2071
Assignees
Labels

Comments

@steveniemitz
Copy link
Contributor

Please answer these questions before submitting a bug report.

What version of OpenCensus are you using?

0.24.0, but looks to be present in all versions

What JVM are you using (java -version)?

1.8.0

Occasionally, at process start, we'll see this error in our logs:

Exception in thread "OpenCensus.Disruptor-0" java.lang.RuntimeException: java.lang.IllegalArgumentException: Current time must be within or after the last bucket.
	at com.lmax.disruptor.FatalExceptionHandler.handleEventException(FatalExceptionHandler.java:45)
	at com.lmax.disruptor.dsl.ExceptionHandlerWrapper.handleEventException(ExceptionHandlerWrapper.java:18)
	at com.lmax.disruptor.BatchEventProcessor.processEvents(BatchEventProcessor.java:187)
	at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:125)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalArgumentException: Current time must be within or after the last bucket.
	at com.google.common.base.Preconditions.checkArgument(Preconditions.java:141)
	at io.opencensus.implcore.stats.MutableViewData$IntervalMutableViewData.refreshBucketList(MutableViewData.java:308)
	at io.opencensus.implcore.stats.MutableViewData$IntervalMutableViewData.record(MutableViewData.java:262)
	at io.opencensus.implcore.stats.MeasureToViewMap.record(MeasureToViewMap.java:160)
	at io.opencensus.implcore.stats.StatsManager$StatsEvent.process(StatsManager.java:101)
	at io.opencensus.impl.internal.DisruptorEventQueue$DisruptorEventHandler.onEvent(DisruptorEventQueue.java:229)
	at io.opencensus.impl.internal.DisruptorEventQueue$DisruptorEventHandler.onEvent(DisruptorEventQueue.java:222)
	at com.lmax.disruptor.BatchEventProcessor.processEvents(BatchEventProcessor.java:168)
	... 2 more

If this happens, all publisher threads eventually will hang forever, since there is no longer a consumer of the LMAX Disruptor queue. It seems like the cause of this is clock skew causing measurements to occur before the bucket start, and there is even a TODO in the code to handle this:
https://github.com/census-instrumentation/opencensus-java/blob/master/impl_core/src/main/java/io/opencensus/implcore/stats/MutableViewData.java#L307

I'm not sure what the correct behavior should be when the measurement timestamp is before the bucket start though, drop the event possibly?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants