Backport rescale bug #1046

arteam · 2017-01-23T17:51:58Z

Backport of #1033 to the 3.1.* branch.

ryantenney · 2017-01-23T17:55:16Z

I'd like to look into this further before we backport it.

arteam · 2017-01-23T17:56:29Z

Thanks, ping me when you have more thoughts on the matter.

ryantenney · 2017-01-23T17:58:35Z

While it does seem to correct an issue, I have concerns about it blocking all threads on the rescale once every hour.

ho3rexqj · 2017-01-28T23:40:40Z

In both the original and updated implementations, while a thread is performing the rescale operation (within the write lock) all concurrent updates block until the rescale operation is complete. There are behavioral differences there, however nothing fundamental changes with respect to the impact on other concurrent updates while the rescale operation is ongoing - could you clarify your concern there?

Note that the race condition the patch fixes allowed updates to sometimes be added to the reservoir outside of the rescale window (that is, without a preceding rescale operation). The fix merely enforces that requirement; updates outside of the rescale window are forced to wait for the rescale operation to complete before updating the reservoir's state. There are other implementations that would accomplish that - the write lock barrier is more than is strictly required there:

One approach would be to add a second atomic to rescale (or similar) to guard the call to lockForRescale:

if (rescaleGuard.getAndSet(next) != next) {
    lockForRescale();
    try {
        if (nextScaleTime.compareAndSet(next, now + RESCALE_THRESHOLD)) {
            // ...
        }
    } finally {
        unlockForRescale();
    }
} else {
    // a spin lock is used in this illustration; ideally something else would be used in practice
    while (nextScaleTime.get() == next)
        ; // do nothing
}

Alternatively, rather than enforcing the above requirement, it could be relaxed and the reservoir could be refactored accordingly. For example, a cache could be added to the reservoir to collect updates added in update outside of the rescale window (value/timestamp pairs) - those would then needed to be processed as part of the rescale operation, and possibly as part of getSnapshot (although that could likely be omitted).

Note, however, both of the above would add significant complexity to the reservoir. I'm not certain it's at all worthwhile given that the window during which multiple concurrent calls to update will block on lockForRescale should either be small (two concurrent calls to update) or involve three or more competing threads (one thread executing getSnapshot and two or more executing update concurrently).

…ce condition. (cherry picked from commit d9af3c1)

…method. (cherry picked from commit b022f04)

* Support for publishing metrics of BigInteger and BigDecimal type * Update third-party.rst New reporter - metrics-munin-reporter * Add links to more third party libs * Update third-party to include metrics-circonus * Fix spelling in core.rst * #814 always close socket (cherry picked from commit 2762ce5) * Revert "Add log4j2 xml-based config support" This reverts commit 6667014. * Revert "Added log4j2 InstrumentedAppender xml support to docs" This reverts commit 26c4cd2. * Revert "Fix javadoc link" This reverts commit 28e6556. * Revert "Add CPU profiling support to the AdminServlet (fixes #927)" This reverts commit bbb52c1. * Revert "Support for publishing metrics of BigInteger and BigDecimal type" This reverts commit 7cfc23c. * Suppress all kinds of Exceptions raised by report() (#1049) (#1056) * Support the JDK's ThreadLocalRandom (#1052) Add a proxy which provides a ThreadLocalRandom implementation depending on its availability in runtime. If the JDK provides ThreadLocalRandom, use it. Otherwise, fallback to the internal implementation. * Support JDK's LongAdder (#1055) * Support the JDK's LongAdder Add an ability to use the JDK's implementation if it's available in the runtime. Use a proxy for creating LongAdders which tries to use the JDK's implementation and fallbacks to the internal one if the JDK doesn't provide any. Unfortunatelly, `LongAdder`s don't have a common interface, therefore we introduce `LongAdderAdapter` as a common interface for `LongAdder`s. It will be used in other components as an interface, but a concrete implementation of it will be created by `LongAdderProxy`. This pattern allows to us to still run code in the JRE, but enjoy the benefits of the stock `LongAdder` in modern JDKs (less bugs, better performance). * Run tests only on JDK8 We need a compile dependency on LongAdders which is not available in JDK7. * Retry DNS lookups (#1064) Do retry DNS lookups, which are just done in the constructor of InetSocketAddress. Reason might be that in some environments DNS is very dynamic and thus a name might be sometimes resolveable and sometimes not. The way this is currently implemented the InetSocketAddress is always cached, which in itself caches the return value of the last DNS lookup, even if that failed. * Backport rescale bug (#1046) * Added ExponentiallyDecayingReservoir test illustrating the rescale race condition. (cherry picked from commit d9af3c1) * Fixed the race condition in ExponentiallyDecayingReservoir's rescale method. (cherry picked from commit b022f04)

arteam added the bug label Jan 23, 2017

arteam added this to the 3.1.3 milestone Jan 23, 2017

ho3rexqj added 2 commits February 7, 2017 22:49

Added ExponentiallyDecayingReservoir test illustrating the rescale ra…

29ab943

…ce condition. (cherry picked from commit d9af3c1)

Fixed the race condition in ExponentiallyDecayingReservoir's rescale …

8eec1da

…method. (cherry picked from commit b022f04)

arteam force-pushed the backport_rescale_bug branch from 598d362 to 8eec1da Compare February 7, 2017 21:49

arteam merged commit b97a09c into 3.1-maintenance Feb 8, 2017

arteam deleted the backport_rescale_bug branch February 8, 2017 08:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backport rescale bug #1046

Backport rescale bug #1046

arteam commented Jan 23, 2017

ryantenney commented Jan 23, 2017

arteam commented Jan 23, 2017

ryantenney commented Jan 23, 2017

ho3rexqj commented Jan 28, 2017

Backport rescale bug #1046

Backport rescale bug #1046

Conversation

arteam commented Jan 23, 2017

ryantenney commented Jan 23, 2017

arteam commented Jan 23, 2017

ryantenney commented Jan 23, 2017

ho3rexqj commented Jan 28, 2017