Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue #1138 Replacement of sliding time window. #1139

Merged
merged 13 commits into from Jun 14, 2017
Merged

Issue #1138 Replacement of sliding time window. #1139

merged 13 commits into from Jun 14, 2017

Conversation

storozhukBM
Copy link
Contributor

Hi metrics team,

This PR provides drop-in replacement of SlidingTimeWindowReservoir called SlidingTimeWindowArrayReservoir. This implementation based uppon ChunkedAssociativeLongArray structure which stores all values in long arrays that are allocated and freed by big chunks. This way of storing reservoir values gives us drastically lower memory overhead over ConcurrentSkipListMap used by current implementation.

There is also new metrics-jcstress module intoduced within this PR. I've used OpenJDK Java Concurrency Stress tests to prove thread safety of new reservoir.

As for benchmarks:

  • On already existing benchmark called ReservoirBenchmark new implementation has 4.2x lower latency than old one. And most importantly SlidingTimeWindowArrayReservoir spends only 271 [ms] in GC during benchmark while previous implementation spends 13428 [ms]. With all this changes SlidingTimeWindowArrayReservoir now has performance comparable with ExponentiallyDecayingReservoir.

  • I also created new benchmark called SlidingTimeWindowReservoirsBenchmark with different scenario for this reservoirs. New benchmark has 3 threads that update target reservoir and 1 thread that makes snapshots. In such scenario new implementation has 2.5x faster writes, 3.5x faster snapshots and spends 40x less time in GC.

Small (1 hour long) load tests for two implementations. Load test also has 3 writers and 1 reader. Frequency of writes changes by periodic function and periods become shorter with time.

Results bellow:
1h_comparison

If you're OK with this changes we've basically two ways to merge this PR. We can leave this two separate reservoirs in library and give our users possibility to choose one. Or we can completely replace old implementation with new one.

newTick = getTick();
boolean longOverflow = newTick < lastTick;
if (longOverflow) {
measurements.clear();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previous implementation was able to handle nanotime overflow without loosing anything.

if (currentChunk.tailChunk == null) {
break;
}
currentChunk = currentChunk.tailChunk;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading mutable fields without synchronization. trim and put do mutation of:

  • cursor
  • startIndex
  • tailChunk
    So result can be totaly wrong, even negative.

@vladimir-bukhtoyarov
Copy link
Contributor

vladimir-bukhtoyarov commented Jun 13, 2017

General comment:
ChunkedAssociativeLongArray has only one properly synchronized method - put. The rest of methods: traverse, values, out, trim, clear and others are not properly synchronized, because do access to mutable fields of class Chunk without any synchronization.

@storozhukBM
Copy link
Contributor Author

Hi @vladimir-bukhtoyarov,
Thank you for review. I understand all concerns that you have but that is only after first look.
All operations in ChunkedAssociativeLongArray are designed to opreate without additional synchronization and recover from possible partial reads.
There is JCStress tests created to reproduce this concurrent writes and reads and prove that SlidingTimeWindowArrayReservoir behaves consistently.
So if you will be able to reproduce any situation that leads to data loss or run time errors it would be really helpfull, but I can assure you that I've done my best to prevent any such case.

As for long overflows:
Yes that is definitelly disadvantage of ChunkedAssociativeLongArray structure because we can't resort all values inside to recover from overflow, so we basically should clear all values from reservoir. But I have a compromise solution where we can delay such overflow by 416 days from reservoir creation.
I changed comparisonResultsTest to reproduce overflow situation and then made overflow delay change to fix this test.

@vladimir-bukhtoyarov
Copy link
Contributor

Hi Bogdan,

You wrote that ChunkedAssociativeLongArray are designed to operate without additional synchronization. But after detailed look, I did not find any design solution in this class, which help to eliminate negative effect of evil dataraces which presented in ChunkedAssociativeLongArray multiple times. Nothing to prevent to concurrent mutation of same chunk, and concurrent reading and writing of same chunk, so I continue to blame ChunkedAssociativeLongArray as not properly synchronized code which eventually will crash monitoring. Just run following benchmark to observe how dataraces are evil:

package com.codahale.metrics.benchmarks;

import com.codahale.metrics.SlidingTimeWindowArrayReservoir;
import com.codahale.metrics.Snapshot;
import org.openjdk.jmh.annotations.*;
import org.openjdk.jmh.profile.GCProfiler;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.RunnerException;
import org.openjdk.jmh.runner.options.Options;
import org.openjdk.jmh.runner.options.OptionsBuilder;
import org.openjdk.jmh.runner.options.TimeValue;

import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicLong;


@State(Scope.Benchmark)
public class SlidingTimeWindowReservoirsDataRaceBenchmark {

    private final SlidingTimeWindowArrayReservoir reservoir = new SlidingTimeWindowArrayReservoir(200, TimeUnit.NANOSECONDS);
    private AtomicLong sequence = new AtomicLong(1);

    @Benchmark
    @Group("contention")
    @GroupThreads(2)
    public void update() {
        reservoir.update(sequence.incrementAndGet());
    }

    @Benchmark
    @Group("contention")
    @GroupThreads(1)
    public int getSize() {
        int size = reservoir.size();
        if (size < 0) {
            throw new IllegalStateException("size must not be negative");
        }
        return size;
    }

    @Benchmark
    @Group("contention")
    @GroupThreads(1)
    public Snapshot getSnapshot() {
        Snapshot snapshot = reservoir.getSnapshot();
        if (snapshot.size() != 0 && snapshot.getMin() == 0) {
            throw new IllegalStateException("Zero was never recorded and should not present in snapshot");
        }
        return snapshot;
    }

    public static void main(String[] args) throws RunnerException {
        Options opt = new OptionsBuilder()
            .include(".*" + SlidingTimeWindowReservoirsDataRaceBenchmark.class.getSimpleName() + ".*")
            .threads(4)
            .warmupIterations(10)
            .measurementIterations(10)
            .addProfiler(GCProfiler.class)
            .measurementTime(TimeValue.seconds(10))
            .timeUnit(TimeUnit.MICROSECONDS)
            .mode(Mode.AverageTime)
            .forks(1)
            .build();

        new Runner(opt).run();
    }
}

Benchmark crashes in few milliseconds.

# VM invoker: C:\Program Files\Java\jdk1.8.0_101\jre\bin\java.exe
# VM options: -Didea.launcher.port=7534 -Didea.launcher.bin.path=D:\dev\bin\gui-tools\IDEA\IntelliJ IDEA 2016.3\bin -Dfile.encoding=UTF-8
# Warmup: 10 iterations, 1 s each
# Measurement: 10 iterations, 10 s each
# Threads: 4 threads, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: com.codahale.metrics.benchmarks.SlidingTimeWindowReservoirsDataRaceBenchmark.contention

# Run progress: 0,00% complete, ETA 00:01:50
# Fork: 1 of 1
# Warmup Iteration   1: 0,619 ±(99.9%) 1,188 us/op
# Warmup Iteration   2: <failure>

java.lang.NegativeArraySizeException
	at com.codahale.metrics.ChunkedAssociativeLongArray.values(ChunkedAssociativeLongArray.java:76)
	at com.codahale.metrics.SlidingTimeWindowArrayReservoir.getSnapshot(SlidingTimeWindowArrayReservoir.java:73)
	at com.codahale.metrics.benchmarks.SlidingTimeWindowReservoirsDataRaceBenchmark.getSnapshot(SlidingTimeWindowReservoirsDataRaceBenchmark.java:47)
	at com.codahale.metrics.benchmarks.generated.SlidingTimeWindowReservoirsDataRaceBenchmark_contention.getSnapshot_avgt_jmhStub(SlidingTimeWindowReservoirsDataRaceBenchmark_contention.java:371)
	at com.codahale.metrics.benchmarks.generated.SlidingTimeWindowReservoirsDataRaceBenchmark_contention.contention_AverageTime(SlidingTimeWindowReservoirsDataRaceBenchmark_contention.java:282)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.openjdk.jmh.runner.LoopBenchmarkHandler$BenchmarkTask.call(LoopBenchmarkHandler.java:210)
	at org.openjdk.jmh.runner.LoopBenchmarkHandler$BenchmarkTask.call(LoopBenchmarkHandler.java:192)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

# Run complete. Total time: 00:00:02

According to jsstress - it is addional tool which can be helpful sometimes, but primary tool of any java programmer who creates concurrent code should be his brain and formal loggic described in JMM specification. If jcstress does not show errors, but code is not well formed synchronized according to JMM rules, that final decision about code correctnes should be taken according to JMM, violations of formal rules are quit enough to blame any code.

And general note about dataraces:
In very few very limited cases, dataraces are acceptable, for example hashcode method of java.lang.String, But each place where datarace exists, should be well documented by tons of comments which describe why each particular datarace does not damage your code, especially in opensource, else nobody can mantain library in future.

Sorry if I'm not polite.

@storozhukBM
Copy link
Contributor Author

storozhukBM commented Jun 14, 2017

Hi Vladimir,
Thank you one more time for review and exhaustive comment.
I must admin that it was stupid to try to hack the system 😶.
My latest PR makes everything simple and straightforward.

Of course this change brings some different numbers but they are still impressive and GC pauses are even lower (ReentrantLock creates synchronization helper objects).
In ReservoirBenchmark: updates are 2.7x faster and 80x less time spent in GC.
In SlidingTimeWindowReservoirsBenchmark: updates are 2.4x faster, reads - 5.6x faster, 57x less time spent in GC.

@vladimir-bukhtoyarov
Copy link
Contributor

Now, It looks correct!

@storozhukBM
Copy link
Contributor Author

OK, so should we replace the existing SlidingTimeWindowReservoir?

@vladimir-bukhtoyarov
Copy link
Contributor

@arteam please take a look to this request.

@arteam
Copy link
Member

arteam commented Jun 14, 2017

I will review tonight. Thank you!

@arteam
Copy link
Member

arteam commented Jun 14, 2017

OK, so should we replace the existing SlidingTimeWindowReservoir?

I reviewed and the implementation looks good (or I can't spot obvious concurrency bugs). Let's replace SlidingTimeWindowReservoir and I will merge the pull request!

@ryantenney
Copy link
Contributor

I would have suggested adding it as part of 3.2 and letting users opt into it, and then replace the other one in 3.3 or 4.

@arteam
Copy link
Member

arteam commented Jun 14, 2017

On second thought, that's probably the best option from the point of reducing the risk of production bugs for users of 3.2. The users who will want to test this reservoir, can test use it in 3.2.3. If everything goes right, then we will make it the default one.

@storozhukBM
Copy link
Contributor Author

storozhukBM commented Jun 14, 2017

I reviewed and the implementation looks good (or I can't spot obvious concurrency bugs).

Don't worry, everything should be OK now. 😄

@arteam arteam merged commit 0e7718b into dropwizard:3.2-development Jun 14, 2017
@arteam
Copy link
Member

arteam commented Jun 14, 2017

Thank you very much for the contribution!

@storozhukBM
Copy link
Contributor Author

🎉
Thank you too, guys. Do you have any plans for release date?

@arteam
Copy link
Member

arteam commented Jun 14, 2017

I thought about making a release on Monday. There is a couple of useful contributions in the 3.2-development branch, which would be nice to release.

@jplock jplock added this to the 3.2.3 milestone Jun 14, 2017
@arteam
Copy link
Member

arteam commented Jun 20, 2017

Sorry for the delay, I am busy this week. Will try to make code reviews and cut a release this week.

bulldozer-bot bot pushed a commit to palantir/tritium that referenced this pull request Apr 23, 2019
dmichel1 added a commit to spotify/heroic that referenced this pull request Apr 27, 2020
Rather than using a custom reservoir in a few places use the new SlidingTimeWindowArrayReservoir everywhere.

This will ensure all histograms values decay in a timely fashion with low traffic and also accurately capture min and max values.

https://medium.com/expedia-group-tech/your-latency-metrics-could-be-misleading-you-how-hdrhistogram-can-help-9d545b598374 goes more into the details about what is wrong with the current histogram implantation.

dropwizard/metrics#1139 and https://metrics.dropwizard.io/4.0.0/manual/core.html#sliding-time-window-reservoirs
dmichel1 added a commit to spotify/heroic that referenced this pull request Apr 27, 2020
Rather than using a custom reservoir in a few places use the new SlidingTimeWindowArrayReservoir everywhere.

This will ensure all histograms values decay in a timely fashion with low traffic and also accurately capture min and max values.

https://medium.com/expedia-group-tech/your-latency-metrics-could-be-misleading-you-how-hdrhistogram-can-help-9d545b598374 goes more into the details about what is wrong with the current histogram implantation.

dropwizard/metrics#1139 and https://metrics.dropwizard.io/4.0.0/manual/core.html#sliding-time-window-reservoirs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants