New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LOG4J2-928] Mostly wait-free MemoryMappedFileManager #87
[LOG4J2-928] Mostly wait-free MemoryMappedFileManager #87
Conversation
…d MemoryMappedFileManager.flush()
…ileManager.Region.tryLock() wait-free
…fNeeded() after the unlock
…ionWhileLocked(), change to wait-free update
52e3588
to
b193120
Compare
boolean okToInvokeMHCaller; | ||
|
||
try { | ||
Class.forName("java.lang.invoke.MethodHandle"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two things here:
- We have a LoaderUtil class for loading classes which tends to work a bit better than using
Class.forName
. - We're on Java 7 minimum, so this class is for sure available. No need for reflection here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jvz thanks for note, removed ThreadHintsMH
Roman, I started to review your patch. Initial feedback below (still reading):
... still reading, also looking forward to tests & perf results. On the topic of perf testing: I'd be interested to see a throughput comparison of the new MemoryMappedFileAppender vs. RandomAccessFileAppender and FileAppender with a varying number of logging threads (1, 2, 4, 8, 16, 32, 64 if possible). A throughput comparison with Async Loggers would also be interesting but be aware that JMH quickly fills up the disruptor ringbuffer. To solve this, configure with a non-existing appender (so the background thread essentially logs to /dev/null). See /log4j-perf/src/main/java/org/apache/logging/log4j/perf/jmh/AsyncLoggersBenchmark.java Some indication of latency behaviour would also be interesting. |
Roman, one more thing: since this is a non-trivial patch, would you mind submitting an ICLA? |
…r() and add MemoryMappedFileManagerConcurrentWriteTest
I follow the approach of java.util.concurrent internals here, where
I don't want to expose them, because these classes are used just inside
It will go with the locking path, rather than wait-free. |
@remkop note that this PR is primarily about latency, not throughput. I will do some testing, but if the throughput degrades (that I expect in the single-thread case), and latency improves, I will consider this a success. |
…ble), make logger type a parameter, rename to Log4j2AppenderThroughputBenchmark, add Log4j2AppenderLatencyBenchmark
Latency comparison
On my MBP with 4 cores and SSD. Tail latency is much better than it used to be, but worse than I expected. Calling
|
Throughput comparisonFirst number is the number of threads, "150" is the message size.
With this PR
On the same MacBook with 4 cores. You can see that with 1 or 2 threads older version shows higher thoughput, but it's the limit of it's scalability, and with 3 or 4 threads the wait-free version takes over. |
I'll try to run those benchmarks on an AWS instance next week, though that's not an ideal place to benchmark due to noisy neighbours and that sort of thing. |
I tried to run:
but got this result:
|
@mikaelstaldal it is configured for at most 1 second latency. Maybe on your machine some pause takes more. BTW jvm crash is not a mystery anymore, it was because of reads after buffer unmapping. |
Then I suggest that you raise the maximum latency, since it can obviously happen. And maybe this long latency on my machine is a flaw in the implementation? I consistently get this issue on my 64 bit Linux machine. With both Java 7 and Java 8. I have a mechanical HDD. |
BTW, the test fails on Java 9:
|
Roman, thanks for the feedback and the benchmarks.
About testing latency: Please take a look at /log4j-core/src/test/java/org/apache/logging/log4j/core/async/perftest/ResponseTimeTest.java ResponseTimeTest incorporates various lessons I learned from Gil Tene's presentations on measuring latency as well as discussions on the Mechanical Sympathy mailing list. (The inner Pacer class is Gil's.) It distinguishes between response time and service time (see the sidebar on the Log4j perf page), and it allows us to control the load. This is important because latency behaviour is often different under various workloads. I am not a fan of using JMH for measuring latency because it only measures service time, not response time, and there is no way to control the workload, it just runs at max speed. (Gil explicitly mentioned we probably don't want to measure latency under maximum load, when the system is under pressure and latency becomes unpredictable). Also, it looks like Log4j2AppenderLatencyBenchmark also measures how long it takes to generate a random number, which is probably not the intention. The throughput numbers look very promising, I will try to run these tests on enterprise hardware this week when I have a chance, and report back the results. All in all, this is great stuff! Keep 'em coming! |
P.S. Roman, are you okay with submitting an ICLA? I am still reviewing the patch but we need the ICLA before it can be merged. If you put "Apache Logging" in the form field "notify project", the secretary will let us know when your ICLA is on file. |
As promised, here are the throughput results on enterprise hardware. Hardware: total 64 cores (4x 16-core) Intel(R) Xeon(R) CPU E5-2697A v4 @ 2.60GHz Results look good: as soon as we introduce concurrency, the lock-free MMap appender starts outperforming the other file appenders. 1 thread
4 threads
8 threads
16 threads
32 threads
64 threads
|
Thank you for running the benchmarks. Is this worth doing over on the
_current_ Java 8 runtime (_131)?
Gary
…On Wed, Jul 5, 2017 at 6:59 AM, Remko Popma ***@***.***> wrote:
As promised, here are the throughput results on enterprise hardware.
Hardware: total 64 cores (4x 16-core) Intel(R) Xeon(R) CPU E5-2697A v4 @
2.60GHz
Java Hotspot 64-bit server 1.8.0_112
Results look good: as soon as we introduce concurrency, the lock-free MMap
appender starts outperforming the other file appenders.
*1 thread*
{code}
-sh-4.2$ java -jar benchmarks.jar ".*Log4j2AppenderThroughput.*" -wi 5 -f 1 -i 10
Benchmark (loggerType) (messageSizeBytes) Mode Samples Score Error Units
o.a.l.l.p.j.Log4j2AppenderThroughputBenchmark.appender File 150 thrpt 10 1344494.961 ± 221110.545 ops/s
o.a.l.l.p.j.Log4j2AppenderThroughputBenchmark.appender RAF 150 thrpt 10 1563396.748 ± 12497.136 ops/s
o.a.l.l.p.j.Log4j2AppenderThroughputBenchmark.appender MMap 150 thrpt 10 1223215.440 ± 14661.487 ops/s
o.a.l.l.p.j.Log4j2AppenderThroughputBenchmark.end2end File 150 thrpt 10 1008537.505 ± 9213.181 ops/s
o.a.l.l.p.j.Log4j2AppenderThroughputBenchmark.end2end RAF 150 thrpt 10 998564.075 ± 12207.661 ops/s
o.a.l.l.p.j.Log4j2AppenderThroughputBenchmark.end2end MMap 150 thrpt 10 785114.881 ± 8827.768 ops/s
{code}
*4 threads*
{code}
-sh-4.2$ java -jar benchmarks.jar ".*Log4j2AppenderThroughput.*" -wi 5 -f 1 -i 10 -t 4
Benchmark (loggerType) (messageSizeBytes) Mode Samples Score Error Units
o.a.l.l.p.j.Log4j2AppenderThroughputBenchmark.appender File 150 thrpt 10 1678485.793 ± 260632.925 ops/s
o.a.l.l.p.j.Log4j2AppenderThroughputBenchmark.appender RAF 150 thrpt 10 1412574.559 ± 447410.040 ops/s
o.a.l.l.p.j.Log4j2AppenderThroughputBenchmark.appender MMap 150 thrpt 10 1386414.915 ± 333117.992 ops/s
o.a.l.l.p.j.Log4j2AppenderThroughputBenchmark.end2end File 150 thrpt 10 959662.994 ± 333541.548 ops/s
o.a.l.l.p.j.Log4j2AppenderThroughputBenchmark.end2end RAF 150 thrpt 10 929405.390 ± 66673.436 ops/s
o.a.l.l.p.j.Log4j2AppenderThroughputBenchmark.end2end MMap 150 thrpt 10 1013681.438 ± 155916.949 ops/s
{code}
*8 threads*
{code}
-sh-4.2$ java -jar benchmarks.jar ".*Log4j2AppenderThroughput.*" -wi 5 -f 1 -i 10 -t 8
Benchmark (loggerType) (messageSizeBytes) Mode Samples Score Error Units
o.a.l.l.p.j.Log4j2AppenderThroughputBenchmark.appender File 150 thrpt 10 1164538.492 ± 62482.965 ops/s
o.a.l.l.p.j.Log4j2AppenderThroughputBenchmark.appender RAF 150 thrpt 10 1194865.200 ± 72251.118 ops/s
o.a.l.l.p.j.Log4j2AppenderThroughputBenchmark.appender MMap 150 thrpt 10 1338960.644 ± 71104.991 ops/s
o.a.l.l.p.j.Log4j2AppenderThroughputBenchmark.end2end File 150 thrpt 10 944670.900 ± 69190.255 ops/s
o.a.l.l.p.j.Log4j2AppenderThroughputBenchmark.end2end RAF 150 thrpt 10 930248.735 ± 28523.086 ops/s
o.a.l.l.p.j.Log4j2AppenderThroughputBenchmark.end2end MMap 150 thrpt 10 998676.286 ± 36327.805 ops/s
{code}
*16 threads*
{code}
-sh-4.2$ java -jar benchmarks.jar ".*Log4j2AppenderThroughput.*" -wi 5 -f 1 -i 10 -t 16
Benchmark (loggerType) (messageSizeBytes) Mode Samples Score Error Units
o.a.l.l.p.j.Log4j2AppenderThroughputBenchmark.appender File 150 thrpt 10 1214786.965 ± 7217.940 ops/s
o.a.l.l.p.j.Log4j2AppenderThroughputBenchmark.appender RAF 150 thrpt 10 1192036.036 ± 13696.830 ops/s
o.a.l.l.p.j.Log4j2AppenderThroughputBenchmark.appender MMap 150 thrpt 10 1548961.297 ± 27141.206 ops/s
o.a.l.l.p.j.Log4j2AppenderThroughputBenchmark.end2end File 150 thrpt 10 1003235.059 ± 10365.921 ops/s
o.a.l.l.p.j.Log4j2AppenderThroughputBenchmark.end2end RAF 150 thrpt 10 994834.287 ± 9057.470 ops/s
o.a.l.l.p.j.Log4j2AppenderThroughputBenchmark.end2end MMap 150 thrpt 10 1323043.186 ± 19960.127 ops/s
{code}
*32 threads*
{code}
-sh-4.2$ java -jar benchmarks.jar ".*Log4j2AppenderThroughput.*" -wi 5 -f 1 -i 10 -t 32
Benchmark (loggerType) (messageSizeBytes) Mode Samples Score Error Units
o.a.l.l.p.j.Log4j2AppenderThroughputBenchmark.appender File 150 thrpt 10 1372729.015 ± 40762.524 ops/s
o.a.l.l.p.j.Log4j2AppenderThroughputBenchmark.appender RAF 150 thrpt 10 1185351.707 ± 16992.542 ops/s
o.a.l.l.p.j.Log4j2AppenderThroughputBenchmark.appender MMap 150 thrpt 10 1784377.615 ± 179359.793 ops/s
o.a.l.l.p.j.Log4j2AppenderThroughputBenchmark.end2end File 150 thrpt 10 918029.971 ± 38434.750 ops/s
o.a.l.l.p.j.Log4j2AppenderThroughputBenchmark.end2end RAF 150 thrpt 10 1122310.566 ± 152840.084 ops/s
o.a.l.l.p.j.Log4j2AppenderThroughputBenchmark.end2end MMap 150 thrpt 10 1898762.678 ± 81312.274 ops/s
{code}
*64 threads*
{code}
-sh-4.2$ java -jar benchmarks.jar ".*Log4j2AppenderThroughput.*" -wi 5 -f 1 -i 10 -t 64
Benchmark (loggerType) (messageSizeBytes) Mode Samples Score Error Units
o.a.l.l.p.j.Log4j2AppenderThroughputBenchmark.appender File 150 thrpt 10 1193371.846 ± 32893.485 ops/s
o.a.l.l.p.j.Log4j2AppenderThroughputBenchmark.appender RAF 150 thrpt 10 1231136.529 ± 26895.411 ops/s
o.a.l.l.p.j.Log4j2AppenderThroughputBenchmark.appender MMap 150 thrpt 10 1409277.639 ± 91900.204 ops/s
o.a.l.l.p.j.Log4j2AppenderThroughputBenchmark.end2end File 150 thrpt 10 881261.726 ± 35023.552 ops/s
o.a.l.l.p.j.Log4j2AppenderThroughputBenchmark.end2end RAF 150 thrpt 10 929691.738 ± 22616.873 ops/s
o.a.l.l.p.j.Log4j2AppenderThroughputBenchmark.end2end MMap 150 thrpt 10 1216108.561 ± 41321.530 ops/s
{code}
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#87 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABIfN9oh2p20v8fnX4yBVzi-pzelEgr1ks5sK5bPgaJpZM4OEhNj>
.
|
I don't have access to hardware with that version of Java. I am fairly sure that 131 doesn't have performance improvements (we checked whether we should upgrade but it was too close to our go-live and 131 didn't have anything particular we needed). |
@remkop I have not competed this PR yet, there are a few more things need to be done. I will try to complete PR in the next weekend. Also I've just sent ICLA to secretary@apache.org |
Thanks, we got notification of the ICLA. |
New latency benchmarking results "done right" (with Pacer), On 4-core MBP with SSD, benchmarking with 3 threads:
I still use JMH to piggy-back threading, warmup, setup/teardown, parameter setting, etc. |
Things still TODO / TBD in this PR:
|
Also recent addition of "write touch" seems to improve throughput considerably, so the numbers posted above are irrelevant now. |
The Java 9 stuff sounds interesting. We may need to add a multi-version JAR build for log4j-core as well (we have one for log4j-api so far) if it can't reasonably done via reflection. |
Following new developments with interest. Looking forward to javadoc on the WriteTouch stuff. I just very briefly scanned the commit on my phone while commuting to work, but didn't see anything Java 9-ish, perhaps I misunderstood. What does this do? Good point about StringBuilderEncoder and MemoryMappedFileManager buffer sizes. We have several options here: we can match up the buffer sizes so they are always the same, or we can document in the MemoryMappedFile appender manual that it is advisable for users to increase the StringBuilderEncoder buffer size (I believe there is a system property). Any preference anyone? |
Most folks are unlikely to look to closely at the docs. IMO we should make the docs as good as we can AND come up with default values that cover 80% of the cases. |
@leventov We may have a Log4j 2.9 release starting as soon as this weekend. Let me know if you want to make the remaining changes to complete this PR for merging into master before that release. |
@leventov Are you still interested in contributing this? |
@rgoers in principle, I would like, but I won't have time to do this in foreseeable future. |
@leventov @remkop @jvz Is there value in accepting this as-is. It has merge conflicts that have to be resolved. I have not checked to see how many there are or how difficult they would be to resolve. It is targeted at Master and I don't believe I would try to backport it to release-2.x due to its complexity. If no action is going to be taken then we should close it as the last activity was 3 years ago. |
Closing since no one responded to my last inquiry. |
A pull request for https://issues.apache.org/jira/browse/LOG4J2-928.
This PR is not complete - it lacks tests and benchmarks (I'm going to add them in the next few weeks), but the core implementation is reviewable already.