-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Customization of Netty "initialBufferSize" #1750 #1881
Conversation
This PR adds the new option suggested here: #1750 @tsegismont mentioned in the issue that performance tests would be a plus, but I haven't put them in place yet. I was thinking of modifying https://github.com/vert-x3/vertx-perf to add a JMH test there to validate performance gains brought by this PR. Is it a good strategy? |
Motivation: Under load and with HTTP requests or responses with multiple headers, a lot of CPU-time can be spent on "AppendableCharSequence.append", which has to always grow and so has to copy char arrays. Modifications: Changed HttpClientOptions and HttpServerOptions to allow the user to pass an initial buffer size for Netty’s HTTP decoder that fits her needs when not happy with the default initial buffer size, i.e. 128 bytes. Also, fixed some typos in documentation. Result: Initial buffer size for HTTP decoder is customisable. Signed-off-by: Leonardo FREITAS GOMES <leonardo.f.gomes@gmail.com>
I don't believe it's a good way to test your changes.
You'd better create a simple application which replies to requests and
measure the response time improvement (with Gatling or JMeter). The
injector and the app should run on separate machines.
2017-03-17 0:21 GMT+01:00 Leo Gomes <notifications@github.com>:
… This PR adds the new option suggested here: #1750
<#1750>
@tsegismont <https://github.com/tsegismont> mentioned in the issue that
performance tests would be a plus, but I haven't put them in place yet. I
was thinking of modifying https://github.com/vert-x3/vertx-perf to add a
JMH test there to validate performance gains brought by this PR. Is it a
good strategy?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1881 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABbltisXBLBag-WV_Vt1-i6JpltZW-AIks5rmcQVgaJpZM4Mf_zm>
.
|
@tsegismont OK, I will put that in place. Should it be on a specific existing project or should I create a project on my side to demonstrate that and provide you guys a way to execute it? Just for the note, when I mentioned JMH I was thinking on something on the likes of what's done here: https://github.com/netty/netty/tree/4.1/microbench/src/main/java/io/netty/handler/codec/http2 |
A project of your own on GH would be perfect.
2017-03-20 16:29 GMT+01:00 Leo Gomes <notifications@github.com>:
… @tsegismont <https://github.com/tsegismont> OK, I will put that in place.
Should it be on a specific existing project or should I create a project on
my side to demonstrate that and provide you guys a way to execute it?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1881 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABbltpraJ96rtneTFqaX6H24bzNddeT4ks5rnptfgaJpZM4Mf_zm>
.
|
I have done some tests with Gatling. Results are available here: https://github.com/leogomes/vertx-benchmark/tree/master/results. I also profiled the app with JFR and generated flamegraphs, which you can find on the same folder, for the different initial buffer sizes. While I couldn't notice any significant difference in response time (or throughput) in my particular tests, the flamegraphs show that HttpObjectDecoder$HeaderParser.parse() goes down from 69.49% of total CPU time to 51.22%, when you change the initial buffer from 128 to 4096. If you guys have any suggestions for a better test case, please let me know. I've written a very basic app and a Gatling scenario that sends a request with a couple of quite big headers: |
Seems really great to me. |
@gmagniez could you try the change and give some feedback?
2017-03-24 16:12 GMT+01:00 gmagniez <notifications@github.com>:
… Seems really great to me.
I think you did not notice a big difference because you have only tested
the HttpServer side.
My use case was an HTTP Proxy handling thousand of requests with 15+
headers with some quite large (at least 500 chars) in both ways. Client <->
(HttpServer) VertX (HttpClient) <-> Backend server
Ending with near 2K of headers
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1881 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABbltjLNypy6tBunI8yUrMax7j3kL178ks5ro91igaJpZM4Mf_zm>
.
|
I already have patched VertX to implement this behavior and as leogomes pointed it out, I have also noticed a lower cpu consumption due to request parsing. Even with a bigger scale as my proxy was using both of HttpServer and HttpClient. |
In a couple of weeks, I should have time to improve my current test scenario (write a proxy-like app with the server and client part, put 15+ headers as described by @gmagniez) and run a better performance test. I will keep you posted. |
So, my new scenario has:
With that scenario, the difference in performance starts be more visible:
Throughput and response time are better as well (as one would expect): Source code and results are available here: https://github.com/leogomes/vertx-benchmark/tree/master/results Gatling scenario: https://github.com/leogomes/vertx-benchmark/blob/master/src/gatling/vertx/VertxInitialBufferSimulation.scala Cheers, |
@leogomes the results are interesting : the latency improvement at the 99th percentile + the throughput improvement looks good. Can you give some details of the actual workload for this benchmark (machines, network) ? |
@vietj Here are the CPU details for the two machines: They're running Linux 64 bits 3.0.101-63-default They're on the same rack and have a 1 Gbps network adapter. The request that I'm sending is the one you can find on the Gatling scenario. The measurement itself is pretty short (< 3min), because I haven't had much time to dedicate to it and also because here we're not actually modifying a default value, but only exposing the underlying Netty setting, so that people with this sort of workload (heavy HTTP headers) can customise the HttpDecoder. So, I was kind of just trying to build an example that could easily reproduce the scenario. But for sure, a more serious performance test would need to be run for much longer. With that setting exposed, people out there may be able to run it with different values and their own use-cases to find their sweet spot. Finally, the last results I had put were taken while profiling with JFR to be able to get the Flamegraphs. I have now run a new test without any profiling and have been able to reproduce the same results. |
Looks good and elaborated to me @leogomes |
@tsegismont WDYT? |
@leogomes LGTM |
Motivation:
Under load and with HTTP requests or responses with multiple headers, a
lot of CPU-time can be spent on "AppendableCharSequence.append", which
has to always grow and so has to copy char arrays.
Modifications:
Changed HttpClientOptions and HttpServerOptions to allow the user to
pass an initial buffer size for Netty’s HTTP decoder that fits her
needs when not happy with the default initial buffer size, i.e. 128
bytes. Also, fixed some typos in documentation.
Result:
Initial buffer size for HTTP decoder is customisable.