-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[STORM-1038] Upgraded netty to 4.x #728
Conversation
@hsun-cnnxty |
@hsun-cnnxty |
Good suggestion. I will get to work on them as soon as I get some time. I am also curious to verify the memory efficiency claimed by 4.x |
Upgraded to latest 4.0.31.Final and changed the buffer allocation to let Netty choose the best default implementations based on the platform. The implementations are determined by following properties: |
@hsun-cnnxty, The changes look good to me. Please upmerge to the latest code. Then I would like to run some performance tests on it to see how it compares to the current code. Please also look at shading. In storm-core we are shading netty now, and I would like to be sure that it is still shaded correctly. If you have any questions on how to do this please let me know and I will be happy to help out. |
@revans2, just merged with the latest master. I don't have a decent storm cluster for performance test. With a small local cluster on single machine. I had tried https://github.com/yahoo/storm-perf-test and did not see any difference in memory consumption after the upgrade (It was configured in a way to make sure there is inter-worker communication using Netty). Hope your test can reveal more information. The shading config is updated in pom.xml with
Hope that's all I need to do. -thanks |
import org.jboss.netty.util.Timeout; | ||
import org.jboss.netty.util.TimerTask; | ||
import io.netty.bootstrap.Bootstrap; | ||
import io.netty.channel.*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please expand the imports?
Also below.
* apache/master: (47 commits) Added STORM-706 to Changelog Added STORM-1396 to Changelog Add myself to the committer list. adding back accidentally deleted metrics Added STORM-695 to Changelog storm-starter: Guide JDK version to later than 7 This closes apache#281 (STORM-517 is a dupe of STORM-833) Added STORM-1416 to Changelog Add STORM-1426 to Changelog Added STORM-1417 to Changelog STORM-1422 Added STORM-1429 to Changelog AvroGenericRecordBolt instead of SequenceFileBolt Added STORM-1401 to Changelog Added STORM-1424 to Changelog Add STORM-1427 to Changelog Added STORM-1413 to Changelog [STORM-1416] Documentation for state store Added STORM-1412 to Changelog Added STORM-1210 to Changelog ...
public interface ISaslClient { | ||
void channelConnected(Channel channel); | ||
|
||
// void channelConnected(Channel channel); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is not needed we should remove it, not comment it out.
Everything looks really good now. just a few minor nits. I still have not found time to run some performance tests, but I will try to do that today. |
I just ran some performance tests using https://github.com/apache/storm/blob/master/examples/storm-starter/src/jvm/storm/starter/ThroughputVsLatency.java I ran with 4 workers on a MacBookPro and the numbers don't look good for netty4. With the older code I was able to do 20,000 sentences/second at CPU Utilization over 30 second interval: But with the netty4 patch it could only handle 17,000/sec and the latency was much worse CPU: To have the netty4 patch have similar latency we needed to only run at 14,000 sentences per second. 99%-lie: 38 ms CPU: Unless we can get the numbers to be close to or better than the netty3 implementation I cannot let this in. |
Cool, at least we get some numbers to compare. I will see if there is default setting need to be changed for netty 4. |
@hsun-cnnxty I hope it is something like that. You should be able to run the tests yourself too. They are not that complex. I build
I then run a small single node cluster
Once it is all up and ready you can run the test.
It will output metrics about the running test every 30 seconds for 5 mins. Some of the numbers are in nanoseconds and others are in milliseconds so pay attention to them. I like to vary the throughput and look to see when it cannot keep up any more to get an idea of the maximum throughput the setup can handle, and what the latency is for a given throughput so we can see how they compare to each other. |
With some refactoring, now it can sustain throughput of 20,000 /sec which it was not able to before. But latency at 20,000 /sec is still much higher than 3.x (5+ times). I will continue to investigate. |
* apache/master: (64 commits) add STORM-1496 to CHANGELOG.md fixing sporadic nimbus log failure and topology visualization backport STORM-1484/1478 to 1.0.0, too add STORM-1499 to CHANGELOG.md fix wrong package name for storm trident Added STORM-1463 to Changelog Added STORM-1485 to Changelog Added STORM-1486 to Changelog Added STORM-1214 to Changelog [STORM-1486] Fix storm-kafa documentation CHANGELOG: move STORM-1450 to 1.0.0 (backported) CHANGELOG.md: move 0.10.0-beta2 to 0.10.0 Fix CHANGELOG.md to have 0.10.1 and move issues Fix misplaced CHANGELOGs add STORM-1452 to changelog add STORM-1406 to changelog adds comments about licensing the profiler feature Fixes profiling/debugging out of the box add storm-mqtt to binary distribution Fixing minor code comments ...
Just merged code from master and seems there is performance degradation with recent changes. I noticed that it not only affects this branch, but also the master branch. Here are the comparison on my laptop for running the perf test with only 5,000/sec throughput with code from master. Before (git hash: 3db9680) Now (git hash: bd396b3) Note: when running the latest code, my CPU is almost 100% busy which may explain why it was so much worse. When running the "old" code, my CPU still had 10% idle time. -thanks |
Yes I am aware of it. We are in the process of merging with the JStorm project, and part of this merger involves moving most of the clojure code to java. In all likelihood the degradation is due to reflection happening on the critical path because many new places in the code are calling into java that were not doing so before. I plan on doing some profiling soon and submit some fixes, but it is very much a moving target so I have not been very motivated to do so. The code for the netty messaging layer should stay mostly the same between 1.0 and 2.0 for this, so if you want to do your work based off of the 1.0 branch and show performance comparisons there being the same I would totally accept that as proof that the code is good. |
@hsun-cnnxty we would like to get this into 1.x-branch as well as master. Did you get a chance to look at @revans2 comment above . It will be great if you can address the comment and up merge your patch. |
I am currently on vacation and will be back in two weeks. Will work on it as soon as I am back home. -thanks |
After rebasing, could you do the performance test against 1.x branch? The status of master branch is a WIP so we would be more convenient with 1.x branch. |
Sure. |
As this PR is for master, new PR #1591 is created for 1.x-branch. Performance tests to be done soon. |
I posted performance test results on #1591. |
@hsun-cnnxty, I think this is great change to have. Any way we can implement this as a plugin for us to switch between current implementation? It took substantial amount to get the current version stabilized. So making it plug and play would help us switch between implementations in case of issues. |
@kishorvpatil that's an interesting idea. You mean a feature flag to toggle between 3.x and 4.x? I will investigate the possibility. Btw, I have moved the work to #1591. -thanks |
We are closing stale Pull Requests to make the list more manageable. Please re-open any Pull Request that has been closed in error. Closes apache#608 Closes apache#639 Closes apache#640 Closes apache#648 Closes apache#662 Closes apache#668 Closes apache#692 Closes apache#705 Closes apache#724 Closes apache#728 Closes apache#730 Closes apache#753 Closes apache#803 Closes apache#854 Closes apache#922 Closes apache#986 Closes apache#992 Closes apache#1019 Closes apache#1040 Closes apache#1041 Closes apache#1043 Closes apache#1046 Closes apache#1051 Closes apache#1078 Closes apache#1146 Closes apache#1164 Closes apache#1165 Closes apache#1178 Closes apache#1213 Closes apache#1225 Closes apache#1258 Closes apache#1259 Closes apache#1268 Closes apache#1272 Closes apache#1277 Closes apache#1278 Closes apache#1288 Closes apache#1296 Closes apache#1328 Closes apache#1342 Closes apache#1353 Closes apache#1370 Closes apache#1376 Closes apache#1391 Closes apache#1395 Closes apache#1399 Closes apache#1406 Closes apache#1410 Closes apache#1422 Closes apache#1427 Closes apache#1443 Closes apache#1462 Closes apache#1468 Closes apache#1483 Closes apache#1506 Closes apache#1509 Closes apache#1515 Closes apache#1520 Closes apache#1521 Closes apache#1525 Closes apache#1527 Closes apache#1544 Closes apache#1550 Closes apache#1566 Closes apache#1569 Closes apache#1570 Closes apache#1575 Closes apache#1580 Closes apache#1584 Closes apache#1591 Closes apache#1600 Closes apache#1611 Closes apache#1613 Closes apache#1639 Closes apache#1703 Closes apache#1711 Closes apache#1719 Closes apache#1737 Closes apache#1760 Closes apache#1767 Closes apache#1768 Closes apache#1785 Closes apache#1799 Closes apache#1822 Closes apache#1824 Closes apache#1844 Closes apache#1874 Closes apache#1918 Closes apache#1928 Closes apache#1937 Closes apache#1942 Closes apache#1951 Closes apache#1957 Closes apache#1963 Closes apache#1964 Closes apache#1965 Closes apache#1967 Closes apache#1968 Closes apache#1971 Closes apache#1985 Closes apache#1986 Closes apache#1998 Closes apache#2031 Closes apache#2032 Closes apache#2071 Closes apache#2076 Closes apache#2108 Closes apache#2119 Closes apache#2128 Closes apache#2142 Closes apache#2174 Closes apache#2206 Closes apache#2297 Closes apache#2322 Closes apache#2332 Closes apache#2341 Closes apache#2377 Closes apache#2414 Closes apache#2469
We are closing stale Pull Requests to make the list more manageable. Please re-open any Pull Request that has been closed in error. Closes apache#608 Closes apache#639 Closes apache#640 Closes apache#648 Closes apache#662 Closes apache#668 Closes apache#692 Closes apache#705 Closes apache#724 Closes apache#728 Closes apache#730 Closes apache#753 Closes apache#803 Closes apache#854 Closes apache#922 Closes apache#986 Closes apache#992 Closes apache#1019 Closes apache#1040 Closes apache#1041 Closes apache#1043 Closes apache#1046 Closes apache#1051 Closes apache#1078 Closes apache#1146 Closes apache#1164 Closes apache#1165 Closes apache#1178 Closes apache#1213 Closes apache#1225 Closes apache#1258 Closes apache#1259 Closes apache#1268 Closes apache#1272 Closes apache#1277 Closes apache#1278 Closes apache#1288 Closes apache#1296 Closes apache#1328 Closes apache#1342 Closes apache#1353 Closes apache#1370 Closes apache#1376 Closes apache#1391 Closes apache#1395 Closes apache#1399 Closes apache#1406 Closes apache#1410 Closes apache#1422 Closes apache#1427 Closes apache#1443 Closes apache#1462 Closes apache#1468 Closes apache#1483 Closes apache#1506 Closes apache#1509 Closes apache#1515 Closes apache#1520 Closes apache#1521 Closes apache#1525 Closes apache#1527 Closes apache#1544 Closes apache#1550 Closes apache#1566 Closes apache#1569 Closes apache#1570 Closes apache#1575 Closes apache#1580 Closes apache#1584 Closes apache#1591 Closes apache#1600 Closes apache#1611 Closes apache#1613 Closes apache#1639 Closes apache#1703 Closes apache#1711 Closes apache#1719 Closes apache#1737 Closes apache#1760 Closes apache#1767 Closes apache#1768 Closes apache#1785 Closes apache#1799 Closes apache#1822 Closes apache#1824 Closes apache#1844 Closes apache#1874 Closes apache#1918 Closes apache#1928 Closes apache#1937 Closes apache#1942 Closes apache#1951 Closes apache#1957 Closes apache#1963 Closes apache#1964 Closes apache#1965 Closes apache#1967 Closes apache#1968 Closes apache#1971 Closes apache#1985 Closes apache#1986 Closes apache#1998 Closes apache#2031 Closes apache#2032 Closes apache#2071 Closes apache#2076 Closes apache#2108 Closes apache#2119 Closes apache#2128 Closes apache#2142 Closes apache#2174 Closes apache#2206 Closes apache#2297 Closes apache#2322 Closes apache#2332 Closes apache#2341 Closes apache#2377 Closes apache#2414 Closes apache#2469
Upgraded the netty transportation layer to 4.x to take advantage of its memory management efficiency.