Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[STORM-1038] Upgraded netty to 4.x #728

Closed
wants to merge 23 commits into from
Closed

Conversation

ooasis
Copy link

@ooasis ooasis commented Sep 9, 2015

Upgraded the netty transportation layer to 4.x to take advantage of its memory management efficiency.

@HeartSaVioR
Copy link
Contributor

@hsun-cnnxty
Great work. Since it can affect performance I recommend you to include benchmark test result.
You can refer #521 (comment) to how to do it.

@HeartSaVioR
Copy link
Contributor

@hsun-cnnxty
Btw, Netty 4.0.31 was released 8 days ago, so we may also want to check this out to test it works.
And we'd like to see memory usages with Netty 3.x / 4.x to confirm 4.x is superior to 3.x.

@ooasis
Copy link
Author

ooasis commented Sep 10, 2015

Good suggestion. I will get to work on them as soon as I get some time. I am also curious to verify the memory efficiency claimed by 4.x

@ooasis
Copy link
Author

ooasis commented Sep 20, 2015

Upgraded to latest 4.0.31.Final and changed the buffer allocation to let Netty choose the best default implementations based on the platform.

The implementations are determined by following properties:
io.netty.noPreferDirect: false to prefer off-heap and true for heap allocation. Default to false
io.netty.allocator.type: default to pooled or unpooled on Android platform

@revans2
Copy link
Contributor

revans2 commented Oct 3, 2015

@hsun-cnnxty, The changes look good to me. Please upmerge to the latest code. Then I would like to run some performance tests on it to see how it compares to the current code. Please also look at shading. In storm-core we are shading netty now, and I would like to be sure that it is still shaded correctly.

If you have any questions on how to do this please let me know and I will be happy to help out.

@ooasis
Copy link
Author

ooasis commented Oct 4, 2015

@revans2, just merged with the latest master. I don't have a decent storm cluster for performance test. With a small local cluster on single machine. I had tried https://github.com/yahoo/storm-perf-test and did not see any difference in memory consumption after the upgrade (It was configured in a way to make sure there is inter-worker communication using Netty). Hope your test can reveal more information.

The shading config is updated in pom.xml with

                     <relocation>
                        <pattern>io.netty</pattern>
                        <shadedPattern>org.apache.storm.netty</shadedPattern>
                    </relocation>

Hope that's all I need to do.

-thanks

import org.jboss.netty.util.Timeout;
import org.jboss.netty.util.TimerTask;
import io.netty.bootstrap.Bootstrap;
import io.netty.channel.*;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please expand the imports?

Also below.

* apache/master: (47 commits)
  Added STORM-706 to Changelog
  Added STORM-1396 to Changelog
  Add myself to the committer list.
  adding back accidentally deleted metrics
  Added STORM-695 to Changelog
  storm-starter: Guide JDK version to later than 7
  This closes apache#281 (STORM-517 is a dupe of STORM-833)
  Added STORM-1416 to Changelog
  Add STORM-1426 to Changelog
  Added STORM-1417 to Changelog
  STORM-1422
  Added STORM-1429 to Changelog
  AvroGenericRecordBolt instead of SequenceFileBolt
  Added STORM-1401 to Changelog
  Added STORM-1424 to Changelog
  Add STORM-1427 to Changelog
  Added STORM-1413 to Changelog
  [STORM-1416] Documentation for state store
  Added STORM-1412 to Changelog
  Added STORM-1210 to Changelog
  ...
public interface ISaslClient {
void channelConnected(Channel channel);

// void channelConnected(Channel channel);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is not needed we should remove it, not comment it out.

@revans2
Copy link
Contributor

revans2 commented Jan 8, 2016

Everything looks really good now. just a few minor nits. I still have not found time to run some performance tests, but I will try to do that today.

@revans2
Copy link
Contributor

revans2 commented Jan 8, 2016

I just ran some performance tests using https://github.com/apache/storm/blob/master/examples/storm-starter/src/jvm/storm/starter/ThroughputVsLatency.java

I ran with 4 workers on a MacBookPro and the numbers don't look good for netty4.

With the older code I was able to do 20,000 sentences/second at
99%-ile: 43 ms
99.9%-ile: 99 ms
min: 8.8 ms
max: 177 ms
mean: 16 ms

CPU Utilization over 30 second interval:
user: 159,718 ms
sys: 49,670 ms

But with the netty4 patch it could only handle 17,000/sec and the latency was much worse
99%-lie: 148 ms
99.9%-lie: 230 ms
min: 9.2 ms
max: 332 ms
mean: 32 ms

CPU:
user: 162,984 ms
sys: 55,867 ms

To have the netty4 patch have similar latency we needed to only run at 14,000 sentences per second.

99%-lie: 38 ms
99.9%-lie: 55 ms
min: 8.8 ms
max: 76 ms
mean: 16 ms

CPU:
user: 152,865 ms
sys: 52,535 ms

Unless we can get the numbers to be close to or better than the netty3 implementation I cannot let this in.

@ooasis
Copy link
Author

ooasis commented Jan 8, 2016

Cool, at least we get some numbers to compare. I will see if there is default setting need to be changed for netty 4.

@revans2
Copy link
Contributor

revans2 commented Jan 8, 2016

@hsun-cnnxty I hope it is something like that. You should be able to run the tests yourself too. They are not that complex. I build

mvn clean install -DskipTests
cd storm-dist/binary
mvn clean package
tar -xzvf ./target/apache-storm-0.11.0-SNAPSHOT.tar.gz
cd apache-storm-0.11.0-SNAPSHOT

I then run a small single node cluster

./bin/storm dev-zookeeper &
./bin/storm nimbus &
./bin/storm supervisor &
./bin/storm ui &
./bin/storm logviewer &

Once it is all up and ready you can run the test.

./bin/storm jar ./examples/storm-starter/storm-starter-topologies-0.11.0-SNAPSHOT.jar storm.starter.ThroughputVsLatency $THROUGHPUT 4 5 wc-test | tee results.txt

It will output metrics about the running test every 30 seconds for 5 mins. Some of the numbers are in nanoseconds and others are in milliseconds so pay attention to them. I like to vary the throughput and look to see when it cannot keep up any more to get an idea of the maximum throughput the setup can handle, and what the latency is for a given throughput so we can see how they compare to each other.

@ooasis
Copy link
Author

ooasis commented Jan 21, 2016

With some refactoring, now it can sustain throughput of 20,000 /sec which it was not able to before. But latency at 20,000 /sec is still much higher than 3.x (5+ times). I will continue to investigate.

* apache/master: (64 commits)
  add STORM-1496 to CHANGELOG.md
  fixing sporadic nimbus log failure and topology visualization
  backport STORM-1484/1478 to 1.0.0, too
  add STORM-1499 to CHANGELOG.md
  fix wrong package name for storm trident
  Added STORM-1463 to Changelog
  Added STORM-1485 to Changelog
  Added STORM-1486 to Changelog
  Added STORM-1214 to Changelog
  [STORM-1486] Fix storm-kafa documentation
  CHANGELOG: move STORM-1450 to 1.0.0 (backported)
  CHANGELOG.md: move 0.10.0-beta2 to 0.10.0
  Fix CHANGELOG.md to have 0.10.1 and move issues
  Fix misplaced CHANGELOGs
  add STORM-1452 to changelog
  add STORM-1406 to changelog
  adds comments about licensing the profiler feature
  Fixes profiling/debugging out of the box
  add storm-mqtt to binary distribution
  Fixing minor code comments
  ...
@ooasis
Copy link
Author

ooasis commented Feb 25, 2016

@revans2

Just merged code from master and seems there is performance degradation with recent changes. I noticed that it not only affects this branch, but also the master branch. Here are the comparison on my laptop for running the perf test with only 5,000/sec throughput with code from master.

Before (git hash: 3db9680)
netty3-r5000.txt

Now (git hash: bd396b3)
netty3-new-r5000.txt

Note: when running the latest code, my CPU is almost 100% busy which may explain why it was so much worse. When running the "old" code, my CPU still had 10% idle time.

-thanks

@revans2
Copy link
Contributor

revans2 commented Feb 25, 2016

Yes I am aware of it.  We are in the process of merging with the JStorm project, and part of this merger involves moving most of the clojure code to java.  In all likelihood the degradation is due to reflection happening on the critical path because many new places in the code are calling into java that were not doing so before.  I plan on doing some profiling soon and submit some fixes, but it is very much a moving target so I have not been very motivated to do so.  The code for the netty messaging layer should stay mostly the same between 1.0 and 2.0 for this, so if you want to do your work based off of the 1.0 branch and show performance comparisons there being the same I would totally accept that as proof that the code is good.

@harshach
Copy link
Contributor

@hsun-cnnxty we would like to get this into 1.x-branch as well as master. Did you get a chance to look at @revans2 comment above . It will be great if you can address the comment and up merge your patch.

@ooasis
Copy link
Author

ooasis commented Jul 11, 2016

I am currently on vacation and will be back in two weeks. Will work on it as soon as I am back home.

-thanks

@HeartSaVioR
Copy link
Contributor

After rebasing, could you do the performance test against 1.x branch? The status of master branch is a WIP so we would be more convenient with 1.x branch.

@ooasis
Copy link
Author

ooasis commented Jul 11, 2016

Sure.

@ooasis
Copy link
Author

ooasis commented Jul 26, 2016

As this PR is for master, new PR #1591 is created for 1.x-branch. Performance tests to be done soon.

@ooasis
Copy link
Author

ooasis commented Jul 31, 2016

@HeartSaVioR @harshach

I posted performance test results on #1591.

@kishorvpatil
Copy link
Contributor

@hsun-cnnxty, I think this is great change to have. Any way we can implement this as a plugin for us to switch between current implementation? It took substantial amount to get the current version stabilized. So making it plug and play would help us switch between implementations in case of issues.

@ooasis
Copy link
Author

ooasis commented Sep 7, 2016

@kishorvpatil that's an interesting idea. You mean a feature flag to toggle between 3.x and 4.x? I will investigate the possibility.

Btw, I have moved the work to #1591.

-thanks

srdo pushed a commit to srdo/storm that referenced this pull request May 11, 2018
srdo pushed a commit to srdo/storm that referenced this pull request May 13, 2018
srdo pushed a commit to srdo/storm that referenced this pull request May 14, 2018
srdo pushed a commit to srdo/storm that referenced this pull request May 15, 2018
srdo pushed a commit to srdo/storm that referenced this pull request May 21, 2018
srdo pushed a commit to srdo/storm that referenced this pull request Jun 5, 2018
srdo pushed a commit to srdo/storm that referenced this pull request Jun 8, 2018
zd-project pushed a commit to zd-project/storm that referenced this pull request Jun 18, 2018
d2r pushed a commit to d2r/storm that referenced this pull request Oct 16, 2018
We are closing stale Pull Requests to make the list more manageable.

Please re-open any Pull Request that has been closed in error.

Closes apache#608
Closes apache#639
Closes apache#640
Closes apache#648
Closes apache#662
Closes apache#668
Closes apache#692
Closes apache#705
Closes apache#724
Closes apache#728
Closes apache#730
Closes apache#753
Closes apache#803
Closes apache#854
Closes apache#922
Closes apache#986
Closes apache#992
Closes apache#1019
Closes apache#1040
Closes apache#1041
Closes apache#1043
Closes apache#1046
Closes apache#1051
Closes apache#1078
Closes apache#1146
Closes apache#1164
Closes apache#1165
Closes apache#1178
Closes apache#1213
Closes apache#1225
Closes apache#1258
Closes apache#1259
Closes apache#1268
Closes apache#1272
Closes apache#1277
Closes apache#1278
Closes apache#1288
Closes apache#1296
Closes apache#1328
Closes apache#1342
Closes apache#1353
Closes apache#1370
Closes apache#1376
Closes apache#1391
Closes apache#1395
Closes apache#1399
Closes apache#1406
Closes apache#1410
Closes apache#1422
Closes apache#1427
Closes apache#1443
Closes apache#1462
Closes apache#1468
Closes apache#1483
Closes apache#1506
Closes apache#1509
Closes apache#1515
Closes apache#1520
Closes apache#1521
Closes apache#1525
Closes apache#1527
Closes apache#1544
Closes apache#1550
Closes apache#1566
Closes apache#1569
Closes apache#1570
Closes apache#1575
Closes apache#1580
Closes apache#1584
Closes apache#1591
Closes apache#1600
Closes apache#1611
Closes apache#1613
Closes apache#1639
Closes apache#1703
Closes apache#1711
Closes apache#1719
Closes apache#1737
Closes apache#1760
Closes apache#1767
Closes apache#1768
Closes apache#1785
Closes apache#1799
Closes apache#1822
Closes apache#1824
Closes apache#1844
Closes apache#1874
Closes apache#1918
Closes apache#1928
Closes apache#1937
Closes apache#1942
Closes apache#1951
Closes apache#1957
Closes apache#1963
Closes apache#1964
Closes apache#1965
Closes apache#1967
Closes apache#1968
Closes apache#1971
Closes apache#1985
Closes apache#1986
Closes apache#1998
Closes apache#2031
Closes apache#2032
Closes apache#2071
Closes apache#2076
Closes apache#2108
Closes apache#2119
Closes apache#2128
Closes apache#2142
Closes apache#2174
Closes apache#2206
Closes apache#2297
Closes apache#2322
Closes apache#2332
Closes apache#2341
Closes apache#2377
Closes apache#2414
Closes apache#2469
d2r pushed a commit to d2r/storm that referenced this pull request Oct 16, 2018
We are closing stale Pull Requests to make the list more manageable.

Please re-open any Pull Request that has been closed in error.

Closes apache#608
Closes apache#639
Closes apache#640
Closes apache#648
Closes apache#662
Closes apache#668
Closes apache#692
Closes apache#705
Closes apache#724
Closes apache#728
Closes apache#730
Closes apache#753
Closes apache#803
Closes apache#854
Closes apache#922
Closes apache#986
Closes apache#992
Closes apache#1019
Closes apache#1040
Closes apache#1041
Closes apache#1043
Closes apache#1046
Closes apache#1051
Closes apache#1078
Closes apache#1146
Closes apache#1164
Closes apache#1165
Closes apache#1178
Closes apache#1213
Closes apache#1225
Closes apache#1258
Closes apache#1259
Closes apache#1268
Closes apache#1272
Closes apache#1277
Closes apache#1278
Closes apache#1288
Closes apache#1296
Closes apache#1328
Closes apache#1342
Closes apache#1353
Closes apache#1370
Closes apache#1376
Closes apache#1391
Closes apache#1395
Closes apache#1399
Closes apache#1406
Closes apache#1410
Closes apache#1422
Closes apache#1427
Closes apache#1443
Closes apache#1462
Closes apache#1468
Closes apache#1483
Closes apache#1506
Closes apache#1509
Closes apache#1515
Closes apache#1520
Closes apache#1521
Closes apache#1525
Closes apache#1527
Closes apache#1544
Closes apache#1550
Closes apache#1566
Closes apache#1569
Closes apache#1570
Closes apache#1575
Closes apache#1580
Closes apache#1584
Closes apache#1591
Closes apache#1600
Closes apache#1611
Closes apache#1613
Closes apache#1639
Closes apache#1703
Closes apache#1711
Closes apache#1719
Closes apache#1737
Closes apache#1760
Closes apache#1767
Closes apache#1768
Closes apache#1785
Closes apache#1799
Closes apache#1822
Closes apache#1824
Closes apache#1844
Closes apache#1874
Closes apache#1918
Closes apache#1928
Closes apache#1937
Closes apache#1942
Closes apache#1951
Closes apache#1957
Closes apache#1963
Closes apache#1964
Closes apache#1965
Closes apache#1967
Closes apache#1968
Closes apache#1971
Closes apache#1985
Closes apache#1986
Closes apache#1998
Closes apache#2031
Closes apache#2032
Closes apache#2071
Closes apache#2076
Closes apache#2108
Closes apache#2119
Closes apache#2128
Closes apache#2142
Closes apache#2174
Closes apache#2206
Closes apache#2297
Closes apache#2322
Closes apache#2332
Closes apache#2341
Closes apache#2377
Closes apache#2414
Closes apache#2469
@asfgit asfgit closed this in #2880 Oct 22, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants