Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

STORM-2912 Revert optimization of sharing tick tuple #2533

Merged
merged 1 commit into from Jan 27, 2018

Conversation

@HeartSaVioR
Copy link
Contributor

HeartSaVioR commented Jan 25, 2018

  • since it incurs side effect and messes metrics

More detail: https://issues.apache.org/jira/browse/STORM-2912

Copy link
Member

satishd left a comment

+1 LGTM except for minor nit. Thanks @HeartSaVioR for the quick fix.

final List<AddressedTuple> tickTuple =
Lists.newArrayList(new AddressedTuple(AddressedTuple.BROADCAST_DEST, tuple));
timerTask.scheduleRecurring(tickTimeSecs, tickTimeSecs, () -> receiveQueue.publish(tickTuple));
timerTask.scheduleRecurring(tickTimeSecs, tickTimeSecs, () -> {

This comment has been minimized.

Copy link
@satishd

satishd Jan 25, 2018

Member

Please put a comment on why it was changed with JIRA number as mentioned in 1.x PR here

This comment has been minimized.

Copy link
@HeartSaVioR

HeartSaVioR Jan 25, 2018

Author Contributor

Addressed. Commit amended since it is a tiny change.

* since it incurs side effect and messes metrics
@HeartSaVioR HeartSaVioR force-pushed the HeartSaVioR:STORM-2912 branch from ca16883 to 1b918e5 Jan 25, 2018
Copy link
Member

satishd left a comment

@HeartSaVioR Thanks for addressing review comments.

Copy link
Contributor

revans2 left a comment

+1

@revans2

This comment has been minimized.

Copy link
Contributor

revans2 commented Jan 25, 2018

The test failure looks unrelated. I have seen it happen before and I think it is related to/fixed by #2534

@arunmahadevan

This comment has been minimized.

Copy link
Contributor

arunmahadevan commented Jan 25, 2018

Ran the FastWordCount with and without this patch, the number dont look much different. FastWordcount is expected not to see any difference ? Maybe we can wait for Alexandre's results.

Storm 1.2.0 RC1

uptime: 30 acked: 8180 avgLatency: 3218.882640586797 acked/sec: 272.6666666666667 failed: 0
uptime: 60 acked: 8240 avgLatency: 3296.849514563107 acked/sec: 137.33333333333334 failed: 0
uptime: 90 acked: 2380 avgLatency: 478.61344537815125 acked/sec: 26.444444444444443 failed: 0
uptime: 121 acked: 3420 avgLatency: 1493.2397660818713 acked/sec: 28.264462809917354 failed: 0
uptime: 151 acked: 3420 avgLatency: 1493.2397660818713 acked/sec: 22.649006622516556 failed: 0
uptime: 181 acked: 9680 avgLatency: 4966.5268595041325 acked/sec: 53.48066298342541 failed: 0
uptime: 211 acked: 9680 avgLatency: 4966.5268595041325 acked/sec: 45.87677725118483 failed: 0
uptime: 241 acked: 2140 avgLatency: 569.7102803738318 acked/sec: 8.879668049792532 failed: 0
uptime: 271 acked: 5620 avgLatency: 2827.6298932384343 acked/sec: 20.7380073800738 failed: 0
uptime: 301 acked: 5620 avgLatency: 2827.6298932384343 acked/sec: 18.67109634551495 failed: 0

With the patch

uptime: 30 acked: 3280 avgLatency: 717.0731707317074 acked/sec: 109.33333333333333 failed: 0
uptime: 60 acked: 3280 avgLatency: 717.0731707317074 acked/sec: 54.666666666666664 failed: 0
uptime: 90 acked: 3020 avgLatency: 1077.8543046357615 acked/sec: 33.55555555555556 failed: 0
uptime: 120 acked: 11380 avgLatency: 5864.984182776801 acked/sec: 94.83333333333333 failed: 0
uptime: 150 acked: 11380 avgLatency: 5864.984182776801 acked/sec: 75.86666666666666 failed: 0
uptime: 180 acked: 5640 avgLatency: 2464.45390070922 acked/sec: 31.333333333333332 failed: 0
uptime: 210 acked: 5640 avgLatency: 2464.45390070922 acked/sec: 26.857142857142858 failed: 0
uptime: 240 acked: 10120 avgLatency: 3743.3379446640315 acked/sec: 42.166666666666664 failed: 0
uptime: 270 acked: 11600 avgLatency: 5629.23448275862 acked/sec: 42.96296296296296 failed: 0
uptime: 300 acked: 11600 avgLatency: 5629.23448275862 acked/sec: 38.666666666666664 failed: 0
@arunmahadevan

This comment has been minimized.

Copy link
Contributor

arunmahadevan commented Jan 25, 2018

+1

Apparently the topology needs to use tick tuple to see the issue. Ran RollingTopWords which uses tick-tuples and can clearly observe the difference.

Before patch

screen shot 2018-01-25 at 2 41 47 pm

After patch

screen shot 2018-01-25 at 2 42 04 pm

@HeartSaVioR

This comment has been minimized.

Copy link
Contributor Author

HeartSaVioR commented Jan 25, 2018

I ran FastWordCount with 1.1.1, and it is showing similar latency even from 1.1.1. Complete latency is calculated from acker receiving ACKER_INIT to acker completing tuple tree, hence it contains all the latency: tuple waiting a queue, transferring, etc, and so if the topology is heavily overloaded it can be a bit off.

@asfgit asfgit merged commit 1b918e5 into apache:master Jan 27, 2018
1 check failed
1 check failed
continuous-integration/travis-ci/pr The Travis CI build failed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.