Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[STORM-1057] Add throughput metrics to spouts/bolts and display them on web ui #753

Closed
wants to merge 796 commits into from

Conversation

wangli1426
Copy link
Contributor

The throughputs for the spouts and bolts could help the user to identify the performance bottleneck and detect the load balancing issue. In this RP, I take measurements on the throughput of the executors and display them on web UI.

Summary of Changes

  1. Take throughput measurements on the spouts and bolts;
  2. Add throughput to ExecutorStats;
  3. Display the throughputs on web UI.

Note: If you cannot see the throughputs on your web UI, please clean your browser cache and try again.

Screenshots

screen shot 2015-09-21 at 13 16 01
screen shot 2015-09-21 at 13 17 24
screen shot 2015-09-21 at 13 17 57
screen shot 2015-09-21 at 13 18 49

@wangli1426 wangli1426 force-pushed the throughput-metrics branch 2 times, most recently from 6894d37 to 066e234 Compare September 22, 2015 14:17
@wangli1426
Copy link
Contributor Author

Hi @HeartSaVioR,

Sorry to interrupt, but could please kindly review the code? I am looking forward to your response. Thanks

@HeartSaVioR
Copy link
Contributor

@wangli1426
Sorry to response later.
We're having holidays in South Korea, 'Chuseok', very similar to 'Mid-autumn festival'.
It ends just Today, so it'll take a few days to get back.

@wangli1426
Copy link
Contributor Author

Thank you for your prompt reply. Please review the code when you come back. Wish you have a good time.

On Sep 29, 2015, at 20:30, Jungtaek Lim notifications@github.com wrote:

@wangli1426 https://github.com/wangli1426
Sorry to response later.
We're having holidays in South Korea, 'Chuseok', very similar to 'Mid-autumn festival'.
It ends just Today, so it'll take a few days to get back.


Reply to this email directly or view it on GitHub #753 (comment).

@jerrypeng
Copy link
Contributor

storm-core/src/genthrift.sh permssions should be changed back to -> 644

empty-emit-streak (MutableLong. 0)]

empty-emit-streak (MutableLong. 0)
spout-counts (count task-datas)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spout-counts does not appear to be used anywhere.

@revans2
Copy link
Contributor

revans2 commented Sep 30, 2015

@wangli1426 I am done with my first pass through the code. There may be other things in here too that I missed on the first pass.

@revans2
Copy link
Contributor

revans2 commented Sep 30, 2015

Actually I thought about this a bit more and I am not sure what value this adds over dividing the emitted counts that we already have by the time window that they are for. I don't see a reason to add in more code to get a number we already know. I am -1 for this approach. I am fine with doing the calculation for the rate and displaying it in the UI. That would be helpful, but I don't see a reason to try and collect it again, in a different way.

@wangli1426
Copy link
Contributor Author

@revans2,
Thanks for your detailed comments. I understand your concern that we might not need to use a new system to obtain the throughput metric, given that we already have code in stats.clj doing the same thing. I will try to obtain the throughput metric using existing functions in stats.clj and come back when I finish.

@wangli1426
Copy link
Contributor Author

@revans2 ,
I have addressed all your concerns in d552a99. The most important modification is that instead of employing RateTracker, I reuse the stats of emitted and executed to generate the throughput stats for spout and bolt respectively.

Look forward to your response. Thanks

@revans2
Copy link
Contributor

revans2 commented Oct 14, 2015

@wangli1426 sorry it took so long to respond. The code looks a lot simpler. Please upmerge. Stats aggregation has changed places, but it still looks like it is a not too difficult change.

I also would really like to see the thrift code changed so adding in the throughput can be a rolling upgrade.

@wangli1426
Copy link
Contributor Author

@revans2
Thank you for your comment. I will up-merge this PR. However, I can't quite understand your last sentence. What do you mean by "you like to see the thrift code changed"? Could please explain more about it? Thanks.

@revans2
Copy link
Contributor

revans2 commented Oct 15, 2015

@wangli1426

Thrift classes have two options for member variables. required and optional. If you mark a member as required it must be there or thrift will throw an exception before serializing/deserializing it. This becomes a problem if we want to do a rolling upgrade (upgrade the cluster with no downtime). In that case we upgrade one daemon at a time, and there will be a period of time when old clients are talking to new servers and/or new clients are talking to old servers. If we add new required fields to thrift classes then the code will break during the upgrade. However, if we mark them all as optional and write the code in the client so it does not break if it gets a null for this value, then we will be OK.

I don't want to break a rolling upgrade just so we can have a rate in the UI.

@wangli1426
Copy link
Contributor Author

Hi @revans2,
Thank you very much for giving so detailed explanation. Your concern is quite reasonable. I will mark the throughput optional. As a recent commit has made substantial modification to stats.clj, I am afraid I need more time to up-merge this PR. I will come back when I am done. Thanks

@revans2
Copy link
Contributor

revans2 commented Oct 15, 2015

@wangli1426 I totally understand that this is going to take more time. Thank you for your patience.

@wangli1426 wangli1426 force-pushed the throughput-metrics branch 3 times, most recently from b8c2d0b to 81c91cd Compare October 16, 2015 08:22
@wangli1426
Copy link
Contributor Author

Hi @revans2 ,
I up-merged my code successfully. Following your suggestion, I mark throughput in storm.thrift as optional. Look forward to your response. Thanks.

6: optional i64 failed;
5: optional double throughput;
6: optional i64 acked;
7: optional i64 failed;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is another problem with maintaining binary compatibility with the previous code. You cannot renumber entries. The tags at the beginning are what identify the field in the binary data. By renumbering them the new code and old code will mix up throughput, acked, and failed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment and point taken.

HeartSaVioR and others added 23 commits September 12, 2016 12:13
* resolve conflict by Jungtaek Lim (kabhwan@gmail.com)
  * resolve version mismatch and some external modules after added
* Support INNER, LEFT / RIGHT / FULL OUTER JOIN based on Trident
* Limitation of feature is just same to Trident join
  * join is occurred for each batch (not across batches)
* Introduce new parameter on join(): JoinOutFieldsMode
  * default value of mode is COMPACT, which is as same as current
* created PreservingFieldsOrderJoinerMultiReducer
  * this preserves fields order from all streams fields
* modified codes to reflect the changes
* Reflect changes to doc
…RM-1837

STORM-1837: Fix complete-topology and prevent message loss
* join doesn't guarantee preserving order (Trident and SQL itself)
@wangli1426
Copy link
Contributor Author

@harshach I managed to upmerge this PR to 1.x-branch in #1703. Please review. Thanks.

@wangli1426
Copy link
Contributor Author

This new feature has been implemented on master branch in #1719. Please review.

d2r pushed a commit to d2r/storm that referenced this pull request Oct 16, 2018
We are closing stale Pull Requests to make the list more manageable.

Please re-open any Pull Request that has been closed in error.

Closes apache#608
Closes apache#639
Closes apache#640
Closes apache#648
Closes apache#662
Closes apache#668
Closes apache#692
Closes apache#705
Closes apache#724
Closes apache#728
Closes apache#730
Closes apache#753
Closes apache#803
Closes apache#854
Closes apache#922
Closes apache#986
Closes apache#992
Closes apache#1019
Closes apache#1040
Closes apache#1041
Closes apache#1043
Closes apache#1046
Closes apache#1051
Closes apache#1078
Closes apache#1146
Closes apache#1164
Closes apache#1165
Closes apache#1178
Closes apache#1213
Closes apache#1225
Closes apache#1258
Closes apache#1259
Closes apache#1268
Closes apache#1272
Closes apache#1277
Closes apache#1278
Closes apache#1288
Closes apache#1296
Closes apache#1328
Closes apache#1342
Closes apache#1353
Closes apache#1370
Closes apache#1376
Closes apache#1391
Closes apache#1395
Closes apache#1399
Closes apache#1406
Closes apache#1410
Closes apache#1422
Closes apache#1427
Closes apache#1443
Closes apache#1462
Closes apache#1468
Closes apache#1483
Closes apache#1506
Closes apache#1509
Closes apache#1515
Closes apache#1520
Closes apache#1521
Closes apache#1525
Closes apache#1527
Closes apache#1544
Closes apache#1550
Closes apache#1566
Closes apache#1569
Closes apache#1570
Closes apache#1575
Closes apache#1580
Closes apache#1584
Closes apache#1591
Closes apache#1600
Closes apache#1611
Closes apache#1613
Closes apache#1639
Closes apache#1703
Closes apache#1711
Closes apache#1719
Closes apache#1737
Closes apache#1760
Closes apache#1767
Closes apache#1768
Closes apache#1785
Closes apache#1799
Closes apache#1822
Closes apache#1824
Closes apache#1844
Closes apache#1874
Closes apache#1918
Closes apache#1928
Closes apache#1937
Closes apache#1942
Closes apache#1951
Closes apache#1957
Closes apache#1963
Closes apache#1964
Closes apache#1965
Closes apache#1967
Closes apache#1968
Closes apache#1971
Closes apache#1985
Closes apache#1986
Closes apache#1998
Closes apache#2031
Closes apache#2032
Closes apache#2071
Closes apache#2076
Closes apache#2108
Closes apache#2119
Closes apache#2128
Closes apache#2142
Closes apache#2174
Closes apache#2206
Closes apache#2297
Closes apache#2322
Closes apache#2332
Closes apache#2341
Closes apache#2377
Closes apache#2414
Closes apache#2469
d2r pushed a commit to d2r/storm that referenced this pull request Oct 16, 2018
We are closing stale Pull Requests to make the list more manageable.

Please re-open any Pull Request that has been closed in error.

Closes apache#608
Closes apache#639
Closes apache#640
Closes apache#648
Closes apache#662
Closes apache#668
Closes apache#692
Closes apache#705
Closes apache#724
Closes apache#728
Closes apache#730
Closes apache#753
Closes apache#803
Closes apache#854
Closes apache#922
Closes apache#986
Closes apache#992
Closes apache#1019
Closes apache#1040
Closes apache#1041
Closes apache#1043
Closes apache#1046
Closes apache#1051
Closes apache#1078
Closes apache#1146
Closes apache#1164
Closes apache#1165
Closes apache#1178
Closes apache#1213
Closes apache#1225
Closes apache#1258
Closes apache#1259
Closes apache#1268
Closes apache#1272
Closes apache#1277
Closes apache#1278
Closes apache#1288
Closes apache#1296
Closes apache#1328
Closes apache#1342
Closes apache#1353
Closes apache#1370
Closes apache#1376
Closes apache#1391
Closes apache#1395
Closes apache#1399
Closes apache#1406
Closes apache#1410
Closes apache#1422
Closes apache#1427
Closes apache#1443
Closes apache#1462
Closes apache#1468
Closes apache#1483
Closes apache#1506
Closes apache#1509
Closes apache#1515
Closes apache#1520
Closes apache#1521
Closes apache#1525
Closes apache#1527
Closes apache#1544
Closes apache#1550
Closes apache#1566
Closes apache#1569
Closes apache#1570
Closes apache#1575
Closes apache#1580
Closes apache#1584
Closes apache#1591
Closes apache#1600
Closes apache#1611
Closes apache#1613
Closes apache#1639
Closes apache#1703
Closes apache#1711
Closes apache#1719
Closes apache#1737
Closes apache#1760
Closes apache#1767
Closes apache#1768
Closes apache#1785
Closes apache#1799
Closes apache#1822
Closes apache#1824
Closes apache#1844
Closes apache#1874
Closes apache#1918
Closes apache#1928
Closes apache#1937
Closes apache#1942
Closes apache#1951
Closes apache#1957
Closes apache#1963
Closes apache#1964
Closes apache#1965
Closes apache#1967
Closes apache#1968
Closes apache#1971
Closes apache#1985
Closes apache#1986
Closes apache#1998
Closes apache#2031
Closes apache#2032
Closes apache#2071
Closes apache#2076
Closes apache#2108
Closes apache#2119
Closes apache#2128
Closes apache#2142
Closes apache#2174
Closes apache#2206
Closes apache#2297
Closes apache#2322
Closes apache#2332
Closes apache#2341
Closes apache#2377
Closes apache#2414
Closes apache#2469
@asfgit asfgit closed this in #2880 Oct 22, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.