-
Notifications
You must be signed in to change notification settings - Fork 13.8k
[FLINK-14818][benchmark] Fix receiving InputGate setup of StreamNetworkBenchmarkEnvironment. #11155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community Automated ChecksLast check on commit fee93ce (Thu Feb 20 10:25:48 UTC 2020) Warnings:
Mention the bot in a comment to re-run the automated checks. Review Progress
Please see the Pull Request Review Guide for a full explanation of the review process. DetailsThe Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commandsThe @flinkbot bot supports the following commands:
|
cfc691c to
21b36d4
Compare
.../java/org/apache/flink/streaming/runtime/io/benchmark/StreamNetworkBenchmarkEnvironment.java
Outdated
Show resolved
Hide resolved
.../java/org/apache/flink/streaming/runtime/io/benchmark/StreamNetworkBenchmarkEnvironment.java
Outdated
Show resolved
Hide resolved
.../java/org/apache/flink/streaming/runtime/io/benchmark/StreamNetworkBenchmarkEnvironment.java
Outdated
Show resolved
Hide resolved
.../java/org/apache/flink/streaming/runtime/io/benchmark/StreamNetworkBenchmarkEnvironment.java
Outdated
Show resolved
Hide resolved
.../java/org/apache/flink/streaming/runtime/io/benchmark/StreamNetworkBenchmarkEnvironment.java
Outdated
Show resolved
Hide resolved
.../java/org/apache/flink/streaming/runtime/io/benchmark/StreamNetworkBenchmarkEnvironment.java
Show resolved
Hide resolved
.../java/org/apache/flink/streaming/runtime/io/benchmark/StreamNetworkBenchmarkEnvironment.java
Outdated
Show resolved
Hide resolved
...main/java/org/apache/flink/runtime/io/network/partition/consumer/SingleInputGateFactory.java
Show resolved
Hide resolved
.../java/org/apache/flink/streaming/runtime/io/benchmark/StreamNetworkBenchmarkEnvironment.java
Outdated
Show resolved
Hide resolved
.../src/main/java/org/apache/flink/runtime/io/network/partition/consumer/LocalInputChannel.java
Outdated
Show resolved
Hide resolved
.../java/org/apache/flink/streaming/runtime/io/benchmark/StreamNetworkBenchmarkEnvironment.java
Outdated
Show resolved
Hide resolved
zhijiangW
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this adjustment @wsry .
I think it is reasonable to fix the invalid number of input gates in benchmark, especially for the buffer amount sensitive scenarios.
It is generally good to me, only left some inline comments. Besides that, I am curious of how this change would affect the existing benchmark results. Could you provide the comparison results?
|
@pnowojski I guess you might also be interested in taking a look. :) |
bcba3ea to
8fe5e3d
Compare
|
I have updated the PR. @zhijiangW |
pnowojski
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good change :) I'm happy that you managed to solve this issue and the change is not that complex.
+1 for the @zhijiangW's doubts about the benchmark code being fragile. If you can figure it out how to make it more stable, that would be great, but if you decide it's too hard/not worth, I'm fine with it.
I would also like to see the results changes (preferably from the benchmarking machine as well)
.../src/main/java/org/apache/flink/runtime/io/network/partition/consumer/LocalInputChannel.java
Show resolved
Hide resolved
.../java/org/apache/flink/streaming/runtime/io/benchmark/StreamNetworkBenchmarkEnvironment.java
Show resolved
Hide resolved
.../java/org/apache/flink/streaming/runtime/io/benchmark/StreamNetworkBenchmarkEnvironment.java
Outdated
Show resolved
Hide resolved
...st/java/org/apache/flink/streaming/runtime/io/benchmark/SingleInputGateBenchmarkFactory.java
Show resolved
Hide resolved
...st/java/org/apache/flink/streaming/runtime/io/benchmark/SingleInputGateBenchmarkFactory.java
Outdated
Show resolved
Hide resolved
...st/java/org/apache/flink/streaming/runtime/io/benchmark/SingleInputGateBenchmarkFactory.java
Outdated
Show resolved
Hide resolved
|
I ran the micro benchmark and found that some of the performance results become better and some go worse. The results are as follows: And after the change: |
07c2811 to
8f263fe
Compare
|
Thanks for the review and comments. @zhijiangW @pnowojski |
|
I find that the number of tcp connections is one of the factors which influence the performance. If I change the connectionIndex from gateIndex to channelIndex which means it remains unchanged compared with the behavior before this change (setup 1000 tcp connections per gate if the number of channels is 1000), the performance results are as follows. From the results, we can see that the performance goes down a bit compared with the results before this change, which should be caused by the decrease of floating buffers. Then what do you think? Should we keep the number of connections unchanged in this PR? @zhijiangW @pnowojski |
|
Thanks for the updates @wsry . As for the issue of connection index, I prefer to keeping it as previous value, then we can only measure the performance effect caused by gate/channel amount. If we want to adjust the connection amount via If the performance regression still exists after getting ride of connection factor, caused by total buffers reduction as you analyzed above, I am a bit torn whether we should increase the default floating buffer setting for these benchmarks in order to keep the performance same as before. Otherwise we might have a bad/invalid curve to trace the performance changes in history term. WDYT? @pnowojski |
|
Also attach the comparison results executed in benchmark machine:
|
zhijiangW
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update @wsry and LGTM now!
I would double execute it in our benchmark machine before merging.
|
@wsry I also updated the benchmark results with the same connection amount in link. The conclusion is that all the cases for network throughput have regression except for 100, 100ms. I am not sure whether we can explain this corner case properly. But I guess it would not block merging since it should be the right way to go. |
…nchmarkEnvironment Before this change, in network benchmark (for example 1000 channels benchmark with 4 record writers) StreamNetworkBenchmarkEnvironment#createInputGate was creating 1000 input gates with 4 input channels each, which doesn't make much sense. This commit is changing that to a single receiver with 4 input gates and each with 1000 channels. It is achieved by providing testing implementations of InputChannels, which are using channel index for requesting subpartitions from ResultPartition, instead of subpartition index. Thanks to that, we can map a single ResultPartition with N subpartitions, to a single instance of InputGate with N channels. The change also influences the benchmark results, overall, the performance goes down a bit because of the decrease of floating buffers and the followings are benchmark results before and after this change: ------------------------------------------------------------------Before---------------------------------------------------------------------- Benchmark (channelsFlushTimeout) Mode Cnt Score Error Units DataSkewStreamNetworkThroughputBenchmarkExecutor.networkSkewedThroughput N/A thrpt 30 17079.534 ± 830.532 ops/ms StreamNetworkBroadcastThroughputBenchmarkExecutor.networkBroadcastThroughput N/A thrpt 30 599.664 ± 13.325 ops/ms StreamNetworkThroughputBenchmarkExecutor.networkThroughput 100,100ms thrpt 30 45629.898 ± 1623.455 ops/ms StreamNetworkThroughputBenchmarkExecutor.networkThroughput 100,100ms,SSL thrpt 30 9817.421 ± 216.075 ops/ms StreamNetworkThroughputBenchmarkExecutor.networkThroughput 1000,1ms thrpt 30 25442.152 ± 968.340 ops/ms StreamNetworkThroughputBenchmarkExecutor.networkThroughput 1000,100ms thrpt 30 27944.285 ± 518.106 ops/ms StreamNetworkThroughputBenchmarkExecutor.networkThroughput 1000,100ms,SSL thrpt 30 7820.549 ± 895.862 ops/ms StreamNetworkLatencyBenchmarkExecutor.networkLatency1to1 N/A avgt 30 13.184 ± 0.093 ms/op ------------------------------------------------------------------After----------------------------------------------------------------------- Benchmark (channelsFlushTimeout) Mode Cnt Score Error Units DataSkewStreamNetworkThroughputBenchmarkExecutor.networkSkewedThroughput N/A thrpt 30 17345.574 ± 370.647 ops/ms StreamNetworkBroadcastThroughputBenchmarkExecutor.networkBroadcastThroughput N/A thrpt 30 608.881 ± 12.054 ops/ms StreamNetworkThroughputBenchmarkExecutor.networkThroughput 100,100ms thrpt 30 41732.518 ± 1109.436 ops/ms StreamNetworkThroughputBenchmarkExecutor.networkThroughput 100,100ms,SSL thrpt 30 9689.525 ± 202.895 ops/ms StreamNetworkThroughputBenchmarkExecutor.networkThroughput 1000,1ms thrpt 30 24106.705 ± 2952.364 ops/ms StreamNetworkThroughputBenchmarkExecutor.networkThroughput 1000,100ms thrpt 30 27509.665 ± 3246.965 ops/ms StreamNetworkThroughputBenchmarkExecutor.networkThroughput 1000,100ms,SSL thrpt 30 7691.287 ± 927.775 ops/ms StreamNetworkLatencyBenchmarkExecutor.networkLatency1to1 N/A avgt 30 12.758 ± 0.147 ms/op
|
@flinkbot run travis re-run the last Travis build |
|
@flinkbot run travis |
|
The failure case in travis is unrelated to this PR, merging |
What is the purpose of the change
In network benchmark (for example 1000 channels benchmark with 4 record writers) StreamNetworkBenchmarkEnvironment#createInputGate creates 1000 input gates with 4 input channels each, which doesn't make much sense. It is expected that either having 4 receivers with single input gate with 1000 channels each, or a single receiver with 4 input gates, with 1000 channels each.
Brief change log
Verifying this change
This change is already covered by existing tests, such as StreamNetworkThroughputBenchmarkTest.
Does this pull request potentially affect one of the following parts:
@Public(Evolving): (yes / no)Documentation