This repository has been archived by the owner on Aug 3, 2020. It is now read-only.
[FLINK-15171] fix issue with netty shuffle buffer allocation skewing benchmark results #43
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a follow-up to the upstream FLINK-15171 PR.
Currently most of the benchmarks use a single
FlinkEnvironmentContext
to use for running test jobs. While runningSerializationFrameworkMiniBenchmarks
suite, I've found that there is still quite a lot of time spent on cluster initialization right inside the benchmarking code. The following image isSerializationFrameworkMiniBenchmarks.serializerTuple
running with async-profiler:The giant hill on the left side is actually nothing more than netty shuffle buffer allocation (which is 1gb by default) and it takes quite a lot of time:
I propose lowering the shuffle buffer size in the
FlinkEnvironmentContext
from default 1gb to something more reasonable like 8m, so it will eliminate this skew in the benchmark results.After this PR, flame graph for
SerializationFrameworkMiniBenchmarks.serializerTuple
looks much more representative, and the cluster init taking 10% of the time is gone:Other option may be to rework the
FlinkEnvironmentContext
in a way so it will directly invoke theLocalExecutor
, starting theMiniCluster
once per the whole microbenchmark suite. But looks like it may require changes in the upstreamLocalExecutor
code.