[FLINK-11986] [state backend, tests] Add micro benchmark for state operations #13
Conversation
Could you paste/link the example full output of the benchmark run? Benchmark scores, including the output of all of the measurement/warm up iterations etc? The things that I would like to know are:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the work @carp84, I think this is a very good addition to the performance tests. I had a couple of comments, mostly smaller. I think the most imporant ones are about iterating the list state in get and thinking about adding read-modify-write cycle benchmarks. I also had a question about the targeted scenarios (cache/mem/disk). Furthermore, I wonder if it would make sense to extend the tests in the future to include: timer service performances, check/savepointpoint performances, operational performances with concurrently running checkpoints.
src/main/java/org/apache/flink/state/benchmark/HeapListStateBenchmark.java
Outdated
Show resolved
Hide resolved
src/main/java/org/apache/flink/state/benchmark/ValueStateBenchmarkBase.java
Outdated
Show resolved
Hide resolved
src/main/java/org/apache/flink/state/benchmark/ListStateBenchmarkBase.java
Outdated
Show resolved
Hide resolved
src/main/java/org/apache/flink/state/benchmark/ListStateBenchmarkBase.java
Outdated
Show resolved
Hide resolved
backend.setCurrentKey(random.nextLong(setupKeyCount)); | ||
valueState.value(); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, I wonder if adding read-modify-write tests would not be valueable for additional insight.
src/main/java/org/apache/flink/state/benchmark/MapStateBenchmarkBase.java
Outdated
Show resolved
Hide resolved
src/main/java/org/apache/flink/state/benchmark/RocksDBValueStateBenchmark.java
Outdated
Show resolved
Hide resolved
class StateBenchmarkConstants { | ||
static final int mapKeyCount = 10; | ||
static final int listValueCount = 100; | ||
static final int setupKeyCount = 500_000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a question about the choice of values: what are we targeting in the benchmark for heap and rock? For example, for heap are would we expect that random op will be answered from memory or from L2/L3 because the whole dataset can still fit there? Similar question for Rocks, do we measure performance for cached blocks or when hitting disk? Should we target the different alternatives, because the message can be very different - for example what if or Rocks benchmarks look all good but never hit the disk and then there are seeks involved to become terrible for users once they hit the disk?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We set the values large enough to trigger disk operations for rocksdb, and confirm this by checking rocksdb log to see whether there's flush/compaction happened. However, whether we should check the all-fit-in-memory case for rocksdb is a question. What's your opinion? Thanks.
Thanks for review @pnowojski. Let me get the data after resolving Stefan's review comments since possibly the change will affect the result. |
We also have internal benchmarks for timer service and checkpoint performance, and let me do upstreaming one by one (smile). For performance with concurrently running checkpoints I agree we need to add one. |
Here comes the total time and stability of the results from 2 rounds:
|
@carp84 that's a lot of benchmarks :) What is going on with I'm thinking about modifying our speed center setup, so that it won't be flooded/overloaded with number of benchmarks. I think a solution might be to start using
It looks like this would allow us to group the benchmark & results together. In order to make it fully work, we would need couple of more things:
we would need to research how can we modify the first command to execute either Thanks to that we might be able to have two independent jenkins jobs - one for Do you think it makes sense? |
Didn't notice this and probably due to environment variance. Let me double check.
Agreed to use separate projects. Only that the naming seems a little bit strange since backends also belong to Flink (smile). And internally we also have micro benchmarks for checkpoint and timer-service and plan to upstream later (as mentioned above), so maybe it worth some efforts on categorizing.
Please check the 3rd commit here, which generate a shaded jmh jar and we could use command like |
Agree :( However we are re-using here a code speed tool from PyPy project, that is not widely adopted and seems tailor suited just for their single use case, so unless we want to develop UI from scratch or modify code speed, we have to dance around those kind of issues 😒 I'm open to other suggestions.
This looks ok as long as we will be able to integrate this command with jenkins job running the benchmarks. |
Have setup a demo in our speed center and it successfully reflected the effect of a recent improvement: |
@StefanRRichter @pnowojski Mind take a look at the latest commit and let me know if any comments? Thanks. And if current codes look good, I plan to remove the numbering in method names (like from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @carp84 for the update and integration with Jenkins :) It looks good. Couple of comments from my side.
Also I would actually also drop test
prefix from the test names (rename test1ListUpdate
to just listUpdate
). I know that this doesn't follow the usual (and ours) Java coding style convention, but in the UI this test
prefix doesn't help with anything and just takes more place.
src/main/java/org/apache/flink/state/benchmark/ListStateBenchmarkBase.java
Outdated
Show resolved
Hide resolved
src/main/java/org/apache/flink/state/benchmark/ListStateBenchmarkBase.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the change LGTM. Unfortunately we do not have any CI hooked in here, so I assume everything compiles and works well with Jenkins? :)
@StefanRRichter do you have some more comments? (you have pending "changes requested").
@pnowojski No, my request have been addressed. |
Yes, please refer to this Jenkins job which is a dry run. :-) And thanks all for review! @pnowojski @StefanRRichter |
Ok :) One last comment. Can you @carp84 squash the commits together, except of |
…c case in command line With this change we could run below command in shell instead of updating the pom file manually: java -jar target/benchmarks.jar -rf csv org.apache.flink.state.benchmark.*
Updated, please check and let me know whether it looks good, thanks. @pnowojski |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, merging. Thanks for the big contribution @carp84 !
Currently we already have benchmarks for the whole backend, but none for finer grained state operations, and here we propose to add more benchmarks, including (but not limited to):
And we will create benchmark for
HeapKeyedStateBackend
andRocksDBKeyedStateBackend
separately.