[FLINK-11986] [state backend, tests] Add micro benchmark for state operations #13

carp84 · 2019-03-20T17:00:50Z

Currently we already have benchmarks for the whole backend, but none for finer grained state operations, and here we propose to add more benchmarks, including (but not limited to):

ValueState
- testPut
- testGet
ListState
- testUpdate
- testGet
- testAddAll
MapState
- testPut
- testGet
- testContains
- testKeys
- testValues
- testEntries
- testIterator
- testRemove
- testPutAll

And we will create benchmark for HeapKeyedStateBackend and RocksDBKeyedStateBackend separately.

pnowojski · 2019-03-21T11:22:14Z

Could you paste/link the example full output of the benchmark run? Benchmark scores, including the output of all of the measurement/warm up iterations etc? The things that I would like to know are:

total time required to execute those new benchmarks.
stability of the results

StefanRRichter

Thanks for the work @carp84, I think this is a very good addition to the performance tests. I had a couple of comments, mostly smaller. I think the most imporant ones are about iterating the list state in get and thinking about adding read-modify-write cycle benchmarks. I also had a question about the targeted scenarios (cache/mem/disk). Furthermore, I wonder if it would make sense to extend the tests in the future to include: timer service performances, check/savepointpoint performances, operational performances with concurrently running checkpoints.

src/main/java/org/apache/flink/state/benchmark/HeapListStateBenchmark.java

src/main/java/org/apache/flink/state/benchmark/ValueStateBenchmarkBase.java

src/main/java/org/apache/flink/state/benchmark/ListStateBenchmarkBase.java

StefanRRichter · 2019-03-22T09:56:41Z

src/main/java/org/apache/flink/state/benchmark/ValueStateBenchmarkBase.java

+        backend.setCurrentKey(random.nextLong(setupKeyCount));
+        valueState.value();
+    }
+}


In general, I wonder if adding read-modify-write tests would not be valueable for additional insight.

src/main/java/org/apache/flink/state/benchmark/MapStateBenchmarkBase.java

src/main/java/org/apache/flink/state/benchmark/RocksDBValueStateBenchmark.java

StefanRRichter · 2019-03-22T10:30:39Z

src/main/java/org/apache/flink/state/benchmark/StateBenchmarkConstants.java

+class StateBenchmarkConstants {
+    static final int mapKeyCount = 10;
+    static final int listValueCount = 100;
+    static final int setupKeyCount = 500_000;


I have a question about the choice of values: what are we targeting in the benchmark for heap and rock? For example, for heap are would we expect that random op will be answered from memory or from L2/L3 because the whole dataset can still fit there? Similar question for Rocks, do we measure performance for cached blocks or when hitting disk? Should we target the different alternatives, because the message can be very different - for example what if or Rocks benchmarks look all good but never hit the disk and then there are seeks involved to become terrible for users once they hit the disk?

We set the values large enough to trigger disk operations for rocksdb, and confirm this by checking rocksdb log to see whether there's flush/compaction happened. However, whether we should check the all-fit-in-memory case for rocksdb is a question. What's your opinion? Thanks.

carp84 · 2019-03-22T20:12:23Z

Could you paste/link the example full output of the benchmark run? Benchmark scores, including the output of all of the measurement/warm up iterations etc? The things that I would like to know are:

total time required to execute those new benchmarks.

stability of the results

Thanks for review @pnowojski. Let me get the data after resolving Stefan's review comments since possibly the change will affect the result.

carp84 · 2019-03-22T20:17:53Z

Furthermore, I wonder if it would make sense to extend the tests in the future to include: timer service performances, check/savepointpoint performances, operational performances with concurrently running checkpoints.

We also have internal benchmarks for timer service and checkpoint performance, and let me do upstreaming one by one (smile). For performance with concurrently running checkpoints I agree we need to add one.

carp84 · 2019-03-26T09:50:38Z

Here comes the total time and stability of the results from 2 rounds:

JMH configuration:

JMH version: 1.19
VM version: JDK 1.8.0_102, VM 25.102-b52
VM options: -Djava.rmi.server.hostname=127.0.0.1 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.ssl
Warmup: 10 iterations, 1 s each
Measurement: 10 iterations, 1 s each
Timeout: 10 min per iteration
Threads: 1 thread, will synchronize iterations
Benchmark mode: Throughput, ops/time

Round 1 time: Run complete. Total time: 00:43:34
Round 2 time: Run complete. Total time: 00:43:44
Result stability

Benchmark	Mode	Cnt	Score	Error	Units
HeapListStateBenchmark.test1Update#1	thrpt	30	2439.935	±115.324	ops/ms
HeapListStateBenchmark.test1Update#2	thrpt	30	2415.013	±122.071	ops/ms
HeapListStateBenchmark.test2Add#1	thrpt	30	3140.404	±152.354	ops/ms
HeapListStateBenchmark.test2Add#2	thrpt	30	3051.865	±244.304	ops/ms
HeapListStateBenchmark.test3Get#1	thrpt	30	1940.100	±71.786	ops/ms
HeapListStateBenchmark.test3Get#2	thrpt	30	1869.135	±109.670	ops/ms
HeapListStateBenchmark.test4GetAndIterate#1	thrpt	30	1857.606	±79.844	ops/ms
HeapListStateBenchmark.test4GetAndIterate#2	thrpt	30	1788.352	±93.340	ops/ms
HeapListStateBenchmark.test5AddAll#1	thrpt	30	294.972	±122.322	ops/ms
HeapListStateBenchmark.test5AddAll#2	thrpt	30	285.633	±113.834	ops/ms
HeapMapStateBenchmark.test1Add#1	thrpt	30	1991.866	±109.012	ops/ms
HeapMapStateBenchmark.test1Add#2	thrpt	30	2033.461	±48.315	ops/ms
HeapMapStateBenchmark.test1Update#1	thrpt	30	1409.441	±80.476	ops/ms
HeapMapStateBenchmark.test1Update#2	thrpt	30	1527.825	±53.739	ops/ms
HeapMapStateBenchmark.test2Get#1	thrpt	30	1396.715	±42.510	ops/ms
HeapMapStateBenchmark.test2Get#2	thrpt	30	1413.608	±73.645	ops/ms
HeapMapStateBenchmark.test3Contains#1	thrpt	30	1507.149	±54.452	ops/ms
HeapMapStateBenchmark.test3Contains#2	thrpt	30	1550.425	±76.011	ops/ms
HeapMapStateBenchmark.test4Keys#1	thrpt	30	9311.358	±236.535	ops/ms
HeapMapStateBenchmark.test4Keys#2	thrpt	30	9593.196	±369.914	ops/ms
HeapMapStateBenchmark.test5Values#1	thrpt	30	9100.976	±267.255	ops/ms
HeapMapStateBenchmark.test5Values#2	thrpt	30	9288.619	±209.923	ops/ms
HeapMapStateBenchmark.test6Entries#1	thrpt	30	8723.019	±186.056	ops/ms
HeapMapStateBenchmark.test6Entries#2	thrpt	30	9448.083	±273.444	ops/ms
HeapMapStateBenchmark.test7Iterator#1	thrpt	30	9522.637	±344.523	ops/ms
HeapMapStateBenchmark.test7Iterator#2	thrpt	30	9351.307	±210.986	ops/ms
HeapMapStateBenchmark.test8Remove#1	thrpt	30	1743.482	±74.815	ops/ms
HeapMapStateBenchmark.test8Remove#2	thrpt	30	1903.833	±105.975	ops/ms
HeapMapStateBenchmark.test9PutAll#1	thrpt	30	756.448	±26.231	ops/ms
HeapMapStateBenchmark.test9PutAll#2	thrpt	30	786.871	±38.946	ops/ms
HeapValueStateBenchmark.test1Update#1	thrpt	30	2048.441	±101.184	ops/ms
HeapValueStateBenchmark.test1Update#2	thrpt	30	2022.074	±178.275	ops/ms
HeapValueStateBenchmark.test2Add#1	thrpt	30	5704.857	±442.342	ops/ms
HeapValueStateBenchmark.test2Add#2	thrpt	30	5264.967	±938.853	ops/ms
HeapValueStateBenchmark.test3Get#1	thrpt	30	2001.197	±78.446	ops/ms
HeapValueStateBenchmark.test3Get#2	thrpt	30	2099.319	±96.104	ops/ms
RocksDBListStateBenchmark.test1Update#1	thrpt	30	187.203	±11.839	ops/ms
RocksDBListStateBenchmark.test1Update#2	thrpt	30	189.827	±11.974	ops/ms
RocksDBListStateBenchmark.test2Add#1	thrpt	30	191.165	±10.647	ops/ms
RocksDBListStateBenchmark.test2Add#2	thrpt	30	193.191	±9.661	ops/ms
RocksDBListStateBenchmark.test3Get#1	thrpt	30	428.343	±40.097	ops/ms
RocksDBListStateBenchmark.test3Get#2	thrpt	30	403.014	±47.031	ops/ms
RocksDBListStateBenchmark.test4GetAndIterate#1	thrpt	30	421.022	±36.899	ops/ms
RocksDBListStateBenchmark.test4GetAndIterate#2	thrpt	30	433.174	±40.616	ops/ms
RocksDBListStateBenchmark.test5AddAll#1	thrpt	30	102.472	±55.705	ops/ms
RocksDBListStateBenchmark.test5AddAll#2	thrpt	30	106.704	±54.815	ops/ms
RocksDBMapStateBenchmark.test1Add#1	thrpt	30	349.681	±36.756	ops/ms
RocksDBMapStateBenchmark.test1Add#2	thrpt	30	342.682	±40.899	ops/ms
RocksDBMapStateBenchmark.test1Update#1	thrpt	30	350.764	±31.309	ops/ms
RocksDBMapStateBenchmark.test1Update#2	thrpt	30	354.110	±37.487	ops/ms
RocksDBMapStateBenchmark.test2Get#1	thrpt	30	45.117	±0.729	ops/ms
RocksDBMapStateBenchmark.test2Get#2	thrpt	30	45.715	±0.820	ops/ms
RocksDBMapStateBenchmark.test3Contains#1	thrpt	30	45.824	±0.620	ops/ms
RocksDBMapStateBenchmark.test3Contains#2	thrpt	30	46.671	±0.592	ops/ms
RocksDBMapStateBenchmark.test4Keys#1	thrpt	30	323.453	±12.886	ops/ms
RocksDBMapStateBenchmark.test4Keys#2	thrpt	30	318.254	±10.652	ops/ms
RocksDBMapStateBenchmark.test5Values#1	thrpt	30	320.683	±9.605	ops/ms
RocksDBMapStateBenchmark.test5Values#2	thrpt	30	316.464	±13.635	ops/ms
RocksDBMapStateBenchmark.test6Entries#1	thrpt	30	239.602	±11.713	ops/ms
RocksDBMapStateBenchmark.test6Entries#2	thrpt	30	240.108	±13.358	ops/ms
RocksDBMapStateBenchmark.test7Iterator#1	thrpt	30	322.243	±9.507	ops/ms
RocksDBMapStateBenchmark.test7Iterator#2	thrpt	30	316.570	±11.002	ops/ms
RocksDBMapStateBenchmark.test8Remove#1	thrpt	30	356.717	±25.874	ops/ms
RocksDBMapStateBenchmark.test8Remove#2	thrpt	30	351.623	±29.733	ops/ms
RocksDBMapStateBenchmark.test9PutAll#1	thrpt	30	88.432	±6.160	ops/ms
RocksDBMapStateBenchmark.test9PutAll#2	thrpt	30	87.454	±5.779	ops/ms
RocksDBValueStateBenchmark.test1Update#1	thrpt	30	343.421	±23.262	ops/ms
RocksDBValueStateBenchmark.test1Update#2	thrpt	30	337.099	±23.011	ops/ms
RocksDBValueStateBenchmark.test2Add#1	thrpt	30	351.838	±19.350	ops/ms
RocksDBValueStateBenchmark.test2Add#2	thrpt	30	330.423	±27.198	ops/ms
RocksDBValueStateBenchmark.test3Get#1	thrpt	30	556.239	±25.828	ops/ms

pnowojski · 2019-04-15T09:26:47Z

@carp84 that's a lot of benchmarks :)

What is going on with HeapValueStateBenchmark.test2Add#2? (It had a huge spread)

I'm thinking about modifying our speed center setup, so that it won't be flooded/overloaded with number of benchmarks. I think a solution might be to start using projects. We could keep current benchmarks in Flink project, while we could add all of the benchmarks from this PR to a State Backends projects. I have played along with this by manually adding some results and you can see the result here on the left:

Executable
  Flink // <----- project #1
    - Flink 
  State Backends // <------ project #2
    - State Backends

It looks like this would allow us to group the benchmark & results together.

In order to make it fully work, we would need couple of more things:

modify save_jmh_result.py script and add optional parameters for project and executable (lines 61 & 62).
research how could we differentiate between various type of benchmarks here. Currently there is a jenkins job that runs all benchmarks defined in this repository and then uploads them using those two commands:

sh "mvn -Dflink.version=`cat ../flink-version` clean install exec:exec"
sh 'python save_jmh_result.py --environment Hetzner --branch master --commit COMMIT --codespeed URL'

we would need to research how can we modify the first command to execute either Flink or State Backends benchmarks. Then we would upload them using second command (two different executions) by passing correct values for the parameters added in step 1.

Thanks to that we might be able to have two independent jenkins jobs - one for State Backends and another for Flink benchmarks. Could come in handy if you do not want to wait for all of the benchmarks to complete and you are only interested in one of them.

Do you think it makes sense?

carp84 · 2019-04-16T03:44:02Z

What is going on with HeapValueStateBenchmark.test2Add#2?

Didn't notice this and probably due to environment variance. Let me double check.

We could keep current benchmarks in Flink project, while we could add all of the benchmarks from this PR to a State Backends projects.

Agreed to use separate projects. Only that the naming seems a little bit strange since backends also belong to Flink (smile). And internally we also have micro benchmarks for checkpoint and timer-service and plan to upstream later (as mentioned above), so maybe it worth some efforts on categorizing.

we would need to research how can we modify the first command to execute either Flink or State Backends benchmarks.

Please check the 3rd commit here, which generate a shaded jmh jar and we could use command like java -jar target/benchmarks.jar -rf csv org.apache.flink.state.benchmark.* to run different benchmark (and save result separately) against the shaded jar w/ the commit, which is the way how I generated the above results.

pnowojski · 2019-04-17T09:17:28Z

Only that the naming seems a little bit strange since backends also belong to Flink (smile).

Agree :( However we are re-using here a code speed tool from PyPy project, that is not widely adopted and seems tailor suited just for their single use case, so unless we want to develop UI from scratch or modify code speed, we have to dance around those kind of issues 😒 I'm open to other suggestions.

java -jar target/benchmarks.jar -rf csv org.apache.flink.state.benchmark.*

This looks ok as long as we will be able to integrate this command with jenkins job running the benchmarks.

carp84 · 2019-04-29T10:11:05Z

Have setup a demo in our speed center and it successfully reflected the effect of a recent improvement:

carp84 · 2019-04-29T10:14:54Z

@StefanRRichter @pnowojski Mind take a look at the latest commit and let me know if any comments? Thanks.

And if current codes look good, I plan to remove the numbering in method names (like from test1ListUpdate to testListUpdate) and clean up all demo data on our codespeed center.

pnowojski

Thanks @carp84 for the update and integration with Jenkins :) It looks good. Couple of comments from my side.

Also I would actually also drop test prefix from the test names (rename test1ListUpdate to just listUpdate). I know that this doesn't follow the usual (and ours) Java coding style convention, but in the UI this test prefix doesn't help with anything and just takes more place.

src/main/java/org/apache/flink/state/benchmark/ListStateBenchmarkBase.java

pnowojski

I think the change LGTM. Unfortunately we do not have any CI hooked in here, so I assume everything compiles and works well with Jenkins? :)

@StefanRRichter do you have some more comments? (you have pending "changes requested").

StefanRRichter · 2019-04-30T12:03:04Z

@pnowojski No, my request have been addressed.

carp84 · 2019-04-30T12:08:05Z

so I assume everything compiles and works well with Jenkins?

Yes, please refer to this Jenkins job which is a dry run. :-)

And thanks all for review! @pnowojski @StefanRRichter

pnowojski · 2019-04-30T12:17:01Z

Ok :)

One last comment. Can you @carp84 squash the commits together, except of Add support to generate shaded benchmark package to allow run specifi... and can you update/modify the README.md to document the new way to run only selected benchmarks? (basically the java -jar target/benchmarks.jar -rf csv org.apache.flink.state.benchmark.* command)

…erations

…c case in command line With this change we could run below command in shell instead of updating the pom file manually: java -jar target/benchmarks.jar -rf csv org.apache.flink.state.benchmark.*

carp84 · 2019-04-30T13:42:43Z

Updated, please check and let me know whether it looks good, thanks. @pnowojski

pnowojski

LGTM, merging. Thanks for the big contribution @carp84 !

This closes dataArtisans#13

pnowojski requested a review from StefanRRichter March 21, 2019 11:22

StefanRRichter suggested changes Mar 22, 2019

View reviewed changes

pnowojski suggested changes Apr 29, 2019

View reviewed changes

src/main/java/org/apache/flink/state/benchmark/ListStateBenchmarkBase.java Outdated Show resolved Hide resolved

src/main/java/org/apache/flink/state/benchmark/ListStateBenchmarkBase.java Outdated Show resolved Hide resolved

pnowojski approved these changes Apr 30, 2019

View reviewed changes

StefanRRichter approved these changes Apr 30, 2019

View reviewed changes

carp84 added 2 commits April 30, 2019 21:21

[FLINK-11986] [state backend, tests] Add micro benchmark for state op…

f62c1ee

…erations

Add support to generate shaded benchmark package to allow run specifi…

2ea93ec

…c case in command line With this change we could run below command in shell instead of updating the pom file manually: java -jar target/benchmarks.jar -rf csv org.apache.flink.state.benchmark.*

carp84 force-pushed the flink-11986 branch from dea6ba1 to 2ea93ec Compare April 30, 2019 13:41

pnowojski approved these changes Apr 30, 2019

View reviewed changes

pnowojski merged commit 8fad36e into dataArtisans:master Apr 30, 2019

pnowojski mentioned this pull request May 10, 2019

Why state-backend's benchmark including time for job submitting and task deploying. #11

Closed

Myasuka pushed a commit to Myasuka/flink-benchmarks that referenced this pull request Jul 15, 2021

[FLINK-22881] Parameterize InputBenchmark test to use FLIP-27 source

66b6c7c

This closes dataArtisans#13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FLINK-11986] [state backend, tests] Add micro benchmark for state operations #13

[FLINK-11986] [state backend, tests] Add micro benchmark for state operations #13

carp84 commented Mar 20, 2019

pnowojski commented Mar 21, 2019

StefanRRichter left a comment

StefanRRichter Mar 22, 2019

StefanRRichter Mar 22, 2019

carp84 Mar 22, 2019

carp84 commented Mar 22, 2019

carp84 commented Mar 22, 2019

carp84 commented Mar 26, 2019 •

edited

pnowojski commented Apr 15, 2019

carp84 commented Apr 16, 2019 •

edited

pnowojski commented Apr 17, 2019

carp84 commented Apr 29, 2019

carp84 commented Apr 29, 2019 •

edited

pnowojski left a comment •

edited

pnowojski left a comment

StefanRRichter commented Apr 30, 2019

carp84 commented Apr 30, 2019 •

edited

pnowojski commented Apr 30, 2019

carp84 commented Apr 30, 2019

pnowojski left a comment

[FLINK-11986] [state backend, tests] Add micro benchmark for state operations #13

[FLINK-11986] [state backend, tests] Add micro benchmark for state operations #13

Conversation

carp84 commented Mar 20, 2019

pnowojski commented Mar 21, 2019

StefanRRichter left a comment

Choose a reason for hiding this comment

StefanRRichter Mar 22, 2019

Choose a reason for hiding this comment

StefanRRichter Mar 22, 2019

Choose a reason for hiding this comment

carp84 Mar 22, 2019

Choose a reason for hiding this comment

carp84 commented Mar 22, 2019

carp84 commented Mar 22, 2019

carp84 commented Mar 26, 2019 • edited

pnowojski commented Apr 15, 2019

carp84 commented Apr 16, 2019 • edited

pnowojski commented Apr 17, 2019

carp84 commented Apr 29, 2019

carp84 commented Apr 29, 2019 • edited

pnowojski left a comment • edited

Choose a reason for hiding this comment

pnowojski left a comment

Choose a reason for hiding this comment

StefanRRichter commented Apr 30, 2019

carp84 commented Apr 30, 2019 • edited

pnowojski commented Apr 30, 2019

carp84 commented Apr 30, 2019

pnowojski left a comment

Choose a reason for hiding this comment

carp84 commented Mar 26, 2019 •

edited

carp84 commented Apr 16, 2019 •

edited

carp84 commented Apr 29, 2019 •

edited

pnowojski left a comment •

edited

carp84 commented Apr 30, 2019 •

edited