test(broker): add state controller performance tests #13884

npepinpe · 2023-08-14T14:43:34Z

Description

This PR adds a new performance test to track the StateController#recover process, which includes copying a snapshot and opening the underlying RocksDB state. Specifically, it measures the case where the state is considered "large", up to 6GB.

Note
The way to compute the snapshot size while we're generating it is far from accurate, since RocksDB is working in the background, performing compaction, flushing, etc., so the size is constantly fluctuating. This means that, typically, the final state size is less than the desired targeted size. We could add some factor as a heuristic, but it's unclear to me if that's super useful.

For now, we log the desired and actual snapshot size. I'm a bit worried however that this will significantly skew results, making it hard to assign a correct reference score and maximum deviation.

To simplify adding this test and future tests, a new JMHTestCase utility is added, which lets you easily write a JUnit test that runs a benchmark and asserts its result. Additionally, a new JMHTest annotation is added, which is coupled with a JUnit 5 extension that will inject a pre-configurd JMHTestCase into a test method as a parameter. The annotation ensures all JMH tests will have an appropriate tag and will be excluded from the main CI pipeline, but included in the performance test job.

Note
All of this can be expanded later as we need more stuff.

There are some notable downsides to this approach of running JMH tests however:

Since JMH forks, it's not possible to debug the tests easily locally. To do so, you have to disable forking. Not too difficult, but not obvious when you're not familiar with it.
The reference score is typically different locally and on CI, and we already see that it sometimes randomly fluctuate on CI. Would be nice to find a better approach.
When a benchmark fails with an error, the actual assertion error is no benchmarks were run. This is pretty bad; luckily, the benchmark logs are present in the test output, but it means you have to wait for the complete workflow to finish before being able to grab them.

Related issues

related #13775

Definition of Done

Not all items need to be done depending on the issue and the pull request.

Code changes:

The changes are backwards compatibility with previous versions
If it fixes a bug then PRs are created to backport the fix to the last two minor versions. You can trigger a backport by assigning labels (e.g. backport stable/1.3) to the PR, in case that fails you need to create backports manually.

Testing:

There are unit/integration tests that verify all acceptance criterias of the issue
New tests are written to ensure backwards compatibility with further versions
The behavior is tested manually
The change has been verified by a QA run
The impact of the changes is verified by a benchmark

Documentation:

The documentation is updated (e.g. BPMN reference, configuration, examples, get-started guides, etc.)
If the PR changes how BPMN processes are validated (e.g. support new BPMN element) then the Camunda modeling team should be informed to adjust the BPMN linting.

Other teams:
If the change impacts another team an issue has been created for this team, explaining what they need to do to support this change.

Please refer to our review guidelines.

npepinpe · 2023-08-14T14:45:32Z

@Zelldon - for now just check that the JMH test makes sense.

It creates a ZeebeDb with all the 57 column families, and adds 1000 (or 10,000 in CI) keys to each, with random key/value (default size of 4kb per key and per value, so 8kb per entry). So in CI this is about 4GB of state, and locally a few hundred MBs.

I think for this specific test we don't really care about the actual data so this should be fine, but of course it could be that RocksDB internals change things slightly. I don't expect significantly though.

This doesn't include any actual fix yet, by the way 🙃

npepinpe · 2023-08-14T14:48:51Z

Early tests with a dumb hacky fix (see below) show a major improvement. With the current approach, you can recover a 4GB state about once per two seconds (or 0.5 times per second). With the hack, you can do it about 6 times per second.

Will be interesting to push further with bigger sizes, but locally I didn't want to test with 20-30 GB 😄

Here's the hack:

 final var snapshotFile = snapshot.getPath().toFile();
            try {
              final var db = RocksDB.openReadOnly(snapshotFile.getAbsolutePath());
              Checkpoint.create(db).createCheckpoint(runtimeDirectory.toString());
              openDb(future);
            } catch (final RocksDBException e) {
              future.completeExceptionally(new RuntimeException(
                  String.format("Failed to recover from snapshot %s", snapshot.getId()),
                  e));
            }

That replaces the snapshotStore.copy call and callback.

...io/camunda/zeebe/broker/system/partitions/impl/perf/LargeStateControllerPerformanceTest.java

Zelldon

Great! Thanks @npepinpe it is cool that we add more of such tests 👍🏼 I had just some smaller comments or thoughts.

...c/test/java/io/camunda/zeebe/broker/system/partitions/impl/perf/JMHStatePerformanceTest.java

broker/src/test/java/io/camunda/zeebe/broker/system/partitions/impl/perf/TestState.java

.github/workflows/ci.yml

npepinpe · 2023-08-15T14:57:52Z

With 18bcf57, I added some utilities to easily create junit tests which run JMH benchmarks.

You can see examples in c4c061f

While I moved the test method to the same class as the benchmark, it can still be moved away, and there's some arguments to be made for keeping them separate for sure. I'm not strongly for or against, let me know what you think. We can still get rid of the JUnit extension and use only the utilities, but imo the extension is nice to ensure all of the tests share the same tag, and as such as included in the appropriate job (and still runnable locally).

npepinpe · 2023-08-15T15:25:01Z

@Zelldon - I think I addressed everything. Let me know if there's anything else. I'll get started on the actual fix now, though we could already merge this PR even without the fix to be honest, to avoid the fix PR being too big.

Zelldon · 2023-08-15T15:26:19Z

Will take a look tomorrow

...io/camunda/zeebe/broker/system/partitions/impl/perf/LargeStateControllerPerformanceTest.java

npepinpe · 2023-08-15T15:54:50Z

Note, switching from environment variables to tags for enabling groups of tests (which is the usual JUnit 5 way) has the downside that if you run all tests for a module in your IDE, it will run the performance tests. I think before it would skip the class.

...io/camunda/zeebe/broker/system/partitions/impl/perf/LargeStateControllerPerformanceTest.java

npepinpe · 2023-08-15T19:49:31Z

Another downside right now, the performance test job is quite slow. Since broker depends on the engine, the tests will always run sequentially. The engine test takes 6 minutes, and now the broker one takes 3 minutes. That's 9 minutes alone, so it's often the last job to finish 😅

npepinpe · 2023-08-15T20:04:26Z

One way to get a more accurate snapshot size is to pause and resume background work in RocksDB manually. However that kind of breaks our ZeebeDb abstraction (maybe?)

npepinpe · 2023-08-16T05:47:49Z

Yeah, the way I set up building the state to a certain size is too inaccurate, leading to the variation being too high. 🤔

npepinpe · 2023-08-16T07:16:15Z

I see a few options:

Allow pausing/resuming background work for the DB. This will ensure we compute proper sizes without having to worry about compaction/flushing. Not too keen on this because it's modifying the production API to suit tests =/
Close the DB between batch inserts, which will give us an idea of the "final" size. This is a bit better, but increases the startup time. This is my preferred approach - it's clean and simple, and it's really only inefficient once, when the state is being set up.
Instead of asserting the deviation, we assert that the score is above a minimum target score. This risks missing future improvements, but since what we're testing is fairly small in scope, it might be fine.

EDIT: I tested closing/opening, and got it to be pretty accurate. It takes about 30 seconds to build a 4GB state, which I think is fine.

npepinpe · 2023-08-16T08:54:46Z

So it keeps fluctuating between 0.5 and 0.65. That's pretty big deviation, and this time the state size is pretty stable, so I don't know where it is. That said, it's a relatively big deviation, but in absolute terms, it means a difference of 300ms, which isn't crazy high.

Zelldon

Great! Thanks @npepinpe, I had smaller comments but I think we are in a pretty good state.

Is the test still flaky?

test-util/src/main/java/io/camunda/zeebe/test/util/jmh/JMHAssert.java

test-util/src/main/java/io/camunda/zeebe/test/util/jmh/JMHTestCase.java

...io/camunda/zeebe/broker/system/partitions/impl/perf/LargeStateControllerPerformanceTest.java

engine/src/test/java/io/camunda/zeebe/engine/perf/EngineLargeStatePerformanceTest.java

Zelldon · 2023-08-16T11:46:28Z

broker/src/test/java/io/camunda/zeebe/broker/system/partitions/impl/perf/TestState.java

    data.wrapBuffer(new UnsafeBuffer(buffer));

    return data;
  }

+  private static long computeSnapshotSize(final Path root) {
+    try (final var files = Files.walk(root)) {


💭 I guess taking checkpoints would also cause to flush and make it easier to estimate. Or isn't there also a statistics property for this ? 🤔

The property is (according to their wiki) an estimate of the live SST data. I have no idea how accurate it is, that's why I went with checking the file size 🤷 Since it includes L0, I expect it would also fluctuate just as much if we don't stop background work somehow.

Good point about taking a checkpoint, but here we're not taking a checkpoint, so it shouldn't have any impact.

.github/workflows/ci.yml

npepinpe · 2023-08-16T12:01:30Z

With a 20% max deviation it's not flaky, as it falls within these bounds. Though considering the engine is sometimes flaky, I'm also not 100% confident.

Like I suggested above, it might be that we want to take a different approach than measuring some expected deviation from a reference score. For example, we could only assert that we meet a specific target (e.g. score is greater than some fixed lower bound). Downside here is that thresholds will be different everywhere it's being run, so it may pass locally, but fail in CI. I guess that's still the case with the max deviation though 😅

npepinpe · 2023-08-16T12:49:13Z

The only remaining thing imo is that the job is the slowest of all now 😄 And the more tests we add, the slower it's going to get. We should eventually look into speeding it up, since each test takes minutes to execute, and likely get executed sequentially...

npepinpe · 2023-08-16T13:16:14Z

bors merge

zeebe-bors-camunda · 2023-08-16T13:29:44Z

Build succeeded:

Test summary

npepinpe requested a review from Zelldon August 14, 2023 14:43

npepinpe linked an issue Aug 14, 2023 that may be closed by this pull request

Improve ZeebePartition recovery time with large state #13775

Closed

github-advanced-security bot found potential problems Aug 14, 2023

View reviewed changes

Zelldon approved these changes Aug 15, 2023

View reviewed changes

github-advanced-security bot found potential problems Aug 15, 2023

View reviewed changes

...io/camunda/zeebe/broker/system/partitions/impl/perf/LargeStateControllerPerformanceTest.java Fixed Show fixed Hide fixed

github-advanced-security bot found potential problems Aug 15, 2023

View reviewed changes

...io/camunda/zeebe/broker/system/partitions/impl/perf/LargeStateControllerPerformanceTest.java Fixed Show fixed Hide fixed

npepinpe marked this pull request as ready for review August 15, 2023 19:49

Zelldon approved these changes Aug 16, 2023

View reviewed changes

feat(test-util): add JMH test extension and utilities

33a7c8b

npepinpe force-pushed the np-13775-state-recover-perf branch from 12ed073 to 43c3910 Compare August 16, 2023 12:28

refactor(parent): override default exclude settings

d55c7c5

npepinpe force-pushed the np-13775-state-recover-perf branch from 43c3910 to 723235f Compare August 16, 2023 12:31

npepinpe added 2 commits August 16, 2023 14:35

test(broker): add state controller performance tests

acc65a4

refactor: run performance tests in parallel

0287f2d

npepinpe force-pushed the np-13775-state-recover-perf branch from 30d491e to 0287f2d Compare August 16, 2023 12:35

zeebe-bors-camunda bot merged commit 568f246 into main Aug 16, 2023
32 checks passed

zeebe-bors-camunda bot deleted the np-13775-state-recover-perf branch August 16, 2023 13:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(broker): add state controller performance tests #13884

test(broker): add state controller performance tests #13884

npepinpe commented Aug 14, 2023 •

edited

npepinpe commented Aug 14, 2023 •

edited

npepinpe commented Aug 14, 2023

Zelldon left a comment

npepinpe commented Aug 15, 2023

npepinpe commented Aug 15, 2023 •

edited

Zelldon commented Aug 15, 2023

npepinpe commented Aug 15, 2023

npepinpe commented Aug 15, 2023

npepinpe commented Aug 15, 2023

npepinpe commented Aug 16, 2023

npepinpe commented Aug 16, 2023 •

edited

npepinpe commented Aug 16, 2023

Zelldon left a comment

Zelldon Aug 16, 2023

npepinpe Aug 16, 2023

npepinpe Aug 16, 2023

npepinpe commented Aug 16, 2023

npepinpe commented Aug 16, 2023

npepinpe commented Aug 16, 2023

zeebe-bors-camunda bot commented Aug 16, 2023

test(broker): add state controller performance tests #13884

test(broker): add state controller performance tests #13884

Conversation

npepinpe commented Aug 14, 2023 • edited

Description

Related issues

Definition of Done

npepinpe commented Aug 14, 2023 • edited

npepinpe commented Aug 14, 2023

Zelldon left a comment

Choose a reason for hiding this comment

npepinpe commented Aug 15, 2023

npepinpe commented Aug 15, 2023 • edited

Zelldon commented Aug 15, 2023

npepinpe commented Aug 15, 2023

npepinpe commented Aug 15, 2023

npepinpe commented Aug 15, 2023

npepinpe commented Aug 16, 2023

npepinpe commented Aug 16, 2023 • edited

npepinpe commented Aug 16, 2023

Zelldon left a comment

Choose a reason for hiding this comment

Zelldon Aug 16, 2023

Choose a reason for hiding this comment

npepinpe Aug 16, 2023

Choose a reason for hiding this comment

npepinpe Aug 16, 2023

Choose a reason for hiding this comment

npepinpe commented Aug 16, 2023

npepinpe commented Aug 16, 2023

npepinpe commented Aug 16, 2023

zeebe-bors-camunda bot commented Aug 16, 2023

npepinpe commented Aug 14, 2023 •

edited

npepinpe commented Aug 14, 2023 •

edited

npepinpe commented Aug 15, 2023 •

edited

npepinpe commented Aug 16, 2023 •

edited