Allow configuring runtime directory #11772

deepthidevaki · 2023-02-21T13:33:59Z

Description

Add new configuration for specifying runtime directory.

zeebe:
   broker:
      data:
         directory: data
         runtime: runtime

The default value for zeebe.broker.data.runtime is null. If it is null, runtime is stored in the same location as before in the data directory. If any other value is configured, it will be used in the following directory structure:

runtime/
├─ 1/
|  ├─ 001.sst/
|  ├─ 002.sst/
├─ 2/
|  ├─ 001.sst/

Related issues

closes #6044

This allows users to configure a separate runtime directory. To keep the default behavior, if no runtime directory is configured, it will use the same old location in the data directory.

github-actions · 2023-02-21T13:49:36Z

Test Results

  996 files +    1   996 suites +1 1h 50m 1s ⏱️ + 7m 40s
8 186 tests +372 8 177 ✔️ +372 9 💤 ±0 0 ❌ ±0
8 382 runs +372 8 373 ✔️ +372 9 💤 ±0 0 ❌ ±0

Results for commit 575e302. ± Comparison against base commit fce9b13.

This pull request removes 432 and adds 804 tests. Note that renamed tests count towards both.

DmnEvaluationTest If successfully evaluated, the output ‑ Should return a message pack output[6] value={z=[1, 2, 3], y=true, x=1}
io.camunda.zeebe.engine.processing.bpmn.activity.OutputMappingTest ‑ shouldApplyOutputMapping[0: io.camunda.zeebe.model.bpmn.impl.BpmnModelInstanceImpl@42675041]
io.camunda.zeebe.engine.processing.bpmn.activity.OutputMappingTest ‑ shouldApplyOutputMapping[1: io.camunda.zeebe.model.bpmn.impl.BpmnModelInstanceImpl@81cba04]
io.camunda.zeebe.engine.processing.processinstance.CreateProcessInstanceSupportedElementTest ‑ testProcessInstanceCanStartAtElementType[Scenario[type=BUSINESS_RULE_TASK, modelInstance=io.camunda.zeebe.model.bpmn.impl.BpmnModelInstanceImpl@6ddc9001, variables={}]]
io.camunda.zeebe.engine.processing.processinstance.CreateProcessInstanceSupportedElementTest ‑ testProcessInstanceCanStartAtElementType[Scenario[type=CALL_ACTIVITY, modelInstance=io.camunda.zeebe.model.bpmn.impl.BpmnModelInstanceImpl@67201002, variables={}]]
io.camunda.zeebe.engine.processing.processinstance.CreateProcessInstanceSupportedElementTest ‑ testProcessInstanceCanStartAtElementType[Scenario[type=END_EVENT, modelInstance=io.camunda.zeebe.model.bpmn.impl.BpmnModelInstanceImpl@6a7b4ff5, variables={}]]
io.camunda.zeebe.engine.processing.processinstance.CreateProcessInstanceSupportedElementTest ‑ testProcessInstanceCanStartAtElementType[Scenario[type=EVENT_BASED_GATEWAY, modelInstance=io.camunda.zeebe.model.bpmn.impl.BpmnModelInstanceImpl@259ae1a9, variables={correlationKey=value}]]
io.camunda.zeebe.engine.processing.processinstance.CreateProcessInstanceSupportedElementTest ‑ testProcessInstanceCanStartAtElementType[Scenario[type=EVENT_SUB_PROCESS, modelInstance=io.camunda.zeebe.model.bpmn.impl.BpmnModelInstanceImpl@5ac35b17, variables={}]]
io.camunda.zeebe.engine.processing.processinstance.CreateProcessInstanceSupportedElementTest ‑ testProcessInstanceCanStartAtElementType[Scenario[type=EXCLUSIVE_GATEWAY, modelInstance=io.camunda.zeebe.model.bpmn.impl.BpmnModelInstanceImpl@4ed492df, variables={}]]
io.camunda.zeebe.engine.processing.processinstance.CreateProcessInstanceSupportedElementTest ‑ testProcessInstanceCanStartAtElementType[Scenario[type=INTERMEDIATE_CATCH_EVENT, modelInstance=io.camunda.zeebe.model.bpmn.impl.BpmnModelInstanceImpl@6eaf6ced, variables={correlationKey=value}]]
…

DmnEvaluationTest If successfully evaluated, the output ‑ Should return a message pack output[6] value={x=1, z=[1, 2, 3], y=true}
io.camunda.zeebe.broker.system.partitions.BrokerDifferentRuntimeDirectoryTest ‑ shouldUseConfiguredRuntimeDirectory
io.camunda.zeebe.engine.processing.bpmn.activity.OutputMappingTest ‑ shouldApplyOutputMapping[0: io.camunda.zeebe.model.bpmn.impl.BpmnModelInstanceImpl@5817b0d1]
io.camunda.zeebe.engine.processing.bpmn.activity.OutputMappingTest ‑ shouldApplyOutputMapping[1: io.camunda.zeebe.model.bpmn.impl.BpmnModelInstanceImpl@6ea1fc9]
io.camunda.zeebe.engine.processing.processinstance.CreateProcessInstanceSupportedElementTest ‑ testProcessInstanceCanStartAtElementType[Scenario[type=BUSINESS_RULE_TASK, modelInstance=io.camunda.zeebe.model.bpmn.impl.BpmnModelInstanceImpl@613bebb9, variables={}]]
io.camunda.zeebe.engine.processing.processinstance.CreateProcessInstanceSupportedElementTest ‑ testProcessInstanceCanStartAtElementType[Scenario[type=CALL_ACTIVITY, modelInstance=io.camunda.zeebe.model.bpmn.impl.BpmnModelInstanceImpl@44bbf5e3, variables={}]]
io.camunda.zeebe.engine.processing.processinstance.CreateProcessInstanceSupportedElementTest ‑ testProcessInstanceCanStartAtElementType[Scenario[type=END_EVENT, modelInstance=io.camunda.zeebe.model.bpmn.impl.BpmnModelInstanceImpl@41843b88, variables={}]]
io.camunda.zeebe.engine.processing.processinstance.CreateProcessInstanceSupportedElementTest ‑ testProcessInstanceCanStartAtElementType[Scenario[type=EVENT_BASED_GATEWAY, modelInstance=io.camunda.zeebe.model.bpmn.impl.BpmnModelInstanceImpl@3b538e2a, variables={correlationKey=value}]]
io.camunda.zeebe.engine.processing.processinstance.CreateProcessInstanceSupportedElementTest ‑ testProcessInstanceCanStartAtElementType[Scenario[type=EVENT_SUB_PROCESS, modelInstance=io.camunda.zeebe.model.bpmn.impl.BpmnModelInstanceImpl@72f4232, variables={}]]
io.camunda.zeebe.engine.processing.processinstance.CreateProcessInstanceSupportedElementTest ‑ testProcessInstanceCanStartAtElementType[Scenario[type=EXCLUSIVE_GATEWAY, modelInstance=io.camunda.zeebe.model.bpmn.impl.BpmnModelInstanceImpl@38d1c7d2, variables={}]]
…

♻️ This comment has been updated with latest results.

deepthidevaki · 2023-02-22T08:11:59Z

...test/java/io/camunda/zeebe/broker/system/partitions/BrokerDifferentRuntimeDirectoryTest.java

+import org.junit.Rule;
+import org.junit.Test;
+
+public class BrokerDifferentRuntimeDirectoryTest {


❓ It seems overkill to start the whole broker to test this feature. But I'm not sure how else to test it. Unit testing PartitionFactory is not enough because we don't know if anywhere else runtime location is assumed to be in data directory. If you have other ideas to test please let me know.

💭

Unit testing PartitionFactory is not enough because we don't know if anywhere else runtime location is assumed to be in data directory

We can't predict how configuration is used everywhere, or if anyone has hard-coded paths, that's true. We hard-code that path in our test utilities, for example. At the same time, I don't know if this is what should prevent us from writing a unit test. The PartitionFactory is the only place where we create the StateController instance that is used to load the state.

I guess we can't easily enforce that - that only there, we create the StateController, and the location of the state is only accessed through that. If we could, would you feel more comfortable unit testing the PartitionFactory?

Couldn't we say the same about the general data directory? I guess you could argue our QA tests ensure that, except since we use default paths, any hard-coded paths would go unnoticed 😅

I think it would be OK to unit test the PartitionFactory to assert only that the state path is what's expected.

I get what you mean though. We could still have a QA which tests the user feature - that is, using a tmpfs volume for the runtime. That's the main usage goal, right? Although verifying that it's used appropriately is still breaking our abstraction.

The test below is also not testing everything anyway. It only checks the directory is not empty, but doesn't really assert that things are used properly, by the right partitions, etc. We can do a quick sync when you're back to make a final decision 👍

npepinpe

👍 Thanks! Feature seems good, I tested it out and it works as expected. Still have to test it with an actual different volume - did you try a manual benchmark with a separate volume mount?

❌ Could you create a documentation issue around this? Something which covers the expected use case (e.g. with a separate in memory volume), a sample for Kubernetes (should we do general or target Helm chart?), and also covering that any configuration change, when the previous directory was a shared volume (e.g. the data one) needs to be cleaned of the old runtime.

npepinpe · 2023-02-24T17:29:39Z

dist/src/main/config/broker.standalone.yaml.template

@@ -191,20 +191,28 @@
      # This section allows to configure Zeebe's data storage. Data is stored in
      # "partition folders". A partition folder has the following structure:
      #
-      # partition-0                       (root partition folder)
-      # ├── partition.json                (metadata about the partition)


💭 Oh wow, this was old 😄

npepinpe · 2023-02-24T17:31:26Z

dist/src/main/config/broker.standalone.yaml.template

+      #    	 └── runtime
+      #          └── yy.sst


Suggested change

# └── runtime

# └── yy.sst

# └── runtime

# └── yy.sst

I guess same below. Super minor 🙃

npepinpe · 2023-02-24T19:03:40Z

...test/java/io/camunda/zeebe/broker/system/partitions/BrokerDifferentRuntimeDirectoryTest.java

+import org.junit.Rule;
+import org.junit.Test;
+
+public class BrokerDifferentRuntimeDirectoryTest {


💭

Unit testing PartitionFactory is not enough because we don't know if anywhere else runtime location is assumed to be in data directory

We can't predict how configuration is used everywhere, or if anyone has hard-coded paths, that's true. We hard-code that path in our test utilities, for example. At the same time, I don't know if this is what should prevent us from writing a unit test. The PartitionFactory is the only place where we create the StateController instance that is used to load the state.

I guess we can't easily enforce that - that only there, we create the StateController, and the location of the state is only accessed through that. If we could, would you feel more comfortable unit testing the PartitionFactory?

Couldn't we say the same about the general data directory? I guess you could argue our QA tests ensure that, except since we use default paths, any hard-coded paths would go unnoticed 😅

I think it would be OK to unit test the PartitionFactory to assert only that the state path is what's expected.

I get what you mean though. We could still have a QA which tests the user feature - that is, using a tmpfs volume for the runtime. That's the main usage goal, right? Although verifying that it's used appropriately is still breaking our abstraction.

The test below is also not testing everything anyway. It only checks the directory is not empty, but doesn't really assert that things are used properly, by the right partitions, etc. We can do a quick sync when you're back to make a final decision 👍

npepinpe · 2023-02-24T19:08:05Z

...test/java/io/camunda/zeebe/broker/system/partitions/BrokerDifferentRuntimeDirectoryTest.java

+import org.junit.Test;
+
+public class BrokerDifferentRuntimeDirectoryTest {
+  private static final String STATE = "state";


❌ Please make this temporary. If I run this locally it will create a state directory in my working directory, i.e. broker/state 😅 So even if this failed, but I'd run the test before at some point, it will pass. Fine for CI I guess, but it will leave garbage on local machines.

With this 98db761, runtime will be determined from the brokerbase. So there is no need to create another temp directory in the test.

npepinpe · 2023-02-24T22:03:47Z

I've approved anyway, I don't see a big blocker, but let's sync when you're back before merging.

deepthidevaki · 2023-02-28T09:35:23Z

Have run a benchmark with runtime mounted to emptyDir.

Testing the impact with growing snapshot size. In the last 10 minutes, only one partition is taking snapshot, and there is almost zero load on the system - a timer start event creates an instance every 1s on partition 1.

deepthidevaki · 2023-02-28T09:38:19Z

broker/src/main/java/io/camunda/zeebe/broker/system/configuration/DataCfg.java

@@ -42,6 +42,9 @@ public final class DataCfg implements ConfigurationEntry {
  @Override
  public void init(final BrokerCfg globalConfig, final String brokerBase) {
    directory = ConfigurationUtil.toAbsolutePath(directory, brokerBase);
+    if (runtimeDirectory != null) {
+      runtimeDirectory = ConfigurationUtil.toAbsolutePath(runtimeDirectory, brokerBase);
+    }


@npepinpe This was added after your review. I think it makes sense to add this to be consistent with the data directory configuration.

Good catch!

deepthidevaki · 2023-02-28T09:47:54Z

Missed to include write i/o. As we see snapshotting can consume more read and write i/o as now it has to copy files to a different disk.

npepinpe · 2023-02-28T09:49:35Z

Would be interesting to see its impact with normal loads. If it pushes write I/O spikes, then I expect it will impact the general system under load. Nothing too unexpected, but good to confirm. Do we have some idea for a threshold at which the impact becomes noticeable? Though I expect these things will always be relative 🤷

deepthidevaki · 2023-02-28T10:00:40Z

Do we have some idea for a threshold at which the impact becomes noticeable?

Not easy to determine. It hits the throughput drop due to growing rocksdb state.

deepthidevaki · 2023-02-28T10:04:20Z

Also, in the normal benchmark workload with no growing state, different runtime disk and snapshotting doesn't have any noticeable impact on throughput.

The potential impact of configuring a different disk is already documented in the config templates.

deepthidevaki · 2023-02-28T10:05:16Z

bors merge

zeebe-bors-camunda · 2023-02-28T10:38:23Z

Build succeeded:

Test summary

deepthidevaki added 2 commits February 21, 2023 14:29

feat(broker): add configuration for seperate runtime directory

878ccc3

This allows users to configure a separate runtime directory. To keep the default behavior, if no runtime directory is configured, it will use the same old location in the data directory.

feat(broker): use configured runtime directory if available

edeb48a

deepthidevaki requested a review from npepinpe February 22, 2023 08:10

deepthidevaki marked this pull request as ready for review February 22, 2023 08:10

deepthidevaki commented Feb 22, 2023

View reviewed changes

deepthidevaki force-pushed the dd-6044-state-directory branch from d1b74ce to 5362f47 Compare February 22, 2023 08:13

npepinpe approved these changes Feb 24, 2023

View reviewed changes

deepthidevaki added 3 commits February 27, 2023 17:40

docs(dist): add runtime directory to configuration templates

ec567c0

fix(broker): initialize runtime directory from brokerBase

98db761

test(broker): verify configured runtime directory is used

3302e73

deepthidevaki force-pushed the dd-6044-state-directory branch from 5362f47 to 3302e73 Compare February 27, 2023 16:40

deepthidevaki commented Feb 28, 2023

View reviewed changes

docs(broker): fix spacing

575e302

zeebe-bors-camunda bot merged commit 39eefa1 into main Feb 28, 2023

zeebe-bors-camunda bot deleted the dd-6044-state-directory branch February 28, 2023 10:38

Zelldon mentioned this pull request Apr 3, 2023

[EPIC] Support stable performance for new instances even on larger state #12033

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow configuring runtime directory #11772

Allow configuring runtime directory #11772

deepthidevaki commented Feb 21, 2023 •

edited

github-actions bot commented Feb 21, 2023 •

edited

deepthidevaki Feb 22, 2023

npepinpe Feb 24, 2023

npepinpe left a comment

npepinpe Feb 24, 2023

npepinpe Feb 24, 2023

npepinpe Feb 24, 2023

npepinpe Feb 24, 2023

npepinpe Feb 24, 2023

deepthidevaki Feb 27, 2023

deepthidevaki Feb 28, 2023

npepinpe commented Feb 24, 2023

deepthidevaki commented Feb 28, 2023

deepthidevaki Feb 28, 2023

npepinpe Feb 28, 2023

deepthidevaki commented Feb 28, 2023

npepinpe commented Feb 28, 2023 •

edited

deepthidevaki commented Feb 28, 2023 •

edited

deepthidevaki commented Feb 28, 2023 •

edited

deepthidevaki commented Feb 28, 2023

zeebe-bors-camunda bot commented Feb 28, 2023

Allow configuring runtime directory #11772

Allow configuring runtime directory #11772

Conversation

deepthidevaki commented Feb 21, 2023 • edited

Description

Related issues

github-actions bot commented Feb 21, 2023 • edited

Test Results

Choose a reason for hiding this comment

Choose a reason for hiding this comment

npepinpe left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

npepinpe commented Feb 24, 2023

deepthidevaki commented Feb 28, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

deepthidevaki commented Feb 28, 2023

npepinpe commented Feb 28, 2023 • edited

deepthidevaki commented Feb 28, 2023 • edited

deepthidevaki commented Feb 28, 2023 • edited

deepthidevaki commented Feb 28, 2023

zeebe-bors-camunda bot commented Feb 28, 2023

deepthidevaki commented Feb 21, 2023 •

edited

github-actions bot commented Feb 21, 2023 •

edited

npepinpe commented Feb 28, 2023 •

edited

deepthidevaki commented Feb 28, 2023 •

edited

deepthidevaki commented Feb 28, 2023 •

edited