Deployment distribution via `InterPartitionCommandSender` #9858

lenaschoenburg · 2022-07-21T13:21:02Z

Description

Uses the InterPartitionCommandSender for deployment distribution.
There are no changes to the overall distribution mechanism, the same records as before are written.
Instead of retrying redistribution on failed network requests and on restore, the DeploymentRedistributor periodically checks for pending deployments and retries them, similar to what's being done for message subscriptions.

Related issues

closes #9625

Definition of Done

Not all items need to be done depending on the issue and the pull request.

Code changes:

The changes are backwards compatibility with previous versions
If it fixes a bug then PRs are created to backport the fix to the last two minor versions. You can trigger a backport by assigning labels (e.g. backport stable/1.3) to the PR, in case that fails you need to create backports manually.

Testing:

There are unit/integration tests that verify all acceptance criterias of the issue
New tests are written to ensure backwards compatibility with further versions
The behavior is tested manually
The change has been verified by a QA run
The impact of the changes is verified by a benchmark

Documentation:

The documentation is updated (e.g. BPMN reference, configuration, examples, get-started guides, etc.)
New content is added to the release announcement
If the PR changes how BPMN processes are validated (e.g. support new BPMN element) then the Camunda modeling team should be informed to adjust the BPMN linting.

Please refer to our review guidelines.

...munda/zeebe/engine/processing/deployment/distribute/DeploymentDistributionCommandSender.java

+    sender.sendCommand(
+        receiverPartitionId,
+        ValueType.DEPLOYMENT,
+        DeploymentIntent.DISTRIBUTE,
+        key,
+        deploymentRecord);


...munda/zeebe/engine/processing/deployment/distribute/DeploymentDistributionCommandSender.java

+    sender.sendCommand(
+        DEPLOYMENT_PARTITION,
+        ValueType.DEPLOYMENT_DISTRIBUTION,
+        DeploymentDistributionIntent.COMPLETE,
+        deploymentKey,
+        distributionRecord);


github-actions · 2022-07-21T13:38:43Z

Unit Test Results

  793 files ±    0   793 suites ±0 1h 39m 51s ⏱️ + 1m 40s
6 365 tests +281 6 355 ✔️ +280 9 💤 ±0 1 ❌ +1
6 526 runs +273 6 516 ✔️ +272 9 💤 ±0 1 ❌ +1

For more details on these failures, see this check.

Results for commit 0d1178b. ± Comparison against base commit 6cc54ae.

♻️ This comment has been updated with latest results.

remcowesterhoud

Great work @oleschoenburg! Very nice cleanup, it's a lot simpler this way 🧹

❓Is it fair to deprecate the sendCommand method that takes a recordKey as parameter? From the discussion we had about it I don't think this will be changed anytime soon. Seems a bit strange to have it be deprecated for removal to me.

🔧 Please update https://github.com/camunda/zeebe/blob/main/docs/assets/deployment_distribution.png to match the new flow. Would be a shame if it's outdated within 3 weeks 😄 The .bpmn file of the model is also stored in the repo.

remcowesterhoud · 2022-07-25T08:37:29Z

engine/src/test/java/io/camunda/zeebe/engine/util/EngineRule.java

+      for (int i = PARTITION_ID; i < PARTITION_ID + partitionCount; i++) {
+        writers.put(i, environmentRule.newLogStreamRecordWriter(i));
+      }


❓ Is it safe to assume that multiple partitions will always be PARTITION_ID + n?

Good point. Yes, this is correct and safe to assume. EngineRule wraps StreamProcessorRule which creates the log streams here:

https://github.com/camunda/zeebe/blob/fe9021349a640362452b801bba8974266883a4c0/engine/src/test/java/io/camunda/zeebe/engine/util/StreamProcessorRule.java#L339-L345

So yes, the partition ids are continous, starting from PARTITION_ID == DEPLOYMENT_PARTITION == 1

remcowesterhoud · 2022-07-25T09:27:55Z

...n/java/io/camunda/zeebe/engine/processing/deployment/distribute/DeploymentRedistributor.java

+    final var schedulingService = context.getScheduleService();
+    schedulingService.runAtFixedRate(
+        DEPLOYMENT_REDISTRIBUTION_INTERVAL,
+        () ->
+            deploymentState.foreachPendingDeploymentDistribution(
+                deploymentDistributionBehavior::distributeDeploymentToPartition));


💭 I wonder if it's worth keeping the pending deployments cached in memory, filling it once on startup, instead of having to query every 10 seconds.

IMO this is fine - usually there are 0 pending deployments so querying should be fast and in-memory caching adds a risk that we don't invalidate/populate the cache properly while adding little value. Let's let RocksDB do the caching for us :)

Although I do wonder if we shouldn't increase the interval from 10 seconds to something like 30. WDYT?

What was the interval before? I would keep it the same

It was retried when the network request wasn't answered within 15 seconds. That's a bit different to the current behavior though, because now we actually need to process the record on the receiving partition before even attempting to acknowledge the distribution. I think that's a good argument to increase this slightly to 30 seconds as buffer for partitions that are processing slowly.

Now that I think about it: We should really use a backoff here.

@remcowesterhoud I've added one more commit that implements a simple exponential backoff scheme.

I did consider adding something like resilience4j or exposing methods in SchedulingService that make use of our RetryStrategys but I thought that a simple exponential backoff is simple enough to just implement ad-hoc. Let me know if you think otherwise 🙇

lenaschoenburg · 2022-07-25T09:50:16Z

❓ Is it fair to deprecate the sendCommand method that takes a recordKey as parameter? From the discussion we had about it I don't think this will be changed anytime soon. Seems a bit strange to have it be deprecated for removal to me.

Good point. I'd prefer to leave it as is. It discourages and warns us about new usage of that method. The query API was also implemented and immediately marked as deprecated.

🔧 Please update https://github.com/camunda/zeebe/blob/main/docs/assets/deployment_distribution.png to match the new flow. Would be a shame if it's outdated within 3 weeks smile The .bpmn file of the model is also stored in the repo.

Good point 👍
I'll update the model and then rebase once again to fix merge conflicts.

Previously, the test implementation simply used the writer from the environmentRule which was not thread-safe. Now, we pre-initialize dedicated writers.

remcowesterhoud

🔧 The new model reads wrong. Now it looks like the Deployment Distribute command is written on partition 2. If I understand the change correctly partition 1 will write this on partition 2, sort of. I think we should get rid of the Deployment Distribute and the Deployment Distribution Complete tasks and just add these as name to the message receive events.

lenaschoenburg · 2022-07-25T11:30:50Z

Now it looks like the Deployment Distribute command is written on partition 2. If I understand the change correctly partition 1 will write this on partition 2, sort of

Well, yes and no 😅. The Deployment Distribute command is written on partition 2. Partition 1 just prepares this record and sends it to partition 2 via InterPartitionCommandSender. I'd argue that the new diagram is correct like this, partition 2 accepts a message (the inter partition message), then writes a command to it's log and sends a message back to partition 1.

We could of course try and hide the messaging and say that conceptually partition 1 directly writes a command on partition 2. I just think that this would hide important information that explains why, for example, distribution has a retry loop: because sending/receiving can fail whereas we usually treat writing to the log as infallible.

Does that make sense to you? It's your (PAT) docs after all, so whatever makes sense to you 👍

remcowesterhoud · 2022-07-25T12:03:20Z

That's why I wrote the sort of 😄 It does make sense to me so I'm happy with it either way. Thanks for updating!

Since we have changed the behavior slightly and now only acknowledge the distribution after processing the command on the receiver partition, we need to make sure that we don't overload slow partitions by retrying distribution too frequently. Here we keep track of the retry count to implement a simple exponential backoff, statically configured to start of at 10 seconds until it reaches a maximum of 5 minutes, doubling every time. Since we keep the retry counter in-memory, the backoff is reset when the deployment partition is recovered. Whether that is good behavior depends on the specific reason why a distribution is pending. For network issues this might be beneficial as a new leader for the deployment partition might not have the same network issues. If the receiving partition is unavailable or lagging, restarting with no backoff is bad but acceptable.

...n/java/io/camunda/zeebe/engine/processing/deployment/distribute/DeploymentRedistributor.java

lenaschoenburg · 2022-07-25T15:49:05Z

Thanks @remcowesterhoud, that was a very productive review! 🙇

bors r+

zeebe-bors-camunda · 2022-07-25T16:05:28Z

Build succeeded:

github-advanced-security bot found potential problems Jul 21, 2022

View reviewed changes

lenaschoenburg force-pushed the 9625-deployment-over-inter-partition-command-sender branch from 4425e00 to 60eec67 Compare July 21, 2022 14:30

lenaschoenburg marked this pull request as ready for review July 21, 2022 15:16

remcowesterhoud self-requested a review July 21, 2022 15:47

lenaschoenburg force-pushed the 9625-deployment-over-inter-partition-command-sender branch from 60eec67 to 3ff5b03 Compare July 25, 2022 06:27

remcowesterhoud approved these changes Jul 25, 2022

View reviewed changes

lenaschoenburg added 8 commits July 25, 2022 12:06

fix: dedicated log stream writers for TestInterPartitionCommandSender

0af91c4

Previously, the test implementation simply used the writer from the environmentRule which was not thread-safe. Now, we pre-initialize dedicated writers.

feat: add a deployment distribution command sender

76fa888

refactor: directly send DEPLOYMENT/DISTRIBUTE

8834d5e

refactor: directly send DEPLOYMENT_DISTRIBUTION/COMPLETE

9467a38

refactor: remove old deployment distribution

550189e

fix: retry distribution in fixed intervals

a58da62

fix: run DeploymentRedistributor on deployment partition only

c3d0eb5

docs: update deployment distribution model

f302156

lenaschoenburg force-pushed the 9625-deployment-over-inter-partition-command-sender branch from 3ff5b03 to f302156 Compare July 25, 2022 10:28

lenaschoenburg requested a review from remcowesterhoud July 25, 2022 10:33

remcowesterhoud approved these changes Jul 25, 2022

View reviewed changes

lenaschoenburg added 2 commits July 25, 2022 16:04

refactor: fix naming of pending distribution parameter

1d31c67

lenaschoenburg requested a review from remcowesterhoud July 25, 2022 14:08

npepinpe mentioned this pull request Jul 25, 2022

Deployment Distribution not idempotent #9877

Closed

remcowesterhoud approved these changes Jul 25, 2022

View reviewed changes

lenaschoenburg added 2 commits July 25, 2022 17:28

refactor: redistribution directly sends commands without behavior class

ef859a3

refactor: improve naming of backoff counter

0d1178b

zeebe-bors-camunda bot merged commit ec7ea28 into main Jul 25, 2022

zeebe-bors-camunda bot deleted the 9625-deployment-over-inter-partition-command-sender branch July 25, 2022 16:05

deepthidevaki mentioned this pull request Aug 30, 2022

Deployment response subscription is not closed after a successful response. #8524

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deployment distribution via `InterPartitionCommandSender` #9858

Deployment distribution via `InterPartitionCommandSender` #9858

lenaschoenburg commented Jul 21, 2022 •

edited

Loading

github-actions bot commented Jul 21, 2022 •

edited

Loading

remcowesterhoud left a comment •

edited

Loading

remcowesterhoud Jul 25, 2022

lenaschoenburg Jul 25, 2022

remcowesterhoud Jul 25, 2022

lenaschoenburg Jul 25, 2022

lenaschoenburg Jul 25, 2022

remcowesterhoud Jul 25, 2022

lenaschoenburg Jul 25, 2022

lenaschoenburg Jul 25, 2022

lenaschoenburg Jul 25, 2022

lenaschoenburg commented Jul 25, 2022

remcowesterhoud left a comment

lenaschoenburg commented Jul 25, 2022

remcowesterhoud commented Jul 25, 2022

lenaschoenburg commented Jul 25, 2022

zeebe-bors-camunda bot commented Jul 25, 2022

Deployment distribution via InterPartitionCommandSender #9858

Deployment distribution via InterPartitionCommandSender #9858

Conversation

lenaschoenburg commented Jul 21, 2022 • edited Loading

Description

Related issues

Definition of Done

github-actions bot commented Jul 21, 2022 • edited Loading

Unit Test Results

remcowesterhoud left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lenaschoenburg commented Jul 25, 2022

remcowesterhoud left a comment

Choose a reason for hiding this comment

lenaschoenburg commented Jul 25, 2022

remcowesterhoud commented Jul 25, 2022

lenaschoenburg commented Jul 25, 2022

zeebe-bors-camunda bot commented Jul 25, 2022

Deployment distribution via `InterPartitionCommandSender` #9858

Deployment distribution via `InterPartitionCommandSender` #9858

lenaschoenburg commented Jul 21, 2022 •

edited

Loading

github-actions bot commented Jul 21, 2022 •

edited

Loading

remcowesterhoud left a comment •

edited

Loading