Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deployment distribution via InterPartitionCommandSender #9858

Merged

Conversation

lenaschoenburg
Copy link
Member

@lenaschoenburg lenaschoenburg commented Jul 21, 2022

Description

  • Uses the InterPartitionCommandSender for deployment distribution.
  • There are no changes to the overall distribution mechanism, the same records as before are written.
  • Instead of retrying redistribution on failed network requests and on restore, the DeploymentRedistributor periodically checks for pending deployments and retries them, similar to what's being done for message subscriptions.

Related issues

closes #9625

Definition of Done

Not all items need to be done depending on the issue and the pull request.

Code changes:

  • The changes are backwards compatibility with previous versions
  • If it fixes a bug then PRs are created to backport the fix to the last two minor versions. You can trigger a backport by assigning labels (e.g. backport stable/1.3) to the PR, in case that fails you need to create backports manually.

Testing:

  • There are unit/integration tests that verify all acceptance criterias of the issue
  • New tests are written to ensure backwards compatibility with further versions
  • The behavior is tested manually
  • The change has been verified by a QA run
  • The impact of the changes is verified by a benchmark

Documentation:

  • The documentation is updated (e.g. BPMN reference, configuration, examples, get-started guides, etc.)
  • New content is added to the release announcement
  • If the PR changes how BPMN processes are validated (e.g. support new BPMN element) then the Camunda modeling team should be informed to adjust the BPMN linting.

Please refer to our review guidelines.

Comment on lines +31 to +36
sender.sendCommand(
receiverPartitionId,
ValueType.DEPLOYMENT,
DeploymentIntent.DISTRIBUTE,
key,
deploymentRecord);

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation

Invoking [InterPartitionCommandSender.sendCommand](1) should be avoided because it has been deprecated.
Comment on lines +42 to +47
sender.sendCommand(
DEPLOYMENT_PARTITION,
ValueType.DEPLOYMENT_DISTRIBUTION,
DeploymentDistributionIntent.COMPLETE,
deploymentKey,
distributionRecord);

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation

Invoking [InterPartitionCommandSender.sendCommand](1) should be avoided because it has been deprecated.
@github-actions
Copy link
Contributor

github-actions bot commented Jul 21, 2022

Unit Test Results

   793 files  ±    0     793 suites  ±0   1h 39m 51s ⏱️ + 1m 40s
6 365 tests +281  6 355 ✔️ +280  9 💤 ±0  1 +1 
6 526 runs  +273  6 516 ✔️ +272  9 💤 ±0  1 +1 

For more details on these failures, see this check.

Results for commit 0d1178b. ± Comparison against base commit 6cc54ae.

♻️ This comment has been updated with latest results.

@lenaschoenburg lenaschoenburg force-pushed the 9625-deployment-over-inter-partition-command-sender branch from 4425e00 to 60eec67 Compare July 21, 2022 14:30
@lenaschoenburg lenaschoenburg marked this pull request as ready for review July 21, 2022 15:16
@remcowesterhoud remcowesterhoud self-requested a review July 21, 2022 15:47
@lenaschoenburg lenaschoenburg force-pushed the 9625-deployment-over-inter-partition-command-sender branch from 60eec67 to 3ff5b03 Compare July 25, 2022 06:27
Copy link
Contributor

@remcowesterhoud remcowesterhoud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @oleschoenburg! Very nice cleanup, it's a lot simpler this way 🧹

❓Is it fair to deprecate the sendCommand method that takes a recordKey as parameter? From the discussion we had about it I don't think this will be changed anytime soon. Seems a bit strange to have it be deprecated for removal to me.

🔧 Please update https://github.com/camunda/zeebe/blob/main/docs/assets/deployment_distribution.png to match the new flow. Would be a shame if it's outdated within 3 weeks 😄 The .bpmn file of the model is also stored in the repo.

Comment on lines +540 to +514
for (int i = PARTITION_ID; i < PARTITION_ID + partitionCount; i++) {
writers.put(i, environmentRule.newLogStreamRecordWriter(i));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ Is it safe to assume that multiple partitions will always be PARTITION_ID + n?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Yes, this is correct and safe to assume. EngineRule wraps StreamProcessorRule which creates the log streams here:

https://github.com/camunda/zeebe/blob/fe9021349a640362452b801bba8974266883a4c0/engine/src/test/java/io/camunda/zeebe/engine/util/StreamProcessorRule.java#L339-L345

So yes, the partition ids are continous, starting from PARTITION_ID == DEPLOYMENT_PARTITION == 1

Comment on lines 37 to 47
final var schedulingService = context.getScheduleService();
schedulingService.runAtFixedRate(
DEPLOYMENT_REDISTRIBUTION_INTERVAL,
() ->
deploymentState.foreachPendingDeploymentDistribution(
deploymentDistributionBehavior::distributeDeploymentToPartition));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💭 I wonder if it's worth keeping the pending deployments cached in memory, filling it once on startup, instead of having to query every 10 seconds.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this is fine - usually there are 0 pending deployments so querying should be fast and in-memory caching adds a risk that we don't invalidate/populate the cache properly while adding little value. Let's let RocksDB do the caching for us :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I do wonder if we shouldn't increase the interval from 10 seconds to something like 30. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the interval before? I would keep it the same

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was retried when the network request wasn't answered within 15 seconds. That's a bit different to the current behavior though, because now we actually need to process the record on the receiving partition before even attempting to acknowledge the distribution. I think that's a good argument to increase this slightly to 30 seconds as buffer for partitions that are processing slowly.

Now that I think about it: We should really use a backoff here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@remcowesterhoud I've added one more commit that implements a simple exponential backoff scheme.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did consider adding something like resilience4j or exposing methods in SchedulingService that make use of our RetryStrategys but I thought that a simple exponential backoff is simple enough to just implement ad-hoc. Let me know if you think otherwise 🙇

@lenaschoenburg
Copy link
Member Author

❓ Is it fair to deprecate the sendCommand method that takes a recordKey as parameter? From the discussion we had about it I don't think this will be changed anytime soon. Seems a bit strange to have it be deprecated for removal to me.

Good point. I'd prefer to leave it as is. It discourages and warns us about new usage of that method. The query API was also implemented and immediately marked as deprecated.

🔧 Please update https://github.com/camunda/zeebe/blob/main/docs/assets/deployment_distribution.png to match the new flow. Would be a shame if it's outdated within 3 weeks smile The .bpmn file of the model is also stored in the repo.

Good point 👍
I'll update the model and then rebase once again to fix merge conflicts.

@lenaschoenburg lenaschoenburg force-pushed the 9625-deployment-over-inter-partition-command-sender branch from 3ff5b03 to f302156 Compare July 25, 2022 10:28
Copy link
Contributor

@remcowesterhoud remcowesterhoud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔧 The new model reads wrong. Now it looks like the Deployment Distribute command is written on partition 2. If I understand the change correctly partition 1 will write this on partition 2, sort of. I think we should get rid of the Deployment Distribute and the Deployment Distribution Complete tasks and just add these as name to the message receive events.

@lenaschoenburg
Copy link
Member Author

Now it looks like the Deployment Distribute command is written on partition 2. If I understand the change correctly partition 1 will write this on partition 2, sort of

Well, yes and no 😅. The Deployment Distribute command is written on partition 2. Partition 1 just prepares this record and sends it to partition 2 via InterPartitionCommandSender. I'd argue that the new diagram is correct like this, partition 2 accepts a message (the inter partition message), then writes a command to it's log and sends a message back to partition 1.

We could of course try and hide the messaging and say that conceptually partition 1 directly writes a command on partition 2. I just think that this would hide important information that explains why, for example, distribution has a retry loop: because sending/receiving can fail whereas we usually treat writing to the log as infallible.

Does that make sense to you? It's your (PAT) docs after all, so whatever makes sense to you 👍

@remcowesterhoud
Copy link
Contributor

That's why I wrote the sort of 😄 It does make sense to me so I'm happy with it either way. Thanks for updating!

Since we have changed the behavior slightly and now only acknowledge
the distribution after processing the command on the receiver partition,
we need to make sure that we don't overload slow partitions by retrying
distribution too frequently.

Here we keep track of the retry count to implement a simple exponential
backoff, statically configured to start of at 10 seconds until it
reaches a maximum of 5 minutes, doubling every time.

Since we keep the retry counter in-memory, the backoff is reset when
the deployment partition is recovered. Whether that is good behavior
depends on the specific reason why a distribution is pending. For
network issues this might be beneficial as a new leader for the
deployment partition might not have the same network issues. If the
receiving partition is unavailable or lagging, restarting with no
backoff is bad but acceptable.
@lenaschoenburg
Copy link
Member Author

Thanks @remcowesterhoud, that was a very productive review! 🙇

bors r+

@zeebe-bors-camunda
Copy link
Contributor

Build succeeded:

@zeebe-bors-camunda zeebe-bors-camunda bot merged commit ec7ea28 into main Jul 25, 2022
@zeebe-bors-camunda zeebe-bors-camunda bot deleted the 9625-deployment-over-inter-partition-command-sender branch July 25, 2022 16:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refactor inter-partition communication to use a common sender and receiver
2 participants