-
Notifications
You must be signed in to change notification settings - Fork 569
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deployment distribution via InterPartitionCommandSender
#9858
Deployment distribution via InterPartitionCommandSender
#9858
Conversation
sender.sendCommand( | ||
receiverPartitionId, | ||
ValueType.DEPLOYMENT, | ||
DeploymentIntent.DISTRIBUTE, | ||
key, | ||
deploymentRecord); |
Check notice
Code scanning / CodeQL
Deprecated method or constructor invocation
sender.sendCommand( | ||
DEPLOYMENT_PARTITION, | ||
ValueType.DEPLOYMENT_DISTRIBUTION, | ||
DeploymentDistributionIntent.COMPLETE, | ||
deploymentKey, | ||
distributionRecord); |
Check notice
Code scanning / CodeQL
Deprecated method or constructor invocation
Unit Test Results 793 files ± 0 793 suites ±0 1h 39m 51s ⏱️ + 1m 40s For more details on these failures, see this check. Results for commit 0d1178b. ± Comparison against base commit 6cc54ae. ♻️ This comment has been updated with latest results. |
4425e00
to
60eec67
Compare
60eec67
to
3ff5b03
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work @oleschoenburg! Very nice cleanup, it's a lot simpler this way 🧹
❓Is it fair to deprecate the sendCommand
method that takes a recordKey
as parameter? From the discussion we had about it I don't think this will be changed anytime soon. Seems a bit strange to have it be deprecated for removal to me.
🔧 Please update https://github.com/camunda/zeebe/blob/main/docs/assets/deployment_distribution.png to match the new flow. Would be a shame if it's outdated within 3 weeks 😄 The .bpmn file of the model is also stored in the repo.
for (int i = PARTITION_ID; i < PARTITION_ID + partitionCount; i++) { | ||
writers.put(i, environmentRule.newLogStreamRecordWriter(i)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❓ Is it safe to assume that multiple partitions will always be PARTITION_ID
+ n?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Yes, this is correct and safe to assume. EngineRule wraps StreamProcessorRule which creates the log streams here:
So yes, the partition ids are continous, starting from PARTITION_ID == DEPLOYMENT_PARTITION == 1
final var schedulingService = context.getScheduleService(); | ||
schedulingService.runAtFixedRate( | ||
DEPLOYMENT_REDISTRIBUTION_INTERVAL, | ||
() -> | ||
deploymentState.foreachPendingDeploymentDistribution( | ||
deploymentDistributionBehavior::distributeDeploymentToPartition)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💭 I wonder if it's worth keeping the pending deployments cached in memory, filling it once on startup, instead of having to query every 10 seconds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO this is fine - usually there are 0 pending deployments so querying should be fast and in-memory caching adds a risk that we don't invalidate/populate the cache properly while adding little value. Let's let RocksDB do the caching for us :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although I do wonder if we shouldn't increase the interval from 10 seconds to something like 30. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What was the interval before? I would keep it the same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was retried when the network request wasn't answered within 15 seconds. That's a bit different to the current behavior though, because now we actually need to process the record on the receiving partition before even attempting to acknowledge the distribution. I think that's a good argument to increase this slightly to 30 seconds as buffer for partitions that are processing slowly.
Now that I think about it: We should really use a backoff here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@remcowesterhoud I've added one more commit that implements a simple exponential backoff scheme.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did consider adding something like resilience4j or exposing methods in SchedulingService
that make use of our RetryStrategy
s but I thought that a simple exponential backoff is simple enough to just implement ad-hoc. Let me know if you think otherwise 🙇
Good point. I'd prefer to leave it as is. It discourages and warns us about new usage of that method. The query API was also implemented and immediately marked as deprecated.
Good point 👍 |
Previously, the test implementation simply used the writer from the environmentRule which was not thread-safe. Now, we pre-initialize dedicated writers.
3ff5b03
to
f302156
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔧 The new model reads wrong. Now it looks like the Deployment Distribute command is written on partition 2. If I understand the change correctly partition 1 will write this on partition 2, sort of. I think we should get rid of the Deployment Distribute and the Deployment Distribution Complete tasks and just add these as name to the message receive events.
Well, yes and no 😅. The Deployment Distribute command is written on partition 2. Partition 1 just prepares this record and sends it to partition 2 via We could of course try and hide the messaging and say that conceptually partition 1 directly writes a command on partition 2. I just think that this would hide important information that explains why, for example, distribution has a retry loop: because sending/receiving can fail whereas we usually treat writing to the log as infallible. Does that make sense to you? It's your (PAT) docs after all, so whatever makes sense to you 👍 |
That's why I wrote the sort of 😄 It does make sense to me so I'm happy with it either way. Thanks for updating! |
Since we have changed the behavior slightly and now only acknowledge the distribution after processing the command on the receiver partition, we need to make sure that we don't overload slow partitions by retrying distribution too frequently. Here we keep track of the retry count to implement a simple exponential backoff, statically configured to start of at 10 seconds until it reaches a maximum of 5 minutes, doubling every time. Since we keep the retry counter in-memory, the backoff is reset when the deployment partition is recovered. Whether that is good behavior depends on the specific reason why a distribution is pending. For network issues this might be beneficial as a new leader for the deployment partition might not have the same network issues. If the receiving partition is unavailable or lagging, restarting with no backoff is bad but acceptable.
...n/java/io/camunda/zeebe/engine/processing/deployment/distribute/DeploymentRedistributor.java
Outdated
Show resolved
Hide resolved
...n/java/io/camunda/zeebe/engine/processing/deployment/distribute/DeploymentRedistributor.java
Outdated
Show resolved
Hide resolved
...n/java/io/camunda/zeebe/engine/processing/deployment/distribute/DeploymentRedistributor.java
Outdated
Show resolved
Hide resolved
Thanks @remcowesterhoud, that was a very productive review! 🙇 bors r+ |
Build succeeded: |
Description
InterPartitionCommandSender
for deployment distribution.DeploymentRedistributor
periodically checks for pending deployments and retries them, similar to what's being done for message subscriptions.Related issues
closes #9625
Definition of Done
Not all items need to be done depending on the issue and the pull request.
Code changes:
backport stable/1.3
) to the PR, in case that fails you need to create backports manually.Testing:
Documentation:
Please refer to our review guidelines.