-
Notifications
You must be signed in to change notification settings - Fork 608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix broken job stream aggregation across stream restarts #17545
Conversation
See #17546 as follow up to improve aggregation/retry logic. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix! Pre-approving assuming you remove the stdout print :D
I wonder if we could actually make the assignment of tenantIds a bit simpler, but we could also follow-up on that within ZPA.
dist/src/main/java/io/camunda/zeebe/shared/management/JobStreamEndpoint.java
Show resolved
Hide resolved
...e/clients/java/src/main/java/io/camunda/zeebe/client/impl/command/StreamJobsCommandImpl.java
Outdated
Show resolved
Hide resolved
.../qa/integration-tests/src/test/java/io/camunda/zeebe/it/clustering/JobStreamLifecycleIT.java
Show resolved
Hide resolved
@@ -114,7 +114,7 @@ public FinalCommandStep<ActivateJobsResponse> requestTimeout(final Duration requ | |||
|
|||
@Override | |||
public ZeebeFuture<ActivateJobsResponse> send() { | |||
|
|||
builder.clearTenantIds(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💭 I wonder if we could get rid of the complexity to juggle between two source sets, so the #tenantIds
method could just always directly update the builder field, like for variables? The default value we could just initialise in the ctor, wdyt? Also happy to extract that to a follow-up though to not block the fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's a good point. Unfortunately we can't really enforce this in the future anyway, so I was hoping to spark some discussion with the ZPA team on this. But for now we can also just do it in the setter 👍
IIRC, the idea is that if there's no tenants when you call send (for example, the user called tenantIds(Collections.emptyList())
), we want to send the default tenant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I took a crack at it, and the downside of using the builder directly is that you need to check every time for duplicate tenant IDs. By having two sets, you always guarantee the uniqueness of each tenant ID in the final list - if you use the builder directly, it's just a list, so you have to make sure they're unique manually.
So the upside is no mutability in the send method, but the downside is more complex setters - enforce uniqueness, deal with overwriting the default tenant ID on the first custom tenant ID, etc. 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Take a look at the last commit for a working solution, but honestly I'm not sure if it's much better 🤷 I think we trade complexity in the send vs complexity in the setters (and their interactions).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would lean towards reverting the last commit, I think it's safer to handle clearing and so on in a single place, even if it's unfortunately the send command =/ wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, let's revert
8f9a230
to
3dbfc12
Compare
3dbfc12
to
8022a50
Compare
Backport failed for Please cherry-pick the changes locally and resolve any conflicts. git fetch origin stable/8.3
git worktree add -d .worktree/backport-17545-to-stable/8.3 origin/stable/8.3
cd .worktree/backport-17545-to-stable/8.3
git switch --create backport-17545-to-stable/8.3
git cherry-pick -x c43b6823662bd6bef247b0707865483ca1f0dec2 ba0e04a8a5ffaa78924d75ed1e2ab7d45ee3c209 0a9702fb8f0f5add27d0f8ca9743c263d2af3da6 8022a50643088b7572a0de96bca1314e44e740df |
Backport failed for Please cherry-pick the changes locally and resolve any conflicts. git fetch origin stable/8.4
git worktree add -d .worktree/backport-17545-to-stable/8.4 origin/stable/8.4
cd .worktree/backport-17545-to-stable/8.4
git switch --create backport-17545-to-stable/8.4
git cherry-pick -x c43b6823662bd6bef247b0707865483ca1f0dec2 ba0e04a8a5ffaa78924d75ed1e2ab7d45ee3c209 0a9702fb8f0f5add27d0f8ca9743c263d2af3da6 8022a50643088b7572a0de96bca1314e44e740df |
Backport failed for Please cherry-pick the changes locally and resolve any conflicts. git fetch origin stable/8.3
git worktree add -d .worktree/backport-17545-to-stable/8.3 origin/stable/8.3
cd .worktree/backport-17545-to-stable/8.3
git switch --create backport-17545-to-stable/8.3
git cherry-pick -x c43b6823662bd6bef247b0707865483ca1f0dec2 ba0e04a8a5ffaa78924d75ed1e2ab7d45ee3c209 0a9702fb8f0f5add27d0f8ca9743c263d2af3da6 8022a50643088b7572a0de96bca1314e44e740df |
Backport failed for Please cherry-pick the changes locally and resolve any conflicts. git fetch origin stable/8.4
git worktree add -d .worktree/backport-17545-to-stable/8.4 origin/stable/8.4
cd .worktree/backport-17545-to-stable/8.4
git switch --create backport-17545-to-stable/8.4
git cherry-pick -x c43b6823662bd6bef247b0707865483ca1f0dec2 ba0e04a8a5ffaa78924d75ed1e2ab7d45ee3c209 0a9702fb8f0f5add27d0f8ca9743c263d2af3da6 8022a50643088b7572a0de96bca1314e44e740df |
Successfully created backport PR for |
Git push to origin failed for stable/8.5 with exitcode 1 |
Description
Adding tenants to the and job activation and streaming commands can end up accumulating the same tenant IDs over and over if the command builder is reuse to send new commands. This is because the commands keep adding the tenant IDs to the underlying gRPC request, without ever clearing them. So each successive
send()
call would end up accumulating IDs.Say you use only the default tenant ID (
<default>
), then on the first send it would be[<default>]
. On the second,[<default>, <default>]
, and so on.For job streaming, this would break the aggregation as the tenant IDs are part of the stream aggregation.
This highlights that the aggregation for job streaming is not ideal - for example, if one stream is tenants
foo, bar
, and the otherfoo
, then technically any job of typefoo
can go to either of them. This is out of scope for now, but I will open another issue to improve the aggregation.Related issues
closes #17513