Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix broken job stream aggregation across stream restarts #17545

Merged
merged 4 commits into from
Apr 18, 2024

Conversation

npepinpe
Copy link
Member

Description

Adding tenants to the and job activation and streaming commands can end up accumulating the same tenant IDs over and over if the command builder is reuse to send new commands. This is because the commands keep adding the tenant IDs to the underlying gRPC request, without ever clearing them. So each successive send() call would end up accumulating IDs.

Say you use only the default tenant ID (<default>), then on the first send it would be [<default>]. On the second, [<default>, <default>], and so on.

For job streaming, this would break the aggregation as the tenant IDs are part of the stream aggregation.

This highlights that the aggregation for job streaming is not ideal - for example, if one stream is tenants foo, bar, and the other foo, then technically any job of type foo can go to either of them. This is out of scope for now, but I will open another issue to improve the aggregation.

Related issues

closes #17513

@github-actions github-actions bot added the component/zeebe Related to the Zeebe component/team label Apr 17, 2024
@npepinpe
Copy link
Member Author

See #17546 as follow up to improve aggregation/retry logic.

Copy link
Contributor

@megglos megglos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix! Pre-approving assuming you remove the stdout print :D

I wonder if we could actually make the assignment of tenantIds a bit simpler, but we could also follow-up on that within ZPA.

@@ -114,7 +114,7 @@ public FinalCommandStep<ActivateJobsResponse> requestTimeout(final Duration requ

@Override
public ZeebeFuture<ActivateJobsResponse> send() {

builder.clearTenantIds();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💭 I wonder if we could get rid of the complexity to juggle between two source sets, so the #tenantIds method could just always directly update the builder field, like for variables? The default value we could just initialise in the ctor, wdyt? Also happy to extract that to a follow-up though to not block the fix.

Copy link
Member Author

@npepinpe npepinpe Apr 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's a good point. Unfortunately we can't really enforce this in the future anyway, so I was hoping to spark some discussion with the ZPA team on this. But for now we can also just do it in the setter 👍

IIRC, the idea is that if there's no tenants when you call send (for example, the user called tenantIds(Collections.emptyList())), we want to send the default tenant.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I took a crack at it, and the downside of using the builder directly is that you need to check every time for duplicate tenant IDs. By having two sets, you always guarantee the uniqueness of each tenant ID in the final list - if you use the builder directly, it's just a list, so you have to make sure they're unique manually.

So the upside is no mutability in the send method, but the downside is more complex setters - enforce uniqueness, deal with overwriting the default tenant ID on the first custom tenant ID, etc. 🤷

Copy link
Member Author

@npepinpe npepinpe Apr 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take a look at the last commit for a working solution, but honestly I'm not sure if it's much better 🤷 I think we trade complexity in the send vs complexity in the setters (and their interactions).

8f9a230

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would lean towards reverting the last commit, I think it's safer to handle clearing and so on in a single place, even if it's unfortunately the send command =/ wdyt?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, let's revert

@npepinpe npepinpe force-pushed the np-17513-duplicate-tenantIds branch from 8f9a230 to 3dbfc12 Compare April 18, 2024 08:41
@npepinpe npepinpe added backport stable/8.3 Backport a pull request to 8.3.x backport stable/8.4 Backport a pull request to 8.4.x backport stable/8.5 Backport a pull request to stable/8.5 labels Apr 18, 2024
@npepinpe npepinpe force-pushed the np-17513-duplicate-tenantIds branch from 3dbfc12 to 8022a50 Compare April 18, 2024 11:55
@npepinpe npepinpe enabled auto-merge April 18, 2024 11:55
@npepinpe npepinpe added this pull request to the merge queue Apr 18, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Apr 18, 2024
@npepinpe npepinpe added this pull request to the merge queue Apr 18, 2024
Merged via the queue into main with commit 3edae67 Apr 18, 2024
71 checks passed
@npepinpe npepinpe deleted the np-17513-duplicate-tenantIds branch April 18, 2024 14:09
@backport-action
Copy link
Collaborator

Backport failed for stable/8.3, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin stable/8.3
git worktree add -d .worktree/backport-17545-to-stable/8.3 origin/stable/8.3
cd .worktree/backport-17545-to-stable/8.3
git switch --create backport-17545-to-stable/8.3
git cherry-pick -x c43b6823662bd6bef247b0707865483ca1f0dec2 ba0e04a8a5ffaa78924d75ed1e2ab7d45ee3c209 0a9702fb8f0f5add27d0f8ca9743c263d2af3da6 8022a50643088b7572a0de96bca1314e44e740df

@backport-action
Copy link
Collaborator

Backport failed for stable/8.4, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin stable/8.4
git worktree add -d .worktree/backport-17545-to-stable/8.4 origin/stable/8.4
cd .worktree/backport-17545-to-stable/8.4
git switch --create backport-17545-to-stable/8.4
git cherry-pick -x c43b6823662bd6bef247b0707865483ca1f0dec2 ba0e04a8a5ffaa78924d75ed1e2ab7d45ee3c209 0a9702fb8f0f5add27d0f8ca9743c263d2af3da6 8022a50643088b7572a0de96bca1314e44e740df

Copy link
Contributor

Backport failed for stable/8.3, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin stable/8.3
git worktree add -d .worktree/backport-17545-to-stable/8.3 origin/stable/8.3
cd .worktree/backport-17545-to-stable/8.3
git switch --create backport-17545-to-stable/8.3
git cherry-pick -x c43b6823662bd6bef247b0707865483ca1f0dec2 ba0e04a8a5ffaa78924d75ed1e2ab7d45ee3c209 0a9702fb8f0f5add27d0f8ca9743c263d2af3da6 8022a50643088b7572a0de96bca1314e44e740df

Copy link
Contributor

Backport failed for stable/8.4, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin stable/8.4
git worktree add -d .worktree/backport-17545-to-stable/8.4 origin/stable/8.4
cd .worktree/backport-17545-to-stable/8.4
git switch --create backport-17545-to-stable/8.4
git cherry-pick -x c43b6823662bd6bef247b0707865483ca1f0dec2 ba0e04a8a5ffaa78924d75ed1e2ab7d45ee3c209 0a9702fb8f0f5add27d0f8ca9743c263d2af3da6 8022a50643088b7572a0de96bca1314e44e740df

@backport-action
Copy link
Collaborator

Successfully created backport PR for stable/8.5:

Copy link
Contributor

Git push to origin failed for stable/8.5 with exitcode 1

github-merge-queue bot pushed a commit that referenced this pull request Apr 18, 2024
… restarts (#17580)

# Description
Backport of #17545 to `stable/8.5`.

relates to #17513
original author: @npepinpe
github-merge-queue bot pushed a commit that referenced this pull request Apr 23, 2024
… restarts (#17601)

# Description
Backport of #17545 to `stable/8.3`.

> [!Note]
> There was a very small conflict in the `JobStreamEndpoint` actuator
around the `Metadata` record, I guess because of the annotations, but
nothing major.

relates to #17513
original author: @npepinpe
github-merge-queue bot pushed a commit that referenced this pull request Apr 23, 2024
… restarts (#17600)

# Description
Backport of #17545 to `stable/8.4`.

> [!Note]
> There was a very small conflict in the `JobStreamEndpoint` actuator
around the `Metadata` record, I guess because of the annotations, but
nothing major.

relates to #17513
original author: @npepinpe
@Zelldon Zelldon added version:8.4.7 Marks an issue as being completely or in parts released in 8.4.7 version:8.5.1 Marks an issue as being completely or in parts released in 8.5.1 version:8.3.11 Marks an issue as being completely or in parts released in 8.3.11 labels May 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport stable/8.3 Backport a pull request to 8.3.x backport stable/8.4 Backport a pull request to 8.4.x backport stable/8.5 Backport a pull request to stable/8.5 component/zeebe Related to the Zeebe component/team version:8.3.11 Marks an issue as being completely or in parts released in 8.3.11 version:8.4.7 Marks an issue as being completely or in parts released in 8.4.7 version:8.5.1 Marks an issue as being completely or in parts released in 8.5.1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Broken job stream aggregation
4 participants