-
Notifications
You must be signed in to change notification settings - Fork 603
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make activated jobs which were not send to clients re-activatable #3631
Comments
Hi, we are having errors due to this bug with Zeebe 0.26.1 in our production environment, do you know when will it be planned? Thank you in advance. |
This is currently in the backlog. However to prioritize it could you give use more context?
|
We identified it in this forum thread It happens occasionally, but our business logic doesn't allow retries so we can't decrease the jobTimeout, and when the issue happens, we lose a customer in out e-commerce due to timeout. |
@npepinpe I want to bring some additional attention to this issue. I regularly see community members run into some form of #5387, for which this can be a solution. Having discussed it with @saig0, we think the idea is good, but an explicit JobIntent would be necessary to make it clear that this is not a user sending a FailJob. |
I have given some thoughts to potential JobIntents:
|
What is the expected impact of fixing this, in concrete terms? Also, I'm not sure how this relates to the call closed. I imagine it is for some calls, but do we know anything more about that? |
I believe this is what users experience when they say that jobs sometimes get lost while delivering them to the client. I believe this would resolve that problem. For example, this would already happen when a client sends an activate jobs request, the gateway has long polling enabled and the client's request times out before a new job was activated in the broker. Once the job is available and is activated, the client no longer has a connection with the gateway and the gateway can't deliver the job. |
Fair point. I'll bring it up again during planning, but it looks like our engine team is pretty busy this quarter. |
hi @npepinpe . |
@npepinpe |
Description
Sometimes the client connection get closed (long polling timed out, network interruption, client died etc.) before gateway could send the activate job response back to the client. This results in the job being marked as activated in the broker, but never gets picked up by a client until the specified job timeout. This might be acceptable in some cases, but if the job time out is high clients might observe a huge latency between the time a job is created and completed.
This can happen at any time, but this is more frequent with long polling. (see #3585)
Proposal
When gateway realizes that response cannot be sent to the client, it can send a request to
cancel activation
to the broker.related to SUPPORT-13198
The text was updated successfully, but these errors were encountered: