-
Notifications
You must be signed in to change notification settings - Fork 556
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Considerable amount of GRPC error details can trigger ingress response header limit #11284
Comments
See related discussion and investigation in slack https://camunda.slack.com/archives/CT702EPFH/p1671124412586789 |
@megglos Can you clarify for us what the chances are of running into this situation? And what is the impact? Does the user receive a response at all? |
@jfriedenstab 300 out of how many total deployment requests? 🤓 Would you consider a truncated error like suggested here
would be a good solution to educate the user about the modeling errors? The user may at least be aware of some specific errors and once resolved may learn about the next remaining ones. On the other hand wouldn't it be better to raise those issues directly in the modeler via validation to provide more helpful errors directly in the modeler (e.g on the affected elements)? If I'm no mistaken we have such validation already and I wonder how users can deploy invalid models from the modeler still 🤔 |
Thanks a lot for your quick reply, @megglos!
I can't tell you the total number of requests in the 90-day period, but in the last 24 hours alone there were ~1200 deployment requests. Ok, so the ratio of requests that failed with the
Yes, I'd say it's a good solution 👍🏻.
I agree that, ideally, we would already show the user all the model errors before they try to deploy. However, I think the linting/validation rules that are in place are not able to catch all the possible error cases. So, some errors would only be detected when the model is actually deployed.
True, we could prevent deployments if the linting detects errors. If I remember correctly though, we took a conscious decision a while ago to allow deployments anyway ( Anyway, models can also be deployed from outside the Web Modeler (via |
Hi @megglos, |
Thanks @jfriedenstab. I've moved it back into our inbox for triage. As I understand, the problem leads to errors in the reverse proxy and the client not receiving the error response. That sounds like a bug to me. It's also high severity as there is no workaround available. I find it difficult to specify a likelihood for this. |
Thank you @korthout! FYI: The Zeebe Java client will receive the following error response:
Regarding the likelihood: The problem seems to occur regularly. Sentry (that we use for error tracking in Web Modeler) recorded 433 events from 75 different users over the last 90 days. |
ZPA-Triage:
|
Maybe something to consider for the future when you add a deploy endpoint to the C8/Zeebe REST API (getting the errors back in a more structured format would also be nice) 🙃. |
Description
In cases, like an invalid bpmn model, a considerable number of errors can occur. These are passed back to the client via the
grpc-status-details-bin
HTTP header, see.In cases where Zeebe is accessed through a reverse proxy/ingress like nginx, this behavior can exceed response header limits of the proxy, resulting in error logs as:
leading to a 502 response to the client by the proxy.
We may consider limiting the amount of error details returned to e.g. a limit of 5 entries, followed by a summary error entry if there are more errors "and N more errors". This would allow to still enforce a response header limit on any client facing proxy while making sure the client still gets back useful information to actually fix the model errors, no matter how much errors they provoked.
SUPPORT-15410
The text was updated successfully, but these errors were encountered: