Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

schema_registry_encode Race Condition #2497

Closed
hendoxc opened this issue Apr 8, 2024 · 5 comments
Closed

schema_registry_encode Race Condition #2497

hendoxc opened this issue Apr 8, 2024 · 5 comments
Labels
bug needs investigation It looks as though have all the information needed but investigation is required processors Any tasks or issues relating specifically to processors

Comments

@hendoxc
Copy link

hendoxc commented Apr 8, 2024

Hey getting a very strange bug using schema_registry_encode

I sometimes see my application intermittently failing to encode some % of messages, I check the logs to see why, and I see

could not decode any json data in input

followed by the json message that its trying to encode then for key xyz

okay fine makes sense, but when I actually look at the json input that was printed out in the error message, I can see that everytime the first {"key_name": is missing/truncated for some reason.

for example if was receiving message like

{"key_1": "value", "key_2": 12}

the error message would look like:

cannot decode textual record \"com.data\": could not decode any json data in input "value", "key_2": 12} for key "key_1"

with the {"key_1": truncated from the json

I know the message in the benthos pipeline is fine, because Im catching the error, logging the message that caused the error, and sending the message to a DLQ to inspect further, where the json is fine, and contains all the expected keys.

This happens on a high throughput topic, and I find that the error tends to occur more often based on the refresh period set in schema_registry_encode, the smaller the refresh period, the more often the application will intermittently throw these errors for a % of messages, then return to normal mode of operation.

@Jeffail Jeffail added bug processors Any tasks or issues relating specifically to processors needs investigation It looks as though have all the information needed but investigation is required labels Apr 8, 2024
@Jeffail
Copy link
Collaborator

Jeffail commented Apr 8, 2024

Hey @hendoxc, I'm taking a look for any race conditions within the processor itself, it'd also help if you can give a general overview of what your config is doing, including:

  • The input type that is producing the messages that are failing
  • Any processors that come specifically before the schema registry processor
  • Whether any custom plugins are a part of the build you're running

@Jeffail
Copy link
Collaborator

Jeffail commented Apr 8, 2024

I managed to find a race condition and have a fix for it: 3c301bb

Let's call this a speculative fix, are you able/willing to run a nightly build to try this out?

@hendoxc
Copy link
Author

hendoxc commented Apr 9, 2024

yes, can pull in the latest benthos commit and try it out.

input: kafka_franz

as for the processors before hand, I'm just doing a schema_registry_decode, then some mapping processors .

@hendoxc
Copy link
Author

hendoxc commented Apr 10, 2024

@Jeffail I've been running a couple applications with the commit SHA provided, and no longer saw the issue, I'd say this is resolved.

@Jeffail
Copy link
Collaborator

Jeffail commented Apr 11, 2024

Awesome, thanks @hendoxc!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug needs investigation It looks as though have all the information needed but investigation is required processors Any tasks or issues relating specifically to processors
Projects
None yet
Development

No branches or pull requests

2 participants