Fix uncaught ParseException when reading Avro from Kafka#14183
Merged
abhishekagarwal87 merged 10 commits intoapache:masterfrom May 4, 2023
Merged
Fix uncaught ParseException when reading Avro from Kafka#14183abhishekagarwal87 merged 10 commits intoapache:masterfrom
ParseException when reading Avro from Kafka#14183abhishekagarwal87 merged 10 commits intoapache:masterfrom
Conversation
…ndling & parse handler.
abhishekagarwal87
approved these changes
Apr 28, 2023
Contributor
abhishekagarwal87
left a comment
There was a problem hiding this comment.
thank you for fixing this.
Contributor
Author
|
There is the usual flaky kafka slow IT failure unrelated to this change. Hopefully re-running should fix it: |
gianm
reviewed
Apr 28, 2023
Contributor
gianm
left a comment
There was a problem hiding this comment.
The production changes and new tests LGTM. I suggest also adding a test for Kafka indexing itself, to future-proof this bug fix against potential future refactors or entire replacements of StreamChunkParser. That test doesn't need to be Avro-specific.
...ions/src/main/java/org/apache/druid/data/input/avro/SchemaRegistryBasedAvroBytesDecoder.java
Outdated
Show resolved
Hide resolved
...ions/src/main/java/org/apache/druid/data/input/avro/SchemaRegistryBasedAvroBytesDecoder.java
Outdated
Show resolved
Hide resolved
Contributor
Author
|
@gianm, thanks for the review. Added a UT to Kafka indexing as well. |
abhishekrb19
added a commit
to abhishekrb19/incubator-druid
that referenced
this pull request
May 5, 2023
) In StreamChunkParser#parseWithInputFormat, we call byteEntityReader.read() without handling a potential ParseException, which is thrown during this function call by the delegate AvroStreamReader#intermediateRowIterator. A ParseException can be thrown if an Avro stream has corrupt data or data that doesn't conform to the schema specified or for other decoding reasons. This exception if uncaught, can cause ingestion to fail.
abhishekagarwal87
pushed a commit
that referenced
this pull request
May 5, 2023
…14212) In StreamChunkParser#parseWithInputFormat, we call byteEntityReader.read() without handling a potential ParseException, which is thrown during this function call by the delegate AvroStreamReader#intermediateRowIterator. A ParseException can be thrown if an Avro stream has corrupt data or data that doesn't conform to the schema specified or for other decoding reasons. This exception if uncaught, can cause ingestion to fail.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #13894, #14041.
In
StreamChunkParser#parseWithInputFormat, we callbyteEntityReader.read()without handling a potentialParseException, which is thrown during this function call by the delegateAvroStreamReader#intermediateRowIterator.A
ParseExceptioncan be thrown if an Avro stream has corrupt data or data that doesn't conform to the schema specified or for other decoding reasons. This exception if uncaught, can cause ingestion to fail.Primary code changes:
ParseExceptionin the call stackStreamChunkParser.StreamChunkParserto facilitate unit testing. This allows dependencies to be passed in by the caller, making testing easier.Unit test changes:
StreamChunkParserTestandKafkaIndexTaskTestto test this exception handling behavior alongside setting maximum allowed parse exceptions to different thresholds.NoopRowIngestionMetersand a mockedRowIngestionMeterswith a more realistic implementation,SimpleRowIngestionMeters, used to validate any parse exceptions.Release note
Fixes #13894, #14041.
This PR has: