[MINOR] Handle cases of malformed records when converting to json#10943
[MINOR] Handle cases of malformed records when converting to json#10943danny0405 merged 5 commits intoapache:masterfrom
Conversation
| // NullPointerException will be thrown in cases where the field values are missing | ||
| // ClassCastException will be thrown in cases where the field values do not match the schema type | ||
| // Fallback to using `toString` which also returns json but without a pretty-print option | ||
| out.write(record.toString().getBytes(StandardCharsets.UTF_8)); |
There was a problem hiding this comment.
What do we get for record.toString when NullPointerException is thrown?
There was a problem hiding this comment.
Hmm, seems like a null constant for empty field.
There was a problem hiding this comment.
So when the transformed JSON string got converted back into avro, the schema could change right?
There was a problem hiding this comment.
You get a string that represents the json of the object, it does not do any validation on types/nullability. See the tests that are added for a sample.
There was a problem hiding this comment.
So when the transformed JSON string got converted back into avro, the schema could change right?
The case here is when you have some data and are trying to convert it to avro and it fails. https://github.com/apache/hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/streamer/HoodieStreamerUtils.java#L164
There was a problem hiding this comment.
Ok, I'm not familiar with that flow. If this is breaking that flow, I can just make a new method
There was a problem hiding this comment.
If the schema does not really change for that, it is okay, maybe we can add some use cases.
There was a problem hiding this comment.
One concern I have is that this could hide some exception and then we don't catch something in our initial testing for some more critical timeline related flow. I think I've convinced myself there should just be a new method like "safeToJson" that does not throw an exception that we use in the error table/writer cases since those are not as critical to Hudi.
There was a problem hiding this comment.
think I've convinced myself there should just be a new method like "safeToJson" that does not throw an exception that we use in the error table/writer cases since those are not as critical to Hudi.
+1
There was a problem hiding this comment.
I've pushed an update
Change Logs
Handles cases of missing required fields and bad input values when converting to JSON. This conversion is used in combination with the Error Table so you cannot assume that the records are properly formatted.
Impact
Avoids exceptions being thrown for malformed input data being sent to the error table writer
Risk level (write none, low medium or high below)
None
Documentation Update
Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none".
ticket number here and follow the instruction to make
changes to the website.
Contributor's checklist