[MINOR] Removing dead code in HiveAvroSerializer#11577
[MINOR] Removing dead code in HiveAvroSerializer#11577voonhous wants to merge 2 commits intoapache:masterfrom
Conversation
|
Thanks for the contribution, can you check the Azure CI failures? |
|
Will check on it and ping you again. Seems like my initial claim of this section being no longer required is wrong, let me confirm this. Stack trace of test failure: 2024-07-05T12:00:22.7907465Z [ERROR] Tests run: 296, Failures: 0, Errors: 4, Skipped: 0, Time elapsed: 1,364.399 s <<< FAILURE! - in JUnit Vintage
2024-07-05T12:00:22.7921852Z [ERROR] [1] true, true(testWriteLogDuringCompaction(boolean, boolean)) Time elapsed: 3.232 s <<< ERROR!
2024-07-05T12:00:22.7922367Z java.lang.ClassCastException: org.apache.hadoop.hive.serde2.io.DateWritable cannot be cast to org.apache.hadoop.io.IntWritable
2024-07-05T12:00:22.7922891Z at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector.getPrimitiveJavaObject(WritableIntObjectInspector.java:46)
2024-07-05T12:00:22.7923390Z at org.apache.hudi.hadoop.utils.HiveAvroSerializer.serializePrimitive(HiveAvroSerializer.java:314)
2024-07-05T12:00:22.7923796Z at org.apache.hudi.hadoop.utils.HiveAvroSerializer.serialize(HiveAvroSerializer.java:210)
2024-07-05T12:00:22.7924221Z at org.apache.hudi.hadoop.utils.HiveAvroSerializer.setUpRecordFieldFromWritable(HiveAvroSerializer.java:119)
2024-07-05T12:00:22.7924909Z at org.apache.hudi.hadoop.utils.HiveAvroSerializer.serialize(HiveAvroSerializer.java:106)
2024-07-05T12:00:22.7925499Z at org.apache.hudi.hadoop.realtime.RealtimeCompactedRecordReader.convertArrayWritableToHoodieRecord(RealtimeCompactedRecordReader.java:203)
2024-07-05T12:00:22.7925986Z at org.apache.hudi.hadoop.realtime.RealtimeCompactedRecordReader.mergeRecord(RealtimeCompactedRecordReader.java:188)
2024-07-05T12:00:22.7926448Z at org.apache.hudi.hadoop.realtime.RealtimeCompactedRecordReader.next(RealtimeCompactedRecordReader.java:128)
2024-07-05T12:00:22.7926893Z at org.apache.hudi.hadoop.realtime.RealtimeCompactedRecordReader.next(RealtimeCompactedRecordReader.java:55)
2024-07-05T12:00:22.7927332Z at org.apache.hudi.hadoop.realtime.HoodieRealtimeRecordReader.next(HoodieRealtimeRecordReader.java:86)
2024-07-05T12:00:22.7927751Z at org.apache.hudi.hadoop.realtime.HoodieRealtimeRecordReader.next(HoodieRealtimeRecordReader.java:36)
2024-07-05T12:00:22.7928213Z at org.apache.hudi.testutils.HoodieMergeOnReadTestUtils.getRecordsUsingInputFormat(HoodieMergeOnReadTestUtils.java:159)
2024-07-05T12:00:22.7928698Z at org.apache.hudi.testutils.HoodieMergeOnReadTestUtils.getRecordsUsingInputFormat(HoodieMergeOnReadTestUtils.java:112)
2024-07-05T12:00:22.7929171Z at org.apache.hudi.testutils.HoodieMergeOnReadTestUtils.getRecordsUsingInputFormat(HoodieMergeOnReadTestUtils.java:106)
2024-07-05T12:00:22.7929628Z at org.apache.hudi.testutils.HoodieMergeOnReadTestUtils.getRecordsUsingInputFormat(HoodieMergeOnReadTestUtils.java:101)
2024-07-05T12:00:22.7930142Z at org.apache.hudi.table.functional.TestHoodieSparkMergeOnReadTableCompaction.readTableTotalRecordsNum(TestHoodieSparkMergeOnReadTableCompaction.java:197)
2024-07-05T12:00:22.7930707Z at org.apache.hudi.table.functional.TestHoodieSparkMergeOnReadTableCompaction.testWriteLogDuringCompaction(TestHoodieSparkMergeOnReadTableCompaction.java:189) |
|
@danny0405 I checked the code and realised that the reason why the tests were failing because the In our productionised tests environment, the HIVE_TYPES for dates are passed by On top of that, if values are indeed passed as INT, we should be seeing the same error as what was described in #11576. The return types for are as such: Hive2org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableDateObjectInspector#getPrimitiveJavaObject Hive3org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableDateObjectInspector#getPrimitiveJavaObject |
Great findings, maybe we can just keep the pattern match for |
|
Sure, will close this then. I'll revisit this again when i have time. |




Change Logs
Removing the case
INT+ DATE if - block as DATE types from the Hive related entrypoints will have a DATEPrmitiveFieldObjectInspectortype.From my understanding, this logic is mainly copied over from:
hudi/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieRealtimeRecordReaderUtils.java
Lines 178 to 181 in 70f44ef
Of which, was introduced in this PR here:
https://github.com/apache/hudi/pull/2634/files
The
switch-casefor this is applied on a different entrypoint (switching an Avro Schema type). For such cases, DATE types will most definitely enter theCASE-INTblock.However, in
HiveAvroSerializer, the code is switching byPrimitiveObjectInspector.PrimitiveCategoryinstead. When reading DATE types, theCASE-DATEwill be used instead ofCASE-INT, making the block that we are deleting here a piece of code that will never be used.On top of that, the entire
CASE-INTblock can be removed as it is already handled by thedefaultblockTherefore, removing it to avoid further confusion.
TLDR: Remove case-blocks as we are switching by different types and entrypoints are different
Impact
None
Risk level (write none, low medium or high below)
None
Documentation Update
Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none".
ticket number here and follow the instruction to make
changes to the website.
Contributor's checklist