Skip to content

[MINOR] Removing dead code in HiveAvroSerializer#11577

Closed
voonhous wants to merge 2 commits intoapache:masterfrom
voonhous:minor_remove_dead_code
Closed

[MINOR] Removing dead code in HiveAvroSerializer#11577
voonhous wants to merge 2 commits intoapache:masterfrom
voonhous:minor_remove_dead_code

Conversation

@voonhous
Copy link
Copy Markdown
Member

@voonhous voonhous commented Jul 5, 2024

Change Logs

Removing the case INT + DATE if - block as DATE types from the Hive related entrypoints will have a DATE PrmitiveFieldObjectInspector type.

image

From my understanding, this logic is mainly copied over from:

case INT:
if (schema.getLogicalType() != null && schema.getLogicalType().getName().equals("date")) {
return HoodieHiveUtils.getDateWriteable((Integer) value);
}

Of which, was introduced in this PR here:
https://github.com/apache/hudi/pull/2634/files

The switch-case for this is applied on a different entrypoint (switching an Avro Schema type). For such cases, DATE types will most definitely enter the CASE-INT block.

However, in HiveAvroSerializer, the code is switching by PrimitiveObjectInspector.PrimitiveCategory instead. When reading DATE types, the CASE-DATE will be used instead of CASE-INT, making the block that we are deleting here a piece of code that will never be used.

On top of that, the entire CASE-INT block can be removed as it is already handled by the default block

Therefore, removing it to avoid further confusion.

TLDR: Remove case-blocks as we are switching by different types and entrypoints are different

Impact

None

Risk level (write none, low medium or high below)

None

Documentation Update

Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none".

  • The config description must be updated if new configs are added or the default value of the configs are changed
  • Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
    ticket number here and follow the instruction to make
    changes to the website.

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@github-actions github-actions bot added the size:XS PR with lines of changes in <= 10 label Jul 5, 2024
@danny0405
Copy link
Copy Markdown
Contributor

Thanks for the contribution, can you check the Azure CI failures?

@voonhous
Copy link
Copy Markdown
Member Author

voonhous commented Jul 6, 2024

Will check on it and ping you again. Seems like my initial claim of this section being no longer required is wrong, let me confirm this.

Stack trace of test failure:

2024-07-05T12:00:22.7907465Z [ERROR] Tests run: 296, Failures: 0, Errors: 4, Skipped: 0, Time elapsed: 1,364.399 s <<< FAILURE! - in JUnit Vintage
2024-07-05T12:00:22.7921852Z [ERROR] [1] true, true(testWriteLogDuringCompaction(boolean, boolean))  Time elapsed: 3.232 s  <<< ERROR!
2024-07-05T12:00:22.7922367Z java.lang.ClassCastException: org.apache.hadoop.hive.serde2.io.DateWritable cannot be cast to org.apache.hadoop.io.IntWritable
2024-07-05T12:00:22.7922891Z 	at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector.getPrimitiveJavaObject(WritableIntObjectInspector.java:46)
2024-07-05T12:00:22.7923390Z 	at org.apache.hudi.hadoop.utils.HiveAvroSerializer.serializePrimitive(HiveAvroSerializer.java:314)
2024-07-05T12:00:22.7923796Z 	at org.apache.hudi.hadoop.utils.HiveAvroSerializer.serialize(HiveAvroSerializer.java:210)
2024-07-05T12:00:22.7924221Z 	at org.apache.hudi.hadoop.utils.HiveAvroSerializer.setUpRecordFieldFromWritable(HiveAvroSerializer.java:119)
2024-07-05T12:00:22.7924909Z 	at org.apache.hudi.hadoop.utils.HiveAvroSerializer.serialize(HiveAvroSerializer.java:106)
2024-07-05T12:00:22.7925499Z 	at org.apache.hudi.hadoop.realtime.RealtimeCompactedRecordReader.convertArrayWritableToHoodieRecord(RealtimeCompactedRecordReader.java:203)
2024-07-05T12:00:22.7925986Z 	at org.apache.hudi.hadoop.realtime.RealtimeCompactedRecordReader.mergeRecord(RealtimeCompactedRecordReader.java:188)
2024-07-05T12:00:22.7926448Z 	at org.apache.hudi.hadoop.realtime.RealtimeCompactedRecordReader.next(RealtimeCompactedRecordReader.java:128)
2024-07-05T12:00:22.7926893Z 	at org.apache.hudi.hadoop.realtime.RealtimeCompactedRecordReader.next(RealtimeCompactedRecordReader.java:55)
2024-07-05T12:00:22.7927332Z 	at org.apache.hudi.hadoop.realtime.HoodieRealtimeRecordReader.next(HoodieRealtimeRecordReader.java:86)
2024-07-05T12:00:22.7927751Z 	at org.apache.hudi.hadoop.realtime.HoodieRealtimeRecordReader.next(HoodieRealtimeRecordReader.java:36)
2024-07-05T12:00:22.7928213Z 	at org.apache.hudi.testutils.HoodieMergeOnReadTestUtils.getRecordsUsingInputFormat(HoodieMergeOnReadTestUtils.java:159)
2024-07-05T12:00:22.7928698Z 	at org.apache.hudi.testutils.HoodieMergeOnReadTestUtils.getRecordsUsingInputFormat(HoodieMergeOnReadTestUtils.java:112)
2024-07-05T12:00:22.7929171Z 	at org.apache.hudi.testutils.HoodieMergeOnReadTestUtils.getRecordsUsingInputFormat(HoodieMergeOnReadTestUtils.java:106)
2024-07-05T12:00:22.7929628Z 	at org.apache.hudi.testutils.HoodieMergeOnReadTestUtils.getRecordsUsingInputFormat(HoodieMergeOnReadTestUtils.java:101)
2024-07-05T12:00:22.7930142Z 	at org.apache.hudi.table.functional.TestHoodieSparkMergeOnReadTableCompaction.readTableTotalRecordsNum(TestHoodieSparkMergeOnReadTableCompaction.java:197)
2024-07-05T12:00:22.7930707Z 	at org.apache.hudi.table.functional.TestHoodieSparkMergeOnReadTableCompaction.testWriteLogDuringCompaction(TestHoodieSparkMergeOnReadTableCompaction.java:189)

@voonhous
Copy link
Copy Markdown
Member Author

voonhous commented Jul 8, 2024

@danny0405 I checked the code and realised that the reason why the tests were failing because the DATE column is defined with an INT type in org.apache.hudi.common.testutils.HoodieTestDataGenerator#TRIP_HIVE_COLUMN_TYPES.

public static final String TRIP_HIVE_COLUMN_TYPES = "bigint,string,string,string,string,string,double,double,double,double,int,bigint,float,binary,int,bigint,decimal(10,6),"
+ "map<string,string>,struct<amount:double,currency:string>,array<struct<amount:double,currency:string>>,boolean";

In our productionised tests environment, the HIVE_TYPES for dates are passed by DATE instead of INT types.

On top of that, if values are indeed passed as INT, we should be seeing the same error as what was described in #11576.

The return types for are as such:

Hive2

org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableDateObjectInspector#getPrimitiveJavaObject

image

Hive3

org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableDateObjectInspector#getPrimitiveJavaObject

image

@danny0405
Copy link
Copy Markdown
Contributor

@danny0405 I checked the code and realised that the reason why the tests were failing because the DATE column is defined with an INT type in org.apache.hudi.common.testutils.HoodieTestDataGenerator#TRIP_HIVE_COLUMN_TYPES.

public static final String TRIP_HIVE_COLUMN_TYPES = "bigint,string,string,string,string,string,double,double,double,double,int,bigint,float,binary,int,bigint,decimal(10,6),"
+ "map<string,string>,struct<amount:double,currency:string>,array<struct<amount:double,currency:string>>,boolean";

In our productionised tests environment, the HIVE_TYPES for dates are passed by DATE instead of INT types.

On top of that, if values are indeed passed as INT, we should be seeing the same error as what was described in #11576.

The return types for are as such:

Hive2

org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableDateObjectInspector#getPrimitiveJavaObject

image

Hive3

org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableDateObjectInspector#getPrimitiveJavaObject

image

Great findings, maybe we can just keep the pattern match for INT in case there are some rare cases.

@hudi-bot
Copy link
Copy Markdown
Collaborator

hudi-bot commented Jul 8, 2024

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@voonhous
Copy link
Copy Markdown
Member Author

voonhous commented Jul 8, 2024

Sure, will close this then. I'll revisit this again when i have time.

@voonhous voonhous closed this Jul 8, 2024
@voonhous voonhous deleted the minor_remove_dead_code branch December 20, 2025 10:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:XS PR with lines of changes in <= 10

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants