[SUPPORT] HoodieRealtimeRecordReader can only work on RealtimeSplit and not with hdfs://111.parquet:0+4 #2813

MyLanPangzi · 2021-04-13T09:44:56Z

Describe the problem you faced

flink write mor table but cannot using hive agg query newest data.

To Reproduce

Steps to reproduce the behavior:

1.flink write mor table
2.create hive extrenal table using org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat

CREATE EXTERNAL TABLE dwd_sale_sale_detail_rt
(
    `_hoodie_commit_time`    STRING,
    `_hoodie_commit_seqno`   STRING,
    `_hoodie_record_key`     STRING,
    `_hoodie_partition_path` STRING,
    `_hoodie_file_name`      STRING,
    shopid                   STRING
salevalue decimal(1,2)
) partitioned by (`sdt` string)
    ROW FORMAT SERDE
        'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
    STORED AS INPUTFORMAT
        'org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat'
        OUTPUTFORMAT
            'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
    LOCATION 'hdfs://nameservice1/user/xiebo/hudi/dwd/dwd_sale_sale_detail_rt';

3.using hive shell agg query get error
4.

Expected behavior

hive query mor correctly and return agg result.

Environment Description

Hudi version : 0.9.0
Spark version :
Hive version : 1.1 cdh 5.6.12
Hadoop version : 2.6 cdh 5.6.12
Storage (HDFS/S3/GCS..) : hdfs
Running on Docker? (yes/no) : no

Additional context

Add any other context about the problem here.

Stacktrace

021-04-13 17:05:45,815 INFO [IPC Server handler 6 on 46363] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1594105654926_12624744_m_000012_0 is : 0.0
2021-04-13 17:05:45,818 FATAL [IPC Server handler 8 on 46363] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1594105654926_12624744_m_000012_0 - exited : java.io.IOException: java.lang.reflect.InvocationTargetException
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:267)
at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.(HadoopShimsSecure.java:213)
at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getRecordReader(HadoopShimsSecure.java:334)
at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:734)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.(MapTask.java:169)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:438)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1924)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:253)
... 11 more
Caused by: java.lang.IllegalArgumentException: HoodieRealtimeRecordReader can only work on RealtimeSplit and not with hdfs://nameservice1/user/hudi/dwd/dwd_sale_sale_detail_rt/20210413/ab5a8ff3-4647-46ae-ba13-7b6eb7914516_8-10-0_20210413170058.parquet:0+57883047
at org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:40)
at org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat.getRecordReader(HoodieParquetRealtimeInputFormat.java:117)
at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.(CombineHiveRecordReader.java:68)

bvaradar · 2021-04-13T16:30:55Z

cc @n3nash

Can you try setting hive.input.format=org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat

and trying.

qianjiangbing · 2021-04-14T02:47:23Z

I tried to set hive.input.format = org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat;
But another error occurred:

Diagnostic Messages for this Task:
Error: java.lang.ClassCastException: org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch cannot be cast to org.apache.hadoop.io.ArrayWritable
at org.apache.hudi.hadoop.realtime.RealtimeCompactedRecordReader.createValue(RealtimeCompactedRecordReader.java:150)
at org.apache.hudi.hadoop.realtime.RealtimeCompactedRecordReader.createValue(RealtimeCompactedRecordReader.java:43)
at org.apache.hudi.hadoop.realtime.HoodieRealtimeRecordReader.createValue(HoodieRealtimeRecordReader.java:89)
at org.apache.hudi.hadoop.realtime.HoodieRealtimeRecordReader.createValue(HoodieRealtimeRecordReader.java:36)
at org.apache.hudi.hadoop.realtime.RealtimeCompactedRecordReader.createValue(RealtimeCompactedRecordReader.java:150)
at org.apache.hudi.hadoop.realtime.RealtimeCompactedRecordReader.createValue(RealtimeCompactedRecordReader.java:43)
at org.apache.hudi.hadoop.realtime.HoodieRealtimeRecordReader.createValue(HoodieRealtimeRecordReader.java:89)
at org.apache.hudi.hadoop.realtime.HoodieCombineRealtimeRecordReader.createValue(HoodieCombineRealtimeRecordReader.java:87)
at org.apache.hudi.hadoop.realtime.HoodieCombineRealtimeRecordReader.createValue(HoodieCombineRealtimeRecordReader.java:41)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.createValue(MapTask.java:186)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapRunner.run(ExecMapRunner.java:37)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1731)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)

n3nash · 2021-04-14T04:55:01Z

@qianjiangbing @bvaradar There is a ticket created for this -> https://issues.apache.org/jira/browse/HUDI-1036. I will look into this later this week. This looks like a legitimate issue

MyLanPangzi · 2021-04-14T05:38:54Z

update the question description. i'm using org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat agg query mor table when error occur.

qianjiangbing · 2021-04-14T09:18:24Z

I use hive2.3.8 to test, it's ok!

n3nash · 2021-04-26T07:07:51Z

@qianjiangbing Thanks for confirming.

@MyLanPangzi I just noticed that you were using Hive version : 1.1 cdh 5.6.12. This is a very old version of Hive. The latest Hudi builds only work with Hive 2.x+ versions. Are you able to migrate to a higher version of Hive ?

MyLanPangzi · 2021-04-27T08:58:47Z

@qianjiangbing Thanks for confirming.

@MyLanPangzi I just noticed that you were using Hive version : 1.1 cdh 5.6.12. This is a very old version of Hive. The latest Hudi builds only work with Hive 2.x+ versions. Are you able to migrate to a higher version of Hive ?

Sorry, I can't upgrade the cluster version. so the only option is using org.apache.hudi.hadoop.HoodieParquetInputFormat for mor table in my cluster.
Can i close the issue if you haven't any fix?

nsivabalan · 2021-05-04T14:52:40Z

CC @n3nash

stayrascal · 2023-02-19T10:13:45Z

@n3nash @nsivabalan may i check does this feature(agg query table) work well in hive 3.1.2, I meet the the sample problem about cast class exception as bellow shows:

Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch cannot be cast to org.apache.hadoop.io.ArrayWritable
	at org.apache.hudi.hadoop.realtime.RealtimeCompactedRecordReader.createValue(RealtimeCompactedRecordReader.java:183)
	at org.apache.hudi.hadoop.realtime.RealtimeCompactedRecordReader.createValue(RealtimeCompactedRecordReader.java:47)
	at org.apache.hudi.hadoop.realtime.HoodieRealtimeRecordReader.createValue(HoodieRealtimeRecordReader.java:89)
	at org.apache.hudi.hadoop.realtime.HoodieRealtimeRecordReader.createValue(HoodieRealtimeRecordReader.java:36)
	at org.apache.hadoop.hive.ql.io.HiveRecordReader.createValue(HiveRecordReader.java:58)
	at org.apache.hadoop.hive.ql.io.HiveRecordReader.createValue(HiveRecordReader.java:33)
	at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.createValue(TezGroupedSplitsInputFormat.java:160)
	at org.apache.tez.mapreduce.lib.MRReaderMapred.setupOldRecordReader(MRReaderMapred.java:168)
	at org.apache.tez.mapreduce.lib.MRReaderMapred.setSplit(MRReaderMapred.java:83)
	at org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:706)
	at org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:665)
	at org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:150)
	at org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:114)
	at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:525)
	at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:171)
	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:266)

Not sure if we have verified that this feature works well in Hive3, I'm using hive 3.1.2 and hudi 0.12.2

danny0405 · 2023-02-20T02:28:04Z

You may need to turn off the vectorized execution for Hive.

stayrascal · 2023-02-23T15:58:43Z

thanks @danny0405, it works.

stayrascal · 2023-02-24T08:39:27Z

@danny0405 one more question, after disable vectorized vectorization, count RT table works, but got empty content during query the RT table, which haven't done any compaction(only have log file without base file). Any parameter I need to set?

0: jdbc:hive2://xxxxxx-1:10000/> select count(*) from flink_hudi_mor_tbl_rt;
+------+
| _c0  |
+------+
| 16   |
+------+
1 row selected (6.141 seconds)

0: jdbc:hive2://xxxxxx-1:10000/> select * from flink_hudi_mor_tbl_rt;
+--------------------------------------------+---------------------------------------------+-------------------------------------------+-----------------------------------------------+------------------------------------------+-----------------------------+-----------------------------+----------------------------+---------------------------+----------------------------------+
| flink_hudi_mor_tbl_rt._hoodie_commit_time  | flink_hudi_mor_tbl_rt._hoodie_commit_seqno  | flink_hudi_mor_tbl_rt._hoodie_record_key  | flink_hudi_mor_tbl_rt._hoodie_partition_path  | flink_hudi_mor_tbl_rt._hoodie_file_name  | flink_hudi_mor_tbl_rt.uuid  | flink_hudi_mor_tbl_rt.name  | flink_hudi_mor_tbl_rt.age  | flink_hudi_mor_tbl_rt.ts  | flink_hudi_mor_tbl_rt.partition  |
+--------------------------------------------+---------------------------------------------+-------------------------------------------+-----------------------------------------------+------------------------------------------+-----------------------------+-----------------------------+----------------------------+---------------------------+----------------------------------+
+--------------------------------------------+---------------------------------------------+-------------------------------------------+-----------------------------------------------+------------------------------------------+-----------------------------+-----------------------------+----------------------------+---------------------------+----------------------------------+
No rows selected (0.137 seconds)

0: jdbc:hive2://xxxx-1:10000/> set hive.input.format;
+----------------------------------------------------+
|                        set                         |
+----------------------------------------------------+
| hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat |
+----------------------------------------------------+
1 row selected (0.006 seconds)
0: jdbc:hive2://xxxx-1:10000/> set hive.tez.input.format;
+----------------------------------------------------+
|                        set                         |
+----------------------------------------------------+
| hive.tez.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat |
+----------------------------------------------------+
1 row selected (0.006 seconds)

it seems that select * from flink_hudi_mor_tbl_rt; didn't submit any job. But it works if query a RT table which has done some compaction commits.

danny0405 · 2023-02-25T03:45:18Z

rt table with pure log is not supported well for Hive queries, you may need to switch to ro table instead.

n3nash self-assigned this Apr 13, 2021

n3nash added awaiting-community-help hive Issues related to hive labels Apr 13, 2021

MyLanPangzi closed this as completed May 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SUPPORT] HoodieRealtimeRecordReader can only work on RealtimeSplit and not with hdfs://111.parquet:0+4 #2813

[SUPPORT] HoodieRealtimeRecordReader can only work on RealtimeSplit and not with hdfs://111.parquet:0+4 #2813

MyLanPangzi commented Apr 13, 2021 •

edited

Loading

bvaradar commented Apr 13, 2021

qianjiangbing commented Apr 14, 2021

n3nash commented Apr 14, 2021

MyLanPangzi commented Apr 14, 2021

qianjiangbing commented Apr 14, 2021

n3nash commented Apr 26, 2021

MyLanPangzi commented Apr 27, 2021

nsivabalan commented May 4, 2021

stayrascal commented Feb 19, 2023

danny0405 commented Feb 20, 2023

stayrascal commented Feb 23, 2023

stayrascal commented Feb 24, 2023 •

edited

Loading

danny0405 commented Feb 25, 2023

[SUPPORT] HoodieRealtimeRecordReader can only work on RealtimeSplit and not with hdfs://111.parquet:0+4 #2813

[SUPPORT] HoodieRealtimeRecordReader can only work on RealtimeSplit and not with hdfs://111.parquet:0+4 #2813

Comments

MyLanPangzi commented Apr 13, 2021 • edited Loading

bvaradar commented Apr 13, 2021

qianjiangbing commented Apr 14, 2021

n3nash commented Apr 14, 2021

MyLanPangzi commented Apr 14, 2021

qianjiangbing commented Apr 14, 2021

n3nash commented Apr 26, 2021

MyLanPangzi commented Apr 27, 2021

nsivabalan commented May 4, 2021

stayrascal commented Feb 19, 2023

danny0405 commented Feb 20, 2023

stayrascal commented Feb 23, 2023

stayrascal commented Feb 24, 2023 • edited Loading

danny0405 commented Feb 25, 2023

MyLanPangzi commented Apr 13, 2021 •

edited

Loading

stayrascal commented Feb 24, 2023 •

edited

Loading