-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Open
Labels
area:table-serviceTable servicesTable servicesengine:flinkFlink integrationFlink integrationissue:data-consistencyData consistency issues (duplicates/phantoms)Data consistency issues (duplicates/phantoms)priority:highSignificant impact; potential bugsSignificant impact; potential bugs
Description
Describe the problem you faced
A flink write hudi job, we have hdfs jitter, cause flink task to fail over, and see this error
To Reproduce
Steps to reproduce the behavior:
*have chance to reproduce
1.flink write with online clustering
2.task fail over when StreamWriteOperatorCoordinator start new instant
3.task fail over again
Expected behavior
no error
Environment Description
-
Hudi version : 0.12
-
Storage (HDFS/S3/GCS..) : HDFS
-
Running on Docker? (yes/no) : no
Additional context
Stacktrace
2023-05-04 12:18:05,125 ERROR org.apache.hudi.sink.clustering.ClusteringOperator [] - Executor executes action [Execute clustering for instant 20230504110903788 from task 0] error
org.apache.hudi.exception.HoodieException: unable to read next record from parquet file
at org.apache.hudi.common.util.ParquetReaderIterator.hasNext(ParquetReaderIterator.java:53) ~[175020faf1b549f5886323df8573cb93:0.12.1]
at java.util.Spliterators$IteratorSpliterator.tryAdvance(Spliterators.java:1811) ~[?:1.8.0_252]
at java.util.stream.StreamSpliterators$WrappingSpliterator.lambda$initPartialTraversalState$0(StreamSpliterators.java:295) ~[?:1.8.0_252]
at java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.fillBuffer(StreamSpliterators.java:207) ~[?:1.8.0_252]
at java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.doAdvance(StreamSpliterators.java:162) ~[?:1.8.0_252]
at java.util.stream.StreamSpliterators$WrappingSpliterator.tryAdvance(StreamSpliterators.java:301) ~[?:1.8.0_252]
at java.util.Spliterators$1Adapter.hasNext(Spliterators.java:681) ~[?:1.8.0_252]
at org.apache.hudi.client.utils.ConcatenatingIterator.hasNext(ConcatenatingIterator.java:45) ~[175020faf1b549f5886323df8573cb93:0.12.1]
at org.apache.hudi.sink.clustering.ClusteringOperator.doClustering(ClusteringOperator.java:273) ~[175020faf1b549f5886323df8573cb93:0.12.1]
at org.apache.hudi.sink.clustering.ClusteringOperator.lambda$processElement$0(ClusteringOperator.java:196) ~[175020faf1b549f5886323df8573cb93:0.12.1]
at org.apache.hudi.sink.utils.NonThrownExecutor.lambda$wrapAction$0(NonThrownExecutor.java:130) ~[175020faf1b549f5886323df8573cb93:0.12.1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_252]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_252]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_252]
Caused by: java.lang.RuntimeException: hdfs://A1/projects/hive/dev_db/hudi_table/region=ctry_2/date=20230504/subp=subp_1/22903630-b6dc-42c1-97f5-fbd6d6e7fbff-9_1-2-1_20230504102509969.parquet is not a Parquet file (length is too low: 0)
at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:539) ~[175020faf1b549f5886323df8573cb93:0.12.1]
at org.apache.parquet.hadoop.ParquetFileReader.<init>(ParquetFileReader.java:776) ~[175020faf1b549f5886323df8573cb93:0.12.1]
at org.apache.parquet.hadoop.ParquetFileReader.open(ParquetFileReader.java:657) ~[175020faf1b549f5886323df8573cb93:0.12.1]
at org.apache.parquet.hadoop.ParquetReader.initReader(ParquetReader.java:152) ~[175020faf1b549f5886323df8573cb93:0.12.1]
at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:135) ~[175020faf1b549f5886323df8573cb93:0.12.1]
at org.apache.hudi.common.util.ParquetReaderIterator.hasNext(ParquetReaderIterator.java:48) ~[175020faf1b549f5886323df8573cb93:0.12.1]
... 13 more
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
area:table-serviceTable servicesTable servicesengine:flinkFlink integrationFlink integrationissue:data-consistencyData consistency issues (duplicates/phantoms)Data consistency issues (duplicates/phantoms)priority:highSignificant impact; potential bugsSignificant impact; potential bugs
Type
Projects
Status
⏳ Awaiting Triage