Skip to content

[SUPPORT]xxx.parquet is not a Parquet file (length is too low: 0) #8674

@hbgstc123

Description

@hbgstc123

Describe the problem you faced

A flink write hudi job, we have hdfs jitter, cause flink task to fail over, and see this error

To Reproduce

Steps to reproduce the behavior:

*have chance to reproduce

1.flink write with online clustering
2.task fail over when StreamWriteOperatorCoordinator start new instant
3.task fail over again

Expected behavior

no error

Environment Description

  • Hudi version : 0.12

  • Storage (HDFS/S3/GCS..) : HDFS

  • Running on Docker? (yes/no) : no

Additional context

Stacktrace

2023-05-04 12:18:05,125 ERROR org.apache.hudi.sink.clustering.ClusteringOperator           [] - Executor executes action [Execute clustering for instant 20230504110903788 from task 0] error
org.apache.hudi.exception.HoodieException: unable to read next record from parquet file 
	at org.apache.hudi.common.util.ParquetReaderIterator.hasNext(ParquetReaderIterator.java:53) ~[175020faf1b549f5886323df8573cb93:0.12.1]
	at java.util.Spliterators$IteratorSpliterator.tryAdvance(Spliterators.java:1811) ~[?:1.8.0_252]
	at java.util.stream.StreamSpliterators$WrappingSpliterator.lambda$initPartialTraversalState$0(StreamSpliterators.java:295) ~[?:1.8.0_252]
	at java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.fillBuffer(StreamSpliterators.java:207) ~[?:1.8.0_252]
	at java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.doAdvance(StreamSpliterators.java:162) ~[?:1.8.0_252]
	at java.util.stream.StreamSpliterators$WrappingSpliterator.tryAdvance(StreamSpliterators.java:301) ~[?:1.8.0_252]
	at java.util.Spliterators$1Adapter.hasNext(Spliterators.java:681) ~[?:1.8.0_252]
	at org.apache.hudi.client.utils.ConcatenatingIterator.hasNext(ConcatenatingIterator.java:45) ~[175020faf1b549f5886323df8573cb93:0.12.1]
	at org.apache.hudi.sink.clustering.ClusteringOperator.doClustering(ClusteringOperator.java:273) ~[175020faf1b549f5886323df8573cb93:0.12.1]
	at org.apache.hudi.sink.clustering.ClusteringOperator.lambda$processElement$0(ClusteringOperator.java:196) ~[175020faf1b549f5886323df8573cb93:0.12.1]
	at org.apache.hudi.sink.utils.NonThrownExecutor.lambda$wrapAction$0(NonThrownExecutor.java:130) ~[175020faf1b549f5886323df8573cb93:0.12.1]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_252]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_252]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_252]
Caused by: java.lang.RuntimeException: hdfs://A1/projects/hive/dev_db/hudi_table/region=ctry_2/date=20230504/subp=subp_1/22903630-b6dc-42c1-97f5-fbd6d6e7fbff-9_1-2-1_20230504102509969.parquet is not a Parquet file (length is too low: 0)
	at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:539) ~[175020faf1b549f5886323df8573cb93:0.12.1]
	at org.apache.parquet.hadoop.ParquetFileReader.<init>(ParquetFileReader.java:776) ~[175020faf1b549f5886323df8573cb93:0.12.1]
	at org.apache.parquet.hadoop.ParquetFileReader.open(ParquetFileReader.java:657) ~[175020faf1b549f5886323df8573cb93:0.12.1]
	at org.apache.parquet.hadoop.ParquetReader.initReader(ParquetReader.java:152) ~[175020faf1b549f5886323df8573cb93:0.12.1]
	at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:135) ~[175020faf1b549f5886323df8573cb93:0.12.1]
	at org.apache.hudi.common.util.ParquetReaderIterator.hasNext(ParquetReaderIterator.java:48) ~[175020faf1b549f5886323df8573cb93:0.12.1]
	... 13 more

Metadata

Metadata

Assignees

Type

No type

Projects

Status

⏳ Awaiting Triage

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions