[SUPPORT] spark executors died due to underestimated record size #5939
Labels
priority:major
degraded perf; unable to move forward; potential bugs
spark
Issues related to spark
writer-core
Issues relating to core transactions/write actions
Projects
Describe the problem you faced
Hi Hudi team! I have some spark executors intermittently die. When I look into the tasks assigned to dead executors, the tasks were trying to write parquet files that were over 320MB according to the logs of other executors that completed the tasks afterwards. However our PARQUET_MAX_FILE_SIZE is set to 100MB. I also noticed “AvgRecordSize => 26” in the driver log when executors die, while AvgRecordSize is usually above 100 for runs that don’t have executors die. I’m guessing the underestimated record size made Hudi decide to load more record in memory than it can handle and die due to out of memory.
So I took two steps here.
My setup:
Expected behavior
Hudi should prevent killing spark executors.
Environment Description
Hudi version : 0.11.0
Spark version : 3.1.2
Storage (HDFS/S3/GCS..) : S3
Running on Docker? (yes/no) : No
The text was updated successfully, but these errors were encountered: