Skip to content

[SUPPORT] Upsert job failing while upgrading from 0.7 to 0.10.1 #7574

@amitbans

Description

@amitbans

Tips before filing an issue

  • Have you gone through our FAQs?

  • Join the mailing list to engage in conversations and get faster support at dev-subscribe@hudi.apache.org.

  • If you have triaged this as a bug, then file an issue directly.

Describe the problem you faced

We are upgrading Hudi from 0.7 to 0.10.1 (part of EMR 5.33.1 to EMR 5.36.0) and facing stage failures at stage "Doing partition and writing data isEmpty at HoodieSparkSqlWriter.scala:627". We have tried increasing executor memory from 30g to 50g but error persists. We have set spark parallelism, shuffle partitions and hoodie.upsert.shuffle.parallelism to 200 but this particular stage seems to be calculating less tasks leading to OOM.

A clear and concise description of the problem.

To Reproduce

Steps to reproduce the behavior:

Expected behavior

A clear and concise description of what you expected to happen.

Environment Description

  • Hudi version : 0.10.1

  • Spark version : Spark 2.4.8

  • Hive version : Hive 2.3.9

  • Hadoop version :

  • Storage (HDFS/S3/GCS..) : S3

  • Running on Docker? (yes/no) : no

Additional context

Add any other context about the problem here.

Stacktrace

Add the stacktrace of the error.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:performancePerformance optimizationsarea:writerWrite client and core write operationspriority:highSignificant impact; potential bugs

    Type

    No type

    Projects

    Status

    👤 User Action

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions