Skip to content

[SUPPORT] HoodieDeltaStreamer - Spark master shouldn't have a default option (AWS Glue) #5456

@Neuw84

Description

@Neuw84

Describe the problem you faced

When trying to run HoodieDeltaStreamer on AWS Glue I found that the Spark master has no option to inherit from the environment as it defaults to local[2]. In these kind of Serverless environments where you do not have access to the master this configuration should be inherited

This can be seen on line 329 on HoodieDeltaStreamer.

public String sparkMaster = "local[2]";

This should be changed for supporting this kind of scenarios, a JavaSparkContext option where no Spark master is defined should be there.

Expected behavior

The Spark master shouldn't have a default as there are some environments (usually serverless such as AWS Glue) where it will be inherited.

Environment Description

  • Hudi version : 0.9.0

  • Spark version : Spark 3.1.1

  • Storage (HDFS/S3/GCS..) : S3

  • Running on Docker? (yes/no) : no

Additional context

If required I think I could work on this as I have quite good Java experience.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions