[SPARK-45756][CORE] Support `spark.master.useAppNameAsAppId.enabled` #43743

dongjoon-hyun · 2023-11-10T01:32:26Z

What changes were proposed in this pull request?

This PR aims to support spark.master.useAppNameAsAppId.enabled as an experimental feature in Spark Standalone cluster.

Why are the changes needed?

This allows the users to control the appID completely.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Manual tests with the following procedure.

$ SPARK_MASTER_OPTS="-Dspark.master.useAppNameAsAppId.enabled=true" sbin/start-master.sh
$ bin/spark-shell --master spark://max.local:7077

Was this patch authored or co-authored using generative AI tooling?

No.

dongjoon-hyun · 2023-11-10T01:36:48Z

Could you review this experimental feature, @viirya ?

viirya · 2023-11-10T02:11:16Z

core/src/main/scala/org/apache/spark/deploy/master/Master.scala

+    val appId = if (useAppNameAsAppId) {
+      desc.name.toLowerCase().replaceAll("\\s+", "")
+    } else {
+      newApplicationId(date)
+    }


Hmm, how Spark responses to duplicate appId? If two submitted applications have same app names.

Should we add nextAppNumber as postfix to the appId as usual?

This experimental design is for the advanced users who maintains unique appName.

viirya

Looks good especially this is still internal config.

dongjoon-hyun · 2023-11-10T02:28:15Z

Thank you, @viirya !

dongjoon-hyun · 2023-11-10T02:35:35Z

Oops. I used the umbrella JIRA id.

dongjoon-hyun · 2023-11-10T02:38:07Z

Let me switch JIRAs.

mridulm · 2023-11-10T12:40:42Z

What is the behavior if this is turned on and there is a conflict ? Will it result in application submission failure ? Overwriting running app state with new app (or vice versa) ?
In other words, what is the benefit of doing this ?

dongjoon-hyun · 2023-11-10T16:21:50Z

Thank you for review, @mridulm .

What is the behavior if this is turned on and there is a conflict ? Will it result in application submission failure ? Overwriting running app state with new app (or vice versa) ?

Yes, it will. Nothing will work correctly. It's a user responsibility to keep it unique as a ID.

In other words, what is the benefit of doing this ?

This feature gives the users lots of controllability. For simple examples,

Human-recognizable App ID (which was the goal of App Name).
Human-recognizable event log file names based on (1) in the remote storage like S3.
Human-recognizable driver log file names based on (1) in the remote storage like S3.

Based on (1), (2) and (3), they can build their service very easily by linking jobs and logs (event/driver) effortless with their patterns.

mridulm · 2023-11-10T19:45:43Z

Why not continue to use the existing scheme (iirc prefix + time: I am afk, so can't confirm), but make the prefix customizable instead ?
That keeps it human readable, while minimizing conflicts.

Pushing responsibility to user and documenting it as such would still result in hard to debug failure modes :-)

I dont use standalone mode - so not sure how common this can become, but something to think about.

dongjoon-hyun · 2023-11-10T23:41:54Z

@mridulm . It seems that you mean SPARK-45754 (Support spark.deploy.appIdPattern) feature which I already delivered that level of support.

Why not continue to use the existing scheme (iirc prefix + time: I am afk, so can't confirm), but make the prefix customizable instead ?

Let me give you the context, @mridulm . There is an umbrella JIRA and most customizable configurations are delivered already.

https://issues.apache.org/jira/browse/SPARK-45869 (Revisit and Improve Spark Standalone Cluster)

mridulm · 2023-11-13T16:19:31Z

This is not answering my query @dongjoon-hyun - as to why we are allowing expicitly setting an application id, and introducing hard to debug modes; instead of relying on minimizing the issue.
I was not specifically referring to SPARK-45754, but that is indeed a good option as well.

dongjoon-hyun · 2023-11-14T09:30:06Z

To @mridulm . The above was the exact answer for your following question. What I meant is that Apache Spark already provide more patterns including the prefix customazable.

Why not continue to use the existing scheme (iirc prefix + time: I am afk, so can't confirm), but make the prefix customizable instead ?

For the following question, that is the exact design goal of this PR where a upper service system uses Apache Spark Standalone cluster as an execution subsystem. The upper service layer already provides many features like the unique appName and security layers. So, there is no such things like hard to debug or minimizing the issue.

why we are allowing explicitly setting an application id, and introducing hard to debug modes; instead of relying on minimizing the issue.

In addition, I already answered with the examples. Let me reiterate that. You can generate a SHS link like https://shs/history/app-202311140014-0204/ with appId while you cannot with appName. appId is used as a foreign key across Apache Spark systems (Storage like S3 event log and Spark driver logs, SHS link, Spark Master link). It opens a wide range of opportunity to us.

[SPARK-45756][CORE] Support spark.master.useAppNameAsAppId.enabled #43743 (comment)

Lastly, this configuration is introduced as Experimental as you see because we are already aware of the risk. We don't recommend this configuration for novice users. This is more like automated subsystems in the industrial production usages.

spark/core/src/main/scala/org/apache/spark/internal/config/package.scala

Line 1852 in 2cac768

    
           .doc("(Experimental) If true, Spark master uses the user-provided appName for appId.")

mridulm · 2023-11-14T14:13:22Z

In a specific ecosystem, this can be designed away from being an issue (have narrow waist submission layer above spark which ensures uniqueness, among other things) - and so no risk to debugging in that ecosystem.
Features in Apache Spark are used by all users - where others might not have such guarantees or might not even be aware of these requirement, and my comment about debuggability was in that context: how can a Spark user leverage this experimental feature ? What should they be aware of ? Currently, if they use this feature and there is an issue, as you mentioned behavior would be undefined.
At a minimum, we should document the expectations/risks.

Thoughts ?

dongjoon-hyun · 2023-11-14T21:15:46Z

Sure, I love to provide more document on this because I want to advertise Spark Standalone Cluster in Apache Spark 4.0.0 more.

Currently, I'm revisiting the documentation on Spark Standalone Cluster. The first one was #43774 and I'll make a few more PRs including this too.

Does it meet your requirement, @mridulm ?

mridulm · 2023-11-14T22:04:42Z

Sounds good to me, thanks @dongjoon-hyun !

### What changes were proposed in this pull request? This PR aims to support `spark.master.useAppNameAsAppId.enabled` as an experimental feature in Spark Standalone cluster. ### Why are the changes needed? This allows the users to control the appID completely. <img width="359" alt="Screenshot 2023-11-09 at 5 33 45 PM" src="https://github.com/apache/spark/assets/9700541/ad2b89ce-9d7d-4144-bd52-f29b94051103"> ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manual tests with the following procedure. ``` $ SPARK_MASTER_OPTS="-Dspark.master.useAppNameAsAppId.enabled=true" sbin/start-master.sh $ bin/spark-shell --master spark://max.local:7077 ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#43743 from dongjoon-hyun/SPARK-45756. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>

[SPARK-45756][CORE] Support spark.master.useAppNameAsAppId.enabled

3e359a5

github-actions bot added the CORE label Nov 10, 2023

Add a test case

093646f

viirya reviewed Nov 10, 2023

View reviewed changes

viirya approved these changes Nov 10, 2023

View reviewed changes

dongjoon-hyun closed this in a566099 Nov 10, 2023

dongjoon-hyun changed the title ~~[SPARK-45756][CORE] Support spark.master.useAppNameAsAppId.enabled~~ [SPARK-45869][CORE] Support spark.master.useAppNameAsAppId.enabled Nov 10, 2023

dongjoon-hyun deleted the SPARK-45756 branch November 10, 2023 02:36

dongjoon-hyun changed the title ~~[SPARK-45869][CORE] Support spark.master.useAppNameAsAppId.enabled~~ [SPARK-45756][CORE] Support spark.master.useAppNameAsAppId.enabled Nov 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-45756][CORE] Support `spark.master.useAppNameAsAppId.enabled` #43743

[SPARK-45756][CORE] Support `spark.master.useAppNameAsAppId.enabled` #43743

dongjoon-hyun commented Nov 10, 2023 •

edited

Loading

dongjoon-hyun commented Nov 10, 2023

viirya Nov 10, 2023

viirya Nov 10, 2023

dongjoon-hyun Nov 10, 2023

viirya left a comment

dongjoon-hyun commented Nov 10, 2023

dongjoon-hyun commented Nov 10, 2023

dongjoon-hyun commented Nov 10, 2023

mridulm commented Nov 10, 2023

dongjoon-hyun commented Nov 10, 2023

mridulm commented Nov 10, 2023

dongjoon-hyun commented Nov 10, 2023

mridulm commented Nov 13, 2023

dongjoon-hyun commented Nov 14, 2023

mridulm commented Nov 14, 2023

dongjoon-hyun commented Nov 14, 2023

mridulm commented Nov 14, 2023

[SPARK-45756][CORE] Support spark.master.useAppNameAsAppId.enabled #43743

[SPARK-45756][CORE] Support spark.master.useAppNameAsAppId.enabled #43743

Conversation

dongjoon-hyun commented Nov 10, 2023 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

dongjoon-hyun commented Nov 10, 2023

viirya Nov 10, 2023

Choose a reason for hiding this comment

viirya Nov 10, 2023

Choose a reason for hiding this comment

dongjoon-hyun Nov 10, 2023

Choose a reason for hiding this comment

viirya left a comment

Choose a reason for hiding this comment

dongjoon-hyun commented Nov 10, 2023

dongjoon-hyun commented Nov 10, 2023

dongjoon-hyun commented Nov 10, 2023

mridulm commented Nov 10, 2023

dongjoon-hyun commented Nov 10, 2023

mridulm commented Nov 10, 2023

dongjoon-hyun commented Nov 10, 2023

mridulm commented Nov 13, 2023

dongjoon-hyun commented Nov 14, 2023

mridulm commented Nov 14, 2023

dongjoon-hyun commented Nov 14, 2023

mridulm commented Nov 14, 2023

[SPARK-45756][CORE] Support `spark.master.useAppNameAsAppId.enabled` #43743

[SPARK-45756][CORE] Support `spark.master.useAppNameAsAppId.enabled` #43743

dongjoon-hyun commented Nov 10, 2023 •

edited

Loading