[SPARK-49760][YARN] Correct handling of SPARK_USER env variable override in app master#48214
Closed
cnauroth wants to merge 1 commit intoapache:masterfrom
Closed
[SPARK-49760][YARN] Correct handling of SPARK_USER env variable override in app master#48214cnauroth wants to merge 1 commit intoapache:masterfrom
SPARK_USER env variable override in app master#48214cnauroth wants to merge 1 commit intoapache:masterfrom
Conversation
…able override in app master.
### What changes were proposed in this pull request?
This patch corrects handling of a user-supplied `SPARK_USER` environment variable in the YARN app master. Currently, the user-supplied value gets appended to the default, like a classpath entry. The patch fixes it by using only the user-supplied value.
### Why are the changes needed?
Overriding the `SPARK_USER` environment variable in the YARN app master with configuration property `spark.yarn.appMasterEnv.SPARK_USER` currently results in an incorrect value. `Client#setupLaunchEnv` first sets a default in the environment map using the Hadoop user. After that, `YarnSparkHadoopUtil.addPathToEnvironment` sees the existing value in the map and interprets the user-supplied value as needing to be appended like a classpath entry. The end result is the Hadoop user appended with the classpath delimiter and user-supplied value, e.g. `cnauroth:overrideuser`.
### Does this PR introduce _any_ user-facing change?
Yes, the app master now uses the user-supplied `SPARK_USER` if specified. (The default is still the Hadoop user.)
### How was this patch tested?
* Existing unit tests pass.
* Added new unit tests covering default and overridden `SPARK_USER` for the app master. The override test fails without this patch, and then passes after the patch is applied.
* Manually tested in a live YARN cluster as shown below.
Manual testing used the `DFSReadWriteTest` job with overrides of `SPARK_USER`:
```
spark-submit \
--deploy-mode cluster \
--files all-lines.txt \
--class org.apache.spark.examples.DFSReadWriteTest \
--conf spark.yarn.appMasterEnv.SPARK_USER=sparkuser_appMaster \
--conf spark.driverEnv.SPARK_USER=sparkuser_driver \
--conf spark.executorEnv.SPARK_USER=sparkuser_executor \
/usr/lib/spark/examples/jars/spark-examples.jar \
all-lines.txt /tmp/DFSReadWriteTest
```
Before the patch, we can see the app master's `SPARK_USER` mishandled by looking at the `_SUCCESS` file in HDFS:
```
hdfs dfs -ls -R /tmp/DFSReadWriteTest
drwxr-xr-x - cnauroth:sparkuser_appMaster hadoop 0 2024-09-20 23:35 /tmp/DFSReadWriteTest/dfs_read_write_test
-rw-r--r-- 1 cnauroth:sparkuser_appMaster hadoop 0 2024-09-20 23:35 /tmp/DFSReadWriteTest/dfs_read_write_test/_SUCCESS
-rw-r--r-- 1 sparkuser_executor hadoop 2295080 2024-09-20 23:35 /tmp/DFSReadWriteTest/dfs_read_write_test/part-00000
-rw-r--r-- 1 sparkuser_executor hadoop 2288718 2024-09-20 23:35 /tmp/DFSReadWriteTest/dfs_read_write_test/part-00001
```
After the patch, we can see it working correctly:
```
hdfs dfs -ls -R /tmp/DFSReadWriteTest
drwxr-xr-x - sparkuser_appMaster hadoop 0 2024-09-23 17:13 /tmp/DFSReadWriteTest/dfs_read_write_test
-rw-r--r-- 1 sparkuser_appMaster hadoop 0 2024-09-23 17:13 /tmp/DFSReadWriteTest/dfs_read_write_test/_SUCCESS
-rw-r--r-- 1 sparkuser_executor hadoop 2295080 2024-09-23 17:13 /tmp/DFSReadWriteTest/dfs_read_write_test/part-00000
-rw-r--r-- 1 sparkuser_executor hadoop 2288718 2024-09-23 17:13 /tmp/DFSReadWriteTest/dfs_read_write_test/part-00001
```
### Was this patch authored or co-authored using generative AI tooling?
No.
Contributor
Author
|
I'm interested in getting this into master and branch-3.5. It won't cherry-pick cleanly though, so I'll put together a separate branch-3.5 pull request if this gets approved. |
SPARK_USER environment vari…SPARK_USER env variable override in app master
Member
|
BTW, according to the original code, this seems to be very old bug, right? Then, we need to backport this to |
Contributor
Author
Sounds good! I filed #48216 for branch-3.5. The same patch cherry-picks to branch-3.4, and I'm confirming tests on that branch now. |
yaooqinn
approved these changes
Sep 24, 2024
Member
Contributor
Author
|
@dongjoon-hyun and @yaooqinn , thanks very much for the review and commit! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
This patch corrects handling of a user-supplied
SPARK_USERenvironment variable in the YARN app master. Currently, the user-supplied value gets appended to the default, like a classpath entry. The patch fixes it by using only the user-supplied value.Why are the changes needed?
Overriding the
SPARK_USERenvironment variable in the YARN app master with configuration propertyspark.yarn.appMasterEnv.SPARK_USERcurrently results in an incorrect value.Client#setupLaunchEnvfirst sets a default in the environment map using the Hadoop user. After that,YarnSparkHadoopUtil.addPathToEnvironmentsees the existing value in the map and interprets the user-supplied value as needing to be appended like a classpath entry. The end result is the Hadoop user appended with the classpath delimiter and user-supplied value, e.g.cnauroth:overrideuser.Does this PR introduce any user-facing change?
Yes, the app master now uses the user-supplied
SPARK_USERif specified. (The default is still the Hadoop user.)How was this patch tested?
SPARK_USERfor the app master. The override test fails without this patch, and then passes after the patch is applied.Manual testing used the
DFSReadWriteTestjob with overrides ofSPARK_USER:Before the patch, we can see the app master's
SPARK_USERmishandled by looking at the_SUCCESSfile in HDFS:After the patch, we can see it working correctly:
Was this patch authored or co-authored using generative AI tooling?
No.