Skip to content

[SPARK-49760][YARN] Correct handling of SPARK_USER env variable override in app master#48214

Closed
cnauroth wants to merge 1 commit intoapache:masterfrom
cnauroth:SPARK-49760
Closed

[SPARK-49760][YARN] Correct handling of SPARK_USER env variable override in app master#48214
cnauroth wants to merge 1 commit intoapache:masterfrom
cnauroth:SPARK-49760

Conversation

@cnauroth
Copy link
Contributor

@cnauroth cnauroth commented Sep 23, 2024

What changes were proposed in this pull request?

This patch corrects handling of a user-supplied SPARK_USER environment variable in the YARN app master. Currently, the user-supplied value gets appended to the default, like a classpath entry. The patch fixes it by using only the user-supplied value.

Why are the changes needed?

Overriding the SPARK_USER environment variable in the YARN app master with configuration property spark.yarn.appMasterEnv.SPARK_USER currently results in an incorrect value. Client#setupLaunchEnv first sets a default in the environment map using the Hadoop user. After that, YarnSparkHadoopUtil.addPathToEnvironment sees the existing value in the map and interprets the user-supplied value as needing to be appended like a classpath entry. The end result is the Hadoop user appended with the classpath delimiter and user-supplied value, e.g. cnauroth:overrideuser.

Does this PR introduce any user-facing change?

Yes, the app master now uses the user-supplied SPARK_USER if specified. (The default is still the Hadoop user.)

How was this patch tested?

  • Existing unit tests pass.
  • Added new unit tests covering default and overridden SPARK_USER for the app master. The override test fails without this patch, and then passes after the patch is applied.
  • Manually tested in a live YARN cluster as shown below.

Manual testing used the DFSReadWriteTest job with overrides of SPARK_USER:

spark-submit \
    --deploy-mode cluster \
    --files all-lines.txt \
    --class org.apache.spark.examples.DFSReadWriteTest \
    --conf spark.yarn.appMasterEnv.SPARK_USER=sparkuser_appMaster \
    --conf spark.driverEnv.SPARK_USER=sparkuser_driver \
    --conf spark.executorEnv.SPARK_USER=sparkuser_executor \
    /usr/lib/spark/examples/jars/spark-examples.jar \
    all-lines.txt /tmp/DFSReadWriteTest

Before the patch, we can see the app master's SPARK_USER mishandled by looking at the _SUCCESS file in HDFS:

hdfs dfs -ls -R /tmp/DFSReadWriteTest

drwxr-xr-x   - cnauroth:sparkuser_appMaster hadoop          0 2024-09-20 23:35 /tmp/DFSReadWriteTest/dfs_read_write_test
-rw-r--r--   1 cnauroth:sparkuser_appMaster hadoop          0 2024-09-20 23:35 /tmp/DFSReadWriteTest/dfs_read_write_test/_SUCCESS
-rw-r--r--   1 sparkuser_executor                      hadoop    2295080 2024-09-20 23:35 /tmp/DFSReadWriteTest/dfs_read_write_test/part-00000
-rw-r--r--   1 sparkuser_executor                      hadoop    2288718 2024-09-20 23:35 /tmp/DFSReadWriteTest/dfs_read_write_test/part-00001

After the patch, we can see it working correctly:

hdfs dfs -ls -R /tmp/DFSReadWriteTest
drwxr-xr-x   - sparkuser_appMaster hadoop          0 2024-09-23 17:13 /tmp/DFSReadWriteTest/dfs_read_write_test
-rw-r--r--   1 sparkuser_appMaster hadoop          0 2024-09-23 17:13 /tmp/DFSReadWriteTest/dfs_read_write_test/_SUCCESS
-rw-r--r--   1 sparkuser_executor  hadoop    2295080 2024-09-23 17:13 /tmp/DFSReadWriteTest/dfs_read_write_test/part-00000
-rw-r--r--   1 sparkuser_executor  hadoop    2288718 2024-09-23 17:13 /tmp/DFSReadWriteTest/dfs_read_write_test/part-00001

Was this patch authored or co-authored using generative AI tooling?

No.

…able override in app master.

### What changes were proposed in this pull request?

This patch corrects handling of a user-supplied `SPARK_USER` environment variable in the YARN app master. Currently, the user-supplied value gets appended to the default, like a classpath entry. The patch fixes it by using only the user-supplied value.

### Why are the changes needed?

Overriding the `SPARK_USER` environment variable in the YARN app master with configuration property `spark.yarn.appMasterEnv.SPARK_USER` currently results in an incorrect value. `Client#setupLaunchEnv` first sets a default in the environment map using the Hadoop user. After that, `YarnSparkHadoopUtil.addPathToEnvironment` sees the existing value in the map and interprets the user-supplied value as needing to be appended like a classpath entry. The end result is the Hadoop user appended with the classpath delimiter and user-supplied value, e.g. `cnauroth:overrideuser`.

### Does this PR introduce _any_ user-facing change?

Yes, the app master now uses the user-supplied `SPARK_USER` if specified. (The default is still the Hadoop user.)

### How was this patch tested?

* Existing unit tests pass.
* Added new unit tests covering default and overridden `SPARK_USER` for the app master. The override test fails without this patch, and then passes after the patch is applied.
* Manually tested in a live YARN cluster as shown below.

Manual testing used the `DFSReadWriteTest` job with overrides of `SPARK_USER`:

```
spark-submit \
    --deploy-mode cluster \
    --files all-lines.txt \
    --class org.apache.spark.examples.DFSReadWriteTest \
    --conf spark.yarn.appMasterEnv.SPARK_USER=sparkuser_appMaster \
    --conf spark.driverEnv.SPARK_USER=sparkuser_driver \
    --conf spark.executorEnv.SPARK_USER=sparkuser_executor \
    /usr/lib/spark/examples/jars/spark-examples.jar \
    all-lines.txt /tmp/DFSReadWriteTest
```

Before the patch, we can see the app master's `SPARK_USER` mishandled by looking at the `_SUCCESS` file in HDFS:

```
hdfs dfs -ls -R /tmp/DFSReadWriteTest

drwxr-xr-x   - cnauroth:sparkuser_appMaster hadoop          0 2024-09-20 23:35 /tmp/DFSReadWriteTest/dfs_read_write_test
-rw-r--r--   1 cnauroth:sparkuser_appMaster hadoop          0 2024-09-20 23:35 /tmp/DFSReadWriteTest/dfs_read_write_test/_SUCCESS
-rw-r--r--   1 sparkuser_executor                      hadoop    2295080 2024-09-20 23:35 /tmp/DFSReadWriteTest/dfs_read_write_test/part-00000
-rw-r--r--   1 sparkuser_executor                      hadoop    2288718 2024-09-20 23:35 /tmp/DFSReadWriteTest/dfs_read_write_test/part-00001
```

After the patch, we can see it working correctly:

```
hdfs dfs -ls -R /tmp/DFSReadWriteTest
drwxr-xr-x   - sparkuser_appMaster hadoop          0 2024-09-23 17:13 /tmp/DFSReadWriteTest/dfs_read_write_test
-rw-r--r--   1 sparkuser_appMaster hadoop          0 2024-09-23 17:13 /tmp/DFSReadWriteTest/dfs_read_write_test/_SUCCESS
-rw-r--r--   1 sparkuser_executor  hadoop    2295080 2024-09-23 17:13 /tmp/DFSReadWriteTest/dfs_read_write_test/part-00000
-rw-r--r--   1 sparkuser_executor  hadoop    2288718 2024-09-23 17:13 /tmp/DFSReadWriteTest/dfs_read_write_test/part-00001
```

### Was this patch authored or co-authored using generative AI tooling?

No.
@github-actions github-actions bot added the YARN label Sep 23, 2024
@cnauroth
Copy link
Contributor Author

I'm interested in getting this into master and branch-3.5. It won't cherry-pick cleanly though, so I'll put together a separate branch-3.5 pull request if this gets approved.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-49760][YARN] Correct handling of SPARK_USER environment vari… [SPARK-49760][YARN] Correct handling of SPARK_USER env variable override in app master Sep 23, 2024
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @cnauroth .

cc @mridulm , too.

@dongjoon-hyun
Copy link
Member

BTW, according to the original code, this seems to be very old bug, right? Then, we need to backport this to branch-3.4 too.

@cnauroth
Copy link
Contributor Author

BTW, according to the original code, this seems to be very old bug, right? Then, we need to backport this to branch-3.4 too.

Sounds good! I filed #48216 for branch-3.5. The same patch cherry-picks to branch-3.4, and I'm confirming tests on that branch now.

@dongjoon-hyun
Copy link
Member

Thank you, @cnauroth and @yaooqinn .
Merged to master.

@cnauroth
Copy link
Contributor Author

@dongjoon-hyun and @yaooqinn , thanks very much for the review and commit!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants