Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-44976][CORE] Preserve full principal user name on executor side #44244

Closed
wants to merge 1 commit into from

Conversation

eubnara
Copy link
Contributor

@eubnara eubnara commented Dec 8, 2023

What changes were proposed in this pull request?

Use full principal name as spark user name to respect hadoop.security.auth_to_local when accessing non-kerberized hdfs from kerberized hadoop cluster.

Why are the changes needed?

Since https://issues.apache.org/jira/browse/SPARK-6558, spark uses short user name, it causes not to respect hadoop.security.auto_to_local on the NameNode in non-kerberized hadoop cluster.
Also, if an user provides --principal and --keytab options when submitting spark job on kerberized cluster and creating output on non-kerberized HDFS, file/directory ownerships are not coherent.



$ hdfs dfs -ls hdfs:///user/eub/some/path/20230510/23
Found 52 items
-rw-rw-rw-   3 _ex_eub hdfs          0 2023-05-11 00:16 hdfs:///user/eub/some/path/20230510/23/_SUCCESS
-rw-r--r--   3 eub      hdfs  134418857 2023-05-11 00:15 hdfs:///user/eub/some/path/20230510/23/part-00000-b781be38-9dbc-41da-8d0e-597a7f343649-c000.txt.gz
-rw-r--r--   3 eub      hdfs  153410049 2023-05-11 00:16 hdfs:///user/eub/some/path/20230510/23/part-00001-b781be38-9dbc-41da-8d0e-597a7f343649-c000.txt.gz
-rw-r--r--   3 eub      hdfs  157260989 2023-05-11 00:16 hdfs:///user/eub/some/path/20230510/23/part-00002-b781be38-9dbc-41da-8d0e-597a7f343649-c000.txt.gz
-rw-r--r--   3 eub      hdfs  156222760 2023-05-11 00:16 hdfs:///user/eub/some/path/20230510/23/part-00003-b781be38-9dbc-41da-8d0e-597a7f343649-c000.txt.gz

Additional description is on https://issues.apache.org/jira/browse/SPARK-44976.

Does this PR introduce any user-facing change?

The ownerships of output file/directory will be coherent even in non-kerberized hdfs cluster from spark job in kerberized cluster.

How was this patch tested?

Manually tested.

Was this patch authored or co-authored using generative AI tooling?

No.

@eubnara
Copy link
Contributor Author

eubnara commented Dec 8, 2023

I cannot reopen #42690. So I recreated PR.

@HyukjinKwon HyukjinKwon changed the title [SPARK-44976] Preserve full principal user name on executor side [SPARK-44976][CORE[ Preserve full principal user name on executor side Dec 8, 2023
@HyukjinKwon HyukjinKwon changed the title [SPARK-44976][CORE[ Preserve full principal user name on executor side [SPARK-44976][CORE] Preserve full principal user name on executor side Dec 8, 2023
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Mar 18, 2024
@eubnara
Copy link
Contributor Author

eubnara commented Mar 18, 2024

Could any one review this?

@github-actions github-actions bot closed this Mar 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
1 participant