-
Notifications
You must be signed in to change notification settings - Fork 28k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-31909][SQL] Add SPARK_SUBMIT_OPTS to Beeline Script #28731
[SPARK-31909][SQL] Add SPARK_SUBMIT_OPTS to Beeline Script #28731
Conversation
CLASS="org.apache.hive.beeline.BeeLine" | ||
exec "${SPARK_HOME}/bin/spark-class" $CLASS "$@" | ||
exec "${SPARK_HOME}/bin/spark-class" $SPARK_SUBMIT_OPTS $CLASS "$@" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It executes Hive's beeline. Does it respect Spark submit options?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Hyukjin, some discussions about the krb5.conf location before is here https://issues.apache.org/jira/browse/SPARK-12050, but in this case, this does not work. From the discussion, people saying we should use SPARK_SUBMIT_OPTS to pass the non-standard krb5.conf location by -Djava.security.krb5.conf=/etc/krb5.conf-custom
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was the JIRA for Hive beeline? I'm saying its Hive's, not Spark's
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my understanding, It should not be hive beeline's issue. Its an environment variable. This might not be the best solution, but it does solve the issue here. I have test hive beeline also, it works because it is able to read HADOOP_OPS. But somehow for spark, in spark-class script, whatever specified in the SPARK_SUBMIT_OPS is not passed to execution. So here I have to pass it in for spark beeline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's avoid a bandaid fix here. Can we make HADOOP_OPS
work with the beeline in Spark?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a look of how hive's beeline works, it looks like the suggested fix for spark's beeline aligns well with that.
Also, spark's beeline actually calls "spark-class" (and some other code call it as well), after tracing them one by one, the suggested fix seems to be the safest way as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kdzhao, can you show some code pointers? I still think it's odd that SPARK_SUBMIT_OPTS
should take an effect in Hive's beeline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I misspoke for it. What I want to say is, in hive, looks like its beeline command is just call to hive with different parameters:
https://github.com/apache/hive/blob/branch-1.2/bin/beeline
https://github.com/apache/hive/blob/branch-1.2/bin/hive
Agree that hive's beeline doesn't read spark parameter, and I would assume it reads its own (I saw "HADOOP_CLIENT_OPTS" etc in above script).
Now back to spark, agree with you that fixing it in spark-class might cover more cases. On another side, so far only the beeline has this issue, so an easy fix on the beeline script also makes sense as a stopgap.
Can one of the admins verify this patch? |
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
What changes were proposed in this pull request?
Added the SPARK_SUBMIT_OPTS environment available to beeline.
Why are the changes needed?
The beeline is not able to pick up the krb5.conf variable specified in the SPARK_SUBMIT_OPTS, located in spark_env.sh.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
./dev/run-tests