Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-31909][SQL] Add SPARK_SUBMIT_OPTS to Beeline Script #28731

Conversation

therealJacobWu
Copy link

What changes were proposed in this pull request?

Added the SPARK_SUBMIT_OPTS environment available to beeline.

Why are the changes needed?

The beeline is not able to pick up the krb5.conf variable specified in the SPARK_SUBMIT_OPTS, located in spark_env.sh.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

./dev/run-tests

@HyukjinKwon HyukjinKwon changed the title [SPARK-31909][CORE] Add SPARK_SUBMIT_OPTS to Beeline Script [SPARK-31909][SQL] Add SPARK_SUBMIT_OPTS to Beeline Script Jun 5, 2020
CLASS="org.apache.hive.beeline.BeeLine"
exec "${SPARK_HOME}/bin/spark-class" $CLASS "$@"
exec "${SPARK_HOME}/bin/spark-class" $SPARK_SUBMIT_OPTS $CLASS "$@"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It executes Hive's beeline. Does it respect Spark submit options?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Hyukjin, some discussions about the krb5.conf location before is here https://issues.apache.org/jira/browse/SPARK-12050, but in this case, this does not work. From the discussion, people saying we should use SPARK_SUBMIT_OPTS to pass the non-standard krb5.conf location by -Djava.security.krb5.conf=/etc/krb5.conf-custom.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was the JIRA for Hive beeline? I'm saying its Hive's, not Spark's

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my understanding, It should not be hive beeline's issue. Its an environment variable. This might not be the best solution, but it does solve the issue here. I have test hive beeline also, it works because it is able to read HADOOP_OPS. But somehow for spark, in spark-class script, whatever specified in the SPARK_SUBMIT_OPS is not passed to execution. So here I have to pass it in for spark beeline.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's avoid a bandaid fix here. Can we make HADOOP_OPS work with the beeline in Spark?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a look of how hive's beeline works, it looks like the suggested fix for spark's beeline aligns well with that.
Also, spark's beeline actually calls "spark-class" (and some other code call it as well), after tracing them one by one, the suggested fix seems to be the safest way as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kdzhao, can you show some code pointers? I still think it's odd that SPARK_SUBMIT_OPTS should take an effect in Hive's beeline.

Copy link

@kdzhao kdzhao Jun 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I misspoke for it. What I want to say is, in hive, looks like its beeline command is just call to hive with different parameters:
https://github.com/apache/hive/blob/branch-1.2/bin/beeline
https://github.com/apache/hive/blob/branch-1.2/bin/hive
Agree that hive's beeline doesn't read spark parameter, and I would assume it reads its own (I saw "HADOOP_CLIENT_OPTS" etc in above script).
Now back to spark, agree with you that fixing it in spark-class might cover more cases. On another side, so far only the beeline has this issue, so an easy fix on the beeline script also makes sense as a stopgap.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Oct 27, 2020
@github-actions github-actions bot closed this Oct 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants