Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-17387][PYSPARK] Allow passing of args to gateway to configure JVM #15019

Closed

Conversation

BryanCutler
Copy link
Member

What changes were proposed in this pull request?

When not using spark-submit, user configuration for the JVM is ignored when creating the SparkContext. This is because the JVM is started before the SparkConf object is initialized. This change allows user to specify --conf args on the command line that will be used when launching the java gateway to configure the JVM.

How was this patch tested?

Started a plain Python shell with this command and verified the JVM was configured with the correct amount of memory.

SPARK_HOME=$PWD PYTHONPATH=python:python/lib/py4j-0.10.3-src.zip python - --conf spark.driver.memory=4g

Note, the additional '-' is needed for the cmd line args to not be processed when invoking the interpreter.

@BryanCutler
Copy link
Member Author

@vanzin and @zjffdu , this is my take on a simplified solution that will allow configuring the JVM from the command line. You still won't be able to use these confs in the shell when creating the SparkContext, but I believe this is also how PySpark and spark-shell work also.

@vanzin
Copy link
Contributor

vanzin commented Sep 9, 2016

@BryanCutler sorry, this is not what I meant. Guess I was confused about your comment on the bug.

See my original code on the bug:

>>> from pyspark import SparkContext
>>> from pyspark import SparkConf
>>> conf = SparkConf().set("spark.driver.memory", "4g")
>>> sc = SparkContext(conf=conf)

That's what I want to work.

@BryanCutler
Copy link
Member Author

Hmmm, that won't work in the PySpark shell either though, even if you restart the SparkContext. If that's what you want though, @zjffdu 's PR addresses that.

@vanzin
Copy link
Contributor

vanzin commented Sep 9, 2016

that won't work in the PySpark shell either though, even if you restart the SparkContext

That's a separate issue which should also be fixed (stopping a python SparkContext does not stop the underlying JVM), and kinda depends on the fix for this one too.

@BryanCutler
Copy link
Member Author

Ok, I'll close this off then

@BryanCutler BryanCutler closed this Sep 9, 2016
@SparkQA
Copy link

SparkQA commented Sep 9, 2016

Test build #65125 has finished for PR 15019 at commit 100e442.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@BryanCutler BryanCutler deleted the python-gateway-conf-SPARK-17387 branch December 2, 2016 01:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants