Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-9270] [PySpark] allow --name option in pyspark #7610

Closed
wants to merge 2 commits into from

Conversation

piaozhexiu
Copy link

This is continuation of #7512 which added --name option to spark-shell. This PR adds the same option to pyspark.

Note that --conf spark.app.name in command-line has no effect in spark-shell and pyspark. Instead, --name must be used. This is in fact inconsistency with spark-sql which doesn't accept --name option while it accepts --conf spark.app.name. I am not fixing this inconsistency in this PR. IMO, one of --name and --conf spark.app.name is needed not both. But since I cannot decide which to choose, I am not making any change here.

@srowen
Copy link
Member

srowen commented Jul 23, 2015

That seems fine if it still sets the app name as desired and improves consistency.

@SparkQA
Copy link

SparkQA commented Jul 23, 2015

Test build #38189 has finished for PR 7610 at commit 190638b.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@sarutak
Copy link
Member

sarutak commented Jul 23, 2015

retest this please.

1 similar comment
@sarutak
Copy link
Member

sarutak commented Jul 23, 2015

retest this please.

@vanzin
Copy link
Contributor

vanzin commented Jul 23, 2015

Note that --conf spark.app.name in command-line has no effect in spark-shell and pyspark. Instead, --name must be used.

This is not critical, but not great either; it points at some inconsistency in the code somewhere. From looking at SparkSubmit.scala, it should work, but I haven't explicitly tested it.

spark-sql which doesn't accept --name

Is that true? SparkSQLEnv has code to explicitly handle app names set by the user. It that doesn't work, it sounds like a bug.

@SparkQA
Copy link

SparkQA commented Jul 23, 2015

Test build #38224 has finished for PR 7610 at commit 190638b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@piaozhexiu
Copy link
Author

@vanzin thank you for your comments!

  1. As for spark-sql, sorry I was wrong. spark-sql works with both --name and --conf spark.app.name. So we're good here.

  2. As to why --conf spark.app.name doesn't work in pyspark, if you look at context.py, line #135 overwrites spark.app.name in conf if appName is passed via a constructor parameter.

So if you run the following cmd-

pyspark --conf spark.app.name=MyName

It invokes the following cmd b/c --name PySparkShell is added by default-

spark-submit --name PySparkShell --conf spark.app.name=MyName

Since PySparkShell ends up being passed as a parameter to the SparkContext constructor, it overwrites spark.app.name. So when we get to line #153 later, spark.app.name is no longer MyName but PySparkShell.

In summary, --name always wins and is always added to the cmd. So --conf spark.app.name never takes effect.

@srowen
Copy link
Member

srowen commented Jul 23, 2015

So the Python part is OK but now does spark-sql need a similar treatment?

@piaozhexiu
Copy link
Author

@srowen spark-sql actually already works with both --name and --conf spark.app.name. I was mistaken earlier.

In short, --name works with every one (spark-shell, spark-sql, pyspark) while --conf spark.app.name only works with spark-sql.

@SparkQA
Copy link

SparkQA commented Jul 23, 2015

Test build #38232 has finished for PR 7610 at commit ccef949.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@piaozhexiu
Copy link
Author

Hmm, jenkins unstable?

@vanzin
Copy link
Contributor

vanzin commented Jul 23, 2015

As to why --conf spark.app.name doesn't work in pyspark, if you look at context.py, line #135

Hmm, it may not work, but I don't think that's the cause. With your changes, that line should never be reached when starting the shell. What I think is happening is:

  • SparkSubmit sets the app name in SparkSubmit.scala, L470 (processing the option defined in L405).
  • In SparkSubmit.scala L555, the conf is read but at that point spark.app.name is already set, so it's not overwritten.

So it seems like an ordering issue in SparkSubmit.scala. In any case, it doesn't seem important enough to change just for this particular edge case. The change LGTM.

@SparkQA
Copy link

SparkQA commented Jul 23, 2015

Test build #38256 has finished for PR 7610 at commit 763e86d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -82,4 +82,4 @@ fi

export PYSPARK_DRIVER_PYTHON
export PYSPARK_DRIVER_PYTHON_OPTS
exec "$SPARK_HOME"/bin/spark-submit pyspark-shell-main "$@"
exec "$SPARK_HOME"/bin/spark-submit pyspark-shell-main --name "PySparkShell" "$@"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we support bin/pyspark --name MyName ? spark-shell support that.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--name MyName works. So the final command will be like this:

spark-submit pyspark-shell-main --name "PySparkShell" --name "MyName" <other args>

Then, MyName takes effect since it comes later than PySparkShell.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be clear, this is how --name is supported in spark-shell too (#7512). I am following the pattern introduced by that patch.

@vanzin
Copy link
Contributor

vanzin commented Jul 24, 2015

Merged to master. Thanks!

@asfgit asfgit closed this in 9a11396 Jul 24, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants