Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-26489][CORE] Use ConfigEntry for hardcoded configs for python/r categories #23428

Closed
wants to merge 4 commits into from

Conversation

HeartSaVioR
Copy link
Contributor

What changes were proposed in this pull request?

The PR makes hardcoded configs below to use ConfigEntry.

  • spark.pyspark
  • spark.python
  • spark.r

This patch doesn't change configs which are not relevant to SparkConf (e.g. system properties, python source code)

How was this patch tested?

Existing tests.

@SparkQA
Copy link

SparkQA commented Jan 2, 2019

Test build #100640 has finished for PR 23428 at commit b0fba98.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -733,4 +733,45 @@ package object config {
.stringConf
.toSequence
.createWithDefault(Nil)

private[spark] val PYTHON_WORKER_REUSE = ConfigBuilder("spark.python.worker.reuse")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather have separate files for these (e.g. Python.scala, R.scala) to avoid polluting this object. See other examples in the config package.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the nice suggestion. I was thinking config package object is already all-in-one (except History, Kafka, Status) so adding these config into here would be OK and we may have chance to sort out all the things, but I agree we can do this earlier for clear cases. Will address.

@SparkQA
Copy link

SparkQA commented Jan 3, 2019

Test build #100670 has finished for PR 23428 at commit f82d269.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 3, 2019

Test build #100672 has finished for PR 23428 at commit 85ea372.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 3, 2019

Test build #100675 has finished for PR 23428 at commit d0a345d.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

retest this please

@@ -47,10 +48,8 @@ private[spark] class RBackend {

def init(): (Int, RAuthHelper) = {
val conf = new SparkConf()
val backendConnectionTimeout = conf.getInt(
"spark.r.backendConnectionTimeout", SparkRDefaults.DEFAULT_CONNECTION_TIMEOUT)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the whole file SparkRDefaults.scala looks not being referred anymore. Can we delete this file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice finding! Will remove.

@@ -733,4 +729,5 @@ package object config {
.stringConf
.toSequence
.createWithDefault(Nil)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a big deal at all but I'd remove this line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK will address.

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, shouldn't we move other Python/R configurations into Python.scala and R.scala? For instance, I'm seeing spark.pyspark.driver.python and spark.executor.pyspark.memory

@SparkQA
Copy link

SparkQA commented Jan 3, 2019

Test build #100677 has finished for PR 23428 at commit d0a345d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HeartSaVioR
Copy link
Contributor Author

HeartSaVioR commented Jan 3, 2019

shouldn't we move other Python/R configurations into Python.scala and R.scala?

I'm a bit careful to avoid conflict with other PR (#23415 addresses spark.executor). If #23415 will be merged sooner I can rebase and also move other things as well.

@SparkQA
Copy link

SparkQA commented Jan 3, 2019

Test build #100691 has finished for PR 23428 at commit c919a1c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor

vanzin commented Jan 3, 2019

Looks good. Merging to master.

@asfgit asfgit closed this in 05372d1 Jan 3, 2019
@HeartSaVioR
Copy link
Contributor Author

Thanks @vanzin and @HyukjinKwon for reviewing and merging!

@HeartSaVioR HeartSaVioR deleted the SPARK-26489 branch January 3, 2019 22:43
@HyukjinKwon
Copy link
Member

LGTM too

jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
…r categories

## What changes were proposed in this pull request?

The PR makes hardcoded configs below to use ConfigEntry.

* spark.pyspark
* spark.python
* spark.r

This patch doesn't change configs which are not relevant to SparkConf (e.g. system properties, python source code)

## How was this patch tested?

Existing tests.

Closes apache#23428 from HeartSaVioR/SPARK-26489.

Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan@gmail.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants