[SPARK-28776][ML] SparkML Writer gets hadoop conf from session state #25505

helenyugithub · 2019-08-19T21:52:05Z

What changes were proposed in this pull request?

SparkML writer gets hadoop conf from session state, instead of the spark context.

Why are the changes needed?

Allow for multiple sessions in the same context that have different hadoop configurations.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Tested in pyspark.ml.tests.test_persistence.PersistenceTest test_default_read_write

helenyugithub · 2019-08-20T00:22:35Z

@jkbradley you seem to have reviewed previous prs to this file like #18742. Would you be able to review this pr, or suggest another person to do so?

srowen · 2019-08-20T12:15:53Z

Just for my reference, does this make it consistent with other similar code? just want to understand the argument why this needs to change - what problem does the current code cause?

helenyugithub · 2019-08-20T13:11:15Z

Hi @srowen
I believe the general convention is to use the session state's hadoop conf (see

spark/examples/src/main/scala/org/apache/spark/examples/DFSReadWriteTest.scala

Line 115 in 39577a2

val fs = FileSystem.get(spark.sessionState.newHadoopConf())

) . This also prevents correctness issues when multiple spark sessions are sharing the same context and they want to have their own hadoop conf (for example, that contains an authentication key that shouldn't be shared across users)

srowen

Sounds good to me. There are probably some more cases where a similar fix should be made then, but OK to address this isolated one.

SparkQA · 2019-08-20T15:34:34Z

Test build #4837 has finished for PR 25505 at commit 9c8f93c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

helenyugithub · 2019-08-21T15:00:30Z

Thanks @srowen ! Now that the tests have passed, is this PR good to merge?

srowen · 2019-08-21T17:15:29Z

(I usually leave it open at least a day to catch any more comments)

## Upstream SPARK-28776 ticket and PR link (if not applicable, explain) apache#25505 ## What changes were proposed in this pull request? SparkML writer gets hadoop conf from session state, instead of the spark context. ## How was this patch tested? Tested in pyspark.ml.tests.test_persistence.PersistenceTest test_default_read_write Please review http://spark.apache.org/contributing.html before opening a pull request.

srowen · 2019-08-22T14:27:55Z

Merged to master

SparkML Writer gets hadoop conf from session state

9c8f93c

dongjoon-hyun changed the title ~~[SPARK-28776] SparkML Writer gets hadoop conf from session state~~ [SPARK-28776][ML] SparkML Writer gets hadoop conf from session state Aug 19, 2019

dongjoon-hyun added the ML label Aug 19, 2019

helenyugithub mentioned this pull request Aug 20, 2019

[SPARK-28776] SparkML Writer gets hadoop conf from session state palantir/spark#599

Merged

srowen approved these changes Aug 20, 2019

View reviewed changes

srowen closed this in fb1f868 Aug 22, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-28776][ML] SparkML Writer gets hadoop conf from session state #25505

[SPARK-28776][ML] SparkML Writer gets hadoop conf from session state #25505

helenyugithub commented Aug 19, 2019 •

edited

helenyugithub commented Aug 20, 2019

srowen commented Aug 20, 2019

helenyugithub commented Aug 20, 2019

srowen left a comment

SparkQA commented Aug 20, 2019

helenyugithub commented Aug 21, 2019

srowen commented Aug 21, 2019

srowen commented Aug 22, 2019

[SPARK-28776][ML] SparkML Writer gets hadoop conf from session state #25505

[SPARK-28776][ML] SparkML Writer gets hadoop conf from session state #25505

Conversation

helenyugithub commented Aug 19, 2019 • edited

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

helenyugithub commented Aug 20, 2019

srowen commented Aug 20, 2019

helenyugithub commented Aug 20, 2019

srowen left a comment

Choose a reason for hiding this comment

SparkQA commented Aug 20, 2019

helenyugithub commented Aug 21, 2019

srowen commented Aug 21, 2019

srowen commented Aug 22, 2019

helenyugithub commented Aug 19, 2019 •

edited