[SPARK-25460][BRANCH-2.4][SS] DataSourceV2: SS sources do not respect SessionConfigSupport #22529

HyukjinKwon · 2018-09-23T02:33:06Z

What changes were proposed in this pull request?

This PR proposes to backport SPARK-25460 to branch-2.4:

This PR proposes to respect SessionConfigSupport in SS datasources as well. Currently these are only respected in batch sources:

spark/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala

Lines 198 to 203 in e06da95

    
           val sessionOptions = DataSourceV2Utils.extractSessionConfigs( 
        
             ds = ds, conf = sparkSession.sessionState.conf) 
        
           val pathsOption = { 
        
             val objectMapper = new ObjectMapper() 
        
             DataSourceOptions.PATHS_KEY -> objectMapper.writeValueAsString(paths.toArray) 
        
           }

spark/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala

Lines 244 to 249 in e06da95

    
           val sessionOptions = DataSourceV2Utils.extractSessionConfigs( 
        
             source, 
        
             df.sparkSession.sessionState.conf) 
        
           val options = sessionOptions ++ extraOptions 
        
           val relation = DataSourceV2Relation.create(source, options)

If a developer makes a datasource V2 that supports both structured streaming and batch jobs, batch jobs respect a specific configuration, let's say, URL to connect and fetch data (which end users might not be aware of); however, structured streaming ends up with not supporting this (and should explicitly be set into options).

How was this patch tested?

Unit tests were added.

…igSupport This PR proposes to respect `SessionConfigSupport` in SS datasources as well. Currently these are only respected in batch sources: https://github.com/apache/spark/blob/e06da95cd9423f55cdb154a2778b0bddf7be984c/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala#L198-L203 https://github.com/apache/spark/blob/e06da95cd9423f55cdb154a2778b0bddf7be984c/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala#L244-L249 If a developer makes a datasource V2 that supports both structured streaming and batch jobs, batch jobs respect a specific configuration, let's say, URL to connect and fetch data (which end users might not be aware of); however, structured streaming ends up with not supporting this (and should explicitly be set into options). Unit tests were added. Closes apache#22462 from HyukjinKwon/SPARK-25460. Authored-by: hyukjinkwon <gurwls223@apache.org> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

SparkQA · 2018-09-23T02:59:34Z

Test build #96480 has finished for PR 22529 at commit b6f8880.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-09-23T07:05:01Z

Test build #96484 has finished for PR 22529 at commit b080b0d.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2018-09-23T07:52:56Z

retest this please

SparkQA · 2018-09-23T11:49:17Z

Test build #96486 has finished for PR 22529 at commit b080b0d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2018-09-23T22:39:15Z

sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala

        var tempReader: MicroBatchReader = null
        val schema = try {
          tempReader = s.createMicroBatchReader(
            Optional.ofNullable(userSpecifiedSchema.orNull),
            Utils.createTempDir(namePrefix = s"temporaryReader").getCanonicalPath,
-            options)
+            dataSourceOptions)


So, this part is the difference, isn't it?

yup. the conflict looks mainly because of renaming.

dongjoon-hyun

+1, LGTM.

dongjoon-hyun · 2018-09-23T22:45:29Z

cc @cloud-fan and @gatorsmile

cloud-fan · 2018-09-24T12:59:07Z

LGTM

dongjoon-hyun · 2018-09-24T15:48:46Z

Merged to branch-2.4.

… SessionConfigSupport ## What changes were proposed in this pull request? This PR proposes to backport SPARK-25460 to branch-2.4: This PR proposes to respect `SessionConfigSupport` in SS datasources as well. Currently these are only respected in batch sources: https://github.com/apache/spark/blob/e06da95cd9423f55cdb154a2778b0bddf7be984c/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala#L198-L203 https://github.com/apache/spark/blob/e06da95cd9423f55cdb154a2778b0bddf7be984c/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala#L244-L249 If a developer makes a datasource V2 that supports both structured streaming and batch jobs, batch jobs respect a specific configuration, let's say, URL to connect and fetch data (which end users might not be aware of); however, structured streaming ends up with not supporting this (and should explicitly be set into options). ## How was this patch tested? Unit tests were added. Closes #22529 from HyukjinKwon/SPARK-25460-backport. Authored-by: hyukjinkwon <gurwls223@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

dongjoon-hyun · 2018-09-24T15:51:09Z

Thank you, @HyukjinKwon and @cloud-fan .

HyukjinKwon · 2018-09-24T15:59:51Z

Thanks, @dongjoon-hyun and @cloud-fan.

HyukjinKwon mentioned this pull request Sep 23, 2018

[SPARK-25460][SS] DataSourceV2: SS sources do not respect SessionConfigSupport #22462

Closed

One missed rename

b080b0d

dongjoon-hyun reviewed Sep 23, 2018

View reviewed changes

dongjoon-hyun approved these changes Sep 23, 2018

View reviewed changes

HyukjinKwon closed this Sep 24, 2018

HyukjinKwon deleted the SPARK-25460-backport branch October 16, 2018 12:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-25460][BRANCH-2.4][SS] DataSourceV2: SS sources do not respect SessionConfigSupport #22529

[SPARK-25460][BRANCH-2.4][SS] DataSourceV2: SS sources do not respect SessionConfigSupport #22529

HyukjinKwon commented Sep 23, 2018

SparkQA commented Sep 23, 2018

SparkQA commented Sep 23, 2018

HyukjinKwon commented Sep 23, 2018

SparkQA commented Sep 23, 2018

dongjoon-hyun Sep 23, 2018

HyukjinKwon Sep 24, 2018

dongjoon-hyun left a comment

dongjoon-hyun commented Sep 23, 2018

cloud-fan commented Sep 24, 2018

dongjoon-hyun commented Sep 24, 2018

dongjoon-hyun commented Sep 24, 2018

HyukjinKwon commented Sep 24, 2018

	val sessionOptions = DataSourceV2Utils.extractSessionConfigs(
	ds = ds, conf = sparkSession.sessionState.conf)
	val pathsOption = {
	val objectMapper = new ObjectMapper()
	DataSourceOptions.PATHS_KEY -> objectMapper.writeValueAsString(paths.toArray)
	}

	val sessionOptions = DataSourceV2Utils.extractSessionConfigs(
	source,
	df.sparkSession.sessionState.conf)
	val options = sessionOptions ++ extraOptions

	val relation = DataSourceV2Relation.create(source, options)

[SPARK-25460][BRANCH-2.4][SS] DataSourceV2: SS sources do not respect SessionConfigSupport #22529

[SPARK-25460][BRANCH-2.4][SS] DataSourceV2: SS sources do not respect SessionConfigSupport #22529

Conversation

HyukjinKwon commented Sep 23, 2018

What changes were proposed in this pull request?

How was this patch tested?

SparkQA commented Sep 23, 2018

SparkQA commented Sep 23, 2018

HyukjinKwon commented Sep 23, 2018

SparkQA commented Sep 23, 2018

dongjoon-hyun Sep 23, 2018

Choose a reason for hiding this comment

HyukjinKwon Sep 24, 2018

Choose a reason for hiding this comment

dongjoon-hyun left a comment

Choose a reason for hiding this comment

dongjoon-hyun commented Sep 23, 2018

cloud-fan commented Sep 24, 2018

dongjoon-hyun commented Sep 24, 2018

dongjoon-hyun commented Sep 24, 2018

HyukjinKwon commented Sep 24, 2018