[SPARK-35803][SQL] Support DataSource V2 CreateTempViewUsing #33922

planga82 · 2021-09-06T23:50:43Z

What changes were proposed in this pull request?

Currently only DataSources V1 are supported in the CreateTempViewUsing command. This PR refactor DataframeReader to reuse the code for the creation of a DataFrame from a DataSource V2

Why are the changes needed?

Improve the support of DataSourve V2 in this command

Does this PR introduce any user-facing change?

It does not change the current behavior, it only adds a new functionality

How was this patch tested?

Unit testing

…3_crateview_datasourceV2

planga82 · 2021-09-06T23:52:37Z

@cloud-fan what do you think? I'm not sure if I'm missing something. Thanks!

HyukjinKwon · 2021-09-07T01:09:23Z

ok to test

HyukjinKwon · 2021-09-07T01:11:35Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Utils.scala

+  def loadV2Source(sparkSession: SparkSession, provider: TableProvider,
+      userSpecifiedSchema: Option[StructType], extraOptions: CaseInsensitiveMap[String],
+                   source: String, paths: String*): Option[DataFrame] = {


Suggested change

def loadV2Source(sparkSession: SparkSession, provider: TableProvider,

userSpecifiedSchema: Option[StructType], extraOptions: CaseInsensitiveMap[String],

source: String, paths: String*): Option[DataFrame] = {

def loadV2Source(

sparkSession: SparkSession,

provider: TableProvider,

userSpecifiedSchema: Option[StructType],

extraOptions: CaseInsensitiveMap[String],

source: String,

paths: String*): Option[DataFrame] = {

HyukjinKwon · 2021-09-07T01:13:15Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Utils.scala

+          dsOptions)
+        (catalog.loadTable(ident), Some(catalog), Some(ident))
+      case _ =>
+        // TODO: Non-catalog paths for DSV2 are currently not well defined.


I know this comment was already existent before but wanted to make a note. This isn't a good example of a comment. There's no JIRA. and we don't know what's not well defined.

Yes, I think the same, when I read it I tried to understand what was missing but I didn't get it. Shall we delete it?

SparkQA · 2021-09-07T02:09:20Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47533/

SparkQA · 2021-09-07T02:18:04Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47533/

SparkQA · 2021-09-07T05:48:29Z

Test build #143031 has finished for PR 33922 at commit 181c5d1.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2021-09-07T07:29:50Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Utils.scala

+  private def getOptionsWithPaths(extraOptions: CaseInsensitiveMap[String],
+      paths: String*): CaseInsensitiveMap[String] = {


Suggested change

private def getOptionsWithPaths(extraOptions: CaseInsensitiveMap[String],

paths: String*): CaseInsensitiveMap[String] = {

private def getOptionsWithPaths(

extraOptions: CaseInsensitiveMap[String],

paths: String*): CaseInsensitiveMap[String] = {

cloud-fan · 2021-09-07T07:31:08Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ddl.scala

-    val analyzedPlan = Dataset.ofRows(
-      sparkSession, LogicalRelation(dataSource.resolveRelation())).logicalPlan
+    val analyzedPlan = DataSource.lookupDataSourceV2(provider, sparkSession.sessionState.conf)
+      .map { tblProvider =>


Suggested change

.map { tblProvider =>

.flatMap { tblProvider =>

cloud-fan · 2021-09-07T07:32:55Z

sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2Suite.scala

+  test("SPARK-35803: Support datasorce V2 in CREATE VIEW USING") {
+    Seq(classOf[SimpleDataSourceV2], classOf[JavaSimpleDataSourceV2]).foreach { cls =>
+      withClue(cls.getName) {
+        sql(s"CREATE or REPLACE GLOBAL TEMPORARY VIEW s1 USING ${cls.getName}")


We should test with normal temp view unless there is something special with the global temp view

Yes, there is no reason for this test

SparkQA · 2021-09-07T23:12:03Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47564/

SparkQA · 2021-09-07T23:52:21Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47564/

SparkQA · 2021-09-08T02:57:52Z

Test build #143061 has finished for PR 33922 at commit 098c485.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2021-09-08T04:16:24Z

thanks, merging to master!

planga82 added 4 commits August 23, 2021 17:26

implementaion + tests

82ed2e0

Clean test

157b8d7

Merge remote-tracking branch 'upstream/master' into feature/spark3580…

ac5458a

…3_crateview_datasourceV2

Refactor DataframeReader

181c5d1

github-actions bot added the SQL label Sep 6, 2021

HyukjinKwon reviewed Sep 7, 2021

View reviewed changes

cloud-fan reviewed Sep 7, 2021

View reviewed changes

pr comments

098c485

cloud-fan approved these changes Sep 8, 2021

View reviewed changes

cloud-fan closed this in feba05f Sep 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-35803][SQL] Support DataSource V2 CreateTempViewUsing #33922

[SPARK-35803][SQL] Support DataSource V2 CreateTempViewUsing #33922

planga82 commented Sep 6, 2021

planga82 commented Sep 6, 2021

HyukjinKwon commented Sep 7, 2021

HyukjinKwon Sep 7, 2021

HyukjinKwon Sep 7, 2021

planga82 Sep 7, 2021

SparkQA commented Sep 7, 2021

SparkQA commented Sep 7, 2021

SparkQA commented Sep 7, 2021

cloud-fan Sep 7, 2021

cloud-fan Sep 7, 2021

cloud-fan Sep 7, 2021

planga82 Sep 7, 2021

SparkQA commented Sep 7, 2021

SparkQA commented Sep 7, 2021

SparkQA commented Sep 8, 2021

cloud-fan commented Sep 8, 2021

		private def getOptionsWithPaths(extraOptions: CaseInsensitiveMap[String],
		paths: String*): CaseInsensitiveMap[String] = {

[SPARK-35803][SQL] Support DataSource V2 CreateTempViewUsing #33922

[SPARK-35803][SQL] Support DataSource V2 CreateTempViewUsing #33922

Conversation

planga82 commented Sep 6, 2021

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

planga82 commented Sep 6, 2021

HyukjinKwon commented Sep 7, 2021

HyukjinKwon Sep 7, 2021

Choose a reason for hiding this comment

HyukjinKwon Sep 7, 2021

Choose a reason for hiding this comment

planga82 Sep 7, 2021

Choose a reason for hiding this comment

SparkQA commented Sep 7, 2021

SparkQA commented Sep 7, 2021

SparkQA commented Sep 7, 2021

cloud-fan Sep 7, 2021

Choose a reason for hiding this comment

cloud-fan Sep 7, 2021

Choose a reason for hiding this comment

cloud-fan Sep 7, 2021

Choose a reason for hiding this comment

planga82 Sep 7, 2021

Choose a reason for hiding this comment

SparkQA commented Sep 7, 2021

SparkQA commented Sep 7, 2021

SparkQA commented Sep 8, 2021

cloud-fan commented Sep 8, 2021