Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-35803][SQL] Support DataSource V2 CreateTempViewUsing #33922

Conversation

planga82
Copy link
Contributor

@planga82 planga82 commented Sep 6, 2021

What changes were proposed in this pull request?

Currently only DataSources V1 are supported in the CreateTempViewUsing command. This PR refactor DataframeReader to reuse the code for the creation of a DataFrame from a DataSource V2

Why are the changes needed?

Improve the support of DataSourve V2 in this command

Does this PR introduce any user-facing change?

It does not change the current behavior, it only adds a new functionality

How was this patch tested?

Unit testing

@github-actions github-actions bot added the SQL label Sep 6, 2021
@planga82
Copy link
Contributor Author

planga82 commented Sep 6, 2021

@cloud-fan what do you think? I'm not sure if I'm missing something. Thanks!

@HyukjinKwon
Copy link
Member

ok to test

Comment on lines 94 to 96
def loadV2Source(sparkSession: SparkSession, provider: TableProvider,
userSpecifiedSchema: Option[StructType], extraOptions: CaseInsensitiveMap[String],
source: String, paths: String*): Option[DataFrame] = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def loadV2Source(sparkSession: SparkSession, provider: TableProvider,
userSpecifiedSchema: Option[StructType], extraOptions: CaseInsensitiveMap[String],
source: String, paths: String*): Option[DataFrame] = {
def loadV2Source(
sparkSession: SparkSession,
provider: TableProvider,
userSpecifiedSchema: Option[StructType],
extraOptions: CaseInsensitiveMap[String],
source: String,
paths: String*): Option[DataFrame] = {

dsOptions)
(catalog.loadTable(ident), Some(catalog), Some(ident))
case _ =>
// TODO: Non-catalog paths for DSV2 are currently not well defined.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this comment was already existent before but wanted to make a note. This isn't a good example of a comment. There's no JIRA. and we don't know what's not well defined.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think the same, when I read it I tried to understand what was missing but I didn't get it. Shall we delete it?

@SparkQA
Copy link

SparkQA commented Sep 7, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47533/

@SparkQA
Copy link

SparkQA commented Sep 7, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47533/

@SparkQA
Copy link

SparkQA commented Sep 7, 2021

Test build #143031 has finished for PR 33922 at commit 181c5d1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Comment on lines 132 to 133
private def getOptionsWithPaths(extraOptions: CaseInsensitiveMap[String],
paths: String*): CaseInsensitiveMap[String] = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private def getOptionsWithPaths(extraOptions: CaseInsensitiveMap[String],
paths: String*): CaseInsensitiveMap[String] = {
private def getOptionsWithPaths(
extraOptions: CaseInsensitiveMap[String],
paths: String*): CaseInsensitiveMap[String] = {

val analyzedPlan = Dataset.ofRows(
sparkSession, LogicalRelation(dataSource.resolveRelation())).logicalPlan
val analyzedPlan = DataSource.lookupDataSourceV2(provider, sparkSession.sessionState.conf)
.map { tblProvider =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.map { tblProvider =>
.flatMap { tblProvider =>

test("SPARK-35803: Support datasorce V2 in CREATE VIEW USING") {
Seq(classOf[SimpleDataSourceV2], classOf[JavaSimpleDataSourceV2]).foreach { cls =>
withClue(cls.getName) {
sql(s"CREATE or REPLACE GLOBAL TEMPORARY VIEW s1 USING ${cls.getName}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should test with normal temp view unless there is something special with the global temp view

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there is no reason for this test

@SparkQA
Copy link

SparkQA commented Sep 7, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47564/

@SparkQA
Copy link

SparkQA commented Sep 7, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47564/

@SparkQA
Copy link

SparkQA commented Sep 8, 2021

Test build #143061 has finished for PR 33922 at commit 098c485.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in feba05f Sep 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants