Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-35535][SQL] New data source V2 API: LocalScan #32678

Closed
wants to merge 3 commits into from

Conversation

gengliangwang
Copy link
Member

What changes were proposed in this pull request?

Add a new data source V2 API: LocalScan. It is a special Scan that will happen on Driver locally instead of Executors.

Why are the changes needed?

The new API improves the flexibility of the DSV2 API. It allows developers to implement connectors for data sources of small data sizes.
For example, we can build a data source for Spark History applications from Spark History Server RESTFUL API. The result set is small and fetching all the results from the Spark driver is good enough. Making it a data source allows us to operate SQL queries with filters or table joins.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Unit test

@SparkQA
Copy link

SparkQA commented May 26, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43505/

@SparkQA
Copy link

SparkQA commented May 26, 2021

Test build #138986 has finished for PR 32678 at commit 823a811.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Thank you for pinging me, @gengliangwang .

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good to me

@SparkQA
Copy link

SparkQA commented May 27, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43526/

@SparkQA
Copy link

SparkQA commented May 27, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43526/

@SparkQA
Copy link

SparkQA commented May 27, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43532/

@SparkQA
Copy link

SparkQA commented May 27, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43532/

@SparkQA
Copy link

SparkQA commented May 27, 2021

Test build #139008 has finished for PR 32678 at commit a7e03aa.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

Merged to master.

@SparkQA
Copy link

SparkQA commented May 27, 2021

Test build #139015 has finished for PR 32678 at commit b96b6c5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

+1, late LGTM. Thanks!

huaxingao pushed a commit to huaxingao/spark that referenced this pull request Jun 9, 2021
### What changes were proposed in this pull request?

Add a new data source V2 API: `LocalScan`. It is a special Scan that will happen on Driver locally instead of Executors.

### Why are the changes needed?

The new API improves the flexibility of the DSV2 API. It allows developers to implement connectors for data sources of small data sizes.
For example, we can build a data source for Spark History applications from Spark History Server RESTFUL API. The result set is small and fetching all the results from the Spark driver is good enough. Making it a data source allows us to operate SQL queries with filters or table joins.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Unit test

Closes apache#32678 from gengliangwang/LocalScan.

Lead-authored-by: Gengliang Wang <ltnwgl@gmail.com>
Co-authored-by: Gengliang Wang <gengliang@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit 5bcd1c2)
dongjoon-hyun pushed a commit that referenced this pull request Aug 24, 2021
### What changes were proposed in this pull request?

This is a follow-up of #32678. It moves `LocalScan` from SQL core package to Catalyst package.

### Why are the changes needed?

There are two packages for `org.apache.spark.sql.connector`
SQL Core: https://github.com/apache/spark/tree/master/sql/core/src/main/java/org/apache/spark/sql/connector
Catalyst: https://github.com/apache/spark/tree/master/sql/catalyst/src/main/java/org/apache/spark/sql/connector

As `LocalScan` doesn't depend on the classes of SQL Core, we should move it to catalyst.
### Does this PR introduce _any_ user-facing change?

No, the trait is not released yet.

### How was this patch tested?

Existing UT.

Closes #33826 from gengliangwang/moveLocalScan.

Authored-by: Gengliang Wang <gengliang@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
dongjoon-hyun pushed a commit that referenced this pull request Aug 24, 2021
### What changes were proposed in this pull request?

This is a follow-up of #32678. It moves `LocalScan` from SQL core package to Catalyst package.

### Why are the changes needed?

There are two packages for `org.apache.spark.sql.connector`
SQL Core: https://github.com/apache/spark/tree/master/sql/core/src/main/java/org/apache/spark/sql/connector
Catalyst: https://github.com/apache/spark/tree/master/sql/catalyst/src/main/java/org/apache/spark/sql/connector

As `LocalScan` doesn't depend on the classes of SQL Core, we should move it to catalyst.
### Does this PR introduce _any_ user-facing change?

No, the trait is not released yet.

### How was this patch tested?

Existing UT.

Closes #33826 from gengliangwang/moveLocalScan.

Authored-by: Gengliang Wang <gengliang@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 5b4c216)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
6 participants