-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-35535][SQL] New data source V2 API: LocalScan #32678
Conversation
Kubernetes integration test unable to build dist. exiting with code: 1 |
Test build #138986 has finished for PR 32678 at commit
|
Thank you for pinging me, @gengliangwang . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes look good to me
sql/core/src/main/java/org/apache/spark/sql/connector/read/LocalScan.java
Show resolved
Hide resolved
Kubernetes integration test starting |
Kubernetes integration test status failure |
...core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala
Outdated
Show resolved
Hide resolved
Kubernetes integration test starting |
Kubernetes integration test status success |
Test build #139008 has finished for PR 32678 at commit
|
Merged to master. |
Test build #139015 has finished for PR 32678 at commit
|
+1, late LGTM. Thanks! |
### What changes were proposed in this pull request? Add a new data source V2 API: `LocalScan`. It is a special Scan that will happen on Driver locally instead of Executors. ### Why are the changes needed? The new API improves the flexibility of the DSV2 API. It allows developers to implement connectors for data sources of small data sizes. For example, we can build a data source for Spark History applications from Spark History Server RESTFUL API. The result set is small and fetching all the results from the Spark driver is good enough. Making it a data source allows us to operate SQL queries with filters or table joins. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Unit test Closes apache#32678 from gengliangwang/LocalScan. Lead-authored-by: Gengliang Wang <ltnwgl@gmail.com> Co-authored-by: Gengliang Wang <gengliang@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> (cherry picked from commit 5bcd1c2)
### What changes were proposed in this pull request? This is a follow-up of #32678. It moves `LocalScan` from SQL core package to Catalyst package. ### Why are the changes needed? There are two packages for `org.apache.spark.sql.connector` SQL Core: https://github.com/apache/spark/tree/master/sql/core/src/main/java/org/apache/spark/sql/connector Catalyst: https://github.com/apache/spark/tree/master/sql/catalyst/src/main/java/org/apache/spark/sql/connector As `LocalScan` doesn't depend on the classes of SQL Core, we should move it to catalyst. ### Does this PR introduce _any_ user-facing change? No, the trait is not released yet. ### How was this patch tested? Existing UT. Closes #33826 from gengliangwang/moveLocalScan. Authored-by: Gengliang Wang <gengliang@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request? This is a follow-up of #32678. It moves `LocalScan` from SQL core package to Catalyst package. ### Why are the changes needed? There are two packages for `org.apache.spark.sql.connector` SQL Core: https://github.com/apache/spark/tree/master/sql/core/src/main/java/org/apache/spark/sql/connector Catalyst: https://github.com/apache/spark/tree/master/sql/catalyst/src/main/java/org/apache/spark/sql/connector As `LocalScan` doesn't depend on the classes of SQL Core, we should move it to catalyst. ### Does this PR introduce _any_ user-facing change? No, the trait is not released yet. ### How was this patch tested? Existing UT. Closes #33826 from gengliangwang/moveLocalScan. Authored-by: Gengliang Wang <gengliang@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 5b4c216) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
What changes were proposed in this pull request?
Add a new data source V2 API:
LocalScan
. It is a special Scan that will happen on Driver locally instead of Executors.Why are the changes needed?
The new API improves the flexibility of the DSV2 API. It allows developers to implement connectors for data sources of small data sizes.
For example, we can build a data source for Spark History applications from Spark History Server RESTFUL API. The result set is small and fetching all the results from the Spark driver is good enough. Making it a data source allows us to operate SQL queries with filters or table joins.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Unit test