-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-36814][SQL] Make class ColumnarBatch extendable #34054
Conversation
ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for making a PR, @flyrain .
cc @aokolnychyi , @cloud-fan , @tgravescs , @sunchao , @viirya
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. This adds the flexibility to extend ColumnarBatch
with minimal change. Pending CI.
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #143470 has finished for PR 34054 at commit
|
Can one of the admins verify this patch? |
Merged into master. Thanks. Can you create a followup PR to add tests to show how you want to extend this so future PRs will not break it? |
Thanks all for review! Thanks @dbtsai for the commit. Will file a followup PR. |
### What changes were proposed in this pull request? A follow up of #34054. Three things changed: 1. Add a test for extendable class `ColumnarBatch` 2. Make `ColumnarBatchRow` public. 3. Change private fields to protected fields. ### Why are the changes needed? A follow up of #34054. Class ColumnarBatch need to be extendable to support better vectorized reading in multiple data sources. For example, Iceberg needs to filter out deleted rows in a batch before Spark consumes it, to support row-level delete( apache/iceberg#3141) in vectorized read. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? A new test is added Closes #34087 from flyrain/SPARK-36821. Authored-by: Yufei Gu <yufei_gu@apple.com> Signed-off-by: DB Tsai <d_tsai@apple.com>
### What changes were proposed in this pull request? A follow up of apache/spark#34054. Three things changed: 1. Add a test for extendable class `ColumnarBatch` 2. Make `ColumnarBatchRow` public. 3. Change private fields to protected fields. ### Why are the changes needed? A follow up of apache/spark#34054. Class ColumnarBatch need to be extendable to support better vectorized reading in multiple data sources. For example, Iceberg needs to filter out deleted rows in a batch before Spark consumes it, to support row-level delete( apache/iceberg#3141) in vectorized read. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? A new test is added Closes #34087 from flyrain/SPARK-36821. Authored-by: Yufei Gu <yufei_gu@apple.com> Signed-off-by: DB Tsai <d_tsai@apple.com>
### What changes were proposed in this pull request? A follow up of apache#34054. Three things changed: 1. Add a test for extendable class `ColumnarBatch` 2. Make `ColumnarBatchRow` public. 3. Change private fields to protected fields. ### Why are the changes needed? A follow up of apache#34054. Class ColumnarBatch need to be extendable to support better vectorized reading in multiple data sources. For example, Iceberg needs to filter out deleted rows in a batch before Spark consumes it, to support row-level delete( apache/iceberg#3141) in vectorized read. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? A new test is added
What changes were proposed in this pull request?
Change class ColumnarBatch to a non-final class
Why are the changes needed?
To support better vectorized reading in multiple data source, ColumnarBatch need to be extendable. For example, To support row-level delete( apache/iceberg#3141) in Iceberg's vectorized read, we need to filter out deleted rows in a batch, which requires ColumnarBatch to be extendable.
Does this PR introduce any user-facing change?
No
How was this patch tested?
No test needed.