Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add datafusion-substrait crate #4543

Merged
merged 38 commits into from
Jan 12, 2023
Merged

Add datafusion-substrait crate #4543

merged 38 commits into from
Jan 12, 2023

Conversation

andygrove
Copy link
Member

@andygrove andygrove commented Dec 7, 2022

Which issue does this PR close?

Closes #4536

Rationale for this change

Move development of datafusion-substrait into main DataFusion repo.

What changes are included in this PR?

Are these changes tested?

Tests are included.

Are there any user-facing changes?

No

andygrove and others added 30 commits March 5, 2022 09:47
* use substrait 0.1

* use datafusion 8.0
Added fetch to consumer

Added limit to producer

Added unit tests for limit

Added roundtrip_fill_none() for testing when None input can be converted to 0

Update src/consumer.rs

Co-authored-by: Andy Grove <andygrove73@gmail.com>

Co-authored-by: Andy Grove <andygrove73@gmail.com>
Add consumer

Add producer and test

Modified error string
* Add plan and function extension support

* Removed unwraps
* Add consumer, producer and tests for aggregate relation

Change function extension registration from absolute to relative anchor
(reference)

Remove operator to/from reference

* Fixed function registration bug

* Add test

* Addressed PR comments
* Changed field reference from masked reference to direct reference

* Handle unsupported case (struct with child)
Fixed aggregate function register bug
* Implement CASE WHEN

* Add more case to test

* Addressed comments
* feat: support explicit catalog/schema names in ReadRel

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix: use re-exported expr crate

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
@andygrove
Copy link
Member Author

@waynexia @nseekhao @JanKaul I need to go through the ASF IP clearance process to have this code moved into this repo. Could you please follow the instructions at https://www.apache.org/licenses/contributor-agreements.html to submit an ICLA (Individual Contributor Licence Agreement) if you do not already have one on file? Thanks in advance for helping with this!

@andygrove
Copy link
Member Author

Thanks @nseekhao and @JanKaul for filing ICLAs.

@waynexia Will you also be able to file an ICLA?

@waynexia
Copy link
Member

waynexia commented Dec 15, 2022

Sorry for the delay 🥲 I just got feedback from the secretary. I'll fix the issue and resubmit it today.

Update: it's done 🥳

@andygrove
Copy link
Member Author

@waynexia @JanKaul @nseekhao @Dandandan Thanks everyone for filing ICLAs. As part of the IP clearance process, I have to remind active committers that they are responsible for ensuring that a Corporate CLA is recorded if such is required to authorize their contributions under their individual CLA.

I have filled out the necessary "paperwork" and will file this in the next day or two and start a vote on accepting this donation. Thanks for your patience.

@waynexia
Copy link
Member

waynexia commented Jan 5, 2023

Thanks for handling this @andygrove

@andygrove
Copy link
Member Author

@andygrove
Copy link
Member Author

@andygrove
Copy link
Member Author

The Apache Arrow PMC has voted to accept this contribution:

https://lists.apache.org/thread/qcbdt8y2vkwvwkjmmljn4jzvggzb2fkk

A lazy consensus incubator vote has now started:

https://lists.apache.org/thread/m5q4qxr32xzhotljv6z7mg3dofl46rv6

@andygrove andygrove changed the title [DO NOT MERGE] Add datafusion-substrait crate Add datafusion-substrait crate Jan 11, 2023
@andygrove andygrove marked this pull request as ready for review January 12, 2023 15:10
@andygrove
Copy link
Member Author

The incubator vote passed. Given that the PMC has already voted to accept this donation I plan on merging this PR later today.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is great to see this finally land @andygrove -- thank you!

@andygrove andygrove merged commit 0d27fcb into apache:master Jan 12, 2023
@andygrove andygrove deleted the substrait branch January 12, 2023 22:18
@ursabot
Copy link

ursabot commented Jan 12, 2023

Benchmark runs are scheduled for baseline = eb19a67 and contender = 0d27fcb. 0d27fcb is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Move datafusion-substrait project into arrow-datafusion repo
8 participants