Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement optimization rule for broadcast/CollectLeft hash join #28

Open
Dandandan opened this issue Apr 21, 2021 · 0 comments
Open

Implement optimization rule for broadcast/CollectLeft hash join #28

Dandandan opened this issue Apr 21, 2021 · 0 comments

Comments

@Dandandan
Copy link
Contributor

Dandandan commented Apr 21, 2021

When the left side of the join is very small (compared to the right side) it is better to load the left side once (and broadcast it in Ballista).
In DF this avoids hash partitioning, in Ballista this also avoids data shuffling.

alamb pushed a commit that referenced this issue Dec 26, 2022
* Sort Removal rule initial commit

* move ordering satisfy to the util

* update test and change repartition maintain_input_order impl

* simplifications

* partition by refactor (#28)

* partition by refactor

* minor changes

* Unnecessary tuple to Range conversion is removed

* move transpose under common

* Add naive sort removal rule

* Add todo for finer Sort removal handling

* Refactors to improve readability and reduce nesting

* reverse expr returns Option (no need for support check)

* fix tests

* partition by and order by no longer ends up at the same window group

* Refactor to simplify code

* Better comments, change method names

* Resolve errors introduced by syncing

* address reviews

* address reviews

* Rename to less confusing OptimizeSorts

Co-authored-by: Mehmet Ozan Kabak <ozankabak@gmail.com>
alamb pushed a commit to alamb/datafusion that referenced this issue Jan 4, 2023
* partition by refactor

* minor changes

* Unnecessary tuple to Range conversion is removed

* move transpose under common
alamb pushed a commit that referenced this issue Jan 4, 2023
* Sort Removal rule initial commit

* move ordering satisfy to the util

* update test and change repartition maintain_input_order impl

* simplifications

* partition by refactor (#28)

* partition by refactor

* minor changes

* Unnecessary tuple to Range conversion is removed

* move transpose under common

* Add naive sort removal rule

* Add todo for finer Sort removal handling

* Refactors to improve readability and reduce nesting

* reverse expr returns Option (no need for support check)

* fix tests

* partition by and order by no longer ends up at the same window group

* Bounded window exec

* solve merge problems

* Refactor to simplify code

* Better comments, change method names

* resolve merge conflicts

* Resolve errors introduced by syncing

* remove set_state, make ntile debuggable

* remove locked flag

* address reviews

* address reviews

* Resolve merge conflict

* address reviews

* address reviews

* address reviews

* Add new tests

* Update tests

* add support for bounded min max

* address reviews

* rename sort rule

* Resolve merge conflicts

* refactors

* Update fuzzy tests + minor changes

* Simplify code and improve comments

* Fix imports, make create_schema more functional

* address reviews

* undo yml change

* minor change to pass from CI

* resolve merge conflicts

* rename some members

* Move rule to physical planning

* Minor stylistic/comment changes

* Simplify batch-merging utility functions

* Remove unnecessary clones, simplify code

* update cargo lock file

* address reviews

* update comments

* resolve linter error

* Tidy up comments after final review

Co-authored-by: Mehmet Ozan Kabak <ozankabak@gmail.com>
andygrove added a commit that referenced this issue Jan 12, 2023
* Initial commit

* initial commit

* failing test

* table scan projection

* closer

* test passes, with some hacks

* use DataFrame (#2)

* update README

* update dependency

* code cleanup (#3)

* Add support for Filter operator and BinaryOp expressions (#4)

* GitHub action (#5)

* Split code into producer and consumer modules (#6)

* Support more functions and scalar types (#7)

* Use substrait 0.1 and datafusion 8.0 (#8)

* use substrait 0.1

* use datafusion 8.0

* update datafusion to 10.0 and substrait to 0.2 (#11)

* Add basic join support (#12)

* Added fetch support (#23)

Added fetch to consumer

Added limit to producer

Added unit tests for limit

Added roundtrip_fill_none() for testing when None input can be converted to 0

Update src/consumer.rs

Co-authored-by: Andy Grove <andygrove73@gmail.com>

Co-authored-by: Andy Grove <andygrove73@gmail.com>

* Upgrade to DataFusion 13.0.0 (#25)

* Add sort consumer and producer (#24)

Add consumer

Add producer and test

Modified error string

* Add serializer/deserializer (#26)

* Add plan and function extension support (#27)

* Add plan and function extension support

* Removed unwraps

* Implement GROUP BY (#28)

* Add consumer, producer and tests for aggregate relation

Change function extension registration from absolute to relative anchor
(reference)

Remove operator to/from reference

* Fixed function registration bug

* Add test

* Addressed PR comments

* Changed field reference from mask to direct reference (#29)

* Changed field reference from masked reference to direct reference

* Handle unsupported case (struct with child)

* Handle SubqueryAlias (#30)

Fixed aggregate function register bug

* Add support for SELECT DISTINCT (#31)

Add test case

* Implement BETWEEN (#32)

* Add case (#33)

* Implement CASE WHEN

* Add more case to test

* Addressed comments

* feat: support explicit catalog/schema names in ReadRel (#34)

* feat: support explicit catalog/schema names in ReadRel

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix: use re-exported expr crate

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* move files to subfolder

* RAT

* remove rust.yaml

* revert .gitignore changes

* tomlfmt

* tomlfmt

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Co-authored-by: Daniël Heres <danielheres@gmail.com>
Co-authored-by: JanKaul <jankaul@mailbox.org>
Co-authored-by: nseekhao <37189615+nseekhao@users.noreply.github.com>
Co-authored-by: Ruihang Xia <waynestxia@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant