Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade DataFusion to latest, to include fixes for aggregation #216

Merged
merged 85 commits into from
Nov 9, 2023

Conversation

Dandandan
Copy link
Collaborator

Which issue does this PR close?

Closes #.

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

mustafasrepo and others added 30 commits October 25, 2023 16:43
* Initial commit

* Address todos

* Update comments

* Simplifications

* Minor simplifications

* Address reviews

* Add TableScan constructor

* Minor changes

* make try_new_with_schema method of Aggregate private

* Use projection try_new instead of try_new_schema

* Simplifications, add comment

* Review changes

* Improve comments

* Move get_wider_type to type_coercion module

* Clean up type coercion file

---------

Co-authored-by: berkaysynnada <berkay.sahin@synnada.ai>
Co-authored-by: Mehmet Ozan Kabak <ozankabak@gmail.com>
…#7655)

* merge main

* fixes and cmt

* review comments, tuning parameters, updating docs

* cargo fmt

* reduce default buffer size to 2 and update docs
…e#7821)

* feat: implement read bloom filter support

* test: add unit test for read bloom filter

* Simplify bloom filter application

* test: add unit test for bloom filter with sql `in`

* fix: imrpove bloom filter match express

* fix: add more test for bloom filter

* ci: rollback dependences

* ci: merge main branch

* fix: unit tests for bloom filter

* ci: cargo clippy

* ci: cargo clippy

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* fix: don't push down volatile predicates in projection

* Update datafusion/optimizer/src/push_down_filter.rs

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* Update datafusion/optimizer/src/push_down_filter.rs

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* Update datafusion/optimizer/src/push_down_filter.rs

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* add suggestions

* fix

* fix doc

* Update datafusion/optimizer/src/push_down_filter.rs

Co-authored-by: Jonah Gao <jonahgaox@gmail.com>

* Update datafusion/optimizer/src/push_down_filter.rs

Co-authored-by: Jonah Gao <jonahgaox@gmail.com>

* Update datafusion/optimizer/src/push_down_filter.rs

Co-authored-by: Jonah Gao <jonahgaox@gmail.com>

* Update datafusion/optimizer/src/push_down_filter.rs

Co-authored-by: Jonah Gao <jonahgaox@gmail.com>

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Jonah Gao <jonahgaox@gmail.com>
…itional (apache#7745)

* Make parquet an option by adding multiple cfg attributes without significant code changes.

* Extract parquet logic into submodule from execution::context

* Extract parquet logic into submodule from datafusion_core::dataframe

* Extract more logic into submodule from execution::context

* Move tests from execution::context

* Rename submodules
* Initial commit

* Simplifications

* Cleanup imports

* Review

---------

Co-authored-by: Mehmet Ozan Kabak <ozankabak@gmail.com>
…noseconds, add `to_timestamp_nanos` (apache#7844)

* Change input for `to_timestamp` function

* docs

* fix examples

* output `to_timestamp` signature as ns
* Change input for `to_timestamp` function

* docs

* fix examples

* output `to_timestamp` signature as ns

* Fix CI `to_timestamp()` failed

* Update datafusion/expr/src/built_in_function.rs

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* fix typo

* fix

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* minor: cast the updated value to the data type of target column

* Update datafusion/sqllogictest/test_files/update.slt

Co-authored-by: Alex Huang <huangweijun1001@gmail.com>

* Update datafusion/sqllogictest/test_files/update.slt

Co-authored-by: Alex Huang <huangweijun1001@gmail.com>

* Update datafusion/sqllogictest/test_files/update.slt

Co-authored-by: Alex Huang <huangweijun1001@gmail.com>

* fix tests

---------

Co-authored-by: Alex Huang <huangweijun1001@gmail.com>
* Add simple exclude all columns test to sqllogictest

* Add more exclude test cases
…pache#7896)

* support dictionary encoded string columns for partition cols

* remove debug prints

* cargo fmt

* generic dictionary cast and dict encoded test

* updates from review

* force retry checks

* try checks again
* remove array

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* cleanup others

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* clippy

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* cleanup cast

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* fmt

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* cleanup cast

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

---------

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
…trait (apache#7965)

* Lower &mut SessionContext in substrait

* rm mut ctx in tests
* Minor: Improve `HashJoinExec` documentation

* Apply suggestions from code review

Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>

---------

Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
…he#7970)

* Add README.md to `core`, `execution` and `physical-plan` crates

* prettier

* Update datafusion/physical-plan/README.md

* Update datafusion/wasmtest/README.md

---------

Co-authored-by: Daniël Heres <danielheres@gmail.com>
…7936)

* Move source repartitioning into ExecutionPlan::repartition

* cleanup

* update test

* update test

* refine docs

* fix merge
* minor: fix broken links in README.md

* fix proto link
* Minor: Upate the sqllogictest crate README

* prettier

* Apply suggestions from code review

Co-authored-by: Jonah Gao <jonahgaox@gmail.com>
Co-authored-by: jakevin <jakevingoo@gmail.com>

---------

Co-authored-by: Jonah Gao <jonahgaox@gmail.com>
Co-authored-by: jakevin <jakevingoo@gmail.com>
* Fix try_from_array data type for NULL value in ListArray

* Fix

* Explicitly assert the datatype

* For review
andygrove and others added 19 commits November 6, 2023 07:44
* changelog

* update version

* update changelog
…() (apache#8059)

* deprecate BuiltinScalarFunction::supports_zero_argument()

* unify old supports_zero_argument() impl
* feat: add example to ci

* nit

* addr comments

---------

Co-authored-by: zhongjingxiong <zhongjingxiong@bytedance.com>
Updates the requirements on [substrait](https://github.com/substrait-io/substrait-rs) to permit the latest version.
- [Release notes](https://github.com/substrait-io/substrait-rs/releases)
- [Changelog](https://github.com/substrait-io/substrait-rs/blob/main/CHANGELOG.md)
- [Commits](substrait-io/substrait-rs@v0.18.0...v0.19.0)

---
updated-dependencies:
- dependency-name: substrait
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Mark Sirek <sirek@cockroachlabs.com>
…ppy` as well as `foo.parquet` (apache#7972)

* feat: read files based on the file extention

* fix: some the file extension might be started with . and some not

* fix: rename extention to extension

* chore: use exec_err

* chore: rename extention to extension

* chore: rename extention to extension

* chore: simplify the code

* fix: check table is empty

* ci: fix test

* fix: add err info

* refactor: extract the logic to infer_types

* fix: add tests for different extensions

* fix: ci clippy

* fix: add more tests

* fix: simplify the logic

* fix: ci
* Minor: Improve HashJoinStream docstrings

* fix comments

* Update datafusion/physical-plan/src/joins/hash_join.rs

Co-authored-by: comphead <comphead@users.noreply.github.com>

* Update datafusion/physical-plan/src/joins/hash_join.rs

Co-authored-by: comphead <comphead@users.noreply.github.com>

---------

Co-authored-by: Daniël Heres <danielheres@gmail.com>
Co-authored-by: comphead <comphead@users.noreply.github.com>
* Fixing broken link

* Update docs/source/contributor-guide/index.md

Thanks for spotting this as well

Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>

---------

Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
* fix: DataFusion suggests invalid functions

* update test

* Add test for BuiltInWindowFunction
* General array repeat

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* cleanup

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* cleanup

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* cleanup

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* add test

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* add test

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* done

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* remove test

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* add comment

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* fm

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

---------

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
… rule (apache#8061)

* Minor: remove unnecessary projection

* fix ci
…parquet dependencies (apache#8095)

* remove duplicate version numbers for arrow, object_store, and parquet dependencies

* cargo update

* use default features in parquet crate

* disable default parquet features in wasmtest
* Protobuf serde for Json file sink

* Fix tests

* Fix test
@Dandandan Dandandan merged commit ca4b6ee into v32 Nov 9, 2023
46 checks passed
@Dandandan Dandandan deleted the upgrade_df_agg branch November 9, 2023 09:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet