Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade Datafusion to v37.1.0 #669

Merged
merged 9 commits into from
May 8, 2024

Conversation

Michael-J-Ward
Copy link
Contributor

@Michael-J-Ward Michael-J-Ward commented May 8, 2024

NOTE: While each commit is a self-contained "logical" change, the project needs all the commits to compile.

Which issue does this PR close?

Closes #663. (hopefully)

This is a cleaned up version of #662.

Importantly, the final failing test is not commented out, so there is 1 failing test.

The error and test case are highlighted in this commit from previous PR: ecff357

What changes are included in this PR?

  • Upgrades the dependencies
  • All minimal ports for the code to compile and all tests (modulo one failure) to pass
  • It does not answer the API questions highlighted in the Tracking Upgrade to Datafusion 37 #663. All choices were w/ the goal of "getting to passing tests".

Are there any user-facing changes?

Much of src/functions.rs have been updated per the upstream epic: apache/datafusion#9285

The method was removed upstream but is used in many tests for `datafusion-python`.

Ref: apache/datafusion#9627
The options to write_parquet changed.

write_json has a new argument that I defaulted to None. We can expose that config later.

Ref: apache/datafusion#9382
- `WindowFunction` and `AggregateFunction` have `null_treatment` options.
- `ScalarValue` and `DataType` have new variants
- `SchemaProvider::table` now returns a `Result`
`datafusion` completed an Epic that ported many of the `BuiltInFunctions` enum to `SclarUDF`.

I created new macros to simplify the port, and used these macros to refactor a few existing functions.

Ref: apache/datafusion#9285
@Michael-J-Ward
Copy link
Contributor Author

One follow-on that I'd like to do is parametrize test_array_functions().

I didn't want to alter the test-cases unnecessarily while doing the upgrade.

@slyons
Copy link

slyons commented May 8, 2024

Note this also closes #656 , #655, #654, #652, #651, #650 and #649

This is a bug upstream in datafusion

FAILED datafusion/tests/test_functions.py::test_array_functions - pyo3_runtime.PanicException: range end index 9 out of range for slice of length 8
Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you for taking this on @Michael-J-Ward!

The conda failures are happening in main too, so will ignore those for now.

@andygrove andygrove merged commit ee93cdd into apache:main May 8, 2024
12 of 20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Tracking Upgrade to Datafusion 37
3 participants