-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Basically the tests added in #2582 to keep ballista and datafusion in sync add significant burden to DataFusion development and I propose removing them, at least temporarily
Here is a description of the process:
https://github.com/apache/arrow-datafusion/blob/907504c/.github/pull_request_template.md?plain=1#L31-L47
I think the rationale for the new CI was to add friction on DataFusion API changes to encourage a more stable API; However, with the currently ongoing efforts to rework the object store and parquet reading, I think all we are doing with the process is slowing things down.
The alternative, to have Ballista keep up with changes in DataFusion, sounds daunting at first, but my firsthand experience suggests it is not that bad. Specifically, https://github.com/influxdata/influxdb_iox, my project, uses DataFusion similarly to Ballista (as the core query engine) and uses a DataFusion pin directly from master. Instead of impinging on the DataFusion development process, we keep IOx up with DataFusion by manually updating the DataFusion pin in IOX about once a week , and sorting out any API changes.
This does take time, but it is mostly mechanical. We do occasionally find bugs that were introduced into DataFusion such as when we tried most recently with https://github.com/influxdata/influxdb_iox/pull/4743 and we then contribute a fix back upstream (e.g. #2674)
I would be interested to hear how others keep up with pre-release DataFusion as well (maybe @ovr and cube-js?)
Describe the solution you'd like
I propose removing the Ballista CI check in DataFusion
Specifically this check: https://github.com/apache/arrow-datafusion/blob/907504c5aa768601f9d70ad2c8f928bedfa9b069/.github/workflows/rust.yml#L128-L172
And writing up instructions (maybe even automation) on how to upgrade the datafusion pin in Ballista manually
Describe alternatives you've considered
- Do nothing
- Bring ballsita back into DataFusion repositoru
Additional context
The move of Ballista to a new repo is tracked in: #2502
There are several discussions about this pain:
cc @andygrove @thinkharderdev @ming535 @Ted-Jiang @xudong963 @tustvold @korowa