Skip to content

Proposal: remove automated ballista CI checks from DataFusion #2679

@alamb

Description

@alamb

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Basically the tests added in #2582 to keep ballista and datafusion in sync add significant burden to DataFusion development and I propose removing them, at least temporarily

Here is a description of the process:
https://github.com/apache/arrow-datafusion/blob/907504c/.github/pull_request_template.md?plain=1#L31-L47

I think the rationale for the new CI was to add friction on DataFusion API changes to encourage a more stable API; However, with the currently ongoing efforts to rework the object store and parquet reading, I think all we are doing with the process is slowing things down.

The alternative, to have Ballista keep up with changes in DataFusion, sounds daunting at first, but my firsthand experience suggests it is not that bad. Specifically, https://github.com/influxdata/influxdb_iox, my project, uses DataFusion similarly to Ballista (as the core query engine) and uses a DataFusion pin directly from master. Instead of impinging on the DataFusion development process, we keep IOx up with DataFusion by manually updating the DataFusion pin in IOX about once a week , and sorting out any API changes.

This does take time, but it is mostly mechanical. We do occasionally find bugs that were introduced into DataFusion such as when we tried most recently with https://github.com/influxdata/influxdb_iox/pull/4743 and we then contribute a fix back upstream (e.g. #2674)

I would be interested to hear how others keep up with pre-release DataFusion as well (maybe @ovr and cube-js?)

Describe the solution you'd like
I propose removing the Ballista CI check in DataFusion

Specifically this check: https://github.com/apache/arrow-datafusion/blob/907504c5aa768601f9d70ad2c8f928bedfa9b069/.github/workflows/rust.yml#L128-L172

And writing up instructions (maybe even automation) on how to upgrade the datafusion pin in Ballista manually

Describe alternatives you've considered

  • Do nothing
  • Bring ballsita back into DataFusion repositoru

Additional context
The move of Ballista to a new repo is tracked in: #2502

There are several discussions about this pain:

cc @andygrove @thinkharderdev @ming535 @Ted-Jiang @xudong963 @tustvold @korowa

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions