Skip to content

Conversation

@sunng87
Copy link
Member

@sunng87 sunng87 commented Oct 24, 2025

No description provided.

@sunng87 sunng87 requested a review from WenyXu October 24, 2025 03:22
@WenyXu
Copy link
Collaborator

WenyXu commented Oct 24, 2025

Let's release a new version.

@sunng87
Copy link
Member Author

sunng87 commented Oct 24, 2025

@WenyXu yes, this time it should be 0.7.0

@WenyXu WenyXu merged commit df57e64 into main Oct 24, 2025
12 checks passed
@WenyXu WenyXu deleted the arrow-57 branch October 24, 2025 03:27
@progval
Copy link
Collaborator

progval commented Oct 24, 2025

Why require 57.0 instead of >=56.0, <58.0?

@sunng87
Copy link
Member Author

sunng87 commented Oct 24, 2025

@progval We will create minor version for each major release of arrow. So user of this library will not silently upgrades to a new major arrow version.

@progval
Copy link
Collaborator

progval commented Oct 24, 2025

I don't follow.

Users with a lockfile won't silently upgrade either way, because the array versions is in the lockfile.

Users without a lockfile will (assuming they have a semver dependency like orc-rust = "0.6.0") silently upgrade either way because they'll automatically get the latest minor version of orc-rust within the major version.

And coupling orc-rust versions with arrow version has the issue that if someone wants a bug fix, then it unnecessarily forces them to use a new Arrow version. For example #56 is only in orc-rust 0.7.0, along with this version bump.

@sunng87
Copy link
Member Author

sunng87 commented Oct 24, 2025

I got your point.

For example, in datafusion-orc we have datafusion 50 which uses arrow 56 in its interface. If we have >=56.0, <58 for orc-rust, cargo will pull both arrow 56 and 57 when doing a clean build of datafusion-orc, and we end up with two incompatible version of arrow:

❯ cargo c
    Updating crates.io index
     Locking 317 packages to latest compatible versions
      Adding arrow v56.2.0 (available: v57.0.0)
      Adding async-compression v0.4.19 (available: v0.4.32)
      Adding comfy-table v7.1.2 (available: v7.2.1)
  Downloaded datafusion-functions-table v50.3.0
  Downloaded datafusion-doc v50.3.0
  Downloaded datafusion-physical-expr-common v50.3.0
  Downloaded datafusion-macros v50.3.0
  Downloaded datafusion-functions-window-common v50.3.0
  Downloaded datafusion-datasource-json v50.3.0
  Downloaded datafusion-functions-nested v50.3.0
  Downloaded datafusion-session v50.3.0
  Downloaded datafusion-common-runtime v50.3.0
  Downloaded datafusion-functions-window v50.3.0
  Downloaded datafusion-physical-expr-adapter v50.3.0
  Downloaded datafusion-datasource-csv v50.3.0
  Downloaded datafusion-catalog-listing v50.3.0
  Downloaded datafusion-functions-aggregate-common v50.3.0
  Downloaded datafusion-pruning v50.3.0
  Downloaded datafusion-physical-optimizer v50.3.0
  Downloaded datafusion-execution v50.3.0
  Downloaded datafusion-catalog v50.3.0
  Downloaded bigdecimal v0.4.9
  Downloaded datafusion-functions-aggregate v50.3.0
  Downloaded datafusion-expr-common v50.3.0
  Downloaded datafusion-datasource-parquet v50.3.0
  Downloaded datafusion-datasource v50.3.0
  Downloaded datafusion-functions v50.3.0
  Downloaded datafusion-sql v50.3.0
  Downloaded datafusion-optimizer v50.3.0
  Downloaded datafusion-physical-expr v50.3.0
  Downloaded datafusion-common v50.3.0
  Downloaded datafusion-expr v50.3.0
  Downloaded datafusion v50.3.0
  Downloaded datafusion-physical-plan v50.3.0
  Downloaded 31 crates (3.1MiB) in 0.34s
   Compiling proc-macro2 v1.0.103
   Compiling unicode-ident v1.0.20
    Checking bitflags v2.10.0
   Compiling cc v1.2.42
    Checking indexmap v2.12.0
    Checking flate2 v1.1.5
    Checking datafusion-doc v50.3.0
   Compiling bigdecimal v0.4.9
    Checking arrow-schema v57.0.0
    Checking flatbuffers v25.9.23
    Checking rustix v1.1.2
    Checking petgraph v0.8.3
   Compiling quote v1.0.41
   Compiling syn v2.0.108
   Compiling zstd-sys v2.0.16+zstd.1.5.7
   Compiling psm v0.1.27
   Compiling stacker v0.1.22
   Compiling ring v0.17.14
   Compiling blake3 v1.8.2
   Compiling bzip2-sys v0.1.13+1.0.8
   Compiling lzma-sys v0.1.20
    Checking tempfile v3.23.0
    Checking bzip2 v0.5.2
    Checking xz2 v0.1.7
   Compiling synstructure v0.13.2
   Compiling zerocopy-derive v0.8.27
   Compiling strum_macros v0.26.4
   Compiling serde_derive v1.0.228
   Compiling zerofrom-derive v0.1.6
   Compiling tokio-macros v2.6.0
   Compiling yoke-derive v0.8.0
   Compiling zerovec-derive v0.11.1
   Compiling displaydoc v0.2.5
   Compiling sqlparser_derive v0.3.0
   Compiling futures-macro v0.3.31
   Compiling async-trait v0.1.89
   Compiling thiserror-impl v2.0.17
   Compiling tracing-attributes v0.1.30
   Compiling recursive-proc-macro-impl v0.1.1
   Compiling thiserror-impl v1.0.69
   Compiling snafu-derive v0.8.9
   Compiling prost-derive v0.13.5
    Checking recursive v0.1.1
    Checking sqlparser v0.58.0
   Compiling zstd-safe v7.2.4
    Checking futures-util v0.3.31
    Checking tokio v1.48.0
    Checking zerofrom v0.1.6
    Checking thiserror v2.0.17
    Checking zstd v0.13.3
    Checking yoke v0.8.0
    Checking zerovec v0.11.4
    Checking zerotrie v0.2.2
    Checking thiserror v1.0.69
    Checking lzokay-native v0.1.0
    Checking zerocopy v0.8.27
    Checking tracing v0.1.41
    Checking prost v0.13.5
    Checking comfy-table v7.1.2
    Checking snafu v0.8.9
    Checking tinystr v0.8.1
    Checking potential_utf v0.1.3
    Checking icu_collections v2.0.0
    Checking icu_locale_core v2.0.0
    Checking half v2.7.1
    Checking ahash v0.8.12
    Checking ppv-lite86 v0.2.21
    Checking futures-executor v0.3.31
    Checking hashbrown v0.14.5
    Checking futures v0.3.31
    Checking rand_chacha v0.9.0
    Checking icu_provider v2.0.0
    Checking arrow-buffer v56.2.0
    Checking arrow-buffer v57.0.0
    Checking serde v1.0.228
    Checking rand v0.9.2
    Checking icu_properties v2.0.1
    Checking icu_normalizer v2.0.0
    Checking dashmap v6.1.0
    Checking arrow-schema v56.2.0
    Checking arrow-data v57.0.0
   Compiling arrow-data v56.2.0
    Checking datafusion-common-runtime v50.3.0
    Checking async-compression v0.4.19
    Checking tokio-util v0.7.16
    Checking idna_adapter v1.2.1
    Checking idna v1.1.0
    Checking arrow-array v57.0.0
    Checking url v2.5.7
    Checking arrow-array v56.2.0
    Checking object_store v0.12.4
    Checking arrow-select v57.0.0
    Checking arrow-arith v57.0.0
    Checking arrow-row v57.0.0
    Checking arrow-select v56.2.0
    Checking arrow-arith v56.2.0
    Checking arrow-row v56.2.0
    Checking arrow-cast v57.0.0
    Checking arrow-ipc v57.0.0
    Checking arrow-ord v57.0.0
    Checking arrow-string v57.0.0
    Checking arrow-cast v56.2.0
    Checking arrow-ipc v56.2.0
    Checking arrow-string v56.2.0
    Checking arrow-ord v56.2.0
    Checking arrow-csv v56.2.0
    Checking arrow-json v56.2.0
    Checking arrow-json v57.0.0
    Checking arrow-csv v57.0.0
    Checking parquet v56.2.0
    Checking arrow v56.2.0
    Checking arrow v57.0.0
    Checking orc-rust v0.7.0 (/home/nsun/workspace/orc-rust)
   Compiling datafusion-common v50.3.0
    Checking datafusion-expr-common v50.3.0
    Checking datafusion-physical-expr-common v50.3.0
    Checking datafusion-functions-aggregate-common v50.3.0
    Checking datafusion-functions-window-common v50.3.0
    Checking datafusion-expr v50.3.0
    Checking datafusion-physical-expr v50.3.0
    Checking datafusion-execution v50.3.0
    Checking datafusion-sql v50.3.0
    Checking datafusion-physical-plan v50.3.0
    Checking datafusion-optimizer v50.3.0
   Compiling datafusion-macros v50.3.0
    Checking datafusion-functions v50.3.0
    Checking datafusion-functions-aggregate v50.3.0
    Checking datafusion-functions-window v50.3.0
    Checking datafusion-session v50.3.0
    Checking datafusion-physical-expr-adapter v50.3.0
    Checking datafusion-functions-nested v50.3.0
    Checking datafusion-datasource v50.3.0
    Checking datafusion-catalog v50.3.0
    Checking datafusion-pruning v50.3.0
    Checking datafusion-physical-optimizer v50.3.0
    Checking datafusion-functions-table v50.3.0
    Checking datafusion-datasource-csv v50.3.0
    Checking datafusion-catalog-listing v50.3.0
    Checking datafusion-datasource-json v50.3.0
    Checking datafusion-datasource-parquet v50.3.0
    Checking datafusion v50.3.0
    Checking datafusion-orc v0.5.0 (/home/nsun/workspace/datafusion-orc)
error[E0308]: mismatched types
   --> src/file_format.rs:53:19
    |
53  |     Ok((loc_path, schema))
    |                   ^^^^^^ expected `arrow::datatypes::Schema`, found `arrow_schema::schema::Schema`
    |
note: two different versions of crate `arrow_schema` are being used; two types coming from two different versions of the same crate are different types even if they look the same
   --> /home/nsun/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/arrow-schema-56.2.0/src/schema.rs:187:1
    |
187 | pub struct Schema {
    | ^^^^^^^^^^^^^^^^^ this is the expected type `arrow::datatypes::Schema`
    |
   ::: /home/nsun/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/arrow-schema-57.0.0/src/schema.rs:187:1
    |
187 | pub struct Schema {
    | ^^^^^^^^^^^^^^^^^ this is the found type `arrow_schema::schema::Schema`
    |
   ::: src/lib.rs:47:5
    |
47  | use datafusion::arrow::datatypes::SchemaRef;
    |     ---------- one version of crate `arrow_schema` used here, as a dependency of crate `arrow_data`
    |
   ::: src/file_format.rs:32:5
    |
32  | use orc_rust::reader::metadata::read_metadata_async;
    |     -------- one version of crate `arrow_schema` used here, as a dependency of crate `arrow_data`
    = help: you can use `cargo tree` to explore your dependency tree

error[E0277]: the trait bound `arrow::error::ArrowError: std::convert::From<OrcError>` is not satisfied
  --> src/physical_exec.rs:67:31
   |
67 |               let mut builder = ArrowReaderBuilder::try_new_async(reader)
   |  _______________________________^
68 | |                 .await
69 | |                 .map_err(ArrowError::from)?;
   | |__________________________________________^ the trait `std::convert::From<OrcError>` is not implemented for `arrow::error::ArrowError`
   |
   = help: the following other types implement trait `std::convert::From<T>`:
             `arrow::error::ArrowError` implements `std::convert::From<DataFusionError>`
             `arrow::error::ArrowError` implements `std::convert::From<FromUtf8Error>`
             `arrow::error::ArrowError` implements `std::convert::From<IntoInnerError<W>>`
             `arrow::error::ArrowError` implements `std::convert::From<Utf8Error>`
             `arrow::error::ArrowError` implements `std::convert::From<datafusion::parquet::errors::ParquetError>`
             `arrow::error::ArrowError` implements `std::convert::From<std::io::Error>`

error[E0277]: the trait bound `arrow::error::ArrowError: std::convert::From<OrcError>` is not satisfied
  --> src/physical_exec.rs:69:26
   |
69 |                 .map_err(ArrowError::from)?;
   |                          ^^^^^^^^^^ the trait `std::convert::From<OrcError>` is not implemented for `arrow::error::ArrowError`
   |
   = help: the following other types implement trait `std::convert::From<T>`:
             `arrow::error::ArrowError` implements `std::convert::From<DataFusionError>`
             `arrow::error::ArrowError` implements `std::convert::From<FromUtf8Error>`
             `arrow::error::ArrowError` implements `std::convert::From<IntoInnerError<W>>`
             `arrow::error::ArrowError` implements `std::convert::From<Utf8Error>`
             `arrow::error::ArrowError` implements `std::convert::From<datafusion::parquet::errors::ParquetError>`
             `arrow::error::ArrowError` implements `std::convert::From<std::io::Error>`

error[E0271]: expected `{async block@src/physical_exec.rs:66:21: 66:31}` to be a future that resolves to `Result<Pin<Box<dyn Stream<Item = Result<RecordBatch, DataFusionError>> + Send>>, DataFusionError>`, but it resolves to `Result<Pin<Box<dyn Stream<Item = Result<RecordBatch, _>> + Send>>, _>`
   --> src/physical_exec.rs:66:12
    |
66  |           Ok(Box::pin(async move {
    |  ____________^
67  | |             let mut builder = ArrowReaderBuilder::try_new_async(reader)
68  | |                 .await
69  | |                 .map_err(ArrowError::from)?;
...   |
90  | |             Ok(reader.map_err(Into::into).boxed())
91  | |         }))
    | |__________^ expected `Result<Pin<Box<...>>, ...>`, found `Result<Pin<Box<...>>, _>`
    |
note: two different versions of crate `arrow_array` are being used; two types coming from two different versions of the same crate are different types even if they look the same
   --> /home/nsun/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/arrow-array-56.2.0/src/record_batch.rs:202:1
    |
202 | pub struct RecordBatch {
    | ^^^^^^^^^^^^^^^^^^^^^^ this is the expected type `arrow::array::RecordBatch`
    |
   ::: /home/nsun/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/arrow-array-57.0.0/src/record_batch.rs:202:1
    |
202 | pub struct RecordBatch {
    | ^^^^^^^^^^^^^^^^^^^^^^ this is the found type `arrow_array::record_batch::RecordBatch`
    |
   ::: src/lib.rs:47:5
    |
47  | use datafusion::arrow::datatypes::SchemaRef;
    |     ---------- one version of crate `arrow_array` used here, as a dependency of crate `arrow`
    |
   ::: src/file_format.rs:32:5
    |
32  | use orc_rust::reader::metadata::read_metadata_async;
    |     -------- one version of crate `arrow_array` used here, as a dependency of crate `arrow`
    = help: you can use `cargo tree` to explore your dependency tree
    = note: required for the cast from `Pin<Box<{async block@src/physical_exec.rs:66:21: 66:31}>>` to `Pin<Box<dyn Future<Output = Result<Pin<Box<dyn Stream<Item = ...> + Send>>, ...>> + Send>>`
    = note: the full name for the type has been written to '/home/nsun/workspace/datafusion-orc/target/debug/deps/datafusion_orc-480e1d5e4f2d63d8.long-type-2049530398898348610.txt'
    = note: consider using `--verbose` to print the full type name to the console

Some errors have detailed explanations: E0271, E0277, E0308.
For more information about an error, try `rustc --explain E0271`.
error: could not compile `datafusion-orc` (lib) due to 4 previous errors

But I understand the issue you point out is a real issue. We need to create a patch release 0.6.x for arrow 56 to include bugfix between current 0.6.4 and 0.7.0

@progval
Copy link
Collaborator

progval commented Oct 24, 2025

cargo will pull both arrow 56 and 57 when doing a clean build of datafusion-orc

Hmm, that's a disappointing behavior from Cargo. Users can work around it with cargo update arrow@57.0.0 --precise 56.0.0 but most people don't know how to do that.

We need to create a patch release 0.6.x for arrow 56 to include bugfix between current 0.6.4 and 0.7.0

I'm all for that, but it can be significant work to maintain old branches so I'd understand if no one wants to do it.

@sunng87
Copy link
Member Author

sunng87 commented Oct 24, 2025

Yes, I think the release process can be:

  • We will release minor (before we get 1.0) version for each arrow major release
  • We will create a patch or minor release for accumulated bugfix or features before arrow update, to ensure it's shipped with the arrow version that originally developed on. (Unfortunately this time we won't get chance to include feat: Implement RowSelection API for efficient row filtering (Phase 1) #59 and feat: Implement Efficient Skipping for RowSelection (Phase 2) #60 for a 0.7 on arrow 56, because I missed them 😢 )
  • If anyone from the community sticks with a particular version of arrow, they are suggested maintain a branch to pick bugfixes. (I might be maintaining the 0.6 because I will be on arrow 56 for a while).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants