chore(deps): bump the datafusion-arrow-parquet group across 1 directory with 22 updates#797
Conversation
…ry with 22 updates Bumps the datafusion-arrow-parquet group with 19 updates in the / directory: | Package | From | To | | --- | --- | --- | | [arrow](https://github.com/apache/arrow-rs) | `57.1.0` | `58.1.0` | | [arrow-array](https://github.com/apache/arrow-rs) | `57.1.0` | `58.1.0` | | [arrow-cast](https://github.com/apache/arrow-rs) | `57.1.0` | `58.1.0` | | [arrow-ipc](https://github.com/apache/arrow-rs) | `57.1.0` | `58.1.0` | | [arrow-json](https://github.com/apache/arrow-rs) | `57.1.0` | `58.1.0` | | [datafusion](https://github.com/apache/datafusion) | `51.0.0` | `52.5.0` | | [datafusion-catalog](https://github.com/apache/datafusion) | `51.0.0` | `52.5.0` | | [datafusion-common](https://github.com/apache/datafusion) | `51.0.0` | `52.5.0` | | [datafusion-common-runtime](https://github.com/apache/datafusion) | `51.0.0` | `52.5.0` | | [datafusion-datasource](https://github.com/apache/datafusion) | `51.0.0` | `52.5.0` | | [datafusion-datasource-parquet](https://github.com/apache/datafusion) | `51.0.0` | `52.5.0` | | [datafusion-execution](https://github.com/apache/datafusion) | `51.0.0` | `52.5.0` | | [datafusion-expr](https://github.com/apache/datafusion) | `51.0.0` | `52.5.0` | | [datafusion-ffi](https://github.com/apache/datafusion) | `51.0.0` | `52.5.0` | | [datafusion-optimizer](https://github.com/apache/datafusion) | `51.0.0` | `52.5.0` | | [datafusion-physical-expr](https://github.com/apache/datafusion) | `51.0.0` | `52.5.0` | | [datafusion-physical-plan](https://github.com/apache/datafusion) | `51.0.0` | `52.5.0` | | [datafusion-pruning](https://github.com/apache/datafusion) | `51.0.0` | `52.5.0` | | [parquet](https://github.com/apache/arrow-rs) | `57.1.0` | `58.1.0` | Updates `arrow` from 57.1.0 to 58.1.0 - [Changelog](https://github.com/apache/arrow-rs/blob/main/CHANGELOG.md) - [Commits](apache/arrow-rs@57.1.0...58.1.0) Updates `arrow-array` from 57.1.0 to 58.1.0 - [Changelog](https://github.com/apache/arrow-rs/blob/main/CHANGELOG.md) - [Commits](apache/arrow-rs@57.1.0...58.1.0) Updates `arrow-buffer` from 57.1.0 to 57.3.0 - [Changelog](https://github.com/apache/arrow-rs/blob/57.3.0/CHANGELOG.md) - [Commits](apache/arrow-rs@57.1.0...57.3.0) Updates `arrow-cast` from 57.1.0 to 58.1.0 - [Changelog](https://github.com/apache/arrow-rs/blob/main/CHANGELOG.md) - [Commits](apache/arrow-rs@57.1.0...58.1.0) Updates `arrow-data` from 57.1.0 to 57.3.0 - [Changelog](https://github.com/apache/arrow-rs/blob/57.3.0/CHANGELOG.md) - [Commits](apache/arrow-rs@57.1.0...57.3.0) Updates `arrow-ipc` from 57.1.0 to 58.1.0 - [Changelog](https://github.com/apache/arrow-rs/blob/main/CHANGELOG.md) - [Commits](apache/arrow-rs@57.1.0...58.1.0) Updates `arrow-json` from 57.1.0 to 58.1.0 - [Changelog](https://github.com/apache/arrow-rs/blob/main/CHANGELOG.md) - [Commits](apache/arrow-rs@57.1.0...58.1.0) Updates `arrow-schema` from 57.1.0 to 57.3.0 - [Changelog](https://github.com/apache/arrow-rs/blob/57.3.0/CHANGELOG.md) - [Commits](apache/arrow-rs@57.1.0...57.3.0) Updates `datafusion` from 51.0.0 to 52.5.0 - [Changelog](https://github.com/apache/datafusion/blob/main/CHANGELOG.md) - [Commits](apache/datafusion@51.0.0...52.5.0) Updates `datafusion-catalog` from 51.0.0 to 52.5.0 - [Changelog](https://github.com/apache/datafusion/blob/main/CHANGELOG.md) - [Commits](apache/datafusion@51.0.0...52.5.0) Updates `datafusion-common` from 51.0.0 to 52.5.0 - [Changelog](https://github.com/apache/datafusion/blob/main/CHANGELOG.md) - [Commits](apache/datafusion@51.0.0...52.5.0) Updates `datafusion-common-runtime` from 51.0.0 to 52.5.0 - [Changelog](https://github.com/apache/datafusion/blob/main/CHANGELOG.md) - [Commits](apache/datafusion@51.0.0...52.5.0) Updates `datafusion-datasource` from 51.0.0 to 52.5.0 - [Changelog](https://github.com/apache/datafusion/blob/main/CHANGELOG.md) - [Commits](apache/datafusion@51.0.0...52.5.0) Updates `datafusion-datasource-parquet` from 51.0.0 to 52.5.0 - [Changelog](https://github.com/apache/datafusion/blob/main/CHANGELOG.md) - [Commits](apache/datafusion@51.0.0...52.5.0) Updates `datafusion-execution` from 51.0.0 to 52.5.0 - [Changelog](https://github.com/apache/datafusion/blob/main/CHANGELOG.md) - [Commits](apache/datafusion@51.0.0...52.5.0) Updates `datafusion-expr` from 51.0.0 to 52.5.0 - [Changelog](https://github.com/apache/datafusion/blob/main/CHANGELOG.md) - [Commits](apache/datafusion@51.0.0...52.5.0) Updates `datafusion-ffi` from 51.0.0 to 52.5.0 - [Changelog](https://github.com/apache/datafusion/blob/main/CHANGELOG.md) - [Commits](apache/datafusion@51.0.0...52.5.0) Updates `datafusion-optimizer` from 51.0.0 to 52.5.0 - [Changelog](https://github.com/apache/datafusion/blob/main/CHANGELOG.md) - [Commits](apache/datafusion@51.0.0...52.5.0) Updates `datafusion-physical-expr` from 51.0.0 to 52.5.0 - [Changelog](https://github.com/apache/datafusion/blob/main/CHANGELOG.md) - [Commits](apache/datafusion@51.0.0...52.5.0) Updates `datafusion-physical-plan` from 51.0.0 to 52.5.0 - [Changelog](https://github.com/apache/datafusion/blob/main/CHANGELOG.md) - [Commits](apache/datafusion@51.0.0...52.5.0) Updates `datafusion-pruning` from 51.0.0 to 52.5.0 - [Changelog](https://github.com/apache/datafusion/blob/main/CHANGELOG.md) - [Commits](apache/datafusion@51.0.0...52.5.0) Updates `parquet` from 57.1.0 to 58.1.0 - [Changelog](https://github.com/apache/arrow-rs/blob/main/CHANGELOG.md) - [Commits](apache/arrow-rs@57.1.0...58.1.0) --- updated-dependencies: - dependency-name: arrow dependency-version: 58.1.0 dependency-type: direct:production update-type: version-update:semver-major dependency-group: datafusion-arrow-parquet - dependency-name: arrow-array dependency-version: 58.1.0 dependency-type: direct:production update-type: version-update:semver-major dependency-group: datafusion-arrow-parquet - dependency-name: arrow-buffer dependency-version: 57.3.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: datafusion-arrow-parquet - dependency-name: arrow-cast dependency-version: 58.1.0 dependency-type: direct:production update-type: version-update:semver-major dependency-group: datafusion-arrow-parquet - dependency-name: arrow-data dependency-version: 57.3.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: datafusion-arrow-parquet - dependency-name: arrow-ipc dependency-version: 58.1.0 dependency-type: direct:production update-type: version-update:semver-major dependency-group: datafusion-arrow-parquet - dependency-name: arrow-json dependency-version: 58.1.0 dependency-type: direct:production update-type: version-update:semver-major dependency-group: datafusion-arrow-parquet - dependency-name: arrow-schema dependency-version: 57.3.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: datafusion-arrow-parquet - dependency-name: datafusion dependency-version: 52.5.0 dependency-type: direct:production update-type: version-update:semver-major dependency-group: datafusion-arrow-parquet - dependency-name: datafusion-catalog dependency-version: 52.5.0 dependency-type: direct:production update-type: version-update:semver-major dependency-group: datafusion-arrow-parquet - dependency-name: datafusion-common dependency-version: 52.5.0 dependency-type: direct:production update-type: version-update:semver-major dependency-group: datafusion-arrow-parquet - dependency-name: datafusion-common-runtime dependency-version: 52.5.0 dependency-type: direct:production update-type: version-update:semver-major dependency-group: datafusion-arrow-parquet - dependency-name: datafusion-datasource dependency-version: 52.5.0 dependency-type: direct:production update-type: version-update:semver-major dependency-group: datafusion-arrow-parquet - dependency-name: datafusion-datasource-parquet dependency-version: 52.5.0 dependency-type: direct:production update-type: version-update:semver-major dependency-group: datafusion-arrow-parquet - dependency-name: datafusion-execution dependency-version: 52.5.0 dependency-type: direct:production update-type: version-update:semver-major dependency-group: datafusion-arrow-parquet - dependency-name: datafusion-expr dependency-version: 52.5.0 dependency-type: direct:production update-type: version-update:semver-major dependency-group: datafusion-arrow-parquet - dependency-name: datafusion-ffi dependency-version: 52.5.0 dependency-type: direct:production update-type: version-update:semver-major dependency-group: datafusion-arrow-parquet - dependency-name: datafusion-optimizer dependency-version: 52.5.0 dependency-type: direct:production update-type: version-update:semver-major dependency-group: datafusion-arrow-parquet - dependency-name: datafusion-physical-expr dependency-version: 52.5.0 dependency-type: direct:production update-type: version-update:semver-major dependency-group: datafusion-arrow-parquet - dependency-name: datafusion-physical-plan dependency-version: 52.5.0 dependency-type: direct:production update-type: version-update:semver-major dependency-group: datafusion-arrow-parquet - dependency-name: datafusion-pruning dependency-version: 52.5.0 dependency-type: direct:production update-type: version-update:semver-major dependency-group: datafusion-arrow-parquet - dependency-name: parquet dependency-version: 58.1.0 dependency-type: direct:production update-type: version-update:semver-major dependency-group: datafusion-arrow-parquet ... Signed-off-by: dependabot[bot] <support@github.com>
ca5495f to
d20c41c
Compare
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
| let ctx = Arc::new(SessionContext::new()) as Arc<dyn TaskContextProvider>; | ||
| let ffi_provider = FFI_TableProvider::new( | ||
| provider, | ||
| true, | ||
| Some(self.runtime.handle().clone()), | ||
| &ctx, | ||
| None, | ||
| ); |
There was a problem hiding this comment.
A few FFI things changed...we don't really use this except currently for the situation where there are multiple SessionContexts and we pass data frames between them (and we should probably find a better way to do this).
| fn try_pushdown_projection( | ||
| &self, | ||
| projection: &ProjectionExprs, | ||
| ) -> Result<Option<Arc<dyn FileSource>>> { | ||
| // Use SplitProjection to handle any projection: | ||
| // - file_indices provides column pruning (always works) | ||
| // - ProjectionOpener handles reordering/expressions/renames after reading | ||
| let split_projection = SplitProjection::new(self.table_schema.file_schema(), projection); |
There was a problem hiding this comment.
This projection pushdown bit is a key change in DataFusion 52. For the generic datasource (which we use for GDAL input) we can use the built-in pattern for bridging this with the previous approach (simple integers).
| /// Apply a [SchemaAdapterFactory] to the inner [ParquetSource] | ||
| pub fn with_schema_adapter_factory( | ||
| &self, | ||
| schema_adapter_factory: Arc<dyn SchemaAdapterFactory>, | ||
| ) -> Self { |
There was a problem hiding this comment.
The schema adapter factory is no more (there is something called a physical expression adapater factory that we may need to add support for later)
| // DataFusion 52 has an issue where field metadata (like ARROW:extension:name) | ||
| // is stripped when evaluating embedded projections in ParquetOpener. This is | ||
| // because the batch schema comes from the parquet reader (which doesn't have | ||
| // extension metadata), and Column::return_field() looks up fields from that schema. | ||
| // This isn't a bug in DataFusion because we're the ones that advertised the table | ||
| // schema as having metadata'd expressions in the first place. | ||
| // | ||
| // We fix this by wrapping Column expressions with MetadataPreservingColumn, | ||
| // which stores the correct field from the file schema and returns it from | ||
| // return_field() regardless of the input schema. | ||
| let transformed_projection = wrap_columns_with_metadata_preserving( | ||
| projection.clone(), | ||
| self.inner.table_schema().table_schema(), | ||
| )?; |
There was a problem hiding this comment.
This is a somewhat involved change that Opus did a great job root causing but very bad job solving and it took me some time to fix this. Because we advertise a file schema that is not the underlying Parquet file schema, we have expressions whose columns refer to a schema that the inner opener assumes is identical to the file schema (but are not!). I hope we can remove this wrapper at some point and/or simplify our Parquet wrapping strategy but for now this is is a minimally invasive strategy that ensures we don't loose benefits of the wrapped Parquet implementation.
| // Wrap with ProjectionOpener to handle reordering/expressions | ||
| if let Some(split_projection) = &self.split_projection { | ||
| ProjectionOpener::try_new( | ||
| split_projection.clone(), | ||
| inner_opener, | ||
| self.table_schema.file_schema(), | ||
| ) | ||
| } else { | ||
| Ok(inner_opener) | ||
| } |
There was a problem hiding this comment.
cc @b4l...this is to keep up with the changes for DataFusion 52. The current tests pass but I didn't add new ones with more complex projections to check (there are some for the sedona-datasource that did trigger a failure and this approach worked there to fix the issue).
Bumps the datafusion-arrow-parquet group with 19 updates in the / directory:
57.1.058.1.057.1.058.1.057.1.058.1.057.1.058.1.057.1.058.1.051.0.052.5.051.0.052.5.051.0.052.5.051.0.052.5.051.0.052.5.051.0.052.5.051.0.052.5.051.0.052.5.051.0.052.5.051.0.052.5.051.0.052.5.051.0.052.5.051.0.052.5.057.1.058.1.0Updates
arrowfrom 57.1.0 to 58.1.0Changelog
Sourced from arrow's changelog.
... (truncated)
Commits
6cadf3bPrepare for 58.1.0 Release (#9573)322f9ce[Variant] Add unshred_variant support for Binary and LargeBinary types (#9576)bc74c71feat(parquet): add content defined chunking for arrow writer (#9450)39dda22Make Sbbf Constructers Public (#9569)d53df60feat: Optimize from_bitwise_binary_op with 64-bit alignment (#9441)44f5dfcperf: Coalesce page fetches when RowSelection selects all rows (#9578)14f1eb9pyarrow: Cache the imported classes to avoid importing them each time (#9439)55a7768[Variant] Addvariant_to_arrowStructtype support (#9572)42ab0bcfix: Usedchecked_addfor bounds checks to avoid UB (#9568)88422cbarrow-flight: generate dict_ids for dicts nested inside complex types (#9556)Updates
arrow-arrayfrom 57.1.0 to 58.1.0Changelog
Sourced from arrow-array's changelog.
... (truncated)
Commits
6cadf3bPrepare for 58.1.0 Release (#9573)322f9ce[Variant] Add unshred_variant support for Binary and LargeBinary types (#9576)bc74c71feat(parquet): add content defined chunking for arrow writer (#9450)39dda22Make Sbbf Constructers Public (#9569)d53df60feat: Optimize from_bitwise_binary_op with 64-bit alignment (#9441)44f5dfcperf: Coalesce page fetches when RowSelection selects all rows (#9578)14f1eb9pyarrow: Cache the imported classes to avoid importing them each time (#9439)55a7768[Variant] Addvariant_to_arrowStructtype support (#9572)42ab0bcfix: Usedchecked_addfor bounds checks to avoid UB (#9568)88422cbarrow-flight: generate dict_ids for dicts nested inside complex types (#9556)Updates
arrow-bufferfrom 57.1.0 to 57.3.0Changelog
Sourced from arrow-buffer's changelog.
Commits
7505005[57_maintenance] Update version to 57.3.0, add changelog (#9333)6bbfb99[maintenance_57] Fix string array equality when the values buffer is the same...505eb8e[57_maintenance] Revert "Seal Array trait (#9092)", markArrayasunsafe...74cf914[57_maintenance] MarkBufferBuilder::new_from_bufferas unsafe (#9292) (#9312)25cc1ac[57_maintenance] fix: ensureBufferBuilder::truncatedoesn't overset length...9fc2fbb[57_maintenance[Parquet] Provide only encrypted column stats in plaintext foo...3df3157[57_maintenance] [regression] Error with adaptive predicate pushdown: "Invali...9e822e0Update version to57.2.0, add CHANGELOG (#9103)28f66f9Add Union encoding documentation (#9102)a8346beMinor: make it clear cache array reader is not cloning arrays (#9057)Updates
arrow-castfrom 57.1.0 to 58.1.0Changelog
Sourced from arrow-cast's changelog.
... (truncated)
Commits
6cadf3bPrepare for 58.1.0 Release (#9573)322f9ce[Variant] Add unshred_variant support for Binary and LargeBinary types (#9576)bc74c71feat(parquet): add content defined chunking for arrow writer (#9450)39dda22Make Sbbf Constructers Public (#9569)d53df60feat: Optimize from_bitwise_binary_op with 64-bit alignment (#9441)44f5dfcperf: Coalesce page fetches when RowSelection selects all rows (#9578)14f1eb9pyarrow: Cache the imported classes to avoid importing them each time (#9439)55a7768[Variant] Addvariant_to_arrowStructtype support (#9572)42ab0bcfix: Usedchecked_addfor bounds checks to avoid UB (#9568)88422cbarrow-flight: generate dict_ids for dicts nested inside complex types (#9556)Updates
arrow-datafrom 57.1.0 to 57.3.0Changelog
Sourced from arrow-data's changelog.
Commits
7505005[57_maintenance] Update version to 57.3.0, add changelog (#9333)6bbfb99[maintenance_57] Fix string array equality when the values buffer is the same...505eb8e[57_maintenance] Revert "Seal Array trait (#9092)", markArrayasunsafe...74cf914[57_maintenance] MarkBufferBuilder::new_from_bufferas unsafe (#9292) (#9312)25cc1ac[57_maintenance] fix: ensureBufferBuilder::truncatedoesn't overset length...9fc2fbb[57_maintenance[Parquet] Provide only encrypted column stats in plaintext foo...3df3157[57_maintenance] [regression] Error with adaptive predicate pushdown: "Invali...9e822e0Update version to57.2.0, add CHANGELOG (#9103)28f66f9Add Union encoding documentation (#9102)a8346beMinor: make it clear cache array reader is not cloning arrays (#9057)Updates
arrow-ipcfrom 57.1.0 to 58.1.0Changelog
Sourced from arrow-ipc's changelog.
... (truncated)
Commits
6cadf3bPrepare for 58.1.0 Release (#9573)322f9ce[Variant] Add unshred_variant support for Binary and LargeBinary types (#9576)bc74c71feat(parquet): add content defined chunking for arrow writer (#9450)39dda22Make Sbbf Constructers Public (#9569)d53df60feat: Optimize from_bitwise_binary_op with 64-bit alignment (#9441)44f5dfcperf: Coalesce page fetches when RowSelection selects all rows (#9578)14f1eb9pyarrow: Cache the imported classes to avoid importing them each time (#9439)55a7768[Variant] Addvariant_to_arrowStructtype support (#9572)42ab0bcfix: Usedchecked_addfor bounds checks to avoid UB (#9568)88422cbarrow-flight: generate dict_ids for dicts nested inside complex types (#9556)Updates
arrow-jsonfrom 57.1.0 to 58.1.0Changelog
Sourced from arrow-json's changelog.
... (truncated)
Commits
6cadf3bPrepare for 58.1.0 Release (#9573)322f9ce[Variant] Add unshred_variant support for Binary and LargeBinary types (#9576)bc74c71feat(parquet): add content defined chunking for arrow writer (#9450)39dda22Make Sbbf Constructers Public (#9569)d53df60feat: Optimize from_bitwise_binary_op with 64-bit alignment (#9441)44f5dfcperf: Coalesce page fetches when RowSelection selects all rows (#9578)14f1eb9pyarrow: Cache the imported classes to avoid importing them each time (#9439)55a7768[Variant] Addvariant_to_arrowStructtype support (#9572)42ab0bcfix: Usedchecked_addfor bounds checks to avoid UB (#9568)88422cbarrow-flight: generate dict_ids for dicts nested inside complex types (#9556)Description has been truncated