-
Notifications
You must be signed in to change notification settings - Fork 36
fix(rust/sedona-expr): Fix GeoParquet pruning when number of final columns is less than the geometry column index #385
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(rust/sedona-expr): Fix GeoParquet pruning when number of final columns is less than the geometry column index #385
Conversation
| /// name <https://github.com/apache/sedona-db/pull/385>. This option may | ||
| /// be removed if the incorrect index can be resolved upstream. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you have any (and I mean any) hunches of what the exact behavior causing this in DataFusion is, now would be a good time to mention it while you're fresh off your investigation. Not top priority, but I'd eventually like to narrow it down enough to submit an issue at least.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the continued prods to investigate this more deeply...this particular prod I think resulted in me finding the root cause here (not a datafusion issue 😬 ). I'll push a (hopefully) fix shortly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Co-authored-by: Peter Nguyen <petern0408@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes a bug in GeoParquet pruning where spatial filters fail when the number of projected columns is less than the geometry column's index. The fix introduces TableGeoStatistics to handle column resolution by name instead of position, avoiding index out-of-bounds errors.
- Introduces
TableGeoStatisticsenum to resolve statistics by name or position - Updates
SpatialFilter::evaluateto returnResult<bool>and useTableGeoStatistics - Refactors statistics handling in GeoParquet file opener to use the new approach
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| rust/sedona-geoparquet/src/file_opener.rs | Updates spatial filter evaluation calls to use TableGeoStatistics and handle new Result return type |
| rust/sedona-expr/src/statistics.rs | Makes GeoStatistics::unspecified() a const for efficiency |
| rust/sedona-expr/src/spatial_filter.rs | Introduces TableGeoStatistics enum and refactors SpatialFilter::evaluate to return Result |
| rust/sedona-datasource/src/spec.rs | Adds documentation clarifying that filter Column indices are relative to file_projection |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…lumns is less than the geometry column index (#385) Co-authored-by: Peter Nguyen <petern0408@gmail.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…lumns is less than the geometry column index (apache#385) Co-authored-by: Peter Nguyen <petern0408@gmail.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
The issue identified by the GDAL filter pushdown in #384 / #380 also affects GeoParquet reads. The following query wouldn't complete for me before this change (after this change it completes in ~40s on a debug build).
The changes I tried to mostly isolate to
sedona-exprto ensure this change is more tightly scoped. I opened #389 to more holistically solve this since we're about to revisit that code shortly when we update DataFusion to provide Geometry/Geography parquet support (unless there's consensus that we should just do that now).