GH-47252: [C++][Compute] Fix sort_indices for temporal types in arrow::Table#50270
Conversation
|
|
There was a problem hiding this comment.
Pull request overview
Fixes a crash in compute::SortIndices / Table::sort_by when sorting arrow::Table columns with temporal logical types (e.g., timestamp), by ensuring per-batch sort-key arrays are converted to their physical representation before comparator-based merges.
Changes:
- Convert flattened per-record-batch sort-key arrays to the physical type (
GetPhysicalType+GetPhysicalArray) when constructingResolvedTableSortKeychunks. - Add a regression test covering
Tablemulti-key sorting on timestamp columns across multiple JSON chunks (record batches).
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| cpp/src/arrow/compute/kernels/vector_sort_internal.h | Ensure table sort-key chunks use physical array types to prevent comparator downcast crashes for temporal logical types. |
| cpp/src/arrow/compute/kernels/vector_sort_test.cc | Add regression coverage for Table sort indices on timestamp keys across multiple chunks and null placement variants. |
pitrou
left a comment
There was a problem hiding this comment.
(just another review to add a suggestion)
|
CI failures are unrelated, I'll merge. |
|
|
|
After merging your PR, Conbench analyzed the 3 benchmarking runs that have been run so far on merge-commit bf108d4. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 2 possible false positives for unstable benchmarks that are known to sometimes produce them. |
welcome |
Rationale for this change
I was unable to use compute::SortIndices with timestamp type because of crash.
What changes are included in this PR?
Fix. The issue was that comparator for merging record batches was not converting timestamp type to its physical variant.
so it crashed on null pointer reference on checked_cast result
Are these changes tested?
yes
Are there any user-facing changes?
This PR contains a "Critical Fix". (If the changes fix either (a) a security vulnerability, (b) a bug that caused incorrect or invalid data to be produced, or (c) a bug that causes a crash (even when the API contract is upheld), please provide explanation. If not, you can remove this.)
i've already provided, i suppose