Skip to content

Commit

Permalink
GH-14946: [C++] Add flattening FieldPath/FieldRef::Get methods (#35197)
Browse files Browse the repository at this point in the history
### Rationale for this change

The current `FieldPath::Get` methods - when extracting nested child values - don't combine the child's null bitmap with higher-level parent bitmaps. While this is often preferable (it allows for zero-copy), there are cases where higher level "flattening" version is useful - e.g. when pre-processing sort keys for structs.

### What changes are included in this PR?

- Adds `FieldPath::GetFlattened` methods alongside the existing  `FieldPath::Get` overloads
- Adds `GetAllFlattened`, `GetOneFlattened` and `GetOneOrNoneFlattened` methods to `FieldRef`
- Adds a couple internal helpers for dealing with both `Get` variants in templates
- Overhauls the `FieldPath` tests in an effort to improve coverage and generalize across the supported input types

More significantly, this alters the `FieldPathGetImpl` internals to use a new `NestedSelector` class. The reason for this is that the prior method required presenting a vector of instantiated child values for each depth level prior to selection. With support for flattening (and recently, `ChunkedArrays`), this becomes a problem since we need to explicitly create all of those child values for each depth level despite the fact that we're only going to select one. So these changes allow any expensive instantiations to be deferred to selection time.

This also indirectly solves an issue that surfaced in the new tests, which is that `FieldPath::Get` would return incorrect nested values when sliced `Array`s are involved. This is because the underlying child data's offset/length weren't being adjusted based on the parent.

### Are these changes tested?

Yes (tests are included)

### Are there any user-facing changes?

Yes, this adds methods to a public API
* Closes: #14946

Lead-authored-by: benibus <bpharks@gmx.com>
Co-authored-by: Antoine Pitrou <antoine@python.org>
Co-authored-by: Ben Harkins <60872452+benibus@users.noreply.github.com>
Co-authored-by: Antoine Pitrou <pitrou@free.fr>
Signed-off-by: Antoine Pitrou <antoine@python.org>
  • Loading branch information
3 people committed May 22, 2023
1 parent 2216a0a commit f3500f6
Show file tree
Hide file tree
Showing 5 changed files with 936 additions and 597 deletions.
2 changes: 1 addition & 1 deletion cpp/src/arrow/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -796,7 +796,7 @@ set_source_files_properties(public_api_test.cc PROPERTIES SKIP_PRECOMPILE_HEADER
SKIP_UNITY_BUILD_INCLUSION ON)

add_arrow_test(scalar_test)
add_arrow_test(type_test)
add_arrow_test(type_test SOURCES field_ref_test.cc type_test.cc)

add_arrow_test(table_test
SOURCES
Expand Down

0 comments on commit f3500f6

Please sign in to comment.