-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-14939: [C++] Support Table lookups in FieldRef and FieldPath #34537
GH-14939: [C++] Support Table lookups in FieldRef and FieldPath #34537
Conversation
- add overloaded implementations of FieldRef::FindAll and FieldPath::Get for Table type - add unit tests for FieldRef::FindAll and FieldPath::Get both for RecordBatch and Table types
- fix if-statement formatting
Thanks for opening a pull request! If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project. Then could you also rename the pull request title in the following format?
or
In the case of PARQUET issues on JIRA the title also supports:
See also: |
@benibus please review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dinimar! I just did a first-pass and left some comments.
@dinimar I pushed a commit that reverts some of the formatting. I'm not entirely sure why the auto-formatter is still picking up I'm going to add some more commits addressing the nested-field tests soon. Then I'll do another round of review :-) |
@zeroshade @westonpace Looking at this a little deeper... do you know if Achieving the same thing for |
About formatting it's possible to fix it by rebasing on the last successful commit (of course after all work is done). For example, yours 4487be. It passed all workflows including formatting check |
Yes, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've pushed commits that address the tests and zero-copy requirement (the latter required some implementation changes, but it's similar in principle). I also added overloads for ChunkedArray
s for good measure.
Hopefully the tests better-clarify what I was getting at regarding nested fields, but let me know if you have any questions!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few minor suggestions. Marking Request Changes
because it isn't clear to me that a follow-up issue for creating generic tests has yet been created. Once we do that I think we can move forward with this.
cpp/src/arrow/type.cc
Outdated
Result<std::shared_ptr<ChunkedArray>> FieldPath::Get( | ||
const ChunkedArray& chunked_array) const { | ||
if (chunked_array.num_chunks() < 1) { | ||
return Status::Invalid("Chunked array must have at least one chunk"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was under the impression that it would be considered invalid since ChunkedArray::MakeEmpty
always allocates one chunk. https://github.com/dinimar/arrow/blob/ec6caeff8477af08679de7a7f12b90d4ba302457/cpp/src/arrow/chunked_array.h#L95-L96
Whether that's reasonable or not, I'm open to just returning an empty ChunkedArray
here (based on its type
or storage_type
) if that makes more sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, ok. I very rarely work with chunked arrays and so you may be right. It is possible to create chunks arrays with 0 chunks though:
>>> pa.chunked_array([], pa.int8()).num_chunks
0
My comment was mostly for understanding. I thought maybe a chunk was needed for some kind of inference.
I think I would slightly prefer returning an empty chunked array (with 1 chunk for consistency with ChunkedArray::MakeEmpty
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@benibus please resolve this converstion. At the moment, it's the last thing to be done before second review from Weston Pace
I've created an issue for the tests: #34830 |
@westonpace please review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you both for working on this.
Benchmark runs are scheduled for baseline = 1bda003 and contender = 1c06854. 1c06854 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
…apache#34537) ### Rationale for this change Described in the issue ### What changes are included in this PR? - added implementations for `FieldPath::Get(const Table& table)` and `FindAll(const Table& table)` - added unit tests for functions mentioned above both for `Table` and `RecordBatch` classes ### Are these changes tested? Yes, by unit tests ### Are there any user-facing changes? Most probably, no * Closes: apache#14939 Lead-authored-by: Dinir Imameev <dinir.imameev@gmail.com> Co-authored-by: benibus <bpharks@gmx.com> Signed-off-by: Weston Pace <weston.pace@gmail.com>
…apache#34537) ### Rationale for this change Described in the issue ### What changes are included in this PR? - added implementations for `FieldPath::Get(const Table& table)` and `FindAll(const Table& table)` - added unit tests for functions mentioned above both for `Table` and `RecordBatch` classes ### Are these changes tested? Yes, by unit tests ### Are there any user-facing changes? Most probably, no * Closes: apache#14939 Lead-authored-by: Dinir Imameev <dinir.imameev@gmail.com> Co-authored-by: benibus <bpharks@gmx.com> Signed-off-by: Weston Pace <weston.pace@gmail.com>
…apache#34537) ### Rationale for this change Described in the issue ### What changes are included in this PR? - added implementations for `FieldPath::Get(const Table& table)` and `FindAll(const Table& table)` - added unit tests for functions mentioned above both for `Table` and `RecordBatch` classes ### Are these changes tested? Yes, by unit tests ### Are there any user-facing changes? Most probably, no * Closes: apache#14939 Lead-authored-by: Dinir Imameev <dinir.imameev@gmail.com> Co-authored-by: benibus <bpharks@gmx.com> Signed-off-by: Weston Pace <weston.pace@gmail.com>
Rationale for this change
Described in the issue
What changes are included in this PR?
FieldPath::Get(const Table& table)
andFindAll(const Table& table)
Table
andRecordBatch
classesAre these changes tested?
Yes, by unit tests
Are there any user-facing changes?
Most probably, no