-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-37562: [Ruby] Add support for table.each_raw_record.to_a #37600
GH-37562: [Ruby] Add support for table.each_raw_record.to_a #37600
Conversation
…atch#each_raw_record without block
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the following PR #37137, I missed the following warnings happened. So I fixed it in this PR.
red-arrow % bundle exec env DYLD_LIBRARY_PATH=$DYLD_LIBRARY_PATH:/tmp/local/lib $(rbenv which ruby) test/run-test.rb
/Users/otegami/dev/project/arrow/ruby/red-arrow/test/raw-records/test-struct-array.rb:518: warning: method redefined; discarding old build
/Users/otegami/dev/project/arrow/ruby/red-arrow/test/each-raw-record/test-struct-array.rb:658: warning: previous definition of build was here
/Users/otegami/dev/project/arrow/ruby/red-arrow/test/raw-records/test-struct-array.rb:526: warning: method redefined; discarding old build
/Users/otegami/dev/project/arrow/ruby/red-arrow/test/each-raw-record/test-struct-array.rb:666: warning: previous definition of build was here
@@ -306,6 +306,10 @@ namespace red_arrow { | |||
|
|||
VALUE | |||
record_batch_each_raw_record(VALUE rb_record_batch){ | |||
if (!rb_block_given_p()) { | |||
return rb_funcall(rb_record_batch, rb_intern("to_enum"), 1, ID2SYM(rb_intern("each_raw_record"))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is a good way to return a Enumerator object, could you teach me?🙏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use RETURN_SIZED_ENUMERATOR()
:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix: f59e5ff
Thanks. I tried to use RETURN_SIZED_ENUMERATOR()
🙏
@@ -178,5 +178,29 @@ def setup | |||
@record_batch[[:c, "a", -1, 3..4]]) | |||
end | |||
end | |||
|
|||
sub_test_case("#each_raw_record") do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was uncertain about the appropriate location to add the following test cases.
I added it here because each_raw_record
is one of the methods for Arrow::Table or Arrow::RecordBatch and est/each-raw-record/*
focuses on the test related to arrays.
- ruby/red-arrow/test/each-raw-record/*
- ruby/red-arrow/test/test-record-batch.rb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can always use no block version like this:
diff --git a/ruby/red-arrow/test/each-raw-record/test-basic-arrays.rb b/ruby/red-arrow/test/each-raw-record/test-basic-arrays.rb
index dbbbd79ee..05f0e90ef 100644
--- a/ruby/red-arrow/test/each-raw-record/test-basic-arrays.rb
+++ b/ruby/red-arrow/test/each-raw-record/test-basic-arrays.rb
@@ -22,12 +22,8 @@ module EachRawRecordBasicArraysTests
[nil],
[nil],
]
- iterated_records = []
target = build({column: :null}, records)
- target.each_raw_record do |record|
- iterated_records << record
- end
- assert_equal(records, iterated_records)
+ assert_equal(records, target.each_raw_record.to_a)
end
def test_boolean
Because a returned Enumerator
uses with-block version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix: bc23b9c
I misunderstood that we explicitly had to call the 'each_raw_record' with a block because I didn't understand how Enumerable#to_a worked.
I will unify tests with raw_records and each_raw_record at the following issue.
@@ -306,6 +306,10 @@ namespace red_arrow { | |||
|
|||
VALUE | |||
record_batch_each_raw_record(VALUE rb_record_batch){ | |||
if (!rb_block_given_p()) { | |||
return rb_funcall(rb_record_batch, rb_intern("to_enum"), 1, ID2SYM(rb_intern("each_raw_record"))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use RETURN_SIZED_ENUMERATOR()
:
@@ -306,6 +306,10 @@ namespace red_arrow { | |||
|
|||
VALUE | |||
record_batch_each_raw_record(VALUE rb_record_batch){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
record_batch_each_raw_record(VALUE rb_record_batch){ | |
record_batch_each_raw_record(VALUE rb_record_batch) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix: 781f547
@@ -178,5 +178,29 @@ def setup | |||
@record_batch[[:c, "a", -1, 3..4]]) | |||
end | |||
end | |||
|
|||
sub_test_case("#each_raw_record") do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can always use no block version like this:
diff --git a/ruby/red-arrow/test/each-raw-record/test-basic-arrays.rb b/ruby/red-arrow/test/each-raw-record/test-basic-arrays.rb
index dbbbd79ee..05f0e90ef 100644
--- a/ruby/red-arrow/test/each-raw-record/test-basic-arrays.rb
+++ b/ruby/red-arrow/test/each-raw-record/test-basic-arrays.rb
@@ -22,12 +22,8 @@ module EachRawRecordBasicArraysTests
[nil],
[nil],
]
- iterated_records = []
target = build({column: :null}, records)
- target.each_raw_record do |record|
- iterated_records << record
- end
- assert_equal(records, iterated_records)
+ assert_equal(records, target.each_raw_record.to_a)
end
def test_boolean
Because a returned Enumerator
uses with-block version.
This change makes the code more efficient and readable by directly returning an enumerator with a known size, rather than calling `to_enum` and `each_raw_record` methods via `rb_funcall`.
@kou |
auto garrow_record_batch = GARROW_RECORD_BATCH(RVAL2GOBJ(rb_record_batch)); | ||
auto record_batch = garrow_record_batch_get_raw(garrow_record_batch).get(); | ||
RETURN_SIZED_ENUMERATOR(rb_record_batch, 0, 0, record_batch->num_rows()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use nullptr
for a null pointer in C++:
RETURN_SIZED_ENUMERATOR(rb_record_batch, 0, 0, record_batch->num_rows()); | |
RETURN_SIZED_ENUMERATOR(rb_record_batch, 0, nullptr, record_batch->num_rows()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much!
fix: 3563046
* @param[in] argc Number of objects of `argv`.
* @param[in] argv Arguments passed to the current method.
We can pass the 0 to argc
because it is literally the number of objects of argv
.
On the other hand, we cannot pass the 0. to argv
because we want to say there is no passed method pointer.
https://github.com/ruby/ruby/blob/05aaff2191cbe777d1efb915ab9652eeaa1c16b8/include/ruby/internal/intern/enumerator.h#L201C4-L202
…assed to the current method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit e56726c. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. |
…ache#37600) ### Rationale for this change This change aligns the behavior of `each_raw_record` with standard Ruby practices by returning an enumerator when no block is provided ### What changes are included in this PR? - Made `Arrow::Table#each_raw_record` and `Arrow::RecordBatch#each_raw_record` return Enumerator when it was called without block. - Added related tests - Resolved warnings related to duplicate test classes which were caused by apache#37137 ### Are these changes tested? Yes. ### Are there any user-facing changes? No * Closes: apache#37562 Authored-by: otegami <a.s.takuya1026@gmail.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
…ache#37600) ### Rationale for this change This change aligns the behavior of `each_raw_record` with standard Ruby practices by returning an enumerator when no block is provided ### What changes are included in this PR? - Made `Arrow::Table#each_raw_record` and `Arrow::RecordBatch#each_raw_record` return Enumerator when it was called without block. - Added related tests - Resolved warnings related to duplicate test classes which were caused by apache#37137 ### Are these changes tested? Yes. ### Are there any user-facing changes? No * Closes: apache#37562 Authored-by: otegami <a.s.takuya1026@gmail.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
Rationale for this change
This change aligns the behavior of
each_raw_record
with standard Ruby practices by returning an enumerator when no block is providedWhat changes are included in this PR?
Arrow::Table#each_raw_record
andArrow::RecordBatch#each_raw_record
return Enumerator when it was called without block.Are these changes tested?
Yes.
Are there any user-facing changes?
No