Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-37562: [Ruby] Add support for table.each_raw_record.to_a #37600

Merged
merged 7 commits into from
Sep 11, 2023

Conversation

otegami
Copy link
Contributor

@otegami otegami commented Sep 6, 2023

Rationale for this change

This change aligns the behavior of each_raw_record with standard Ruby practices by returning an enumerator when no block is provided

What changes are included in this PR?

Are these changes tested?

Yes.

Are there any user-facing changes?

No

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the following PR #37137, I missed the following warnings happened. So I fixed it in this PR.

red-arrow % bundle exec env DYLD_LIBRARY_PATH=$DYLD_LIBRARY_PATH:/tmp/local/lib $(rbenv which ruby) test/run-test.rb
/Users/otegami/dev/project/arrow/ruby/red-arrow/test/raw-records/test-struct-array.rb:518: warning: method redefined; discarding old build
/Users/otegami/dev/project/arrow/ruby/red-arrow/test/each-raw-record/test-struct-array.rb:658: warning: previous definition of build was here
/Users/otegami/dev/project/arrow/ruby/red-arrow/test/raw-records/test-struct-array.rb:526: warning: method redefined; discarding old build
/Users/otegami/dev/project/arrow/ruby/red-arrow/test/each-raw-record/test-struct-array.rb:666: warning: previous definition of build was here

@@ -306,6 +306,10 @@ namespace red_arrow {

VALUE
record_batch_each_raw_record(VALUE rb_record_batch){
if (!rb_block_given_p()) {
return rb_funcall(rb_record_batch, rb_intern("to_enum"), 1, ID2SYM(rb_intern("each_raw_record")));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is a good way to return a Enumerator object, could you teach me?🙏

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix: f59e5ff
Thanks. I tried to use RETURN_SIZED_ENUMERATOR()🙏

@@ -178,5 +178,29 @@ def setup
@record_batch[[:c, "a", -1, 3..4]])
end
end

sub_test_case("#each_raw_record") do
Copy link
Contributor Author

@otegami otegami Sep 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was uncertain about the appropriate location to add the following test cases.
I added it here because each_raw_record is one of the methods for Arrow::Table or Arrow::RecordBatch and est/each-raw-record/* focuses on the test related to arrays.

  • ruby/red-arrow/test/each-raw-record/*
  • ruby/red-arrow/test/test-record-batch.rb

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can always use no block version like this:

diff --git a/ruby/red-arrow/test/each-raw-record/test-basic-arrays.rb b/ruby/red-arrow/test/each-raw-record/test-basic-arrays.rb
index dbbbd79ee..05f0e90ef 100644
--- a/ruby/red-arrow/test/each-raw-record/test-basic-arrays.rb
+++ b/ruby/red-arrow/test/each-raw-record/test-basic-arrays.rb
@@ -22,12 +22,8 @@ module EachRawRecordBasicArraysTests
       [nil],
       [nil],
     ]
-    iterated_records = []
     target = build({column: :null}, records)
-    target.each_raw_record do |record|
-      iterated_records << record
-    end
-    assert_equal(records, iterated_records)
+    assert_equal(records, target.each_raw_record.to_a)
   end
 
   def test_boolean

Because a returned Enumerator uses with-block version.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix: bc23b9c

I misunderstood that we explicitly had to call the 'each_raw_record' with a block because I didn't understand how Enumerable#to_a worked.

I will unify tests with raw_records and each_raw_record at the following issue.

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Sep 6, 2023
@@ -306,6 +306,10 @@ namespace red_arrow {

VALUE
record_batch_each_raw_record(VALUE rb_record_batch){
if (!rb_block_given_p()) {
return rb_funcall(rb_record_batch, rb_intern("to_enum"), 1, ID2SYM(rb_intern("each_raw_record")));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -306,6 +306,10 @@ namespace red_arrow {

VALUE
record_batch_each_raw_record(VALUE rb_record_batch){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
record_batch_each_raw_record(VALUE rb_record_batch){
record_batch_each_raw_record(VALUE rb_record_batch) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix: 781f547

@@ -178,5 +178,29 @@ def setup
@record_batch[[:c, "a", -1, 3..4]])
end
end

sub_test_case("#each_raw_record") do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can always use no block version like this:

diff --git a/ruby/red-arrow/test/each-raw-record/test-basic-arrays.rb b/ruby/red-arrow/test/each-raw-record/test-basic-arrays.rb
index dbbbd79ee..05f0e90ef 100644
--- a/ruby/red-arrow/test/each-raw-record/test-basic-arrays.rb
+++ b/ruby/red-arrow/test/each-raw-record/test-basic-arrays.rb
@@ -22,12 +22,8 @@ module EachRawRecordBasicArraysTests
       [nil],
       [nil],
     ]
-    iterated_records = []
     target = build({column: :null}, records)
-    target.each_raw_record do |record|
-      iterated_records << record
-    end
-    assert_equal(records, iterated_records)
+    assert_equal(records, target.each_raw_record.to_a)
   end
 
   def test_boolean

Because a returned Enumerator uses with-block version.

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Sep 7, 2023
This change makes the code more efficient and readable by directly returning an enumerator with a known size,
rather than calling `to_enum` and `each_raw_record` methods via `rb_funcall`.
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Sep 10, 2023
@otegami
Copy link
Contributor Author

otegami commented Sep 10, 2023

@kou
Thank you for reviewing 🙏 I've fixed all the points you commented on.

@otegami otegami requested a review from kou September 10, 2023 23:59
auto garrow_record_batch = GARROW_RECORD_BATCH(RVAL2GOBJ(rb_record_batch));
auto record_batch = garrow_record_batch_get_raw(garrow_record_batch).get();
RETURN_SIZED_ENUMERATOR(rb_record_batch, 0, 0, record_batch->num_rows());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use nullptr for a null pointer in C++:

Suggested change
RETURN_SIZED_ENUMERATOR(rb_record_batch, 0, 0, record_batch->num_rows());
RETURN_SIZED_ENUMERATOR(rb_record_batch, 0, nullptr, record_batch->num_rows());

Copy link
Contributor Author

@otegami otegami Sep 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much!
fix: 3563046

 * @param[in]  argc     Number of objects of `argv`.
 * @param[in]  argv     Arguments passed to the current method.

We can pass the 0 to argc because it is literally the number of objects of argv.
On the other hand, we cannot pass the 0. to argv because we want to say there is no passed method pointer.
https://github.com/ruby/ruby/blob/05aaff2191cbe777d1efb915ab9652eeaa1c16b8/include/ruby/internal/intern/enumerator.h#L201C4-L202

ruby/red-arrow/ext/arrow/raw-records.cpp Outdated Show resolved Hide resolved
@github-actions github-actions bot added awaiting review Awaiting review awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Sep 11, 2023
@github-actions github-actions bot added awaiting review Awaiting review awaiting committer review Awaiting committer review and removed awaiting review Awaiting review awaiting changes Awaiting changes labels Sep 11, 2023
@otegami otegami requested a review from kou September 11, 2023 04:27
Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@kou kou merged commit e56726c into apache:main Sep 11, 2023
12 checks passed
@kou kou removed the awaiting committer review Awaiting committer review label Sep 11, 2023
@github-actions github-actions bot added the awaiting merge Awaiting merge label Sep 11, 2023
@otegami otegami deleted the ruby-support-each-raw-record-no-block-given branch September 11, 2023 05:15
@conbench-apache-arrow
Copy link

After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit e56726c.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

loicalleyne pushed a commit to loicalleyne/arrow that referenced this pull request Nov 13, 2023
…ache#37600)

### Rationale for this change

This change aligns the behavior of `each_raw_record` with standard Ruby practices by returning an enumerator when no block is provided

### What changes are included in this PR?

- Made `Arrow::Table#each_raw_record` and `Arrow::RecordBatch#each_raw_record` return Enumerator when it was called without block.
- Added related tests
- Resolved warnings related to duplicate test classes which were caused by apache#37137

### Are these changes tested?

Yes.

### Are there any user-facing changes?

No

* Closes: apache#37562

Authored-by: otegami <a.s.takuya1026@gmail.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
dgreiss pushed a commit to dgreiss/arrow that referenced this pull request Feb 19, 2024
…ache#37600)

### Rationale for this change

This change aligns the behavior of `each_raw_record` with standard Ruby practices by returning an enumerator when no block is provided

### What changes are included in this PR?

- Made `Arrow::Table#each_raw_record` and `Arrow::RecordBatch#each_raw_record` return Enumerator when it was called without block.
- Added related tests
- Resolved warnings related to duplicate test classes which were caused by apache#37137

### Are these changes tested?

Yes.

### Are there any user-facing changes?

No

* Closes: apache#37562

Authored-by: otegami <a.s.takuya1026@gmail.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Ruby] Add support for table.each_raw_record.to_a
2 participants