Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Ruby] Multiple filter conditions in Arrow::Table.load #35915

Closed
collimarco opened this issue Jun 5, 2023 · 5 comments · Fixed by #35927
Closed

[Ruby] Multiple filter conditions in Arrow::Table.load #35915

collimarco opened this issue Jun 5, 2023 · 5 comments · Fixed by #35927

Comments

@collimarco
Copy link

Describe the enhancement requested

I have this code:

table = Arrow::Table.load(s3_uri, format: :parquet)
puts table.slice { |slicer| (slicer['status'] == 200) & (slicer['message'].match_substring? 'foo') }

Now I would like to rewrite it more efficiently using condition pushdown:

table = Arrow::Table.load(s3_uri, format: :parquet, filter: [[:equal, :status, 200], [:match_substring, :message, 'foo']])
puts table

However this code doesn't work (invalid argument Array for the filter).

Any idea how to rewrite it correctly? I can't find any documentation about multiple filter conditions.

Component(s)

Ruby

@kou
Copy link
Member

kou commented Jun 5, 2023

Could you try filter: [:and, [:equal, :status, 200], [:match_substring, :message, 'foo']]?

@collimarco
Copy link
Author

@kou It seems to work, but there is another problem with the function in the filter. Now I get this error:

/Users/collimarco/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/gobject-introspection-4.1.6/lib/gobject-introspection/loader.rb:705:in `invoke': 
[scanner-builder][filter][set]: Invalid: Function 'match_substring' accepts 1 arguments but 2 passed (Arrow::Error::Invalid)

@kou
Copy link
Member

kou commented Jun 5, 2023

Ah, we need to create an option object for match_substring:

match_substring_options = Arrow::MatchSubstringOptions.new
match_substring_options.pattern = 'foo'
table = Arrow::Table.load(s3_uri, format: :parquet, filter: [:and, [:equal, :status, 200], [:match_substring, :message, match_substring_options]])

But filter: [:and, [:equal, :status, 200], [:match_substring, :message, {pattern: 'foo'}]] shortcut should be implemented.

@collimarco
Copy link
Author

@kou Unfortunately the new code gives a segmentation fault:

searchparquet.rb: [BUG] Segmentation fault at 0x0000000000000000
ruby 3.2.2 (2023-03-30 revision e51014f9c0) [x86_64-darwin22]

kou added a commit to kou/arrow that referenced this issue Jun 6, 2023
…om Hash automatically

This also fixes a crash bug with `CallExpression.new(name, args,
Arrow::MatchSubstringOptions.new)`. `Arrow::MatchSubstringOptions.new`
is freed multiple times.
kou added a commit to kou/arrow that referenced this issue Jun 6, 2023
…om Hash automatically

This also fixes a crash bug with `CallExpression.new(name, args,
Arrow::MatchSubstringOptions.new)`. `Arrow::MatchSubstringOptions.new`
is freed multiple times.
@kou
Copy link
Member

kou commented Jun 6, 2023

Oh, #35927 fixes it.

@kou kou closed this as completed in #35927 Jun 9, 2023
@kou kou added this to the 13.0.0 milestone Jun 9, 2023
kou added a commit that referenced this issue Jun 9, 2023
…h automatically (#35927)

### Rationale for this change

It's convenient.

### What changes are included in this PR?

This also fixes a crash bug with `CallExpression.new(name, args, Arrow::MatchSubstringOptions.new)`. `Arrow::MatchSubstringOptions.new` is freed multiple times.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

Yes.
* Closes: #35915

Authored-by: Sutou Kouhei <kou@clear-code.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants