-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Ruby] Random segfault in hash join #35819
Comments
Hmm. I want to check whether this is related to Ruby's GC or not. Could you try with |
It was harder to make it crash with
It is still very random, and with |
Thanks. Then it seems that this isn't related to Ruby. This will be related to Acero. @westonpace Have you seen this problem? (Joining with multiple threads is crashed.) |
I refactored our code to call I noticed that in 1c97ab0 you have added a call to However I did manage to crash it with |
I'm not aware of any current issues like this. That being said, the join code is pretty complex, so I would not be surprised if it was the culprit. Getting some kind of reproducer will be essential though. @stenlarsson , you mention that you have several joins. Are they chained together or parallel? For example, chained would be "join a and b and then join c to the result and then join d to the result" and parallel would be something like "join a and b and also join c and d and then join the two results together". Also, what kinds of joins are these? Are they left inner joins? |
They are chained left outer joins. |
I can't understand the problem you said. Could you provide a small script that shows the problem? |
Here is an example require 'arrow'
def build_join(plan)
left_array = Arrow::Int64DataType.new.build_array(0..10)
left_table = Arrow::Table.new('col1' => left_array)
left_node = plan.build_source_node(left_table)
right_array = Arrow::Int64DataType.new.build_array(0..5)
right_table = Arrow::Table.new('col2' => right_array)
right_node = plan.build_source_node(right_table)
plan.build_hash_join_node(left_node, right_node, Arrow::HashJoinNodeOptions.new(:left_outer, ['col1'], ['col2']))
end
def get_result(plan, node)
sink_node_options = Arrow::SinkNodeOptions.new
plan.build_sink_node(node, sink_node_options)
plan.validate
plan.start
plan.wait
reader = sink_node_options.get_reader(node.output_schema)
reader.read_all
end
plan = Arrow::ExecutePlan.new
node = build_join(plan)
GC.start
result = get_result(plan, node)
p result This prints an empty table on my computer (!?), but if I remove the line with |
…ePlan If we don't refer them, GC may free them unexpectedly. Relations: * `GArrowExecutePlan` -> `GArrowExecuteNode`s * `GArrowExecuteNode` -> `GArrowExecuteOptions` * `GArrowSourceNodeOptions` -> `GArrowRecordBatchReader` or `GArrowRecordBatch` * `GArrowRecordBatchReader` -> `GArrowRecordBatch`s or `GArrowTable`
Thanks. GH-35963 will fix this. |
Great news, thanks! |
…ePlan If we don't refer them, GC may free them unexpectedly. Relations: * `GArrowExecutePlan` -> `GArrowExecuteNode`s * `GArrowExecuteNode` -> `GArrowExecuteOptions` * `GArrowSourceNodeOptions` -> `GArrowRecordBatchReader` or `GArrowRecordBatch` * `GArrowRecordBatchReader` -> `GArrowRecordBatch`s or `GArrowTable`
…35963) ### Rationale for this change If we don't refer them, GC may free them unexpectedly. Relations: * `GArrowExecutePlan` -> `GArrowExecuteNode`s * `GArrowExecuteNode` -> `GArrowExecuteOptions` * `GArrowSourceNodeOptions` -> `GArrowRecordBatchReader` or `GArrowRecordBatch` * `GArrowRecordBatchReader` -> `GArrowRecordBatch`s or `GArrowTable` ### What changes are included in this PR? Add missing references in GLib and mark dependency container explicitly in Ruby. Because we can't mark dependency container automatically in Ruby. ### Are these changes tested? Yes. ### Are there any user-facing changes? Yes. * Closes: #35819 Authored-by: Sutou Kouhei <kou@clear-code.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
Describe the bug, including details regarding any error messages, version, and platform.
We have some (complicated) Ruby code that joins a lot of tables, and it randomly crashes. Sometimes you get a stack trace, but sometimes the Ruby crash handler in turn crashes. This indicates that garbage has been written to the stack. I have also seen cases where it doesn't crash, but there is some random garbage in the result.
I installed the
libarrow-acero1200-dbgsym
andlibarrow1200-dbgsym
packages in our Docker image, and ran Ruby through GDB. This way I'm (sometimes) able to capture a stack trace. Here is one example:Because this is so random I have unfortunately not been able to create a small reproducible test that I can share with you. I tried setting
OMP_NUM_THREADS=1
, but I couldn't get it to crash. Please let me know if there is anything more I can do to troubleshoot this.This problem does not happen in Arrow 10, but I have been able to reproduce it in both Arrow 11 and 12, on both amd64 and arm64, and on both macOS and Linux.
Component(s)
Ruby
The text was updated successfully, but these errors were encountered: