Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++] Segmentation fault on arrow-compute-hash-join-node-test on macos nightlies #32570

Closed
asfimport opened this issue Aug 3, 2022 · 16 comments · Fixed by #39234
Closed

[C++] Segmentation fault on arrow-compute-hash-join-node-test on macos nightlies #32570

asfimport opened this issue Aug 3, 2022 · 16 comments · Fixed by #39234
Assignees
Labels
Component: C++ Critical Fix Bugfixes for security vulnerabilities, crashes, or invalid data. Priority: Critical Type: bug
Milestone

Comments

@asfimport
Copy link
Collaborator

Some of our nightly builds are failing due to a segmentation fault on hash-join tests:

 33/90 Test #35: arrow-compute-hash-join-node-test .........***Failed    1.21 sec
Running arrow-compute-hash-join-node-test, redirecting output into /var/folders/24/8k48jl6d249_n_qfxwsl6xvm0000gn/T/arrow-HEAD.XXXXX.W72iCJcj/cpp-build/build/test-logs/arrow-compute-hash-join-node-test.txt (attempt 1/1)
/Users/runner/work/crossbow/crossbow/arrow/cpp/build-support/run-test.sh: line 88: 78018 Segmentation fault: 11  $TEST_EXECUTABLE "$@" > $LOGFILE.raw 2>&1
Running main() from /var/folders/24/8k48jl6d249_n_qfxwsl6xvm0000gn/T/arrow-HEAD.XXXXX.W72iCJcj/cpp-build/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest_main.cc
[==========] Running 29 tests from 4 test suites.
[----------] Global test environment set-up.
[----------] 10 tests from HashJoin
[ RUN      ] HashJoin.Suffix
[       OK ] HashJoin.Suffix (4 ms)
[ RUN      ] HashJoin.Random
/private/var/folders/24/8k48jl6d249_n_qfxwsl6xvm0000gn/T/arrow-HEAD.XXXXX.W72iCJcj/cpp-build/src/arrow/compute/exec 

The failures can be seen. It seems to be only related to macos from the failed jobs:
verify-rc-source-cpp-macos-conda-amd64
verify-rc-source-integration-macos-conda-amd64
verify-rc-source-python-macos-amd64

Reporter: Raúl Cumplido / @raulcd
Assignee: Vibhatha Lakmal Abeykoon / @vibhatha

PRs and other links:

Note: This issue was originally created as ARROW-17292. Please see the migration documentation for further details.

@asfimport
Copy link
Collaborator Author

Raúl Cumplido / @raulcd:
This seems to continue happening on some nightly failures, example: 21st August on verify-rc-source-python-macos-amd64:

36/90 Test #35: arrow-compute-hash-join-node-test .........***Failed    3.62 sec
Running arrow-compute-hash-join-node-test, redirecting output into /var/folders/24/8k48jl6d249_n_qfxwsl6xvm0000gn/T/arrow-HEAD.XXXXX.6JD4aZtZ/cpp-build/build/test-logs/arrow-compute-hash-join-node-test.txt (attempt 1/1)
/Users/runner/work/crossbow/crossbow/arrow/cpp/build-support/run-test.sh: line 88: 88684 Segmentation fault: 11  $TEST_EXECUTABLE "$@" > $LOGFILE.raw 2>&1
Running main() from /var/folders/24/8k48jl6d249_n_qfxwsl6xvm0000gn/T/arrow-HEAD.XXXXX.6JD4aZtZ/cpp-build/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest_main.cc
[==========] Running 29 tests from 4 test suites.
[----------] Global test environment set-up.
[----------] 10 tests from HashJoin
[ RUN      ] HashJoin.Suffix
[       OK ] HashJoin.Suffix (5 ms)
[ RUN      ] HashJoin.Random
/private/var/folders/24/8k48jl6d249_n_qfxwsl6xvm0000gn/T/arrow-HEAD.XXXXX.6JD4aZtZ/cpp-build/src/arrow/compute/exec 

@asfimport
Copy link
Collaborator Author

Vibhatha Lakmal Abeykoon / @vibhatha:
While running more tests (using crossbows), I got this CI failure. 


33/89 Test #31: arrow-compute-aggregate-test ..............   Passed    1.44 sec
6991      Start 36: arrow-compute-asof-join-node-test
699234/89 Test #36: arrow-compute-asof-join-node-test .........***Failed    0.37 sec
6993Running arrow-compute-asof-join-node-test, redirecting output into /var/folders/24/8k48jl6d249_n_qfxwsl6xvm0000gn/T/arrow-HEAD.XXXXX.VG4icjS7/cpp-build/build/test-logs/arrow-compute-asof-join-node-test.txt (attempt 1/1)
6994/Users/runner/work/crossbow/crossbow/arrow/cpp/build-support/run-test.sh: line 88: 95323 Segmentation fault: 11  $TEST_EXECUTABLE "$@" > $LOGFILE.raw 2>&1
6995Running main() from /var/folders/24/8k48jl6d249_n_qfxwsl6xvm0000gn/T/arrow-HEAD.XXXXX.VG4icjS7/cpp-build/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest_main.cc
6996[==========] Running 80 tests from 2 test suites.
6997[----------] Global test environment set-up.
6998[----------] 14 tests from AsofJoinTest
6999[ RUN      ] AsofJoinTest.TestUnsupportedOntype
7000[       OK ] AsofJoinTest.TestUnsupportedOntype (4 ms)
7001[ RUN      ] AsofJoinTest.TestUnsupportedBytype
7002[       OK ] AsofJoinTest.TestUnsupportedBytype (0 ms)
7003[ RUN      ] AsofJoinTest.TestUnsupportedDatatype
7004[       OK ] AsofJoinTest.TestUnsupportedDatatype (0 ms)
7005[ RUN      ] AsofJoinTest.TestMissingKeys
7006[       OK ] AsofJoinTest.TestMissingKeys (0 ms)
7007[ RUN      ] AsofJoinTest.TestUnsupportedTolerance
7008[       OK ] AsofJoinTest.TestUnsupportedTolerance (0 ms)
7009[ RUN      ] AsofJoinTest.TestMissingOnKey
7010[       OK ] AsofJoinTest.TestMissingOnKey (0 ms)
7011[ RUN      ] AsofJoinTest.TestMissingByKey
7012[       OK ] AsofJoinTest.TestMissingByKey (0 ms)
7013[ RUN      ] AsofJoinTest.TestNestedOnKey
7014[       OK ] AsofJoinTest.TestNestedOnKey (0 ms)
7015[ RUN      ] AsofJoinTest.TestNestedByKey
7016[       OK ] AsofJoinTest.TestNestedByKey (0 ms)
7017[ RUN      ] AsofJoinTest.TestAmbiguousOnKey
7018[       OK ] AsofJoinTest.TestAmbiguousOnKey (0 ms)
7019[ RUN      ] AsofJoinTest.TestAmbiguousByKey
7020[       OK ] AsofJoinTest.TestAmbiguousByKey (0 ms)
7021[ RUN      ] AsofJoinTest.TestLeftUnorderedOnKey
7022/Users/runner/work/crossbow/crossbow/arrow/cpp/src/arrow/compute/exec/exec_plan.cc:58: Plan was destroyed before finishing
7023[       OK ] AsofJoinTest.TestLeftUnorderedOnKey (1 ms)
7024[ RUN      ] AsofJoinTest.TestRightUnorderedOnKey
7025/private/var/folders/24/8k48jl6d249_n_qfxwsl6xvm0000gn/T/arrow-HEAD.XXXXX.VG4icjS7/cpp-build/src/arrow/compute/exec
7026
7027      Start 37: arrow-compute-tpch-node-test
702835/89 Test #34: arrow-compute-plan-test ...................   Passed    2.71 sec
7029      Start 38: arrow-compute-union-node-test

It looks very much the same. But only reproduced this once (for 37 CI runs)

@asfimport
Copy link
Collaborator Author

Weston Pace / @westonpace:
The asof join test failure is very useful. I dug into it further and unearthed ARROW-18018. It's possible that ARROW-18018 is the cause for the hash join test failure as well. By introducing delays and stress I was able to trigger the hash join test to segfault as a result of ARROW-18018. However, without being able to reproduce it and get a stack trace, it is pretty much impossible to tell for sure. As a result, I have left ARROW-18018 as a separate JIRA.

Once it merges in, we should see if this failure continues to occur.

Either way, we have likely instances of this failure going back as far as I can go. E.g. we started tracking nightly failures in Zulip in May and I still see this test failure sporadically though I cannot confirm the cause because Github no longer has the logs. My suspicion is that, whatever this bug is, we have probably already released several releases with it, and it should not be a blocker for 10.0.0.

@asfimport
Copy link
Collaborator Author

Alessandro Molina / @amol-:
I'm lowering this from Blocker to Critical has it has already been present in past releases. But we should keep it under high attention, the nightlies should prevent us from forgetting it's a problem we need to fix by the way.

@asfimport asfimport added this to the 11.0.0 milestone Jan 11, 2023
@raulcd raulcd removed this from the 11.0.0 milestone Jan 11, 2023
@raulcd
Copy link
Member

raulcd commented Jan 16, 2023

I've seen this on the last nightly builds but this seems to be a flaky that's been happening for a long time:
verify-rc-source-python-macos-amd64
verify-rc-source-python-macos-arm64
Not a blocker for the release but adding a comment to raise that it still seems to happen occasionally:

 36/91 Test #37: arrow-compute-hash-join-node-test .........***Failed    3.05 sec
Running arrow-compute-hash-join-node-test, redirecting output into /var/folders/24/8k48jl6d249_n_qfxwsl6xvm0000gn/T/arrow-HEAD.XXXXX.EURZ3mxJ/cpp-build/build/test-logs/arrow-compute-hash-join-node-test.txt (attempt 1/1)
/Users/runner/work/crossbow/crossbow/arrow/cpp/build-support/run-test.sh: line 88: 98274 Segmentation fault: 11  $TEST_EXECUTABLE "$@" > $LOGFILE.raw 2>&1
Running main() from /var/folders/24/8k48jl6d249_n_qfxwsl6xvm0000gn/T/arrow-HEAD.XXXXX.EURZ3mxJ/cpp-build/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest_main.cc
[==========] Running 34 tests from 4 test suites.
[----------] Global test environment set-up.
[----------] 12 tests from HashJoin
[ RUN      ] HashJoin.Suffix
[       OK ] HashJoin.Suffix (4 ms)
[ RUN      ] HashJoin.Random
/private/var/folders/24/8k48jl6d249_n_qfxwsl6xvm0000gn/T/arrow-HEAD.XXXXX.EURZ3mxJ/cpp-build/src/arrow/compute/exec

@assignUser
Copy link
Member

@westonpace
Copy link
Member

Captured a stack trace. Leaving it here for future reference for myself:

[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from HashJoin
[ RUN      ] HashJoin.Random
Process 1223 stopped
* thread #2, stop reason = EXC_BAD_ACCESS (code=1, address=0x100944000)
    frame #0: 0x00000001025a723c libarrow_acero.1200.dylib`std::__1::enable_if<std::is_trivially_copyable_v<unsigned long long>, unsigned long long>::type arrow::util::SafeLoad<unsigned long long>(unaligned=0x0000000100943ffa) at ubsan.h:66:3
   63  	template <typename T>
   64  	inline std::enable_if_t<std::is_trivially_copyable_v<T>, T> SafeLoad(const T* unaligned) {
   65  	  std::remove_const_t<T> ret;
-> 66  	  std::memcpy(&ret, unaligned, sizeof(T));
   67  	  return ret;
   68  	}
   69  	
Target 0: (arrow-acero-hash-join-node-test) stopped.
(lldb) bt
* thread #2, stop reason = EXC_BAD_ACCESS (code=1, address=0x100944000)
  * frame #0: 0x00000001025a723c libarrow_acero.1200.dylib`std::__1::enable_if<std::is_trivially_copyable_v<unsigned long long>, unsigned long long>::type arrow::util::SafeLoad<unsigned long long>(unaligned=0x0000000100943ffa) at ubsan.h:66:3
    frame #1: 0x000000010eee6ea9 libarrow.1200.dylib`arrow::compute::ExecBatchBuilder::AppendSelected(this=0x0000700004c3f760, i=55, ptr="fafPS]by^", num_bytes=9)::$_7::operator()(int, unsigned char const*, unsigned int) const at light_array.cc:605:56
    frame #2: 0x000000010eedc680 libarrow.1200.dylib`void arrow::compute::ExecBatchBuilder::Visit<arrow::compute::ExecBatchBuilder::AppendSelected(std::__1::shared_ptr<arrow::ArrayData> const&, arrow::compute::ResizableArrayData*, int, unsigned short const*, arrow::MemoryPool*)::$_7>(column=std::__1::shared_ptr<arrow::ArrayData>::element_type @ 0x0000600002d15c98 strong=2 weak=1, num_rows=59, row_ids=0x000000010c0be7b8, process_value_fn=(unnamed class) @ 0x0000700004c3f760)::$_7) at light_array.cc:472:7
    frame #3: 0x000000010eedaaea libarrow.1200.dylib`arrow::compute::ExecBatchBuilder::AppendSelected(source=std::__1::shared_ptr<arrow::ArrayData>::element_type @ 0x0000600002d15c98 strong=2 weak=1, target=0x0000000108a04800, num_rows_to_append=60, row_ids=0x000000010c0be7b8, pool=0x00000001108a7f48) at light_array.cc:598:5
    frame #4: 0x000000010eedd47e libarrow.1200.dylib`arrow::compute::ExecBatchBuilder::AppendSelected(this=0x00000001018627d0, pool=0x00000001108a7f48, batch=0x0000700004c40420, num_rows_to_append=60, row_ids=0x000000010c0be7b8, num_cols=4, col_ids=0x0000600000034340) at light_array.cc:710:5
    frame #5: 0x0000000102581851 libarrow_acero.1200.dylib`arrow::acero::JoinResultMaterialize::Append(this=0x0000000101862780, key_and_payload=0x0000700004c40420, num_rows_to_append=60, row_ids=0x000000010c0be7b8, key_ids=0x000000010c0bf008, payload_ids=0x000000010c0c0058, num_rows_appended=0x0000700004c3fe10) at swiss_join.cc:1640:5
    frame #6: 0x00000001025bb4f2 libarrow_acero.1200.dylib`arrow::Status arrow::acero::JoinResultMaterialize::Append<arrow::acero::JoinProbeProcessor::OnNextBatch(long long, arrow::compute::ExecBatch const&, arrow::util::TempVectorStack*, std::__1::vector<arrow::compute::KeyColumnArray, std::__1::allocator<arrow::compute::KeyColumnArray> >*)::$_18>(this=0x0000700004c3fe58, num_rows_to_append_left=60, offset=0, num_rows_appended=0x0000700004c3fe10)::$_18)::'lambda'(int, int, int*)::operator()(int, int, int*) const at swiss_join_internal.h:613:18
    frame #7: 0x00000001025bb0b3 libarrow_acero.1200.dylib`arrow::Status arrow::acero::JoinResultMaterialize::AppendAndOutput<arrow::Status arrow::acero::JoinResultMaterialize::Append<arrow::acero::JoinProbeProcessor::OnNextBatch(long long, arrow::compute::ExecBatch const&, arrow::util::TempVectorStack*, std::__1::vector<arrow::compute::KeyColumnArray, std::__1::allocator<arrow::compute::KeyColumnArray> >*)::$_18>(arrow::compute::ExecBatch const&, int, unsigned short const*, unsigned int const*, unsigned int const*, arrow::acero::JoinProbeProcessor::OnNextBatch(long long, arrow::compute::ExecBatch const&, arrow::util::TempVectorStack*, std::__1::vector<arrow::compute::KeyColumnArray, std::__1::allocator<arrow::compute::KeyColumnArray> >*)::$_18)::'lambda'(int, int, int*), arrow::acero::JoinProbeProcessor::OnNextBatch(long long, arrow::compute::ExecBatch const&, arrow::util::TempVectorStack*, std::__1::vector<arrow::compute::KeyColumnArray, std::__1::allocator<arrow::compute::KeyColumnArray> >*)::$_18>(this=0x0000000101862780, num_rows_to_append=60, append_rows_fn=0x0000700004c3fe58, output_batch_fn=0x0000700004c3fec8)::$_18 const&, arrow::acero::JoinProbeProcessor::OnNextBatch(long long, arrow::compute::ExecBatch const&, arrow::util::TempVectorStack*, std::__1::vector<arrow::compute::KeyColumnArray, std::__1::allocator<arrow::compute::KeyColumnArray> >*)::$_18 const&) at swiss_join_internal.h:567:7
    frame #8: 0x00000001025848d8 libarrow_acero.1200.dylib`arrow::Status arrow::acero::JoinResultMaterialize::Append<arrow::acero::JoinProbeProcessor::OnNextBatch(long long, arrow::compute::ExecBatch const&, arrow::util::TempVectorStack*, std::__1::vector<arrow::compute::KeyColumnArray, std::__1::allocator<arrow::compute::KeyColumnArray> >*)::$_18>(this=0x0000000101862780, key_and_payload=0x0000700004c40420, num_rows_to_append=60, row_ids=0x000000010c0be7b8, key_ids=0x000000010c0bf008, payload_ids=0x000000010c0c0058, output_batch_fn=(unnamed class) @ 0x0000700004c3fec8)::$_18) at swiss_join_internal.h:610:12
    frame #9: 0x000000010258429f libarrow_acero.1200.dylib`arrow::acero::JoinProbeProcessor::OnNextBatch(this=0x0000000101860d70, thread_id=4, keypayload_batch=0x0000700004c40420, temp_stack=0x0000000100e1de38, temp_column_arrays=0x0000000101862838 size=3) at swiss_join.cc:1993:9
    frame #10: 0x000000010258bcf4 libarrow_acero.1200.dylib`arrow::acero::SwissJoin::ProbeSingleBatch(this=0x0000000101860600, thread_index=4, batch=ExecBatch @ 0x0000700004c40578) at swiss_join.cc:2144:26
    frame #11: 0x00000001024a3ea9 libarrow_acero.1200.dylib`arrow::acero::HashJoinNode::Init(this=0x0000000101864588, thread_index=4, task_id=276)::'lambda'(unsigned long, long long)::operator()(unsigned long, long long) const at hash_join_node.cc:964:25
    frame #12: 0x00000001024a3dd9 libarrow_acero.1200.dylib`decltype(__f=0x0000000101864588, __args=0x0000700004c407a8, __args=0x0000700004c407a0)::'lambda'(unsigned long, long long)&>(fp)(static_cast<unsigned long>(fp0), static_cast<long long>(fp0))) std::__1::__invoke<arrow::acero::HashJoinNode::Init()::'lambda'(unsigned long, long long)&, unsigned long, long long>(arrow::acero::HashJoinNode::Init()::'lambda'(unsigned long, long long)&, unsigned long&&, long long&&) at type_traits:3918:1
    frame #13: 0x00000001024a3d6a libarrow_acero.1200.dylib`arrow::Status std::__1::__invoke_void_return_wrapper<arrow::Status, false>::__call<arrow::acero::HashJoinNode::Init(__args=0x0000000101864588, __args=0x0000700004c407a8, __args=0x0000700004c407a0)::'lambda'(unsigned long, long long)&, unsigned long, long long>(arrow::acero::HashJoinNode::Init()::'lambda'(unsigned long, long long)&, unsigned long&&, long long&&) at invoke.h:30:16
    frame #14: 0x00000001024a3cfa libarrow_acero.1200.dylib`std::__1::__function::__alloc_func<arrow::acero::HashJoinNode::Init()::'lambda'(unsigned long, long long), std::__1::allocator<arrow::acero::HashJoinNode::Init()::'lambda'(unsigned long, long long)>, arrow::Status (unsigned long, long long)>::operator(this=0x0000000101864588, __arg=0x0000700004c407a8, __arg=0x0000700004c407a0)(unsigned long&&, long long&&) at function.h:178:16
    frame #15: 0x00000001024a29d9 libarrow_acero.1200.dylib`std::__1::__function::__func<arrow::acero::HashJoinNode::Init()::'lambda'(unsigned long, long long), std::__1::allocator<arrow::acero::HashJoinNode::Init()::'lambda'(unsigned long, long long)>, arrow::Status (unsigned long, long long)>::operator(this=0x0000000101864580, __arg=0x0000700004c407a8, __arg=0x0000700004c407a0)(unsigned long&&, long long&&) at function.h:352:12
    frame #16: 0x00000001025c2192 libarrow_acero.1200.dylib`std::__1::__function::__value_func<arrow::Status (unsigned long, long long)>::operator(this=0x0000000101864580, __args=0x0000700004c407a8, __args=0x0000700004c407a0)(unsigned long&&, long long&&) const at function.h:505:16
    frame #17: 0x00000001025bd575 libarrow_acero.1200.dylib`std::__1::function<arrow::Status (unsigned long, long long)>::operator(this= Lambda in File hash_join_node.cc at Line 944, __arg=4, __arg=276)(unsigned long, long long) const at function.h:1182:12
    frame #18: 0x00000001025bd414 libarrow_acero.1200.dylib`arrow::acero::TaskSchedulerImpl::ExecuteTask(this=0x0000000100e1cc10, thread_id=4, group_id=5, task_id=276, task_group_finished=0x0000700004c40893) at task_util.cc:215:5
    frame #19: 0x00000001025c4be0 libarrow_acero.1200.dylib`arrow::acero::TaskSchedulerImpl::ScheduleMore(this=0x000060000216c168, thread_id=4)::$_0::operator()(unsigned long) const at task_util.cc:366:5
    frame #20: 0x00000001025c4ace libarrow_acero.1200.dylib`decltype(__f=0x000060000216c168, __args=0x0000700004c40a28)::$_0&>(fp)(static_cast<unsigned long>(fp0))) std::__1::__invoke<arrow::acero::TaskSchedulerImpl::ScheduleMore(unsigned long, int)::$_0&, unsigned long>(arrow::acero::TaskSchedulerImpl::ScheduleMore(unsigned long, int)::$_0&, unsigned long&&) at type_traits:3918:1
    frame #21: 0x00000001025c4a75 libarrow_acero.1200.dylib`arrow::Status std::__1::__invoke_void_return_wrapper<arrow::Status, false>::__call<arrow::acero::TaskSchedulerImpl::ScheduleMore(__args=0x000060000216c168, __args=0x0000700004c40a28)::$_0&, unsigned long>(arrow::acero::TaskSchedulerImpl::ScheduleMore(unsigned long, int)::$_0&, unsigned long&&) at invoke.h:30:16
    frame #22: 0x00000001025c4a25 libarrow_acero.1200.dylib`std::__1::__function::__alloc_func<arrow::acero::TaskSchedulerImpl::ScheduleMore(unsigned long, int)::$_0, std::__1::allocator<arrow::acero::TaskSchedulerImpl::ScheduleMore(unsigned long, int)::$_0>, arrow::Status (unsigned long)>::operator(this=0x000060000216c168, __arg=0x0000700004c40a28)(unsigned long&&) at function.h:178:16
    frame #23: 0x00000001025c3704 libarrow_acero.1200.dylib`std::__1::__function::__func<arrow::acero::TaskSchedulerImpl::ScheduleMore(unsigned long, int)::$_0, std::__1::allocator<arrow::acero::TaskSchedulerImpl::ScheduleMore(unsigned long, int)::$_0>, arrow::Status (unsigned long)>::operator(this=0x000060000216c160, __arg=0x0000700004c40a28)(unsigned long&&) at function.h:352:12
    frame #24: 0x00000001000d6a7d arrow-acero-hash-join-node-test`std::__1::__function::__value_func<arrow::Status (unsigned long)>::operator(this=0x000060000216c160, __args=0x0000700004c40a28)(unsigned long&&) const at function.h:505:16
    frame #25: 0x00000001000d67e0 arrow-acero-hash-join-node-test`std::__1::function<arrow::Status (unsigned long)>::operator(this=0x000060000216c160, __arg=4)(unsigned long) const at function.h:1182:12
    frame #26: 0x0000000102520fe4 libarrow_acero.1200.dylib`arrow::acero::QueryContext::ScheduleTask(this=0x000060000216c150)>, std::__1::basic_string_view<char, std::__1::char_traits<char> >)::$_2::operator()() const at query_context.cc:73:12
    frame #27: 0x0000000102520f83 libarrow_acero.1200.dylib`decltype(__f=0x000060000216c150)>, std::__1::basic_string_view<char, std::__1::char_traits<char> >)::$_2&>(fp)()) std::__1::__invoke<arrow::acero::QueryContext::ScheduleTask(std::__1::function<arrow::Status (unsigned long)>, std::__1::basic_string_view<char, std::__1::char_traits<char> >)::$_2&>(arrow::acero::QueryContext::ScheduleTask(std::__1::function<arrow::Status (unsigned long)>, std::__1::basic_string_view<char, std::__1::char_traits<char> >)::$_2&) at type_traits:3918:1
    frame #28: 0x0000000102520f30 libarrow_acero.1200.dylib`arrow::Status std::__1::__invoke_void_return_wrapper<arrow::Status, false>::__call<arrow::acero::QueryContext::ScheduleTask(__args=0x000060000216c150)>, std::__1::basic_string_view<char, std::__1::char_traits<char> >)::$_2&>(arrow::acero::QueryContext::ScheduleTask(std::__1::function<arrow::Status (unsigned long)>, std::__1::basic_string_view<char, std::__1::char_traits<char> >)::$_2&) at invoke.h:30:16
    frame #29: 0x0000000102520ef0 libarrow_acero.1200.dylib`std::__1::__function::__alloc_func<arrow::acero::QueryContext::ScheduleTask(std::__1::function<arrow::Status (unsigned long)>, std::__1::basic_string_view<char, std::__1::char_traits<char> >)::$_2, std::__1::allocator<arrow::acero::QueryContext::ScheduleTask(std::__1::function<arrow::Status (unsigned long)>, std::__1::basic_string_view<char, std::__1::char_traits<char> >)::$_2>, arrow::Status ()>::operator(this=0x000060000216c150)() at function.h:178:16
    frame #30: 0x000000010251fc37 libarrow_acero.1200.dylib`std::__1::__function::__func<arrow::acero::QueryContext::ScheduleTask(std::__1::function<arrow::Status (unsigned long)>, std::__1::basic_string_view<char, std::__1::char_traits<char> >)::$_2, std::__1::allocator<arrow::acero::QueryContext::ScheduleTask(std::__1::function<arrow::Status (unsigned long)>, std::__1::basic_string_view<char, std::__1::char_traits<char> >)::$_2>, arrow::Status ()>::operator(this=0x000060000216c140)() at function.h:352:12
    frame #31: 0x00000001022ecfb5 libarrow_acero.1200.dylib`std::__1::__function::__value_func<arrow::Status ()>::operator(this=0x000060000263ddd0)() const at function.h:505:16
    frame #32: 0x00000001022e8b23 libarrow_acero.1200.dylib`std::__1::function<arrow::Status ()>::operator(this=0x000060000263ddd0)() const at function.h:1182:12
    frame #33: 0x000000010251d9ed libarrow_acero.1200.dylib`std::__1::enable_if<((!(std::is_void<arrow::Status>::value)) && (!(is_future<arrow::Status>::value))) && ((!(arrow::Future<arrow::internal::Empty>::is_empty)) || (std::is_same<arrow::Status, arrow::Status>::value)), void>::type arrow::detail::ContinueFuture::operator(this=0x000060000263ddb0, next=Future<arrow::internal::Empty> @ 0x0000700004c40c38, f=0x000060000263ddd0)<std::__1::function<arrow::Status ()>&, arrow::Status, arrow::Future<arrow::internal::Empty> >(arrow::Future<arrow::internal::Empty>, std::__1::function<arrow::Status ()>&) const at future.h:150:23
    frame #34: 0x000000010251d90e libarrow_acero.1200.dylib`decltype(__f=0x000060000263ddb0, __args=0x000060000263ddc0, __args=0x000060000263ddd0)(static_cast<arrow::Future<arrow::internal::Empty>&>(fp0), static_cast<std::__1::function<arrow::Status ()>&>(fp0))) std::__1::__invoke<arrow::detail::ContinueFuture&, arrow::Future<arrow::internal::Empty>&, std::__1::function<arrow::Status ()>&>(arrow::detail::ContinueFuture&, arrow::Future<arrow::internal::Empty>&, std::__1::function<arrow::Status ()>&) at type_traits:3918:1
    frame #35: 0x000000010251d8ca libarrow_acero.1200.dylib`std::__1::__bind_return<arrow::detail::ContinueFuture, std::__1::tuple<arrow::Future<arrow::internal::Empty>, std::__1::function<arrow::Status ()> >, std::__1::tuple<>, __is_valid_bind_return<arrow::detail::ContinueFuture, std::__1::tuple<arrow::Future<arrow::internal::Empty>, std::__1::function<arrow::Status ()> >, std::__1::tuple<> >::value>::type std::__1::__apply_functor<arrow::detail::ContinueFuture, std::__1::tuple<arrow::Future<arrow::internal::Empty>, std::__1::function<arrow::Status (__f=0x000060000263ddb0, __bound_args=size=2, (null)=__tuple_indices<0, 1> @ 0x0000700004c40c98, __args=size=0)> >, 0ul, 1ul, std::__1::tuple<> >(arrow::detail::ContinueFuture&, std::__1::tuple<arrow::Future<arrow::internal::Empty>, std::__1::function<arrow::Status ()> >&, std::__1::__tuple_indices<0ul, 1ul>, std::__1::tuple<>&&) at bind.h:257:12
    frame #36: 0x000000010251d860 libarrow_acero.1200.dylib`std::__1::__bind_return<arrow::detail::ContinueFuture, std::__1::tuple<arrow::Future<arrow::internal::Empty>, std::__1::function<arrow::Status ()> >, std::__1::tuple<>, __is_valid_bind_return<arrow::detail::ContinueFuture, std::__1::tuple<arrow::Future<arrow::internal::Empty>, std::__1::function<arrow::Status ()> >, std::__1::tuple<> >::value>::type std::__1::__bind<arrow::detail::ContinueFuture, arrow::Future<arrow::internal::Empty>&, std::__1::function<arrow::Status (this=0x000060000263ddb0)> >::operator()<>() at bind.h:292:20
    frame #37: 0x000000010251d7f1 libarrow_acero.1200.dylib`arrow::internal::FnOnce<void ()>::FnImpl<std::__1::__bind<arrow::detail::ContinueFuture, arrow::Future<arrow::internal::Empty>&, std::__1::function<arrow::Status (this=0x000060000263dda0)> > >::invoke() at functional.h:152:42
    frame #38: 0x000000010ec7140a libarrow.1200.dylib`arrow::internal::FnOnce<void ()>::operator(this=0x0000700004c40e38)() && at functional.h:140:17
    frame #39: 0x000000010ec82c91 libarrow.1200.dylib`arrow::internal::WorkerLoop(state=std::__1::shared_ptr<arrow::internal::ThreadPool::State>::element_type @ 0x0000000100e1dc28 strong=25 weak=2, it=std::__1::list<std::__1::thread, std::__1::allocator<std::__1::thread> >::iterator @ 0x0000700004c40ed8) at thread_pool.cc:269:11
    frame #40: 0x000000010ec8293d libarrow.1200.dylib`arrow::internal::ThreadPool::LaunchWorkersUnlocked(this=0x0000600000c43548)::$_6::operator()() const at thread_pool.cc:430:7
    frame #41: 0x000000010ec828b5 libarrow.1200.dylib`decltype(__f=0x0000600000c43548)::$_6>(fp)()) std::__1::__invoke<arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_6>(arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_6&&) at type_traits:3918:1
    frame #42: 0x000000010ec82855 libarrow.1200.dylib`void std::__1::__thread_execute<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_6>(__t=size=2, (null)=__tuple_indices<> @ 0x0000700004c40f68)::$_6>&, std::__1::__tuple_indices<>) at thread:287:5
    frame #43: 0x000000010ec82082 libarrow.1200.dylib`void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_6> >(__vp=0x0000600000c43540) at thread:298:5
    frame #44: 0x00007ff800bd04e1 libsystem_pthread.dylib`_pthread_start + 125
    frame #45: 0x00007ff800bcbf6b libsystem_pthread.dylib`thread_start + 15

@zanmato1984
Copy link
Collaborator

I kinda happen to be looking at this issue. I tried several ways to reproduce it and found that with all allocators (jemalloc and mimalloc) disabled, ASAN reported the issue almost all the time, with exact the same call stack (I guess allocators might prevent ASAN from detecting this issue). Also I confirmed that this issue still exists on main branch.

Now I'm able to take a deeper look and hopefully will come up with the root cause and probably a fix soon.

cc @westonpace

@zanmato1984
Copy link
Collaborator

zanmato1984 commented Dec 14, 2023

I actually found two bugs. Bug 1 is the straight cause of this issue. Bug 2 will cause later failure when bug 1 is fixed.

Bug 1

In ExecBatchBuilder::AppendSelected, we do word-copying

int num_rows_to_process =
num_rows_to_append -
NumRowsToSkip(source, num_rows_to_append, row_ids, sizeof(uint64_t));
Visit(source, num_rows_to_process, row_ids,
[&](int i, const uint8_t* ptr, uint32_t num_bytes) {
uint64_t* dst = reinterpret_cast<uint64_t*>(target->mutable_data(2) +
offsets[num_rows_before + i]);
const uint64_t* src = reinterpret_cast<const uint64_t*>(ptr);
for (uint32_t word_id = 0;
word_id < bit_util::CeilDiv(num_bytes, sizeof(uint64_t)); ++word_id) {
util::SafeStore<uint64_t>(dst + word_id, util::SafeLoad(src + word_id));
}
});

so we need to calculate the number of tail rows to skip (the skipped rows, or bytes, will be copied in bytes
Visit(source, num_rows_to_append - num_rows_to_process, row_ids + num_rows_to_process,
[&](int i, const uint8_t* ptr, uint32_t num_bytes) {
uint64_t* dst = reinterpret_cast<uint64_t*>(
target->mutable_data(2) +
offsets[num_rows_before + num_rows_to_process + i]);
const uint64_t* src = reinterpret_cast<const uint64_t*>(ptr);
memcpy(dst, src, num_bytes);
});

). But in the calculation we neglect the case that there might be multiple occurrences of the same row, which results in accessing the tail row that should be skipped. The access is by word and possibly unaligned, so if the tail row doesn't span full words, the unaligned word access will eventually exceed the buffer boundary (segmentation fault).

Note that to let this bug happen, the tail row need to be close to the 64-byte aligned memory region (minimal of arrow buffer), otherwise the access will be still valid in the perspective of OS, even if it exceeds the "logically" boundary.

I've simplified this bug into a unit test case in https://github.com/apache/arrow/pull/39234/files#diff-f33777ab347090e9e661ef1d9260ed1488902627f69a703097d73c6d186bca69.

Another thing I want to mention is that, I only fixed the non-fixed-length case, because I think all fixed-length cases will be fine (disclaimer: I could be wrong) based on the following facts:

  1. All fixed-lengths <= 8-byte use direct copy so don't need to skip any rows:
    uint32_t fixed_length = column_metadata.fixed_length;
    switch (fixed_length) {
    case 0:
    CollectBits(source->buffers[1]->data(), source->offset, target->mutable_data(1),
    num_rows_before, num_rows_to_append, row_ids);
    break;
    case 1:
    Visit(source, num_rows_to_append, row_ids,
    [&](int i, const uint8_t* ptr, uint32_t num_bytes) {
    target->mutable_data(1)[num_rows_before + i] = *ptr;
    });
    break;
    case 2:
    Visit(
    source, num_rows_to_append, row_ids,
    [&](int i, const uint8_t* ptr, uint32_t num_bytes) {
    reinterpret_cast<uint16_t*>(target->mutable_data(1))[num_rows_before + i] =
    *reinterpret_cast<const uint16_t*>(ptr);
    });
    break;
    case 4:
    Visit(
    source, num_rows_to_append, row_ids,
    [&](int i, const uint8_t* ptr, uint32_t num_bytes) {
    reinterpret_cast<uint32_t*>(target->mutable_data(1))[num_rows_before + i] =
    *reinterpret_cast<const uint32_t*>(ptr);
    });
    break;
    case 8:
    Visit(
    source, num_rows_to_append, row_ids,
    [&](int i, const uint8_t* ptr, uint32_t num_bytes) {
    reinterpret_cast<uint64_t*>(target->mutable_data(1))[num_rows_before + i] =
    *reinterpret_cast<const uint64_t*>(ptr);
    });
    break;
  2. The rest fixed-lengths (> 8-byte) do skip rows
    NumRowsToSkip(source, num_rows_to_append, row_ids, sizeof(uint64_t));
    . But they'll be fine because they'd be multiple of 8-byte.

Bug 2

After fixing bug 1, the HashJoin.Random case succeeds more (with bug 1 it fails almost instantly). But I still observe rare failures. The cause turns out to be very straightforward, though REALLY HARD to disclose. We miss adding the column's offset when calculating how many rows to skip: https://github.com/apache/arrow/pull/39234/files#diff-b22f5dc5e08b62b5c4f3447570063a28aac2c4de3112f11d7509d37f5f47da5cR398.

PR #39234 filed to fix them both.

@assignUser
Copy link
Member

Wow, thanks for picking this up and documenting you results so thoroughly!

@amoeba amoeba added the Critical Fix Bugfixes for security vulnerabilities, crashes, or invalid data. label Dec 20, 2023
pitrou pushed a commit that referenced this issue Dec 21, 2023
…nsecutive tail rows with the same id may exceed buffer boundary (#39234)

### Rationale for this change

Addressed in #32570 (comment)

### What changes are included in this PR?

1. Skip consecutive rows with the same id when calculating rows to skip when appending to `ExecBatchBuilder`.
2. Fix the bug that column offset is neglected when calculating rows to skip.

### Are these changes tested?

Yes. New UT included and the change is also protected by the existing case mentioned in the issue.

### Are there any user-facing changes?

No.

**This PR contains a "Critical Fix".**

Because #32570 is labeled critical, and causes a crash even when the API contract is upheld.

* Closes: #32570

Authored-by: zanmato <zanmato1984@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
@pitrou pitrou added this to the 15.0.0 milestone Dec 21, 2023
@zanmato1984
Copy link
Collaborator

Another thing I want to mention is that, I only fixed the non-fixed-length case, because I think all fixed-length cases will be fine (disclaimer: I could be wrong) based on the following facts:

  1. All fixed-lengths <= 8-byte use direct copy so don't need to skip any rows:
    uint32_t fixed_length = column_metadata.fixed_length;
    switch (fixed_length) {
    case 0:
    CollectBits(source->buffers[1]->data(), source->offset, target->mutable_data(1),
    num_rows_before, num_rows_to_append, row_ids);
    break;
    case 1:
    Visit(source, num_rows_to_append, row_ids,
    [&](int i, const uint8_t* ptr, uint32_t num_bytes) {
    target->mutable_data(1)[num_rows_before + i] = *ptr;
    });
    break;
    case 2:
    Visit(
    source, num_rows_to_append, row_ids,
    [&](int i, const uint8_t* ptr, uint32_t num_bytes) {
    reinterpret_cast<uint16_t*>(target->mutable_data(1))[num_rows_before + i] =
    *reinterpret_cast<const uint16_t*>(ptr);
    });
    break;
    case 4:
    Visit(
    source, num_rows_to_append, row_ids,
    [&](int i, const uint8_t* ptr, uint32_t num_bytes) {
    reinterpret_cast<uint32_t*>(target->mutable_data(1))[num_rows_before + i] =
    *reinterpret_cast<const uint32_t*>(ptr);
    });
    break;
    case 8:
    Visit(
    source, num_rows_to_append, row_ids,
    [&](int i, const uint8_t* ptr, uint32_t num_bytes) {
    reinterpret_cast<uint64_t*>(target->mutable_data(1))[num_rows_before + i] =
    *reinterpret_cast<const uint64_t*>(ptr);
    });
    break;
  2. The rest fixed-lengths (> 8-byte) do skip rows
    NumRowsToSkip(source, num_rows_to_append, row_ids, sizeof(uint64_t));

    . But they'll be fine because they'd be multiple of 8-byte.

Update: these analysis of fixed size types are wrong. See #39583 and the subsequent fix #39585 for details.

pitrou pushed a commit that referenced this issue Jan 17, 2024
…ecutive tail rows with the same id may exceed buffer boundary (for fixed size types) (#39585)

### Rationale for this change

#39583 is a subsequent issue of #32570 (fixed by #39234). The last issue and fixed only resolved var length types. It turns out fixed size types have the same issue.

### What changes are included in this PR?

Do the same fix of #39234 for fixed size types.

### Are these changes tested?

UT included.

### Are there any user-facing changes?

* Closes: #39583

Authored-by: zanmato1984 <zanmato1984@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
idailylife pushed a commit to idailylife/arrow that referenced this issue Jan 18, 2024
…g consecutive tail rows with the same id may exceed buffer boundary (for fixed size types) (apache#39585)

### Rationale for this change

apache#39583 is a subsequent issue of apache#32570 (fixed by apache#39234). The last issue and fixed only resolved var length types. It turns out fixed size types have the same issue.

### What changes are included in this PR?

Do the same fix of apache#39234 for fixed size types.

### Are these changes tested?

UT included.

### Are there any user-facing changes?

* Closes: apache#39583

Authored-by: zanmato1984 <zanmato1984@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
clayburn pushed a commit to clayburn/arrow that referenced this issue Jan 23, 2024
…ing consecutive tail rows with the same id may exceed buffer boundary (apache#39234)

### Rationale for this change

Addressed in apache#32570 (comment)

### What changes are included in this PR?

1. Skip consecutive rows with the same id when calculating rows to skip when appending to `ExecBatchBuilder`.
2. Fix the bug that column offset is neglected when calculating rows to skip.

### Are these changes tested?

Yes. New UT included and the change is also protected by the existing case mentioned in the issue.

### Are there any user-facing changes?

No.

**This PR contains a "Critical Fix".**

Because apache#32570 is labeled critical, and causes a crash even when the API contract is upheld.

* Closes: apache#32570

Authored-by: zanmato <zanmato1984@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
clayburn pushed a commit to clayburn/arrow that referenced this issue Jan 23, 2024
…g consecutive tail rows with the same id may exceed buffer boundary (for fixed size types) (apache#39585)

### Rationale for this change

apache#39583 is a subsequent issue of apache#32570 (fixed by apache#39234). The last issue and fixed only resolved var length types. It turns out fixed size types have the same issue.

### What changes are included in this PR?

Do the same fix of apache#39234 for fixed size types.

### Are these changes tested?

UT included.

### Are there any user-facing changes?

* Closes: apache#39583

Authored-by: zanmato1984 <zanmato1984@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
@mpimenov
Copy link

@zanmato1984 I'd like to report that while I'm seeing crashes with stacks very similar to the ones in the reference post above, the patches you provided do not fix them. Probably some cases remain that have not been handled.

@zanmato1984
Copy link
Collaborator

@zanmato1984 I'd like to report that while I'm seeing crashes with stacks very similar to the ones in the reference post above, the patches you provided do not fix them. Probably some cases remain that have not been handled.

Would you mind to provide the stack you are seeing? That will be helpful. Thanks.

@mpimenov
Copy link

mpimenov commented Feb 1, 2024

Sure, here it is. The line numbers may be slightly off because I had to manually apply your patch to a then-fresh version of the repository (a couple of weeks before the Arrow 15 release date, when your second PR was still in review).

Visit<(lambda at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/swiss_join.cc:332:9)> at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/swiss_join.cc:133 [0x2e57685]
DecodeSelected at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/swiss_join.cc:330 [0x2e57685]
FlushBuildColumn at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/swiss_join.cc:1672 [0x2e5f9f6]
Flush at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/swiss_join.cc:1716 [0x2e60178]
AppendAndOutput<(lambda at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/swiss_join_internal.h:612:9), (lambda at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/swiss_join.cc:1993:9)> at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/swiss_join_internal.h:570 [0x2e60dbc]
Append<(lambda at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/swiss_join.cc:1993:9)> at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/swiss_join_internal.h:610 [0x2e60dbc]
OnNextBatch at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/swiss_join.cc:1993 [0x2e60dbc]
ProbeSingleBatch at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/swiss_join.cc:2144 [0x2e653e7]
OnProbeSideBatch at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/hash_join_node.cc:818 [0x2e0766d]
InputReceived at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/hash_join_node.cc:891 [0x2e06491]
OutputBatchCallback at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/hash_join_node.cc:1004 [0x2e0a3af]
operator() at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/hash_join_node.cc:947 [0x2e0a3af]
__invoke_impl<arrow::Status, (lambda at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/hash_join_node.cc:947:5) &, long, arrow::compute::ExecBatch> at /usr/bin/../lib/gcc/x86_64-linux-gnu/14.0.0/../../../../include/c++/14.0.0/bits/invoke.h:61 [0x2e0a1fb]
__invoke_r<arrow::Status, (lambda at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/hash_join_node.cc:947:5) &, long, arrow::compute::ExecBatch> at /usr/bin/../lib/gcc/x86_64-linux-gnu/14.0.0/../../../../include/c++/14.0.0/bits/invoke.h:114 [0x2e0a1fb]
_M_invoke at /usr/bin/../lib/gcc/x86_64-linux-gnu/14.0.0/../../../../include/c++/14.0.0/bits/std_function.h:290 [0x2e0a1fb]
operator() at /usr/bin/../lib/gcc/x86_64-linux-gnu/14.0.0/../../../../include/c++/14.0.0/bits/std_function.h:591 [0x2e60eb5]
operator() at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/swiss_join.cc:1993 [0x2e60eb5]
AppendAndOutput<(lambda at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/swiss_join_internal.h:612:9), (lambda at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/swiss_join.cc:1993:9)> at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/swiss_join_internal.h:571 [0x2e60eb5]
Append<(lambda at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/swiss_join.cc:1993:9)> at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/swiss_join_internal.h:610 [0x2e60eb5]
OnNextBatch at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/swiss_join.cc:1993 [0x2e60eb5]
ProbeSingleBatch at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/swiss_join.cc:2144 [0x2e653e7]
OnProbeSideBatch at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/hash_join_node.cc:818 [0x2e0766d]
InputReceived at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/hash_join_node.cc:891 [0x2e06491]
OutputBatchCallback at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/hash_join_node.cc:1004 [0x2e0a3af]
operator() at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/hash_join_node.cc:947 [0x2e0a3af]
__invoke_impl<arrow::Status, (lambda at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/hash_join_node.cc:947:5) &, long, arrow::compute::ExecBatch> at /usr/bin/../lib/gcc/x86_64-linux-gnu/14.0.0/../../../../include/c++/14.0.0/bits/invoke.h:61 [0x2e0a1fb]
__invoke_r<arrow::Status, (lambda at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/hash_join_node.cc:947:5) &, long, arrow::compute::ExecBatch> at /usr/bin/../lib/gcc/x86_64-linux-gnu/14.0.0/../../../../include/c++/14.0.0/bits/invoke.h:114 [0x2e0a1fb]
_M_invoke at /usr/bin/../lib/gcc/x86_64-linux-gnu/14.0.0/../../../../include/c++/14.0.0/bits/std_function.h:290 [0x2e0a1fb]
operator() at /usr/bin/../lib/gcc/x86_64-linux-gnu/14.0.0/../../../../include/c++/14.0.0/bits/std_function.h:591 [0x2e618ef]
operator() at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/swiss_join.cc:2039 [0x2e618ef]
Flush<(lambda at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/swiss_join.cc:2039:5)> at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/swiss_join_internal.h:626 [0x2e618ef]
OnFinished at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/swiss_join.cc:2039 [0x2e618ef]
OnScanHashTableFinished at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/swiss_join.cc:2425 [0x2e683ff]
StartScanHashTable at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/swiss_join.cc:2323 [0x2e68ff0]
ProbingFinished at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/swiss_join.cc:2153 [0x2e655f9]
OnQueuedBatchesProbed at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/hash_join_node.cc:876 [0x2e0a61b]
operator() at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/hash_join_node.cc:968 [0x2e0a61b]
__invoke_impl<arrow::Status, (lambda at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/hash_join_node.cc:967:9) &, unsigned long> at /usr/bin/../lib/gcc/x86_64-linux-gnu/14.0.0/../../../../include/c++/14.0.0/bits/invoke.h:61 [0x2e0a61b]
__invoke_r<arrow::Status, (lambda at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/hash_join_node.cc:967:9) &, unsigned long> at /usr/bin/../lib/gcc/x86_64-linux-gnu/14.0.0/../../../../include/c++/14.0.0/bits/invoke.h:114 [0x2e0a61b]
_M_invoke at /usr/bin/../lib/gcc/x86_64-linux-gnu/14.0.0/../../../../include/c++/14.0.0/bits/std_function.h:290 [0x2e0a61b]
operator() at /usr/bin/../lib/gcc/x86_64-linux-gnu/14.0.0/../../../../include/c++/14.0.0/bits/std_function.h:591 [0x2e695f5]
OnTaskGroupFinished at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/task_util.cc:252 [0x2e695f5]
operator() at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/task_util.cc:371 [0x2e6a313]
__invoke_impl<arrow::Status, (lambda at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/task_util.cc:371:5) &, unsigned long> at /usr/bin/../lib/gcc/x86_64-linux-gnu/14.0.0/../../../../include/c++/14.0.0/bits/invoke.h:61 [0x2e6a313]
__invoke_r<arrow::Status, (lambda at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/task_util.cc:371:5) &, unsigned long> at /usr/bin/../lib/gcc/x86_64-linux-gnu/14.0.0/../../../../include/c++/14.0.0/bits/invoke.h:114 [0x2e6a313]
_M_invoke at /usr/bin/../lib/gcc/x86_64-linux-gnu/14.0.0/../../../../include/c++/14.0.0/bits/std_function.h:290 [0x2e6a313]
operator() at /usr/bin/../lib/gcc/x86_64-linux-gnu/14.0.0/../../../../include/c++/14.0.0/bits/std_function.h:591 [0x2e3279f]
operator() at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/query_context.cc:82 [0x2e3279f]
__invoke_impl<arrow::Status, (lambda at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/query_context.cc:80:40) &> at /usr/bin/../lib/gcc/x86_64-linux-gnu/14.0.0/../../../../include/c++/14.0.0/bits/invoke.h:61 [0x2e3279f]
__invoke_r<arrow::Status, (lambda at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/query_context.cc:80:40) &> at /usr/bin/../lib/gcc/x86_64-linux-gnu/14.0.0/../../../../include/c++/14.0.0/bits/invoke.h:114 [0x2e3279f]
_M_invoke at /usr/bin/../lib/gcc/x86_64-linux-gnu/14.0.0/../../../../include/c++/14.0.0/bits/std_function.h:290 [0x2e3279f]
operator() at /usr/bin/../lib/gcc/x86_64-linux-gnu/14.0.0/../../../../include/c++/14.0.0/bits/std_function.h:591 [0x2e3355e]
operator()<std::function<arrow::Status ()> &, arrow::Status, arrow::Future<arrow::internal::Empty> > at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/util/future.h:150 [0x2e3355e]
__invoke_impl<void, arrow::detail::ContinueFuture &, arrow::Future<arrow::internal::Empty> &, std::function<arrow::Status ()> &> at /usr/bin/../lib/gcc/x86_64-linux-gnu/14.0.0/../../../../include/c++/14.0.0/bits/invoke.h:61 [0x2e3355e]
operator() at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/util/functional.h:140 [0x3376417]
WorkerLoop at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/util/thread_pool.cc:457 [0x3376417]
operator() at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/util/thread_pool.cc:618 [0x3376417]
__invoke_impl<void, (lambda at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/util/thread_pool.cc:616:23)> at /usr/bin/../lib/gcc/x86_64-linux-gnu/14.0.0/../../../../include/c++/14.0.0/bits/invoke.h:61 [0x3376417]
__invoke<(lambda at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/util/thread_pool.cc:616:23)> at /usr/bin/../lib/gcc/x86_64-linux-gnu/14.0.0/../../../../include/c++/14.0.0/bits/invoke.h:96 [0x3376417]
_M_invoke<0UL> at /usr/bin/../lib/gcc/x86_64-linux-gnu/14.0.0/../../../../include/c++/14.0.0/bits/std_thread.h:292 [0x3376417]
operator() at /usr/bin/../lib/gcc/x86_64-linux-gnu/14.0.0/../../../../include/c++/14.0.0/bits/std_thread.h:299 [0x3376417]
_M_run at /usr/bin/../lib/gcc/x86_64-linux-gnu/14.0.0/../../../../include/c++/14.0.0/bits/std_thread.h:244 [0x3376417]

@zanmato1984
Copy link
Collaborator

Sure, here it is. The line numbers may be slightly off because I had to manually apply your patch to a then-fresh version of the repository (a couple of weeks before the Arrow 15 release date, when your second PR was still in review).

Visit<(lambda at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/swiss_join.cc:332:9)> at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/swiss_join.cc:133 [0x2e57685]
DecodeSelected at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/swiss_join.cc:330 [0x2e57685]
FlushBuildColumn at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/swiss_join.cc:1672 [0x2e5f9f6]
Flush at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/swiss_join.cc:1716 [0x2e60178]
AppendAndOutput<(lambda at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/swiss_join_internal.h:612:9), (lambda at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/swiss_join.cc:1993:9)> at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/swiss_join_internal.h:570 [0x2e60dbc]
Append<(lambda at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/swiss_join.cc:1993:9)> at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/swiss_join_internal.h:610 [0x2e60dbc]
OnNextBatch at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/swiss_join.cc:1993 [0x2e60dbc]
ProbeSingleBatch at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/swiss_join.cc:2144 [0x2e653e7]
OnProbeSideBatch at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/hash_join_node.cc:818 [0x2e0766d]
InputReceived at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/hash_join_node.cc:891 [0x2e06491]
OutputBatchCallback at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/hash_join_node.cc:1004 [0x2e0a3af]
operator() at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/hash_join_node.cc:947 [0x2e0a3af]
__invoke_impl<arrow::Status, (lambda at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/hash_join_node.cc:947:5) &, long, arrow::compute::ExecBatch> at /usr/bin/../lib/gcc/x86_64-linux-gnu/14.0.0/../../../../include/c++/14.0.0/bits/invoke.h:61 [0x2e0a1fb]
__invoke_r<arrow::Status, (lambda at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/hash_join_node.cc:947:5) &, long, arrow::compute::ExecBatch> at /usr/bin/../lib/gcc/x86_64-linux-gnu/14.0.0/../../../../include/c++/14.0.0/bits/invoke.h:114 [0x2e0a1fb]
_M_invoke at /usr/bin/../lib/gcc/x86_64-linux-gnu/14.0.0/../../../../include/c++/14.0.0/bits/std_function.h:290 [0x2e0a1fb]
operator() at /usr/bin/../lib/gcc/x86_64-linux-gnu/14.0.0/../../../../include/c++/14.0.0/bits/std_function.h:591 [0x2e60eb5]
operator() at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/swiss_join.cc:1993 [0x2e60eb5]
AppendAndOutput<(lambda at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/swiss_join_internal.h:612:9), (lambda at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/swiss_join.cc:1993:9)> at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/swiss_join_internal.h:571 [0x2e60eb5]
Append<(lambda at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/swiss_join.cc:1993:9)> at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/swiss_join_internal.h:610 [0x2e60eb5]
OnNextBatch at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/swiss_join.cc:1993 [0x2e60eb5]
ProbeSingleBatch at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/swiss_join.cc:2144 [0x2e653e7]
OnProbeSideBatch at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/hash_join_node.cc:818 [0x2e0766d]
InputReceived at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/hash_join_node.cc:891 [0x2e06491]
OutputBatchCallback at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/hash_join_node.cc:1004 [0x2e0a3af]
operator() at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/hash_join_node.cc:947 [0x2e0a3af]
__invoke_impl<arrow::Status, (lambda at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/hash_join_node.cc:947:5) &, long, arrow::compute::ExecBatch> at /usr/bin/../lib/gcc/x86_64-linux-gnu/14.0.0/../../../../include/c++/14.0.0/bits/invoke.h:61 [0x2e0a1fb]
__invoke_r<arrow::Status, (lambda at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/hash_join_node.cc:947:5) &, long, arrow::compute::ExecBatch> at /usr/bin/../lib/gcc/x86_64-linux-gnu/14.0.0/../../../../include/c++/14.0.0/bits/invoke.h:114 [0x2e0a1fb]
_M_invoke at /usr/bin/../lib/gcc/x86_64-linux-gnu/14.0.0/../../../../include/c++/14.0.0/bits/std_function.h:290 [0x2e0a1fb]
operator() at /usr/bin/../lib/gcc/x86_64-linux-gnu/14.0.0/../../../../include/c++/14.0.0/bits/std_function.h:591 [0x2e618ef]
operator() at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/swiss_join.cc:2039 [0x2e618ef]
Flush<(lambda at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/swiss_join.cc:2039:5)> at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/swiss_join_internal.h:626 [0x2e618ef]
OnFinished at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/swiss_join.cc:2039 [0x2e618ef]
OnScanHashTableFinished at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/swiss_join.cc:2425 [0x2e683ff]
StartScanHashTable at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/swiss_join.cc:2323 [0x2e68ff0]
ProbingFinished at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/swiss_join.cc:2153 [0x2e655f9]
OnQueuedBatchesProbed at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/hash_join_node.cc:876 [0x2e0a61b]
operator() at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/hash_join_node.cc:968 [0x2e0a61b]
__invoke_impl<arrow::Status, (lambda at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/hash_join_node.cc:967:9) &, unsigned long> at /usr/bin/../lib/gcc/x86_64-linux-gnu/14.0.0/../../../../include/c++/14.0.0/bits/invoke.h:61 [0x2e0a61b]
__invoke_r<arrow::Status, (lambda at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/hash_join_node.cc:967:9) &, unsigned long> at /usr/bin/../lib/gcc/x86_64-linux-gnu/14.0.0/../../../../include/c++/14.0.0/bits/invoke.h:114 [0x2e0a61b]
_M_invoke at /usr/bin/../lib/gcc/x86_64-linux-gnu/14.0.0/../../../../include/c++/14.0.0/bits/std_function.h:290 [0x2e0a61b]
operator() at /usr/bin/../lib/gcc/x86_64-linux-gnu/14.0.0/../../../../include/c++/14.0.0/bits/std_function.h:591 [0x2e695f5]
OnTaskGroupFinished at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/task_util.cc:252 [0x2e695f5]
operator() at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/task_util.cc:371 [0x2e6a313]
__invoke_impl<arrow::Status, (lambda at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/task_util.cc:371:5) &, unsigned long> at /usr/bin/../lib/gcc/x86_64-linux-gnu/14.0.0/../../../../include/c++/14.0.0/bits/invoke.h:61 [0x2e6a313]
__invoke_r<arrow::Status, (lambda at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/task_util.cc:371:5) &, unsigned long> at /usr/bin/../lib/gcc/x86_64-linux-gnu/14.0.0/../../../../include/c++/14.0.0/bits/invoke.h:114 [0x2e6a313]
_M_invoke at /usr/bin/../lib/gcc/x86_64-linux-gnu/14.0.0/../../../../include/c++/14.0.0/bits/std_function.h:290 [0x2e6a313]
operator() at /usr/bin/../lib/gcc/x86_64-linux-gnu/14.0.0/../../../../include/c++/14.0.0/bits/std_function.h:591 [0x2e3279f]
operator() at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/query_context.cc:82 [0x2e3279f]
__invoke_impl<arrow::Status, (lambda at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/query_context.cc:80:40) &> at /usr/bin/../lib/gcc/x86_64-linux-gnu/14.0.0/../../../../include/c++/14.0.0/bits/invoke.h:61 [0x2e3279f]
__invoke_r<arrow::Status, (lambda at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/acero/query_context.cc:80:40) &> at /usr/bin/../lib/gcc/x86_64-linux-gnu/14.0.0/../../../../include/c++/14.0.0/bits/invoke.h:114 [0x2e3279f]
_M_invoke at /usr/bin/../lib/gcc/x86_64-linux-gnu/14.0.0/../../../../include/c++/14.0.0/bits/std_function.h:290 [0x2e3279f]
operator() at /usr/bin/../lib/gcc/x86_64-linux-gnu/14.0.0/../../../../include/c++/14.0.0/bits/std_function.h:591 [0x2e3355e]
operator()<std::function<arrow::Status ()> &, arrow::Status, arrow::Future<arrow::internal::Empty> > at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/util/future.h:150 [0x2e3355e]
__invoke_impl<void, arrow::detail::ContinueFuture &, arrow::Future<arrow::internal::Empty> &, std::function<arrow::Status ()> &> at /usr/bin/../lib/gcc/x86_64-linux-gnu/14.0.0/../../../../include/c++/14.0.0/bits/invoke.h:61 [0x2e3355e]
operator() at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/util/functional.h:140 [0x3376417]
WorkerLoop at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/util/thread_pool.cc:457 [0x3376417]
operator() at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/util/thread_pool.cc:618 [0x3376417]
__invoke_impl<void, (lambda at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/util/thread_pool.cc:616:23)> at /usr/bin/../lib/gcc/x86_64-linux-gnu/14.0.0/../../../../include/c++/14.0.0/bits/invoke.h:61 [0x3376417]
__invoke<(lambda at /tmp/source-root/.conan2/p/b/arrow19d6b0dc5db3a/b/src/cpp/src/arrow/util/thread_pool.cc:616:23)> at /usr/bin/../lib/gcc/x86_64-linux-gnu/14.0.0/../../../../include/c++/14.0.0/bits/invoke.h:96 [0x3376417]
_M_invoke<0UL> at /usr/bin/../lib/gcc/x86_64-linux-gnu/14.0.0/../../../../include/c++/14.0.0/bits/std_thread.h:292 [0x3376417]
operator() at /usr/bin/../lib/gcc/x86_64-linux-gnu/14.0.0/../../../../include/c++/14.0.0/bits/std_thread.h:299 [0x3376417]
_M_run at /usr/bin/../lib/gcc/x86_64-linux-gnu/14.0.0/../../../../include/c++/14.0.0/bits/std_thread.h:244 [0x3376417]

This stack seems quite different from the one in #32570 (comment). I think it is something else that my previous two PRs are not for.

Perhaps you can file a new issue with the provided stack and let's move there for further discussion?

Thanks.

@mpimenov
Copy link

mpimenov commented Feb 5, 2024

On closer inspection, I think you are right. Previously this code crashed with the exact stack as in the post above and after your fixes the stack is different. I've created a new issue as you suggested: #39951

dgreiss pushed a commit to dgreiss/arrow that referenced this issue Feb 19, 2024
…ing consecutive tail rows with the same id may exceed buffer boundary (apache#39234)

### Rationale for this change

Addressed in apache#32570 (comment)

### What changes are included in this PR?

1. Skip consecutive rows with the same id when calculating rows to skip when appending to `ExecBatchBuilder`.
2. Fix the bug that column offset is neglected when calculating rows to skip.

### Are these changes tested?

Yes. New UT included and the change is also protected by the existing case mentioned in the issue.

### Are there any user-facing changes?

No.

**This PR contains a "Critical Fix".**

Because apache#32570 is labeled critical, and causes a crash even when the API contract is upheld.

* Closes: apache#32570

Authored-by: zanmato <zanmato1984@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
dgreiss pushed a commit to dgreiss/arrow that referenced this issue Feb 19, 2024
…g consecutive tail rows with the same id may exceed buffer boundary (for fixed size types) (apache#39585)

### Rationale for this change

apache#39583 is a subsequent issue of apache#32570 (fixed by apache#39234). The last issue and fixed only resolved var length types. It turns out fixed size types have the same issue.

### What changes are included in this PR?

Do the same fix of apache#39234 for fixed size types.

### Are these changes tested?

UT included.

### Are there any user-facing changes?

* Closes: apache#39583

Authored-by: zanmato1984 <zanmato1984@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
raulcd pushed a commit that referenced this issue Feb 20, 2024
…ecutive tail rows with the same id may exceed buffer boundary (for fixed size types) (#39585)

### Rationale for this change

#39583 is a subsequent issue of #32570 (fixed by #39234). The last issue and fixed only resolved var length types. It turns out fixed size types have the same issue.

### What changes are included in this PR?

Do the same fix of #39234 for fixed size types.

### Are these changes tested?

UT included.

### Are there any user-facing changes?

* Closes: #39583

Authored-by: zanmato1984 <zanmato1984@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
zanmato1984 added a commit to zanmato1984/arrow that referenced this issue Feb 28, 2024
…g consecutive tail rows with the same id may exceed buffer boundary (for fixed size types) (apache#39585)

### Rationale for this change

apache#39583 is a subsequent issue of apache#32570 (fixed by apache#39234). The last issue and fixed only resolved var length types. It turns out fixed size types have the same issue.

### What changes are included in this PR?

Do the same fix of apache#39234 for fixed size types.

### Are these changes tested?

UT included.

### Are there any user-facing changes?

* Closes: apache#39583

Authored-by: zanmato1984 <zanmato1984@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
thisisnic pushed a commit to thisisnic/arrow that referenced this issue Mar 8, 2024
…g consecutive tail rows with the same id may exceed buffer boundary (for fixed size types) (apache#39585)

### Rationale for this change

apache#39583 is a subsequent issue of apache#32570 (fixed by apache#39234). The last issue and fixed only resolved var length types. It turns out fixed size types have the same issue.

### What changes are included in this PR?

Do the same fix of apache#39234 for fixed size types.

### Are these changes tested?

UT included.

### Are there any user-facing changes?

* Closes: apache#39583

Authored-by: zanmato1984 <zanmato1984@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: C++ Critical Fix Bugfixes for security vulnerabilities, crashes, or invalid data. Priority: Critical Type: bug
Projects
None yet
8 participants