Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++][Acero] ASAN reports heap buffer overflow in arrow::compute::KeyCompare::CompareBinaryColumnToRow #39577

Closed
zanmato1984 opened this issue Jan 12, 2024 · 2 comments · Fixed by #39606
Assignees
Labels
Component: C++ Critical Fix Bugfixes for security vulnerabilities, crashes, or invalid data. Type: bug
Milestone

Comments

@zanmato1984
Copy link
Collaborator

zanmato1984 commented Jan 12, 2024

Describe the bug, including details regarding any error messages, version, and platform.

Hardware

Apple M1 Pro

OS

macOS Sonoma 14.1.1 (23B81)

Version

3cc04f1

Reproduce

Change test HashJoin.Random code to run more times, e.g. 1000:

const int num_tests = 25;

Build with ASAN enabled and all allocators disabled:

cmake --preset ninja-debug -DARROW_USE_ASAN=ON -DARROW_JEMALLOC=OFF -DARROW_MIMALLOC=OFF ..
ninja -j8

Run specific test:

./debug/arrow-acero-hash-join-node-test --gtest_filter=HashJoin.Random

Result:

Note: Google Test filter = HashJoin.Random
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from HashJoin
[ RUN      ] HashJoin.Random
=================================================================
==70601==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x000115ecf4c0 at pc 0x000106da0b20 bp 0x00016b01a450 sp 0x00016b019c00
READ of size 8 at 0x000115ecf4c0 thread T0
    #0 0x106da0b1c in __asan_memcpy+0x394 (libclang_rt.asan_osx_dynamic.dylib:arm64e+0x50b1c)
    #1 0x108d4f23c in std::__1::enable_if<std::is_trivially_copyable_v<unsigned long long>, unsigned long long>::type arrow::util::SafeLoad<unsigned long long>(unsigned long long const*) ubsan.h:66
    #2 0x119608950 in void arrow::compute::KeyCompare::CompareBinaryColumnToRow<false>(unsigned int, unsigned int, unsigned short const*, unsigned int const*, arrow::compute::LightContext*, arrow::compute::KeyColumnArray const&, arrow::compute::RowTableImpl const&, unsigned char*)::'lambda4'(unsigned char const*, unsigned char const*, unsigned int, unsigned int)::operator()(unsigned char const*, unsigned char const*, unsigned int, unsigned int) const compare_internal.cc:227
    #3 0x119607dec in void arrow::compute::KeyCompare::CompareBinaryColumnToRowHelper<false, void arrow::compute::KeyCompare::CompareBinaryColumnToRow<false>(unsigned int, unsigned int, unsigned short const*, unsigned int const*, arrow::compute::LightContext*, arrow::compute::KeyColumnArray const&, arrow::compute::RowTableImpl const&, unsigned char*)::'lambda4'(unsigned char const*, unsigned char const*, unsigned int, unsigned int)>(unsigned int, unsigned int, unsigned int, unsigned short const*, unsigned int const*, arrow::compute::LightContext*, arrow::compute::KeyColumnArray const&, arrow::compute::RowTableImpl const&, unsigned char*, void arrow::compute::KeyCompare::CompareBinaryColumnToRow<false>(unsigned int, unsigned int, unsigned short const*, unsigned int const*, arrow::compute::LightContext*, arrow::compute::KeyColumnArray const&, arrow::compute::RowTableImpl const&, unsigned char*)::'lambda4'(unsigned char const*, unsigned char const*, unsigned int, unsigned int)) compare_internal.cc:109
    #4 0x119600ad0 in void arrow::compute::KeyCompare::CompareBinaryColumnToRow<false>(unsigned int, unsigned int, unsigned short const*, unsigned int const*, arrow::compute::LightContext*, arrow::compute::KeyColumnArray const&, arrow::compute::RowTableImpl const&, unsigned char*) compare_internal.cc:201
    #5 0x1195fe0dc in arrow::compute::KeyCompare::CompareColumnsToRows(unsigned int, unsigned short const*, unsigned int const*, arrow::compute::LightContext*, unsigned int*, unsigned short*, std::__1::vector<arrow::compute::KeyColumnArray, std::__1::allocator<arrow::compute::KeyColumnArray>> const&, arrow::compute::RowTableImpl const&, bool, unsigned char*) compare_internal.cc:382
    #6 0x108cd2d74 in arrow::acero::RowArray::Compare(arrow::compute::ExecBatch const&, int, int, int, unsigned short const*, unsigned int const*, unsigned int*, unsigned short*, long long, arrow::util::TempVectorStack*, std::__1::vector<arrow::compute::KeyColumnArray, std::__1::allocator<arrow::compute::KeyColumnArray>>&, unsigned char*) swiss_join.cc:252
    #7 0x108ce7674 in arrow::acero::SwissTableWithKeys::EqualCallback(int, unsigned short const*, unsigned int const*, unsigned int*, unsigned short*, void*) swiss_join.cc:923
    #8 0x108d67768 in arrow::acero::SwissTableWithKeys::InitCallbacks()::$_11::operator()(int, unsigned short const*, unsigned int const*, unsigned int*, unsigned short*, void*) const swiss_join.cc:969
    #9 0x108d676d4 in decltype(std::declval<arrow::acero::SwissTableWithKeys::InitCallbacks()::$_11&>()(std::declval<int>(), std::declval<unsigned short const*>(), std::declval<unsigned int const*>(), std::declval<unsigned int*>(), std::declval<unsigned short*>(), std::declval<void*>())) std::__1::__invoke[abi:v160006]<arrow::acero::SwissTableWithKeys::InitCallbacks()::$_11&, int, unsigned short const*, unsigned int const*, unsigned int*, unsigned short*, void*>(arrow::acero::SwissTableWithKeys::InitCallbacks()::$_11&, int&&, unsigned short const*&&, unsigned int const*&&, unsigned int*&&, unsigned short*&&, void*&&) invoke.h:394
    #10 0x108d674a8 in void std::__1::__invoke_void_return_wrapper<void, true>::__call<arrow::acero::SwissTableWithKeys::InitCallbacks()::$_11&, int, unsigned short const*, unsigned int const*, unsigned int*, unsigned short*, void*>(arrow::acero::SwissTableWithKeys::InitCallbacks()::$_11&, int&&, unsigned short const*&&, unsigned int const*&&, unsigned int*&&, unsigned short*&&, void*&&) invoke.h:487
    #11 0x108d67454 in std::__1::__function::__alloc_func<arrow::acero::SwissTableWithKeys::InitCallbacks()::$_11, std::__1::allocator<arrow::acero::SwissTableWithKeys::InitCallbacks()::$_11>, void (int, unsigned short const*, unsigned int const*, unsigned int*, unsigned short*, void*)>::operator()[abi:v160006](int&&, unsigned short const*&&, unsigned int const*&&, unsigned int*&&, unsigned short*&&, void*&&) function.h:185
    #12 0x108d632d8 in std::__1::__function::__func<arrow::acero::SwissTableWithKeys::InitCallbacks()::$_11, std::__1::allocator<arrow::acero::SwissTableWithKeys::InitCallbacks()::$_11>, void (int, unsigned short const*, unsigned int const*, unsigned int*, unsigned short*, void*)>::operator()(int&&, unsigned short const*&&, unsigned int const*&&, unsigned int*&&, unsigned short*&&, void*&&) function.h:356
    #13 0x118e964d8 in std::__1::__function::__value_func<void (int, unsigned short const*, unsigned int const*, unsigned int*, unsigned short*, void*)>::operator()[abi:v160006](int&&, unsigned short const*&&, unsigned int const*&&, unsigned int*&&, unsigned short*&&, void*&&) const function.h:510
    #14 0x118e8dd98 in std::__1::function<void (int, unsigned short const*, unsigned int const*, unsigned int*, unsigned short*, void*)>::operator()(int, unsigned short const*, unsigned int const*, unsigned int*, unsigned short*, void*) const function.h:1156
    #15 0x118e8d530 in arrow::compute::SwissTable::run_comparisons(int, unsigned short const*, unsigned char const*, unsigned int const*, int*, unsigned short*, std::__1::function<void (int, unsigned short const*, unsigned int const*, unsigned int*, unsigned short*, void*)> const&, void*) const key_map.cc:364
    #16 0x118e8e2f0 in arrow::compute::SwissTable::find(int, unsigned int const*, unsigned char*, unsigned char const*, unsigned int*, arrow::util::TempVectorStack*, std::__1::function<void (int, unsigned short const*, unsigned int const*, unsigned int*, unsigned short*, void*)> const&, void*) const key_map.cc:466
    #17 0x108cea49c in arrow::acero::SwissTableWithKeys::Map(arrow::acero::SwissTableWithKeys::Input*, bool, unsigned int const*, unsigned char*, unsigned int*) swiss_join.cc:1040
    #18 0x108ce9824 in arrow::acero::SwissTableWithKeys::MapReadOnly(arrow::acero::SwissTableWithKeys::Input*, unsigned int const*, unsigned char*, unsigned int*) swiss_join.cc:991
    #19 0x108cfe820 in arrow::acero::JoinProbeProcessor::OnNextBatch(long long, arrow::compute::ExecBatch const&, arrow::util::TempVectorStack*, std::__1::vector<arrow::compute::KeyColumnArray, std::__1::allocator<arrow::compute::KeyColumnArray>>*) swiss_join.cc:1911
    #20 0x108d05990 in arrow::acero::SwissJoin::ProbeSingleBatch(unsigned long, arrow::compute::ExecBatch) swiss_join.cc:2144
    #21 0x108b29070 in arrow::acero::HashJoinNode::Init()::'lambda'(unsigned long, long long)::operator()(unsigned long, long long) const hash_join_node.cc:964
    #22 0x108b28e18 in decltype(std::declval<arrow::acero::HashJoinNode::Init()::'lambda'(unsigned long, long long)&>()(std::declval<unsigned long>(), std::declval<long long>())) std::__1::__invoke[abi:v160006]<arrow::acero::HashJoinNode::Init()::'lambda'(unsigned long, long long)&, unsigned long, long long>(arrow::acero::HashJoinNode::Init()::'lambda'(unsigned long, long long)&, unsigned long&&, long long&&) invoke.h:394
    #23 0x108b28d24 in arrow::Status std::__1::__invoke_void_return_wrapper<arrow::Status, false>::__call<arrow::acero::HashJoinNode::Init()::'lambda'(unsigned long, long long)&, unsigned long, long long>(arrow::acero::HashJoinNode::Init()::'lambda'(unsigned long, long long)&, unsigned long&&, long long&&) invoke.h:478
    #24 0x108b28ce8 in std::__1::__function::__alloc_func<arrow::acero::HashJoinNode::Init()::'lambda'(unsigned long, long long), std::__1::allocator<arrow::acero::HashJoinNode::Init()::'lambda'(unsigned long, long long)>, arrow::Status (unsigned long, long long)>::operator()[abi:v160006](unsigned long&&, long long&&) function.h:185
    #25 0x108b24bb8 in std::__1::__function::__func<arrow::acero::HashJoinNode::Init()::'lambda'(unsigned long, long long), std::__1::allocator<arrow::acero::HashJoinNode::Init()::'lambda'(unsigned long, long long)>, arrow::Status (unsigned long, long long)>::operator()(unsigned long&&, long long&&) function.h:356
    #26 0x108db0828 in std::__1::__function::__value_func<arrow::Status (unsigned long, long long)>::operator()[abi:v160006](unsigned long&&, long long&&) const function.h:510
    #27 0x108da1fd4 in std::__1::function<arrow::Status (unsigned long, long long)>::operator()(unsigned long, long long) const function.h:1156
    #28 0x108da1adc in arrow::acero::TaskSchedulerImpl::ExecuteTask(unsigned long, int, long long, bool*) task_util.cc:216
    #29 0x108db80e8 in arrow::acero::TaskSchedulerImpl::ScheduleMore(unsigned long, int)::$_0::operator()(unsigned long) const task_util.cc:371
    #30 0x108db7c80 in decltype(std::declval<arrow::acero::TaskSchedulerImpl::ScheduleMore(unsigned long, int)::$_0&>()(std::declval<unsigned long>())) std::__1::__invoke[abi:v160006]<arrow::acero::TaskSchedulerImpl::ScheduleMore(unsigned long, int)::$_0&, unsigned long>(arrow::acero::TaskSchedulerImpl::ScheduleMore(unsigned long, int)::$_0&, unsigned long&&) invoke.h:394
    #31 0x108db7bd8 in arrow::Status std::__1::__invoke_void_return_wrapper<arrow::Status, false>::__call<arrow::acero::TaskSchedulerImpl::ScheduleMore(unsigned long, int)::$_0&, unsigned long>(arrow::acero::TaskSchedulerImpl::ScheduleMore(unsigned long, int)::$_0&, unsigned long&&) invoke.h:478
    #32 0x108db7ba4 in std::__1::__function::__alloc_func<arrow::acero::TaskSchedulerImpl::ScheduleMore(unsigned long, int)::$_0, std::__1::allocator<arrow::acero::TaskSchedulerImpl::ScheduleMore(unsigned long, int)::$_0>, arrow::Status (unsigned long)>::operator()[abi:v160006](unsigned long&&) function.h:185
    #33 0x108db3d48 in std::__1::__function::__func<arrow::acero::TaskSchedulerImpl::ScheduleMore(unsigned long, int)::$_0, std::__1::allocator<arrow::acero::TaskSchedulerImpl::ScheduleMore(unsigned long, int)::$_0>, arrow::Status (unsigned long)>::operator()(unsigned long&&) function.h:356
    #34 0x10510623c in std::__1::__function::__value_func<arrow::Status (unsigned long)>::operator()[abi:v160006](unsigned long&&) const function.h:510
    #35 0x105105798 in std::__1::function<arrow::Status (unsigned long)>::operator()(unsigned long) const function.h:1156
    #36 0x108c005b4 in arrow::acero::QueryContext::ScheduleTask(std::__1::function<arrow::Status (unsigned long)>, std::__1::basic_string_view<char, std::__1::char_traits<char>>)::$_2::operator()() const query_context.cc:82
    #37 0x108c0052c in decltype(std::declval<arrow::acero::QueryContext::ScheduleTask(std::__1::function<arrow::Status (unsigned long)>, std::__1::basic_string_view<char, std::__1::char_traits<char>>)::$_2&>()()) std::__1::__invoke[abi:v160006]<arrow::acero::QueryContext::ScheduleTask(std::__1::function<arrow::Status (unsigned long)>, std::__1::basic_string_view<char, std::__1::char_traits<char>>)::$_2&>(arrow::acero::QueryContext::ScheduleTask(std::__1::function<arrow::Status (unsigned long)>, std::__1::basic_string_view<char, std::__1::char_traits<char>>)::$_2&) invoke.h:394
    #38 0x108c004dc in arrow::Status std::__1::__invoke_void_return_wrapper<arrow::Status, false>::__call<arrow::acero::QueryContext::ScheduleTask(std::__1::function<arrow::Status (unsigned long)>, std::__1::basic_string_view<char, std::__1::char_traits<char>>)::$_2&>(arrow::acero::QueryContext::ScheduleTask(std::__1::function<arrow::Status (unsigned long)>, std::__1::basic_string_view<char, std::__1::char_traits<char>>)::$_2&) invoke.h:478
    #39 0x108c004b0 in std::__1::__function::__alloc_func<arrow::acero::QueryContext::ScheduleTask(std::__1::function<arrow::Status (unsigned long)>, std::__1::basic_string_view<char, std::__1::char_traits<char>>)::$_2, std::__1::allocator<arrow::acero::QueryContext::ScheduleTask(std::__1::function<arrow::Status (unsigned long)>, std::__1::basic_string_view<char, std::__1::char_traits<char>>)::$_2>, arrow::Status ()>::operator()[abi:v160006]() function.h:185
    #40 0x108bfc1b4 in std::__1::__function::__func<arrow::acero::QueryContext::ScheduleTask(std::__1::function<arrow::Status (unsigned long)>, std::__1::basic_string_view<char, std::__1::char_traits<char>>)::$_2, std::__1::allocator<arrow::acero::QueryContext::ScheduleTask(std::__1::function<arrow::Status (unsigned long)>, std::__1::basic_string_view<char, std::__1::char_traits<char>>)::$_2>, arrow::Status ()>::operator()() function.h:356
    #41 0x10865bf88 in std::__1::__function::__value_func<arrow::Status ()>::operator()[abi:v160006]() const function.h:510
    #42 0x10864ce20 in std::__1::function<arrow::Status ()>::operator()() const function.h:1156
    #43 0x108bf8908 in std::__1::enable_if<!std::is_void<arrow::Status>::value && !is_future<arrow::Status>::value && (!arrow::Future<arrow::internal::Empty>::is_empty || std::is_same<arrow::Status, arrow::Status>::value), void>::type arrow::detail::ContinueFuture::operator()<std::__1::function<arrow::Status ()>&, arrow::Status, arrow::Future<arrow::internal::Empty>>(arrow::Future<arrow::internal::Empty>, std::__1::function<arrow::Status ()>&) const future.h:150
    #44 0x108bf8630 in decltype(std::declval<arrow::detail::ContinueFuture&>()(std::declval<arrow::Future<arrow::internal::Empty>&>(), std::declval<std::__1::function<arrow::Status ()>&>())) std::__1::__invoke[abi:v160006]<arrow::detail::ContinueFuture&, arrow::Future<arrow::internal::Empty>&, std::__1::function<arrow::Status ()>&>(arrow::detail::ContinueFuture&, arrow::Future<arrow::internal::Empty>&, std::__1::function<arrow::Status ()>&) invoke.h:394
    #45 0x108bf84d0 in std::__1::__bind_return<arrow::detail::ContinueFuture, std::__1::tuple<arrow::Future<arrow::internal::Empty>, std::__1::function<arrow::Status ()>>, std::__1::tuple<>, __is_valid_bind_return<arrow::detail::ContinueFuture, std::__1::tuple<arrow::Future<arrow::internal::Empty>, std::__1::function<arrow::Status ()>>, std::__1::tuple<>>::value>::type std::__1::__apply_functor[abi:v160006]<arrow::detail::ContinueFuture, std::__1::tuple<arrow::Future<arrow::internal::Empty>, std::__1::function<arrow::Status ()>>, 0ul, 1ul, std::__1::tuple<>>(arrow::detail::ContinueFuture&, std::__1::tuple<arrow::Future<arrow::internal::Empty>, std::__1::function<arrow::Status ()>>&, std::__1::__tuple_indices<0ul, 1ul>, std::__1::tuple<>&&) bind.h:263
    #46 0x108bf83d4 in std::__1::__bind_return<arrow::detail::ContinueFuture, std::__1::tuple<arrow::Future<arrow::internal::Empty>, std::__1::function<arrow::Status ()>>, std::__1::tuple<>, __is_valid_bind_return<arrow::detail::ContinueFuture, std::__1::tuple<arrow::Future<arrow::internal::Empty>, std::__1::function<arrow::Status ()>>, std::__1::tuple<>>::value>::type std::__1::__bind<arrow::detail::ContinueFuture, arrow::Future<arrow::internal::Empty>&, std::__1::function<arrow::Status ()>>::operator()[abi:v160006]<>() bind.h:295
    #47 0x108bf8210 in arrow::internal::FnOnce<void ()>::FnImpl<std::__1::__bind<arrow::detail::ContinueFuture, arrow::Future<arrow::internal::Empty>&, std::__1::function<arrow::Status ()>>>::invoke() functional.h:152
    #48 0x1183acf00 in arrow::internal::FnOnce<void ()>::operator()() && functional.h:140
    #49 0x1183ac674 in arrow::internal::SerialExecutor::RunLoop() thread_pool.cc:252
    #50 0x108992068 in arrow::Future<arrow::acero::BatchesWithCommonSchema> arrow::internal::SerialExecutor::Run<arrow::acero::BatchesWithCommonSchema, arrow::Result<arrow::acero::BatchesWithCommonSchema>>(arrow::internal::FnOnce<arrow::Future<arrow::acero::BatchesWithCommonSchema> (arrow::internal::Executor*)>) thread_pool.h:420
    #51 0x1089911d0 in arrow::Result<arrow::acero::BatchesWithCommonSchema> arrow::internal::SerialExecutor::RunInSerialExecutor<arrow::acero::BatchesWithCommonSchema, arrow::Future<arrow::acero::BatchesWithCommonSchema>, arrow::Result<arrow::acero::BatchesWithCommonSchema>>(arrow::internal::FnOnce<arrow::Future<arrow::acero::BatchesWithCommonSchema> (arrow::internal::Executor*)>) thread_pool.h:300
    #52 0x1088df00c in arrow::Future<arrow::acero::BatchesWithCommonSchema>::SyncType arrow::internal::RunSynchronously<arrow::Future<arrow::acero::BatchesWithCommonSchema>, arrow::acero::BatchesWithCommonSchema>(arrow::internal::FnOnce<arrow::Future<arrow::acero::BatchesWithCommonSchema> (arrow::internal::Executor*)>, bool) thread_pool.h:590
    #53 0x1088dec6c in arrow::acero::DeclarationToExecBatches(arrow::acero::Declaration, bool, arrow::MemoryPool*, arrow::compute::FunctionRegistry*) exec_plan.cc:878
    #54 0x104f100c8 in arrow::acero::HashJoinWithExecPlan(arrow::acero::Random64Bit&, bool, arrow::acero::HashJoinNodeOptions const&, std::__1::shared_ptr<arrow::Schema> const&, std::__1::vector<std::__1::shared_ptr<arrow::Array>, std::__1::allocator<std::__1::shared_ptr<arrow::Array>>> const&, std::__1::vector<std::__1::shared_ptr<arrow::Array>, std::__1::allocator<std::__1::shared_ptr<arrow::Array>>> const&, int, int) hash_join_node_test.cc:920
    #55 0x104f1a1d4 in arrow::acero::HashJoin_Random_Test::TestBody() hash_join_node_test.cc:1154
    #56 0x105f6b718 in void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) gtest.cc:2607
    #57 0x105f29a74 in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) gtest.cc:2643
    #58 0x105f299c4 in testing::Test::Run() gtest.cc:2682
    #59 0x105f2aa68 in testing::TestInfo::Run() gtest.cc:2861
    #60 0x105f2bb58 in testing::TestSuite::Run() gtest.cc:3015
    #61 0x105f398d4 in testing::internal::UnitTestImpl::RunAllTests() gtest.cc:5855
    #62 0x105f72ab0 in bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) gtest.cc:2607
    #63 0x105f392a8 in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) gtest.cc:2643
    #64 0x105f39194 in testing::UnitTest::Run() gtest.cc:5438
    #65 0x105a9bee4 in RUN_ALL_TESTS() gtest.h:2490
    #66 0x105a9bec8 in main gtest_main.cc:52
    #67 0x18dc050dc  (<unknown module>)

0x000115ecf4c0 is located 0 bytes after 3136-byte region [0x000115ece880,0x000115ecf4c0)
allocated by thread T0 here:
    #0 0x106da3308 in wrap_posix_memalign+0xa4 (libclang_rt.asan_osx_dynamic.dylib:arm64e+0x53308)
    #1 0x117a3fb44 in arrow::(anonymous namespace)::SystemAllocator::AllocateAligned(long long, long long, unsigned char**) memory_pool.cc:323
    #2 0x117a451d0 in arrow::BaseMemoryPoolImpl<arrow::(anonymous namespace)::SystemAllocator>::Allocate(long long, long long, unsigned char**) memory_pool.cc:465
    #3 0x117a4e51c in arrow::PoolBuffer::Reserve(long long) memory_pool.cc:867
    #4 0x117a4da58 in arrow::PoolBuffer::Resize(long long, bool) memory_pool.cc:891
    #5 0x117a334c8 in arrow::Result<std::__1::unique_ptr<arrow::ResizableBuffer, std::__1::default_delete<arrow::ResizableBuffer>>> arrow::(anonymous namespace)::ResizePoolBuffer<std::__1::unique_ptr<arrow::ResizableBuffer, std::__1::default_delete<arrow::ResizableBuffer>>, std::__1::unique_ptr<arrow::PoolBuffer, std::__1::default_delete<arrow::PoolBuffer>>>(std::__1::unique_ptr<arrow::PoolBuffer, std::__1::default_delete<arrow::PoolBuffer>>&&, long long) memory_pool.cc:931
    #6 0x117a331f4 in arrow::AllocateResizableBuffer(long long, long long, arrow::MemoryPool*) memory_pool.cc:957
    #7 0x104e2a5a4 in arrow::BufferBuilder::Resize(long long, bool) buffer_builder.h:78
    #8 0x10657a9c0 in arrow::BufferBuilder::Reserve(long long) buffer_builder.h:98
    #9 0x10657a0a8 in arrow::TypedBufferBuilder<unsigned char, void>::Reserve(long long) buffer_builder.h:291
    #10 0x1194dfec8 in arrow::Status arrow::compute::internal::(anonymous namespace)::FSBSelectionImpl::GenerateOutput<arrow::compute::internal::(anonymous namespace)::Selection<arrow::compute::internal::(anonymous namespace)::FSBSelectionImpl, arrow::FixedSizeBinaryType>::FilterAdapter>() vector_selection_internal.cc:581
    #11 0x1194df048 in arrow::compute::internal::(anonymous namespace)::Selection<arrow::compute::internal::(anonymous namespace)::FSBSelectionImpl, arrow::FixedSizeBinaryType>::ExecFilter() vector_selection_internal.cc:469
    #12 0x1194d8210 in arrow::Status arrow::compute::internal::(anonymous namespace)::FilterExec<arrow::compute::internal::(anonymous namespace)::FSBSelectionImpl>(arrow::compute::KernelContext*, arrow::compute::ExecSpan const&, arrow::compute::ExecResult*) vector_selection_internal.cc:897
    #13 0x1194d7fd8 in arrow::compute::internal::FSBFilterExec(arrow::compute::KernelContext*, arrow::compute::ExecSpan const&, arrow::compute::ExecResult*) vector_selection_internal.cc:903
    #14 0x118d4f334 in arrow::compute::detail::(anonymous namespace)::VectorExecutor::Exec(arrow::compute::ExecSpan const&, arrow::compute::detail::ExecListener*) exec.cc:1109
    #15 0x118d4d308 in arrow::compute::detail::(anonymous namespace)::VectorExecutor::Execute(arrow::compute::ExecBatch const&, arrow::compute::detail::ExecListener*) exec.cc:1049
    #16 0x118e28514 in arrow::compute::detail::FunctionExecutorImpl::Execute(std::__1::vector<arrow::Datum, std::__1::allocator<arrow::Datum>> const&, long long) function.cc:277
    #17 0x118e02aac in arrow::compute::(anonymous namespace)::ExecuteInternal(arrow::compute::Function const&, std::__1::vector<arrow::Datum, std::__1::allocator<arrow::Datum>>, long long, arrow::compute::FunctionOptions const*, arrow::compute::ExecContext*) function.cc:342
    #18 0x118e02168 in arrow::compute::Function::Execute(std::__1::vector<arrow::Datum, std::__1::allocator<arrow::Datum>> const&, arrow::compute::FunctionOptions const*, arrow::compute::ExecContext*) const function.cc:349
    #19 0x118d18b24 in arrow::compute::CallFunction(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, std::__1::vector<arrow::Datum, std::__1::allocator<arrow::Datum>> const&, arrow::compute::FunctionOptions const*, arrow::compute::ExecContext*) exec.cc:1369
    #20 0x1194c8ae0 in arrow::compute::internal::(anonymous namespace)::FilterMetaFunction::ExecuteImpl(std::__1::vector<arrow::Datum, std::__1::allocator<arrow::Datum>> const&, arrow::compute::FunctionOptions const*, arrow::compute::ExecContext*) const vector_selection_filter_internal.cc:1026
    #21 0x118e06cf8 in arrow::compute::MetaFunction::Execute(std::__1::vector<arrow::Datum, std::__1::allocator<arrow::Datum>> const&, arrow::compute::FunctionOptions const*, arrow::compute::ExecContext*) const function.cc:482
    #22 0x118d18b24 in arrow::compute::CallFunction(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, std::__1::vector<arrow::Datum, std::__1::allocator<arrow::Datum>> const&, arrow::compute::FunctionOptions const*, arrow::compute::ExecContext*) exec.cc:1369
    #23 0x118c641a0 in arrow::compute::Filter(arrow::Datum const&, arrow::Datum const&, arrow::compute::FilterOptions const&, arrow::compute::ExecContext*) api_vector.cc:365
    #24 0x108ae3880 in arrow::acero::BloomFilterPushdownContext::FilterSingleBatch(unsigned long, arrow::compute::ExecBatch*) hash_join_node.cc:597
    #25 0x108b87974 in arrow::acero::BloomFilterPushdownContext::Init(arrow::acero::HashJoinNode*, unsigned long, std::__1::function<int (std::__1::function<arrow::Status (unsigned long, long long)>, std::__1::function<arrow::Status (unsigned long)>)>, std::__1::function<arrow::Status (int, long long)>, std::__1::function<arrow::Status (unsigned long)>, bool, bool)::$_3::operator()(unsigned long, long long) const hash_join_node.cc:1073
    #26 0x108b878d4 in decltype(std::declval<arrow::acero::BloomFilterPushdownContext::Init(arrow::acero::HashJoinNode*, unsigned long, std::__1::function<int (std::__1::function<arrow::Status (unsigned long, long long)>, std::__1::function<arrow::Status (unsigned long)>)>, std::__1::function<arrow::Status (int, long long)>, std::__1::function<arrow::Status (unsigned long)>, bool, bool)::$_3&>()(std::declval<unsigned long>(), std::declval<long long>())) std::__1::__invoke[abi:v160006]<arrow::acero::BloomFilterPushdownContext::Init(arrow::acero::HashJoinNode*, unsigned long, std::__1::function<int (std::__1::function<arrow::Status (unsigned long, long long)>, std::__1::function<arrow::Status (unsigned long)>)>, std::__1::function<arrow::Status (int, long long)>, std::__1::function<arrow::Status (unsigned long)>, bool, bool)::$_3&, unsigned long, long long>(arrow::acero::BloomFilterPushdownContext::Init(arrow::acero::HashJoinNode*, unsigned long, std::__1::function<int (std::__1::function<arrow::Status (unsigned long, long long)>, std::__1::function<arrow::Status (unsigned long)>)>, std::__1::function<arrow::Status (int, long long)>, std::__1::function<arrow::Status (unsigned long)>, bool, bool)::$_3&, unsigned long&&, long long&&) invoke.h:394
    #27 0x108b877e0 in arrow::Status std::__1::__invoke_void_return_wrapper<arrow::Status, false>::__call<arrow::acero::BloomFilterPushdownContext::Init(arrow::acero::HashJoinNode*, unsigned long, std::__1::function<int (std::__1::function<arrow::Status (unsigned long, long long)>, std::__1::function<arrow::Status (unsigned long)>)>, std::__1::function<arrow::Status (int, long long)>, std::__1::function<arrow::Status (unsigned long)>, bool, bool)::$_3&, unsigned long, long long>(arrow::acero::BloomFilterPushdownContext::Init(arrow::acero::HashJoinNode*, unsigned long, std::__1::function<int (std::__1::function<arrow::Status (unsigned long, long long)>, std::__1::function<arrow::Status (unsigned long)>)>, std::__1::function<arrow::Status (int, long long)>, std::__1::function<arrow::Status (unsigned long)>, bool, bool)::$_3&, unsigned long&&, long long&&) invoke.h:478
    #28 0x108b877a4 in std::__1::__function::__alloc_func<arrow::acero::BloomFilterPushdownContext::Init(arrow::acero::HashJoinNode*, unsigned long, std::__1::function<int (std::__1::function<arrow::Status (unsigned long, long long)>, std::__1::function<arrow::Status (unsigned long)>)>, std::__1::function<arrow::Status (int, long long)>, std::__1::function<arrow::Status (unsigned long)>, bool, bool)::$_3, std::__1::allocator<arrow::acero::BloomFilterPushdownContext::Init(arrow::acero::HashJoinNode*, unsigned long, std::__1::function<int (std::__1::function<arrow::Status (unsigned long, long long)>, std::__1::function<arrow::Status (unsigned long)>)>, std::__1::function<arrow::Status (int, long long)>, std::__1::function<arrow::Status (unsigned long)>, bool, bool)::$_3>, arrow::Status (unsigned long, long long)>::operator()[abi:v160006](unsigned long&&, long long&&) function.h:185
    #29 0x108b83674 in std::__1::__function::__func<arrow::acero::BloomFilterPushdownContext::Init(arrow::acero::HashJoinNode*, unsigned long, std::__1::function<int (std::__1::function<arrow::Status (unsigned long, long long)>, std::__1::function<arrow::Status (unsigned long)>)>, std::__1::function<arrow::Status (int, long long)>, std::__1::function<arrow::Status (unsigned long)>, bool, bool)::$_3, std::__1::allocator<arrow::acero::BloomFilterPushdownContext::Init(arrow::acero::HashJoinNode*, unsigned long, std::__1::function<int (std::__1::function<arrow::Status (unsigned long, long long)>, std::__1::function<arrow::Status (unsigned long)>)>, std::__1::function<arrow::Status (int, long long)>, std::__1::function<arrow::Status (unsigned long)>, bool, bool)::$_3>, arrow::Status (unsigned long, long long)>::operator()(unsigned long&&, long long&&) function.h:356

SUMMARY: AddressSanitizer: heap-buffer-overflow (libclang_rt.asan_osx_dynamic.dylib:arm64e+0x50b1c) in __asan_memcpy+0x394
Shadow bytes around the buggy address:
  0x000115ecf200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x000115ecf280: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x000115ecf300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x000115ecf380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x000115ecf400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x000115ecf480: 00 00 00 00 00 00 00 00[fa]fa fa fa fa fa fa fa
  0x000115ecf500: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x000115ecf580: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x000115ecf600: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x000115ecf680: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x000115ecf700: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==70601==ABORTING
[1]    70601 abort      ./debug/arrow-acero-hash-join-node-test --gtest_filter=HashJoin.Random

Component(s)

C++

@zanmato1984
Copy link
Collaborator Author

The cause seems to be that, the default buffer alignment (64b) doesn’t guarantee tail bytes safety when doing by-word operation for long fixed size types.

Did some debugging, I found for this particular case, an encoded row took 19b, and there were 165 rows. They took 19b * 165 = 3135b, so 3136b is the actual size aligned by 64b. The last row access started at byte 3116 for 3 words (24b), which eventually exceeded the size 3136 buffer boundary by 4b.

I’m working on a fix.

@zanmato1984
Copy link
Collaborator Author

Take

pitrou pushed a commit that referenced this issue Jan 16, 2024
…eBinaryColumnToRow` (#39606)

### Rationale for this change

Default buffer alignment (64b) doesn't guarantee the safety of tail-word access in  `KeyCompare::CompareBinaryColumnToRow`. Comment #39577 (comment) is a concrete example.

### What changes are included in this PR?

Make `KeyCompare::CompareBinaryColumnToRow` tail-word safe.

### Are these changes tested?

UT included.

### Are there any user-facing changes?

No.

* Closes: #39577

Authored-by: zanmato1984 <zanmato1984@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
@pitrou pitrou added this to the 16.0.0 milestone Jan 16, 2024
@raulcd raulcd modified the milestones: 16.0.0, 15.0.1 Jan 18, 2024
idailylife pushed a commit to idailylife/arrow that referenced this issue Jan 18, 2024
…CompareBinaryColumnToRow` (apache#39606)

### Rationale for this change

Default buffer alignment (64b) doesn't guarantee the safety of tail-word access in  `KeyCompare::CompareBinaryColumnToRow`. Comment apache#39577 (comment) is a concrete example.

### What changes are included in this PR?

Make `KeyCompare::CompareBinaryColumnToRow` tail-word safe.

### Are these changes tested?

UT included.

### Are there any user-facing changes?

No.

* Closes: apache#39577

Authored-by: zanmato1984 <zanmato1984@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
clayburn pushed a commit to clayburn/arrow that referenced this issue Jan 23, 2024
…CompareBinaryColumnToRow` (apache#39606)

### Rationale for this change

Default buffer alignment (64b) doesn't guarantee the safety of tail-word access in  `KeyCompare::CompareBinaryColumnToRow`. Comment apache#39577 (comment) is a concrete example.

### What changes are included in this PR?

Make `KeyCompare::CompareBinaryColumnToRow` tail-word safe.

### Are these changes tested?

UT included.

### Are there any user-facing changes?

No.

* Closes: apache#39577

Authored-by: zanmato1984 <zanmato1984@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
@zanmato1984 zanmato1984 changed the title [C++][Acero] ASAN reports heap buffer overflow in hash join test [C++][Acero] ASAN reports heap buffer overflow in arrow::compute::KeyCompare::CompareBinaryColumnToRow Jan 24, 2024
dgreiss pushed a commit to dgreiss/arrow that referenced this issue Feb 19, 2024
…CompareBinaryColumnToRow` (apache#39606)

### Rationale for this change

Default buffer alignment (64b) doesn't guarantee the safety of tail-word access in  `KeyCompare::CompareBinaryColumnToRow`. Comment apache#39577 (comment) is a concrete example.

### What changes are included in this PR?

Make `KeyCompare::CompareBinaryColumnToRow` tail-word safe.

### Are these changes tested?

UT included.

### Are there any user-facing changes?

No.

* Closes: apache#39577

Authored-by: zanmato1984 <zanmato1984@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
raulcd pushed a commit that referenced this issue Feb 20, 2024
…eBinaryColumnToRow` (#39606)

### Rationale for this change

Default buffer alignment (64b) doesn't guarantee the safety of tail-word access in  `KeyCompare::CompareBinaryColumnToRow`. Comment #39577 (comment) is a concrete example.

### What changes are included in this PR?

Make `KeyCompare::CompareBinaryColumnToRow` tail-word safe.

### Are these changes tested?

UT included.

### Are there any user-facing changes?

No.

* Closes: #39577

Authored-by: zanmato1984 <zanmato1984@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
@amoeba amoeba added the Critical Fix Bugfixes for security vulnerabilities, crashes, or invalid data. label Feb 28, 2024
zanmato1984 added a commit to zanmato1984/arrow that referenced this issue Feb 28, 2024
…CompareBinaryColumnToRow` (apache#39606)

### Rationale for this change

Default buffer alignment (64b) doesn't guarantee the safety of tail-word access in  `KeyCompare::CompareBinaryColumnToRow`. Comment apache#39577 (comment) is a concrete example.

### What changes are included in this PR?

Make `KeyCompare::CompareBinaryColumnToRow` tail-word safe.

### Are these changes tested?

UT included.

### Are there any user-facing changes?

No.

* Closes: apache#39577

Authored-by: zanmato1984 <zanmato1984@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
thisisnic pushed a commit to thisisnic/arrow that referenced this issue Mar 8, 2024
…CompareBinaryColumnToRow` (apache#39606)

### Rationale for this change

Default buffer alignment (64b) doesn't guarantee the safety of tail-word access in  `KeyCompare::CompareBinaryColumnToRow`. Comment apache#39577 (comment) is a concrete example.

### What changes are included in this PR?

Make `KeyCompare::CompareBinaryColumnToRow` tail-word safe.

### Are these changes tested?

UT included.

### Are there any user-facing changes?

No.

* Closes: apache#39577

Authored-by: zanmato1984 <zanmato1984@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: C++ Critical Fix Bugfixes for security vulnerabilities, crashes, or invalid data. Type: bug
Projects
None yet
4 participants