Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bitmap过大导致Be宕掉 #5849

Closed
xqinghu opened this issue May 20, 2021 · 3 comments · Fixed by #5857 or #5893
Closed

Bitmap过大导致Be宕掉 #5849

xqinghu opened this issue May 20, 2021 · 3 comments · Fixed by #5857 or #5893

Comments

@xqinghu
Copy link

xqinghu commented May 20, 2021

Describe the bug
导入数据,某一个ad_channel_id的pv大约1.6亿, report_test1查询全部BE会宕掉,report_test2 compaction会宕

version: https://github.com/baidu-doris/incubator-doris/releases/tag/DORIS-0.13.15-release

report_test1:
SELECT ad_channel_id,bitmap_union_count(pv) FROM report_test1 WHERE ad_channel_id=111 GROUP BY ad_channel_id

表结构:
CREATE TABLEreport_test1(ad_channel_id int(11) NOT NULL COMMENT "",aid bitint(20) NOT NULL COMMENT "",n bigint(20) SUM NULL DEFAULT "0" COMMENT "",pv bitmap BITMAP_UNION NOT NULL COMMENT "" ) ENGINE=OLAP AGGREGATE KEY(ad_channel_id,aid) COMMENT "OLAP" DISTRIBUTED BY HASH(aid`) BUCKETS 32;

CREATE TABLE report_test2 (
ad_channel_id int(11) NOT NULL COMMENT "",
n bigint(20) SUM NULL DEFAULT "0" COMMENT "",
pv bitmap BITMAP_UNION NOT NULL COMMENT ""
) ENGINE=OLAP
AGGREGATE KEY(ad_channel_id)
COMMENT "OLAP"
DISTRIBUTED BY HASH(ad_channel_id) BUCKETS 32;
`

查询宕掉错误日志:
PC: @ 0x7fa43730b866 __memcpy_ssse3_back *** SIGSEGV (@0x0) received by PID 22647 (TID 0x7fa3ed56e700) from PID 0; stack trace: *** @ 0x1bb6aa1 google::(anonymous namespace)::FailureSignalHandler() @ 0x7fa437eb65d0 (unknown) @ 0x7fa43730b866 __memcpy_ssse3_back @ 0x2127b3f array_container_clone @ 0x2122e78 ra_copy @ 0xf2934f Roaring::Roaring() @ 0xf298e9 doris::BitmapValue::write() @ 0xf26a2c doris::BitmapFunctions::bitmap_serialize() @ 0x16d530f doris::NewAggFnEvaluator::SerializeOrFinalize() @ 0x164a495 doris::PartitionedAggregationNode::GetOutputTuple() @ 0x1650600 doris::PartitionedAggregationNode::GetRowsFromPartition() @ 0x1650ac6 doris::PartitionedAggregationNode::GetNextInternal() @ 0x1650c3f doris::PartitionedAggregationNode::get_next() @ 0x11cdcb9 doris::PlanFragmentExecutor::get_next_internal() @ 0x11ce778 doris::PlanFragmentExecutor::open_internal() @ 0x11ceeff doris::PlanFragmentExecutor::open() @ 0x115791e doris::FragmentExecState::execute() @ 0x115a556 doris::FragmentMgr::_exec_actual() @ 0x115fa1d std::_Function_handler<>::_M_invoke() @ 0x12a1f42 doris::ThreadPool::dispatch_thread() @ 0x129bc65 doris::Thread::supervise_thread() @ 0x7fa437eaedd5 start_thread @ 0x7fa4372b402d __clone

查询宕掉Core堆栈:
[New LWP 28096] [New LWP 28261] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by/usr/local/doris/be/lib/palo_be'.
Program terminated with signal 6, Aborted.
#0 0x00007f232aa7f2c7 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.17-260.el7_6.6.x86_64 libgcc-4.8.5-39.el7.x86_64 zlib-1.2.7-18.el7.x86_64
(gdb) bt
#0 0x00007f232aa7f2c7 in raise () from /lib64/libc.so.6
#1 0x00007f232aa809b8 in abort () from /lib64/libc.so.6
#2 0x00007f232aa780e6 in __assert_fail_base () from /lib64/libc.so.6
#3 0x00007f232aa78192 in __assert_fail () from /lib64/libc.so.6
#4 0x0000000002129da5 in container_clone ()
#5 0x0000000002122e78 in ra_copy ()
#6 0x0000000000f2934f in Roaring::Roaring (this=0x7f22e0768eb8, r=...) at /data/tmp/incubator-doris-DORIS-0.13.15-release/thirdparty/installed/include/roaring/roaring.h:52
#7 0x0000000000f298e9 in pair<unsigned int const, Roaring> (__p=..., this=0x7f22e0768eb0) at /data/tmp/incubator-doris-DORIS-0.13.15-release/be/src/util/bitmap_value.h:596
#8 for_each<std::_Rb_tree_const_iterator<std::pair<unsigned int const, Roaring> >, doris::detail::Roaring64Map::write(char*) const::<lambda(const std::pair<unsigned int, Roaring>&)> > (__f=..., __last=..., __first=...)
at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/stl_algo.h:3882
#9 write (buf=0x1004ab2b "", this=0x10e9a508) at /data/tmp/incubator-doris-DORIS-0.13.15-release/be/src/util/bitmap_value.h:603
#10 doris::BitmapValue::write (this=0x10e9a500, dst=) at /data/tmp/incubator-doris-DORIS-0.13.15-release/be/src/util/bitmap_value.h:1158
#11 0x0000000000f26a2c in serialize (value=0x10e9a500, ctx=0x12aa9390) at /data/tmp/incubator-doris-DORIS-0.13.15-release/be/src/exprs/bitmap_function.cpp:160
#12 doris::BitmapFunctions::bitmap_serialize (ctx=0x12aa9390, src=...) at /data/tmp/incubator-doris-DORIS-0.13.15-release/be/src/exprs/bitmap_function.cpp:380
#13 0x00000000016d530f in doris::NewAggFnEvaluator::SerializeOrFinalize (this=, src=src@entry=0x2ff42c000, dst_slot_desc=..., dst=dst@entry=0x2ff42c000, fn=)
at /data/tmp/incubator-doris-DORIS-0.13.15-release/be/src/exprs/new_agg_fn_evaluator.cc:616
#14 0x000000000164a495 in Serialize (tuple=0x2ff42c000, this=) at /data/tmp/incubator-doris-DORIS-0.13.15-release/be/src/exprs/agg_fn.h:118
#15 Serialize (dst=, evals=...) at /data/tmp/incubator-doris-DORIS-0.13.15-release/be/src/exprs/new_agg_fn_evaluator.h:310
#16 doris::PartitionedAggregationNode::GetOutputTuple (this=this@entry=0x11c35600, agg_fn_evals=..., tuple=0x2ff42c000, pool=) at /data/tmp/incubator-doris-DORIS-0.13.15-release/be/src/exec/partitioned_aggregation_node.cc:1048
#17 0x0000000001650600 in doris::PartitionedAggregationNode::GetRowsFromPartition(doris::RuntimeState*, doris::RowBatch*) () at /data/tmp/incubator-doris-DORIS-0.13.15-release/be/src/runtime/row_batch.h:236
#18 0x0000000001650ac6 in doris::PartitionedAggregationNode::GetNextInternal(doris::RuntimeState*, doris::RowBatch*, bool*) () at /data/tmp/incubator-doris-DORIS-0.13.15-release/be/src/exec/partitioned_aggregation_node.cc:429
#19 0x0000000001650c3f in doris::PartitionedAggregationNode::get_next(doris::RuntimeState*, doris::RowBatch*, bool*) () at /data/tmp/incubator-doris-DORIS-0.13.15-release/be/src/exec/partitioned_aggregation_node.cc:349
#20 0x00000000011cdcb9 in doris::PlanFragmentExecutor::get_next_internal(doris::RowBatch**) () at /data/tmp/incubator-doris-DORIS-0.13.15-release/be/src/runtime/plan_fragment_executor.cpp:476
#21 0x00000000011ce778 in doris::PlanFragmentExecutor::open_internal() () at /data/tmp/incubator-doris-DORIS-0.13.15-release/be/src/runtime/plan_fragment_executor.cpp:287
#22 0x00000000011ceeff in doris::PlanFragmentExecutor::open() () at /data/tmp/incubator-doris-DORIS-0.13.15-release/be/src/runtime/plan_fragment_executor.cpp:253
#23 0x000000000115791e in doris::FragmentExecState::execute() () at /data/tmp/incubator-doris-DORIS-0.13.15-release/be/src/runtime/fragment_mgr.cpp:219
#24 0x000000000115a556 in doris::FragmentMgr::_exec_actual(std::shared_ptrdoris::FragmentExecState, std::function<void (doris::PlanFragmentExecutor*)>) () at /data/tmp/incubator-doris-DORIS-0.13.15-release/be/src/runtime/fragment_mgr.cpp:422
#25 0x000000000115fa1d in __invoke_impl<void, void (doris::FragmentMgr::&)(std::shared_ptrdoris::FragmentExecState, std::function<void(doris::PlanFragmentExecutor)>), doris::FragmentMgr*&, std::shared_ptrdoris::FragmentExecState&, std::function<void(doris::PlanFragmentExecutor*)>&> (__t=@0x10e9b490: 0xcdffe00, __f=
@0x10e9b450: (void (doris::FragmentMgr::)(doris::FragmentMgr * const, std::shared_ptrdoris::FragmentExecState, std::function<void(doris::PlanFragmentExecutor)>)) 0x115a530 <doris::FragmentMgr::_exec_actual(std::shared_ptrdoris::FragmentExecState, std::function<void (doris::PlanFragmentExecutor*)>)>) at /opt/rh/devtoolset-8/root/usr/include/c++/8/ext/atomicity.h:96
#26 __invoke<void (doris::FragmentMgr::&)(std::shared_ptrdoris::FragmentExecState, std::function<void(doris::PlanFragmentExecutor)>), doris::FragmentMgr*&, std::shared_ptrdoris::FragmentExecState&, std::function<void(doris::PlanFragmentExecutor*)>&> (__fn=
@0x10e9b450: (void (doris::FragmentMgr::)(doris::FragmentMgr * const, std::shared_ptrdoris::FragmentExecState, std::function<void(doris::PlanFragmentExecutor)>)) 0x115a530 <doris::FragmentMgr::_exec_actual(std::shared_ptrdoris::FragmentExecState, std::function<void (doris::PlanFragmentExecutor*)>)>) at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/invoke.h:95
#27 __call<void, 0, 1, 2> (__args=..., this=0x10e9b450) at /opt/rh/devtoolset-8/root/usr/include/c++/8/functional:565
#28 operator()<> (this=0x10e9b450) at /opt/rh/devtoolset-8/root/usr/include/c++/8/functional:651
#29 std::_Function_handler<void (), std::_Bind_result<void, void (doris::FragmentMgr::(doris::FragmentMgr, std::shared_ptrdoris::FragmentExecState, std::function<void (doris::PlanFragmentExecutor*)>))(std::shared_ptrdoris::FragmentExecState, std::function<void (doris::PlanFragmentExecutor*)>)> >::_M_invoke(std::_Any_data const&) (__functor=...) at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/std_function.h:297
#30 0x00000000012a1f42 in operator() (this=0x11b59b98) at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/std_function.h:260
#31 run (this=0x11b59b90) at /data/tmp/incubator-doris-DORIS-0.13.15-release/be/src/util/threadpool.cpp:41
#32 doris::ThreadPool::dispatch_thread() () at /data/tmp/incubator-doris-DORIS-0.13.15-release/be/src/util/threadpool.cpp:545
#33 0x000000000129bc65 in operator() (this=0x6bd7ed8) at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/std_function.h:260
#34 doris::Thread::supervise_thread(void*) () at /data/tmp/incubator-doris-DORIS-0.13.15-release/be/src/util/thread.cpp:386
#35 0x00007f232b741dd5 in start_thread () from /lib64/libpthread.so.0
#36 0x00007f232ab4702d in clone () from /lib64/libc.so.6
(gdb) f 20
#20 0x00000000011cdcb9 in doris::PlanFragmentExecutor::get_next_internal(doris::RowBatch**) () at /data/tmp/incubator-doris-DORIS-0.13.15-release/be/src/runtime/plan_fragment_executor.cpp:476
476 /data/tmp/incubator-doris-DORIS-0.13.15-release/be/src/runtime/plan_fragment_executor.cpp: No such file or directory.
(gdb)
`

Compaction宕掉错误日志:
`
tcmalloc: large alloc 1413791744 bytes == 0x1539ac000 @ 0x26391f0 0x27a31a4 0x1088384 0x1737151 0x173b580 0x173bec0 0x173bf8f 0x170c3fa 0x1711134 0x10cbad9 0x1052f87 0x1053093 0x1053480 0x1036b9e 0x103a705 0x10268b7 0x1018a7f 0x1019d9c 0x101ac8b 0x1016dd2 0xfaac9f 0xf8859c 0x12a1f42 0x129bc65 0x7f43ec6a1dd5
tcmalloc: large alloc 2147483648 bytes == 0x1a7df8000 @ 0x26391f0 0x27a3994 0x27a3d3c 0x1202951 0x11b9be6 0x1142856 0x109e8be 0x173c06c 0x170c3fa 0x1711134 0x10cbad9 0x1052f87 0x1053093 0x1053480 0x1036b9e 0x103a705 0x10268b7 0x1018a7f 0x1019d9c 0x101ac8b 0x1016dd2 0xfaac9f 0xf8859c 0x12a1f42 0x129bc65 0x7f43ec6a1dd5
tcmalloc: large alloc 2153267200 bytes == 0x267556000 @ 0x26391f0 0x27a31a4 0x1088384 0x1737151 0x173b580 0x173bec0 0x173bf8f 0x170c3fa 0x1711134 0x10cbad9 0x1052f87 0x1053093 0x1053480 0x1036b9e 0x103a705 0x10268b7 0x1018a7f 0x1019d9c 0x101ac8b 0x1016dd2 0xfaac9f 0xf8859c 0x12a1f42 0x129bc65 0x7f43ec6a1dd5
tcmalloc: large alloc 4294967296 bytes == 0x3182a2000 @ 0x26391f0 0x27a3994 0x27a3d3c 0x1202951 0x11b9be6 0x1142856 0x109e8be 0x173c06c 0x170c3fa 0x1711134 0x10cbad9 0x1052f87 0x1053093 0x1053480 0x1036b9e 0x103a705 0x10268b7 0x1018a7f 0x1019d9c 0x101ac8b 0x1016dd2 0xfaac9f 0xf8859c 0x12a1f42 0x129bc65 0x7f43ec6a1dd5
I failed to find one of the right cookies. Found 16
terminate called after throwing an instance of 'std::runtime_error'
what(): failed alloc while reading
*** Aborted at 1621478894 (unix time) try "date -d @1621478894" if you are using GNU date ***
PC: @ 0x7f43eb9df2c7 __GI_raise
*** SIGABRT (@0x5756) received by PID 22358 (TID 0x7f4391d4e700) from PID 22358; stack trace: ***
@ 0x1bb6aa1 google::(anonymous namespace)::FailureSignalHandler()
@ 0x7f43ec6a95d0 (unknown)
@ 0x7f43eb9df2c7 __GI_raise
@ 0x7f43eb9e09b8 __GI_abort
@ 0xd1639e __gnu_cxx::__verbose_terminate_handler()
@ 0x2709126 __cxxabiv1::__terminate()
@ 0x2709161 std::terminate()
@ 0x2707f83 __cxa_throw
@ 0x1051f91 doris::AggregateFuncTraits<>::update()
@ 0x1035b17 doris::Reader::_agg_key_next_row()
@ 0x1026a33 doris::Merger::merge_rowsets()
@ 0x1018a7f doris::Compaction::do_compaction_impl()
@ 0x1019d9c doris::Compaction::do_compaction()
@ 0x101ac8b doris::CumulativeCompaction::execute_compact_impl()
@ 0x1016dd2 doris::Compaction::execute_compact()
@ 0xfaac9f doris::Tablet::execute_compaction()
@ 0xf8859c _ZNSt17_Function_handlerIFvvEZN5doris13StorageEngine35_compaction_tasks_producer_callbackEvEUlvE0_E9_M_invokeERKSt9_Any_data
@ 0x12a1f42 doris::ThreadPool::dispatch_thread()
@ 0x129bc65 doris::Thread::supervise_thread()
@ 0x7f43ec6a1dd5 start_thread
@ 0x7f43ebaa702d __clone

`

Compaction宕掉Core堆栈:
[New LWP 12100] [New LWP 12099] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by/usr/local/doris/be/lib/palo_be'.
Program terminated with signal 6, Aborted.
#0 0x00007f485474e2c7 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.17-260.el7_6.6.x86_64 libgcc-4.8.5-39.el7.x86_64 zlib-1.2.7-18.el7.x86_64
(gdb) bt
#0 0x00007f485474e2c7 in raise () from /lib64/libc.so.6
#1 0x00007f485474f9b8 in abort () from /lib64/libc.so.6
#2 0x0000000000d1639e in __gnu_cxx::__verbose_terminate_handler() [clone .cold.1] ()
#3 0x0000000002709126 in __cxxabiv1::__terminate(void ()()) ()
#4 0x0000000002709161 in std::terminate() ()
#5 0x0000000002707f83 in __cxa_throw ()
#6 0x0000000001051f91 in read (portable=true, buf=) at /data/tmp/incubator-doris-DORIS-0.13.15-release/thirdparty/installed/include/roaring/roaring.hh:455
#7 read (buf=) at /data/tmp/incubator-doris-DORIS-0.13.15-release/be/src/util/bitmap_value.h:636
#8 deserialize (src=, this=0x7f47fb34ae10) at /data/tmp/incubator-doris-DORIS-0.13.15-release/be/src/util/bitmap_value.h:1182
#9 deserialize (src=, this=0x7f47fb34ae10) at /data/tmp/incubator-doris-DORIS-0.13.15-release/be/src/util/bitmap_value.h:1165
#10 BitmapValue (src=, this=0x7f47fb34ae10) at /data/tmp/incubator-doris-DORIS-0.13.15-release/be/src/util/bitmap_value.h:969
#11 doris::AggregateFuncTraits<(doris::FieldAggregationMethod)7, (doris::FieldType)25>::update (dst=, src=..., mem_pool=) at /data/tmp/incubator-doris-DORIS-0.13.15-release/be/src/olap/aggregate_func.h:528
#12 0x0000000001035b17 in update (this=, mem_pool=0x0, src=..., dst=0x7f47fb34aed8) at /data/tmp/incubator-doris-DORIS-0.13.15-release/be/src/olap/aggregate_func.h:62
#13 agg_update (this=, mem_pool=0x0, src=..., dest=0x7f47fb34aed8) at /data/tmp/incubator-doris-DORIS-0.13.15-release/be/src/olap/field.h:72
#14 agg_update_row<doris::RowCursor, doris::RowCursor> (src=..., dst=0x7f47fb34afb0, cids=...) at /data/tmp/incubator-doris-DORIS-0.13.15-release/be/src/olap/row.h:185
#15 doris::Reader::_agg_key_next_row(doris::RowCursor
, doris::MemPool*, doris::ObjectPool*, bool*) () at /data/tmp/incubator-doris-DORIS-0.13.15-release/be/src/olap/reader.cpp:224
#16 0x0000000001026a33 in next_row_with_aggregation (eof=0x7f47fb34af5e, agg_pool=0x7f47fb34af90, mem_pool=0x25ad49d80, row_cursor=0x7f47fb34afb0, this=0x7f47fb34b0a0) at /data/tmp/incubator-doris-DORIS-0.13.15-release/be/src/olap/reader.h:98
#17 doris::Merger::merge_rowsets(std::shared_ptrdoris::Tablet, doris::ReaderType, std::vector<std::shared_ptrdoris::RowsetReader, std::allocator<std::shared_ptrdoris::RowsetReader > > const&, doris::RowsetWriter*, doris::Merger::Statistics*) ()
at /data/tmp/incubator-doris-DORIS-0.13.15-release/be/src/olap/merger.cpp:60
#18 0x0000000001018a7f in doris::Compaction::do_compaction_impl(long) () at /opt/rh/devtoolset-8/root/usr/include/c++/8/ext/atomicity.h:96
#19 0x0000000001019d9c in doris::Compaction::do_compaction (this=this@entry=0x8f285b550, permits=5) at /data/tmp/incubator-doris-DORIS-0.13.15-release/be/src/olap/compaction.cpp:58
#20 0x000000000101194c in doris::BaseCompaction::execute_compact_impl() () at /data/tmp/incubator-doris-DORIS-0.13.15-release/be/src/olap/base_compaction.cpp:69
#21 0x0000000001016dd2 in doris::Compaction::execute_compact (this=0x8f285b550) at /data/tmp/incubator-doris-DORIS-0.13.15-release/be/src/olap/compaction.cpp:47
#22 0x0000000000faa9a1 in doris::Tablet::execute_compaction(doris::CompactionType) () at /data/tmp/incubator-doris-DORIS-0.13.15-release/be/src/olap/tablet.cpp:1417
#23 0x0000000000f8859c in operator() (__closure=0xc5a9fc150) at /data/tmp/incubator-doris-DORIS-0.13.15-release/be/src/olap/olap_server.cpp:364
#24 std::_Function_handler<void (), doris::StorageEngine::_compaction_tasks_producer_callback()::{lambda()#2}>::_M_invoke(std::_Any_data const&) () at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/std_function.h:297
#25 0x00000000012a1f42 in operator() (this=0x25ad49698) at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/std_function.h:260
#26 run (this=0x25ad49690) at /data/tmp/incubator-doris-DORIS-0.13.15-release/be/src/util/threadpool.cpp:41
#27 doris::ThreadPool::dispatch_thread() () at /data/tmp/incubator-doris-DORIS-0.13.15-release/be/src/util/threadpool.cpp:545
#28 0x000000000129bc65 in operator() (this=0xedd09d8) at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/std_function.h:260
#29 doris::Thread::supervise_thread(void*) () at /data/tmp/incubator-doris-DORIS-0.13.15-release/be/src/util/thread.cpp:386
#30 0x00007f4855410dd5 in start_thread () from /lib64/libpthread.so.0
#31 0x00007f485481602d in clone () from /lib64/libc.so.6
`

@xqinghu xqinghu changed the title Bitmap过大导致Be挡掉 Bitmap过大导致Be宕掉 May 20, 2021
@xqinghu
Copy link
Author

xqinghu commented May 24, 2021

测试了下这个提交,问题没有解决,请重新打开这个问题 @morningman @stdpain

@stdpain
Copy link
Contributor

stdpain commented May 24, 2021

@xqinghu try execute command

addr2line -e lib/palo_be 0x26391f0 0x27a3994 0x27a3d3c 0x1202951 0x11b9be6 0x1142856 0x109e8be 0x173c06c 0x170c3fa 0x1711134 0x10cbad9 0x1052f87 0x1053093 0x1053480 0x1036b9e 0x103a705 0x10268b7 0x1018a7f 0x1019d9c 0x101ac8b 0x1016dd2 0xfaac9f 0xf8859c 0x12a1f42 0x129bc65 0x7f43ec6a1dd5 

@stdpain
Copy link
Contributor

stdpain commented May 24, 2021

I looked at the implementation of the CRoaring Bitmap, the memory will be inflated especially in the larger than int32, the probability is that this is the cause, and the probability of your data distribution between int32 ~ int64 is likely to have problems

Try to keep your values as close together as possible and you'll be fine, hash will make the values too loosely arranged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants