Skip to content

[Bug] doris be 自动退出 #28881

@haoyan19881215

Description

@haoyan19881215

Search before asking

  • I had searched in the issues and found no similar issues.

Version

2.0.2

What's Wrong?

there are 8 be in my doris cluster,but 2 be Automatic stop when they running for 1~2 hours。but I cannot find any exception in logs
be.out
start time: Fri Dec 22 09:47:07 CST 2023 INFO: java_cmd /opt/jdk1.8.0_191/bin/java INFO: jdk_version 8 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/apache-doris-2.0.2-bin-x64-noavx2/be/lib/java_extensions/preload-extensions/preload-extensions-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/apache-doris-2.0.2-bin-x64-noavx2/be/lib/java_extensions/java-udf/java-udf-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/apache-doris-2.0.2-bin-x64-noavx2/be/lib/hadoop_hdfs/common/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Reload4jLoggerFactory] Java HotSpot(TM) 64-Bit Server VM warning: You have loaded library /opt/apache-doris-2.0.2-bin-x64-noavx2/be/lib/hadoop_hdfs/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now. It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
be.INFO
I1222 11:57:49.784793 24196 olap_server.cpp:1064] cooldown producer get tablet num: 0 I1222 11:57:54.344010 24947 heartbeat_server.cpp:61] get heartbeat from FE.host:10.237.22.118, port:9020, cluster id:1914286766, counter:1561, BE start time: 1703209629783 I1222 11:57:57.641873 24283 task_worker_pool.cpp:1068] successfully report TASK|host=10.237.22.118|port=9020 I1222 11:58:09.578222 24038 load_channel_mgr.cpp:250] cleaning timed out load channels I1222 11:58:09.578332 24038 load_channel_mgr.cpp:282] load mem consumption(bytes). limit: 13420866764, current: 0, peak: 0, total running load channels: 0 I1222 11:58:09.785163 24196 olap_server.cpp:1064] cooldown producer get tablet num: 0 I1222 11:58:10.643011 24283 task_worker_pool.cpp:1068] successfully report TASK|host=10.237.22.118|port=9020 I1222 11:58:23.643994 24283 task_worker_pool.cpp:1068] successfully report TASK|host=10.237.22.118|port=9020 I1222 11:58:29.785368 24196 olap_server.cpp:1064] cooldown producer get tablet num: 0 I1222 11:58:31.761829 24285 tablet_manager.cpp:1016] find expired transactions for 0 tablets I1222 11:58:31.761873 24285 tablet_manager.cpp:1048] success to build all report tablets info. tablet_count=0 I1222 11:58:31.762550 24285 task_worker_pool.cpp:1068] successfully report TABLET|host=10.237.22.118|port=9020 I1222 11:58:36.864832 23487 daemon.cpp:397] doris start to exit I1222 11:58:38.206190 24284 data_dir.cpp:810] path: /opt/apache-doris-2.0.2-bin-x64-noavx2/be/storage total capacity: 1022174953472, available capacity: 1013264003072 I1222 11:58:38.206262 24284 storage_engine.cpp:383] get root path info cost: 0 ms. tablet counter: 0 I1222 11:58:38.207151 24284 task_worker_pool.cpp:1068] successfully report DISK|host=10.237.22.118|port=9020 I1222 11:58:38.644892 24283 task_worker_pool.cpp:1068] successfully report TASK|host=10.237.22.118|port=9020 I1222 11:58:39.867292 23487 server.cpp:1167] Server[doris::PInternalServiceImpl] is going to quit I1222 11:58:39.892066 24882 thrift_server.cpp:170] ThriftServer heartbeat exited I1222 11:58:39.892594 24289 thrift_server.cpp:170] ThriftServer backend exited I1222 11:58:39.892784 23487 storage_engine.cpp:546] begin stopping storage engine I1222 11:58:39.892843 24190 olap_server.cpp:364] try to perform path gc by rowsetid! I1222 11:58:39.893136 23487 storage_engine.cpp:566] start join garbage sweeper thread I1222 11:58:39.893158 23487 storage_engine.cpp:568] end join garbage sweeper thread I1222 11:58:39.893258 23487 storage_engine.cpp:588] end stopping storage engine I1222 11:58:40.893028 24284 data_dir.cpp:810] path: /opt/apache-doris-2.0.2-bin-x64-noavx2/be/storage total capacity: 1022174953472, available capacity: 1013264003072 I1222 11:58:40.893131 24284 storage_engine.cpp:383] get root path info cost: 0 ms. tablet counter: 0 I1222 11:58:40.893143 24285 tablet_manager.cpp:1016] find expired transactions for 0 tablets I1222 11:58:40.893206 24285 tablet_manager.cpp:1048] success to build all report tablets info. tablet_count=0 I1222 11:58:40.893572 24283 task_worker_pool.cpp:1068] successfully report TASK|host=10.237.22.118|port=9020 I1222 11:58:40.893594 24284 task_worker_pool.cpp:1068] successfully report DISK|host=10.237.22.118|port=9020 I1222 11:58:40.893640 24285 task_worker_pool.cpp:1068] successfully report TABLET|host=10.237.22.118|port=9020 I1222 11:58:40.898120 23487 task_scheduler.cpp:63] Start shutdown BlockedTaskScheduler I1222 11:58:40.898229 23800 task_scheduler.cpp:193] BlockedTaskScheduler schedule thread stop I1222 11:58:40.898555 23487 task_scheduler.cpp:63] Start shutdown BlockedTaskScheduler I1222 11:58:40.898597 23809 task_scheduler.cpp:193] BlockedTaskScheduler schedule thread stop I1222 11:58:40.899576 23810 fragment_mgr.cpp:1080] FragmentMgr cancel worker is going to exit. I1222 11:58:40.901443 23888 result_buffer_mgr.cpp:172] result buffer manager cancel thread finish. I1222 11:58:40.904731 23487 routine_load_task_executor.cpp:83] 0 not executed tasks left, cleanup I1222 11:58:40.921046 23487 olap_meta.cpp:68] [Rocksdb] [db/db_impl.cc:252] Shutdown: canceling all background work I1222 11:58:40.921134 23487 olap_meta.cpp:68] [Rocksdb] [db/db_impl.cc:252] Shutdown: canceling all background work I1222 11:58:40.921468 23487 olap_meta.cpp:68] [Rocksdb] [db/db_impl.cc:398] Shutdown complete I1222 11:58:40.921481 23487 olap_meta.cpp:106] finish close rocksdb for OlapMeta I1222 11:58:40.929057 23487 stream_load_recorder.cpp:56] finish close rocksdb for ~StreamLoadRecorder
be.WARNING
`
W1222 11:47:09.999689 23889 status.h:383] meet error status: [IO_ERROR]failed to list /opt/apache-doris-2.0.2-bin-x64-noavx2/be/storage/mini_download: (2), No such file or directory
0. /root/src/doris-2.0/be/src/common/stack_trace.cpp:302: StackTrace::tryCapture() @ 0x000000000b9e64c7 in /opt/apache-doris-2.0.2-bin-x64-noavx2/be/lib/doris_be

  1. /root/src/doris-2.0/be/src/common/stack_trace.h:0: doris::get_stack_traceabi:cxx11 @ 0x000000000b9e4ae5 in /opt/apache-doris-2.0.2-bin-x64-noavx2/be/lib/doris_be
  2. /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/unique_ptr.h:173: doris::Status doris::Status::Error<true, std::_cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::
    _cxx11::basic_string<char, std::char_traits, std::allocator > >(int, std::basic_string_view<char, std::char_traits >, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_stri
    ng<char, std::char_traits, std::allocator >&&) @ 0x000000000aecc168 in /opt/apache-doris-2.0.2-bin-x64-noavx2/be/lib/doris_be
  3. /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string.h:187: doris::io::LocalFileSystem::list_impl(std::filesystem::__cxx11::path const&, bool, std::vector<doris::io::FileInfo, std::allocator >, bool) @ 0x000000000aec6eac in /opt/apache-doris-2.0.2-bin-x64-noavx2/be/lib/doris_be
  4. /root/src/doris-2.0/be/src/common/status.h:348: doris::io::FileSystem::list(std::filesystem::__cxx11::path const&, bool, std::vector<doris::io::FileInfo, std::allocatordoris::io::FileInfo >, bool) @ 0x000000000aec0f6c in /opt/apache-doris-2.
    0.2-bin-x64-noavx2/be/lib/doris_be
  5. /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/unique_ptr.h:360: doris::LoadPathMgr::clean_one_path(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) @ 0x00000000
    0b83cd40 in /opt/apache-doris-2.0.2-bin-x64-noavx2/be/lib/doris_be
  6. /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_iterator.h:1034: std::_Function_handler<void (), doris::LoadPathMgr::init()::$_0>::_M_invoke(std::_Any_data const&) @ 0x000000000b83e218 in /opt/apac
    he-doris-2.0.2-bin-x64-noavx2/be/lib/doris_be
  7. /var/local/ldb-toolchain/bin/../usr/include/pthread.h:562: doris::Thread::supervise_thread(void*) @ 0x000000000ba1819a in /opt/apache-doris-2.0.2-bin-x64-noavx2/be/lib/doris_be
  8. start_thread @ 0x0000000000007dd5 in /usr/lib64/libpthread-2.17.so
  9. clone @ 0x00000000000fdead in /usr/lib64/libc-2.17.so
    W1222 11:47:09.999835 23889 file_system.cpp:72] [IO_ERROR]failed to list /opt/apache-doris-2.0.2-bin-x64-noavx2/be/storage/mini_download: (2), No such file or directory
  10. /root/src/doris-2.0/be/src/common/stack_trace.cpp:302: StackTrace::tryCapture() @ 0x000000000b9e64c7 in /opt/apache-doris-2.0.2-bin-x64-noavx2/be/lib/doris_be
  11. /root/src/doris-2.0/be/src/common/stack_trace.h:0: doris::get_stack_traceabi:cxx11 @ 0x000000000b9e4ae5 in /opt/apache-doris-2.0.2-bin-x64-noavx2/be/lib/doris_be
  12. /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/unique_ptr.h:173: doris::Status doris::Status::Error<true, std::_cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::
    _cxx11::basic_string<char, std::char_traits, std::allocator > >(int, std::basic_string_view<char, std::char_traits >, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_stri
    ng<char, std::char_traits, std::allocator >&&) @ 0x000000000aecc168 in /opt/apache-doris-2.0.2-bin-x64-noavx2/be/lib/doris_be
  13. /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string.h:187: doris::io::LocalFileSystem::list_impl(std::filesystem::__cxx11::path const&, bool, std::vector<doris::io::FileInfo, std::allocator >, bool) @ 0x000000000aec6eac in /opt/apache-doris-2.0.2-bin-x64-noavx2/be/lib/doris_be
  14. /root/src/doris-2.0/be/src/common/status.h:348: doris::io::FileSystem::list(std::filesystem::__cxx11::path const&, bool, std::vector<doris::io::FileInfo, std::allocatordoris::io::FileInfo >, bool) @ 0x000000000aec0f6c in /opt/apache-doris-2.
    0.2-bin-x64-noavx2/be/lib/doris_be
  15. /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/unique_ptr.h:360: doris::LoadPathMgr::clean_one_path(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) @ 0x00000000
    0b83cd40 in /opt/apache-doris-2.0.2-bin-x64-noavx2/be/lib/doris_be
  16. /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_iterator.h:1034: std::_Function_handler<void (), doris::LoadPathMgr::init()::$_0>::_M_invoke(std::_Any_data const&) @ 0x000000000b83e218 in /opt/apac
    he-doris-2.0.2-bin-x64-noavx2/be/lib/doris_be
  17. /var/local/ldb-toolchain/bin/../usr/include/pthread.h:562: doris::Thread::supervise_thread(void*) @ 0x000000000ba1819a in /opt/apache-doris-2.0.2-bin-x64-noavx2/be/lib/doris_be
  18. start_thread @ 0x0000000000007dd5 in /usr/lib64/libpthread-2.17.so
  19. clone @ 0x00000000000fdead in /usr/lib64/libc-2.17.so
    `
    I run command 'dmesg -T',there was no any information about doris.

What You Expected?

I want to know the reason and how to fix it

How to Reproduce?

No response

Anything Else?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions