-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[Fix](hdfs-writer) Fix hdfs file writer core with check failed: _ref_cnt == 0 in dtor of HdfsFileWriter.
#33959
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Fix](hdfs-writer) Fix hdfs file writer core with check failed: _ref_cnt == 0 in dtor of HdfsFileWriter.
#33959
Conversation
…_cnt == 0` in dctor of `HdfsFileWriter`.
|
Thank you for your contribution to Apache Doris. Since 2024-03-18, the Document has been moved to doris-website. |
|
run buildall |
|
clang-tidy review says "All clean, LGTM! 👍" |
|
TeamCity be ut coverage result: |
TPC-H: Total hot run time: 38297 ms |
TPC-DS: Total hot run time: 185925 ms |
ClickBench: Total hot run time: 30.02 s |
| if (!handle->invalid()) { | ||
| handle->inc_ref(); | ||
| *fs_handle = handle; | ||
| *fs_handle = std::move(handle); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we still have to deal with inc_ref() even with shared_ptr support?
can we integrate the process (inc_ref, dec_ref) into shared_ptr ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is for fs handle cache.
be/src/io/fs/hdfs_file_writer.cpp
Outdated
| HdfsFileWriter::HdfsFileWriter(Path path, std::shared_ptr<HdfsHandler> handler, hdfsFile hdfs_file, | ||
| std::string fs_name, const FileWriterOptions* opts) | ||
| : _path(std::move(path)), | ||
| _hdfs_handler(handler), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| _hdfs_handler(handler), | |
| _hdfs_handler(std::move(handler)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto:
return std::make_unique<HdfsFileWriter>(std::move(path), std::move(handler), hdfs_file, fs_name, opts);
| // do not use std::shared_ptr or std::unique_ptr | ||
| // _fs_handle is managed by HdfsFileSystemCache | ||
| HdfsHandler* _fs_handle = nullptr; | ||
| std::shared_ptr<HdfsHandler> _fs_handle = nullptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| std::shared_ptr<HdfsHandler> _fs_handle = nullptr; | |
| std::shared_ptr<HdfsHandler> _fs_handle; |
可以不用显式写 = nullptr,智能指针默认初始化都是 nullptr
be/src/io/hdfs_util.cpp
Outdated
| } | ||
| if (_cache.size() < MAX_CACHE_HANDLE) { | ||
| std::unique_ptr<HdfsHandler> handle = std::make_unique<HdfsHandler>(hdfs_fs, true); | ||
| std::shared_ptr<HdfsHandler> handle = std::make_shared<HdfsHandler>(hdfs_fs, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| std::shared_ptr<HdfsHandler> handle = std::make_shared<HdfsHandler>(hdfs_fs, true); | |
| auto handle = std::make_shared<HdfsHandler>(hdfs_fs, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
| _last_access_time = std::chrono::duration_cast<std::chrono::milliseconds>( | ||
| std::chrono::system_clock::now().time_since_epoch()) | ||
| .count(); | ||
| void update_last_access_time() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: method 'update_last_access_time' can be made const [readability-make-member-function-const]
| void update_last_access_time() { | |
| void update_last_access_time() const { |
60e60ff to
ffc4826
Compare
ffc4826 to
c95dd69
Compare
|
run buildall |
|
TeamCity be ut coverage result: |
morningman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
…_cnt == 0` in dtor of `HdfsFileWriter`. (#33959) ## Issue: ``` F20240421 17:14:37.494115 184986 hdfs_util.h:65] Check failed: _ref_cnt == 0 *** Check failure stack trace: *** F20240421 17:14:37.505879 185108 hdfs_util.h:65] Check failed: _ref_cnt == 0 *** Check failure stack trace: *** @ 0x556f5236d316 google::LogMessageFatal::~LogMessageFatal() @ 0x556f5236d316 google::LogMessageFatal::~LogMessageFatal() @ 0x556f2830e200 doris::io::HdfsFileWriter::~HdfsFileWriter() @ 0x556f2830e200 doris::io::HdfsFileWriter::~HdfsFileWriter() @ 0x556f2830e21e doris::io::HdfsFileWriter::~HdfsFileWriter() @ 0x556f2830e21e doris::io::HdfsFileWriter::~HdfsFileWriter() @ 0x556f507893b0 doris::vectorized::VHivePartitionWriter::~VHivePartitionWriter() @ 0x556f507893b0 doris::vectorized::VHivePartitionWriter::~VHivePartitionWriter() @ 0x556f506c005e std::_Hashtable<>::clear() @ 0x556f506c005e std::_Hashtable<>::clear() @ 0x556f50780f4f doris::vectorized::VHiveTableWriter::close() @ 0x556f50780f4f doris::vectorized::VHiveTableWriter::close() @ 0x556f5072bc4f doris::vectorized::AsyncResultWriter::process_block() @ 0x556f5072bc4f doris::vectorized::AsyncResultWriter::process_block() @ 0x556f5072cd01 std::_Function_handler<>::_M_invoke() @ 0x556f5072cd01 std::_Function_handler<>::_M_invoke() @ 0x556f2b02b73d doris::ThreadPool::dispatch_thread() @ 0x556f2b02b73d doris::ThreadPool::dispatch_thread() @ 0x556f2b008d59 doris::Thread::supervise_thread() @ 0x556f2b008d59 doris::Thread::supervise_thread() @ 0x7f2c2bfb4609 start_thread @ 0x7f2c2bfb4609 start_thread @ 0x7f2c2c261133 clone @ 0x7f2c2c261133 clone @ (nil) (unknown) *** Query id: ac4f457c003d4489-b04ac56ef05b12f0 *** *** is nereids: 1 *** *** tablet id: 0 *** *** Aborted at 1713690877 (unix time) try "date -d @1713690877" if you are using GNU date *** *** Current BE git commitID: e6f4a2f *** *** SIGABRT unknown detail explain (@0x38a6) received by PID 14502 (TID 184986 OR 0x7f21d0614700) from PID 14502; stack trace: *** @ (nil) (unknown) F20240421 17:14:37.505879 185108 hdfs_util.h:65] Check failed: _ref_cnt == 0 F20240421 17:14:37.887202 185110 hdfs_util.h:65] Check failed: _ref_cnt == 0 *** Check failure stack trace: *** @ 0x556f5236d316 google::LogMessageFatal::~LogMessageFatal() @ 0x556f2830e200 doris::io::HdfsFileWriter::~HdfsFileWriter() @ 0x556f2830e21e doris::io::HdfsFileWriter::~HdfsFileWriter() @ 0x556f507893b0 doris::vectorized::VHivePartitionWriter::~VHivePartitionWriter() 0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /root/doris/be/src/common/signal_handler.h:421 ``` The root cause is When it cannot be processed in the cache (such as when the cache is full), we will create a `fs_handler`, and the life cycle of `fs_handler` is managed by caller. We have separated the hdfs writer and can create `fs_handler` separately, so `HdfsFileSystem` and `HdfsFileWriter` may be callers. In `HdfsFileSystem`, the `fs_handler` of `HdfsFileWriter` is shared, so it needs to be changed to `shared_ptr`. ### Solution Fix hdfs file writer core with `check failed: _ref_cnt == 0` in dtor of `HdfsFileWriter`. Change `fs_handler` ptr to `shared_ptr` and remove ref count operations.
Proposed changes
Issue:
The root cause is
When it cannot be processed in the cache (such as when the cache is full), we will create a
fs_handler, and the life cycle offs_handleris managed by caller. We have separated the hdfs writer and can createfs_handlerseparately, soHdfsFileSystemandHdfsFileWritermay be callers. InHdfsFileSystem, thefs_handlerofHdfsFileWriteris shared, so it needs to be changed toshared_ptr.Solution
Fix hdfs file writer core with
check failed: _ref_cnt == 0in dtor ofHdfsFileWriter.Change
fs_handlerptr toshared_ptrand remove ref count operations.Further comments
If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...