Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[5.0] extend life of http_plugin_state to ensure no invalid memory accesses during nodeos shutdown #2042

Merged
merged 2 commits into from
Jan 16, 2024

Conversation

spoonincode
Copy link
Member

During the destruction of beast_http_session,

virtual ~beast_http_session() {
is_send_exception_response_ = false;
plugin_state_.requests_in_flight -= 1;
if(plugin_state_.get_logger().is_enabled(fc::log_level::all)) {
auto session_time = steady_clock::now() - session_begin_;
auto session_time_us = std::chrono::duration_cast<std::chrono::microseconds>(session_time).count();
plugin_state_.get_logger().log(FC_LOG_MESSAGE(all, "session time ${t}", ("t", session_time_us)));
plugin_state_.get_logger().log(FC_LOG_MESSAGE(all, " read ${t}", ("t", read_time_us_)));
plugin_state_.get_logger().log(FC_LOG_MESSAGE(all, " handle ${t}", ("t", handle_time_us_)));
plugin_state_.get_logger().log(FC_LOG_MESSAGE(all, " write ${t}", ("t", write_time_us_)));
}
}

there are accesses to plugin_state_ which is simply a reference to an object that lives in http_plugin_impl,

Additionally, notice that beast_http_session's socket_ has been constructed with the io_context from http_plugin_impl::plugin_state's thread_pool. So destruction of the socket_ will also require valid access to the plugin_state object.

But it is possible for a beast_http_session to outlive http_plugin which causes these accesses during destruction to be invalid. Consider a situation where appbase's executor (priority_queue_executor) has a queued callback holding the sole reference count of a shared_ptr<beast_http_session> and appbase is commanded to quit(). appbase's exec() upon exiting its run loop will then call appbase's shutdown(). application_base::shutdown() will then effectively call http_plugin::plugin_shutdown() (stopping http threads) and then destroy http_plugin. However, priority_queue_executor will still be keeping alive a beast_http_session via some pending callback. As appbase continues to destruct eventually priority_queue_executor is destroyed which destroys all pending callbacks. It's at that time the last reference to the shared_ptr<beast_http_session> is removed causing it to be destroyed and expecting access to http_plugin_impl::plugin_state.

But wait! When appbase's io_context stops it first clears its pending queue before application_base::shutdown(), right?
https://github.com/AntelopeIO/appbase/blob/b75b31e14f966fa3de6246e120dcba36c6ce5264/include/appbase/application_base.hpp#L148-L150
afaict this is racy: there is nothing to stop code in some other non-main thread (like http_plugin's threads!) from queueing more callbacks on to appbase's executor between the time exec.clear() is called and, say, http_plugin::shutdown() is called which halts its threads.

Not immediately seeing any good approaches to resolve this issue (especially in a patch release) beyond migrating http_plugin_impl::plugin_state to a shared_ptr so that every beast_http_session can keep it alive as long as it needs to. The socket_ needing http_plugin_impl::plugin_state's thread_pool to remain alive is seemingly what makes this non-negotiable without more significant refactoring.

I am considering this resolving #2022 because while I can't get the exact fatal malloc error to print, the failure I can reproduce here is at the same point in time.

Was unable to reproduce the issue in 3.2 or 4.0. Curiously, those versions already do have a std::shared_ptr<http_plugin_state> so maybe that's why.

Copy link
Member

@linh2931 linh2931 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like your detailed analysis.

@ericpassmore
Copy link
Contributor

Note:start
group: STABILITY
category: INTERNALS
summary: Extend life of http_plugin_state to ensure no invalid memory accesses during nodeos shutdown.
Note:end

@ericpassmore
Copy link
Contributor

Note:start
group: STABILITY
category: BUG
summary: Prevent error on shutdown by extending the life of http_plugin_state thereby ensuring no invalid memory accesses.
Note:end

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5.0.0-rc3 WAX: Error on shutdown: corrupted size vs. prev_size
5 participants