[5.0] extend life of http_plugin_state
to ensure no invalid memory accesses during nodeos shutdown
#2042
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
During the destruction of
beast_http_session
,leap/plugins/http_plugin/include/eosio/http_plugin/beast_http_session.hpp
Lines 270 to 281 in 04774eb
there are accesses to
plugin_state_
which is simply a reference to an object that lives inhttp_plugin_impl
,leap/plugins/http_plugin/include/eosio/http_plugin/beast_http_session.hpp
Line 86 in 04774eb
Additionally, notice that
beast_http_session
'ssocket_
has been constructed with the io_context fromhttp_plugin_impl::plugin_state
'sthread_pool
. So destruction of thesocket_
will also require valid access to theplugin_state
object.But it is possible for a
beast_http_session
to outlivehttp_plugin
which causes these accesses during destruction to be invalid. Consider a situation where appbase's executor (priority_queue_executor
) has a queued callback holding the sole reference count of ashared_ptr<beast_http_session>
and appbase is commanded toquit()
. appbase'sexec()
upon exiting its run loop will then call appbase'sshutdown()
.application_base::shutdown()
will then effectively callhttp_plugin::plugin_shutdown()
(stopping http threads) and then destroyhttp_plugin
. However,priority_queue_executor
will still be keeping alive abeast_http_session
via some pending callback. As appbase continues to destruct eventuallypriority_queue_executor
is destroyed which destroys all pending callbacks. It's at that time the last reference to theshared_ptr<beast_http_session>
is removed causing it to be destroyed and expecting access tohttp_plugin_impl::plugin_state
.But wait! When appbase's io_context stops it first clears its pending queue before
application_base::shutdown()
, right?https://github.com/AntelopeIO/appbase/blob/b75b31e14f966fa3de6246e120dcba36c6ce5264/include/appbase/application_base.hpp#L148-L150
afaict this is racy: there is nothing to stop code in some other non-main thread (like http_plugin's threads!) from queueing more callbacks on to appbase's executor between the time
exec.clear()
is called and, say,http_plugin::shutdown()
is called which halts its threads.Not immediately seeing any good approaches to resolve this issue (especially in a patch release) beyond migrating
http_plugin_impl::plugin_state
to ashared_ptr
so that everybeast_http_session
can keep it alive as long as it needs to. Thesocket_
needinghttp_plugin_impl::plugin_state
'sthread_pool
to remain alive is seemingly what makes this non-negotiable without more significant refactoring.I am considering this resolving #2022 because while I can't get the exact fatal malloc error to print, the failure I can reproduce here is at the same point in time.
Was unable to reproduce the issue in 3.2 or 4.0. Curiously, those versions already do have a
std::shared_ptr<http_plugin_state>
so maybe that's why.