Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[4.0] hardening resource monitor manager plugin shutdown handling #1774

Merged
merged 5 commits into from
Oct 16, 2023

Conversation

linh2931
Copy link
Member

#1485 reports occasional resource manager plugin test failures. The failure is shown as
/__w/leap/leap/plugins/resource_monitor_plugin/test/test_resmon_plugin.cpp(147): fatal error: in "resmon_plugin_tests/startupNormal": unexpected exception thrown by plugin_startup({"/tmp"})
without any other additional information.

Further investigation reveals the scheduled monitor timer task is not cancelled during plugin shutdown and currently monitor task might be still running before the thread exits. This PR hardens the plugin shutdown process.

Resolves #1485

Those tests were intended to verify the duration of space_monitor_loop.
In essence they tested Boost's expires_from_noa, which was not
necessary.
The tests themselves were hacky and took uncessary 50 seconds.
* removes unnecessary check of info logging "Creating and st
arting monitor thread"
* reduces time to wait for nodeos startup from 120 seconds to 10 seconds
boost::system::error_code ec;
timer.cancel(ec);
}
thread_pool.stop();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once stop() returns it's impossible for timer to be accessed from the thread_pool, so I'm not immediately seeing a need this mutex: just stop the thread_pool and let timer be dtor'ed on main thread as it is already.

If you do that, I also suspect you don't even need

if ( ec != boost::asio::error::operation_aborted ) { // not cancelled

because once the thread_pool is stopped no callbacks will be run, ergo the cancellation callback of the timer in its dtor will never be run.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your insight. I updated the code.

* no need to call cancel timer explicitly.
* no need to use mutex for timer as when timer is used on the main
  thread, resource monitor thread has already stopped.
@linh2931 linh2931 merged commit 79fc096 into release/4.0 Oct 16, 2023
16 checks passed
@linh2931 linh2931 deleted the resmon_plugin_shutdown_hardening branch October 16, 2023 21:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants