Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

valgrind inspired fixes #1693

Merged
merged 4 commits into from Apr 10, 2019

Conversation

Projects
4 participants
@crypto-ape
Copy link
Contributor

commented Apr 2, 2019

Hey Monkeys!

I had some troubles and run the witness_node with valgrind (the memcheck tool). As a result, I provide here some minor fixes.

Ape out!

@oxarbitrage

This comment has been minimized.

Copy link
Member

commented Apr 2, 2019

very nice, i wanted to run valgrind a while back and was unsuccessful. care to explain a bit the errors you were having and how the changes fixes them. thanks.

@oxarbitrage

This comment has been minimized.

Copy link
Member

commented Apr 2, 2019

Related bitshares/bitshares-fc#118

Some valgrind output added into this pull requests will help. I also think we might be able to get some better or at least different results profiling than with the gprof.

A wiki on how you are doing this will be very helpful IMHO.

@crypto-ape

This comment has been minimized.

Copy link
Contributor Author

commented Apr 2, 2019

fix referencing local stack variable in async thread

Valgrind reported both invalid reads and invalid writes with traces like this:

Invalid write of size 8
   at 0x3267A14: graphene::net::detail::statistics_gathering_node_delegate_wrapper::call_statistics_collector::starting_execution() (node_impl.hxx:133)
   by 0x32670D7: graphene::net::detail::statistics_gathering_node_delegate_wrapper::call_statistics_collector::actual_execution_measurement_helper::actual_execution_measurement_helper(graphene::net::detail::statistics_gathering_node_delegate_wrapper::call_statistics_collector&) (node_impl.hxx:92)
   by 0x324E5F9: graphene::net::detail::statistics_gathering_node_delegate_wrapper::handle_block(graphene::net::block_message const&, bool, std::vector<fc::ripemd160, std::allocator<fc::ripemd160> >&)::{lambda()#1}::operator()() const (node.cpp:4999)
   by 0x32622F8: fc::detail::functor_run<graphene::net::detail::statistics_gathering_node_delegate_wrapper::handle_block(graphene::net::block_message const&, bool, std::vector<fc::ripemd160, std::allocator<fc::ripemd160> >&)::{lambda()#1}>::run(void*, fc::detail::functor_run<graphene::net::detail::statistics_gathering_node_delegate_wrapper::handle_block(graphene::net::block_message const&, bool, std::vector<fc::ripemd160, std::allocator<fc::ripemd160> >&)::{lambda()#1}>) (task.hpp:77)
   by 0x301BC0B: fc::task_base::run_impl() (task.cpp:43)
   by 0x301BB8B: fc::task_base::run() (task.cpp:32)
   by 0x300E85D: fc::thread_d::run_next_task() (thread_d.hpp:513)
   by 0x300EBC2: fc::thread_d::process_tasks() (thread_d.hpp:562)
   by 0x300E30A: fc::thread_d::start_process_tasks(boost::context::detail::transfer_t) (thread_d.hpp:493)
   by 0x33CB9AE: make_fcontext (in /home/crypto-ape/work/valgrind/witness_node)
 Address 0x2b6447e8 is 2,095,016 bytes inside a block of size 2,097,152 alloc'd
   at 0x5022B0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)

The original code created a local variable on stack and deferred a lambda function reference-capturing the local variable into an asynchronous thread. At the time the lambda function was called, it used already invalidated memory. What is worse, it could rewrite a stack in use.

I have put the object into a shared_ptr and capture it in the lambda function by copy. After the last commit I have tested it again and compared the logs from valgrind. Definitely fixed.

explicitly cleanup external library facilities

Various memory leaks are detected with malloc being called from curl_easy_init. One example:

256 bytes in 1 blocks are indirectly lost in loss record 41 of 67
   at 0x5022B0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
   by 0x5879171: ??? (in /usr/lib/x86_64-linux-gnu/libcurl.so.4.5.0)
   by 0x5886143: curl_easy_init (in /usr/lib/x86_64-linux-gnu/libcurl.so.4.5.0)
   by 0x2A5699A: graphene::elasticsearch::detail::elasticsearch_plugin_impl::elasticsearch_plugin_impl(graphene::elasticsearch::elasticsearch_plugin&) (elasticsearch_plugin.cpp:42)
   by 0x2A51150: graphene::elasticsearch::elasticsearch_plugin::elasticsearch_plugin() (elasticsearch_plugin.cpp:401)
   by 0x223F9E7: void __gnu_cxx::new_allocator<graphene::elasticsearch::elasticsearch_plugin>::construct<graphene::elasticsearch::elasticsearch_plugin>(graphene::elasticsearch::elasticsearch_plugin*) (new_allocator.h:136)
   by 0x223E87A: void std::allocator_traits<std::allocator<graphene::elasticsearch::elasticsearch_plugin> >::construct<graphene::elasticsearch::elasticsearch_plugin>(std::allocator<graphene::elasticsearch::elasticsearch_plugin>&, graphene::elasticsearch::elasticsearch_plugin*) (alloc_traits.h:475)
   by 0x223D05D: std::_Sp_counted_ptr_inplace<graphene::elasticsearch::elasticsearch_plugin, std::allocator<graphene::elasticsearch::elasticsearch_plugin>, (__gnu_cxx::_Lock_policy)2>::_Sp_counted_ptr_inplace<>(std::allocator<graphene::elasticsearch::elasticsearch_plugin>) (shared_ptr_base.h:526)
   by 0x223B08C: std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count<graphene::elasticsearch::elasticsearch_plugin, std::allocator<graphene::elasticsearch::elasticsearch_plugin>>(std::_Sp_make_shared_tag, graphene::elasticsearch::elasticsearch_plugin*, std::allocator<graphene::elasticsearch::elasticsearch_plugin> const&) (shared_ptr_base.h:637)
   by 0x2239266: std::__shared_ptr<graphene::elasticsearch::elasticsearch_plugin, (__gnu_cxx::_Lock_policy)2>::__shared_ptr<std::allocator<graphene::elasticsearch::elasticsearch_plugin>>(std::_Sp_make_shared_tag, std::allocator<graphene::elasticsearch::elasticsearch_plugin> const&) (shared_ptr_base.h:1295)
   by 0x22374B5: std::shared_ptr<graphene::elasticsearch::elasticsearch_plugin>::shared_ptr<std::allocator<graphene::elasticsearch::elasticsearch_plugin>>(std::_Sp_make_shared_tag, std::allocator<graphene::elasticsearch::elasticsearch_plugin> const&) (shared_ptr.h:344)
   by 0x2234EEB: std::shared_ptr<graphene::elasticsearch::elasticsearch_plugin> std::allocate_shared<graphene::elasticsearch::elasticsearch_plugin, std::allocator<graphene::elasticsearch::elasticsearch_plugin>>(std::allocator<graphene::elasticsearch::elasticsearch_plugin> const&) (shared_ptr.h:691)

The proposed proper cleanup shrinks the number of the reported memory leaks, yet some of them remain.

@crypto-ape

This comment has been minimized.

Copy link
Contributor Author

commented Apr 2, 2019

I will comment the commits in bitshares/bitshares-fc#118 later, probably tomorrow.

@ryanRfox

This comment has been minimized.

Copy link
Member

commented Apr 2, 2019

Does this seem like it can make it into 3.1.0? If yes, please add to the Project Board and Milestone, else to the Backlog. I don't want to loose track of it.

@oxarbitrage oxarbitrage added this to In development in Feature Release (3.1.0) via automation Apr 2, 2019

@oxarbitrage oxarbitrage added this to the 3.1.0 - Feature Release milestone Apr 2, 2019

@oxarbitrage

This comment has been minimized.

Copy link
Member

commented Apr 2, 2019

I will comment the commits in bitshares/bitshares-fc#118 later, probably tomorrow.

Thank you.

Does this seem like it can make it into 3.1.0? If yes, please add to the Project Board and Milestone, else to the Backlog. I don't want to loose track of it.

I think is possible, Added.

@@ -4919,7 +4920,7 @@ namespace graphene { namespace net { namespace detail {
return _node_delegate->method_name(__VA_ARGS__); \
} \
else \
return _thread->async([&](){ \
return _thread->async([&, statistics_collector](){ \

This comment has been minimized.

Copy link
@pmconrad

pmconrad Apr 3, 2019

Contributor

Please remove the & and capture this instead. Same below.

This comment has been minimized.

Copy link
@crypto-ape

crypto-ape Apr 3, 2019

Author Contributor

I am sorry, but this won't work.

Strictly speaking, it would be enough to capture by copy only this->_node_delegate. But omitting the implicit by-reference capture (i.e. &) the __VA_ARGS__ won't be captured.

Well, the simplest solution is to use the implicit capture by copy (i.e. =), which won't compile because the functions used with the macro consume parameters as references (see the interface class node_delegate), at least one uses such parameter as an additional output parameter.

Furthermore, it is not easily seen that the original caller provides these references as of non-stack allocated entities. Thus, to be completely bug-proof, all of these parameters shall be passed in shared_ptr.

Now, from the valgrind output I have produced, it does not seem this is the case. I suppose the proposed modification to be accepted as is (it provably solves an existing problem), and in the case a strictly complete solution is deemed required, a specific issue to be opened for the required refactor.

This comment has been minimized.

Copy link
@pmconrad

pmconrad Apr 4, 2019

Contributor

Missed the VA_ARGS, sorry.

handle_message, for example, uses a stack-allocated message (originating from message_oriented_connection_impl::read_loop). Normally this lives longer than the handle_message call should take, however, it is re-used when reading the next message (which could lead to data corruption), and it is destroyed when reading fails (e. g. when the connection breaks down). This could even be related to #1256 @jmjatlanta ?

I agree that this should be handled in another ticket.

@pmconrad

This comment has been minimized.

Copy link
Contributor

commented Apr 3, 2019

Nice catch,thanks!

@pmconrad pmconrad merged commit 421a2dd into bitshares:develop Apr 10, 2019

2 checks passed

ci/dockercloud Your tests passed in Docker Cloud
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

Feature Release (3.1.0) automation moved this from In development to Done Apr 10, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.