Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
crimson/net: allow a Message to be resent.
Recetly the `crimson::mon::Client` has started resending messages when session is being reopened which may happen during recovery from a network issue. A crash documented below has been observed in teuthology testing. It looks the fixes in `mon::Client` unveiled a problem in the messenger -- it assumes that a `Message` instance shall not be sent twice. This stays in conflict with the behaviour of `mon::Client` about e.g. `MMonCommand`. ``` INFO 2021-03-02 14:29:01,192 [shard 0] monc - handle_mon_map: renewed tickets DEBUG 2021-03-02 14:29:01,192 [shard 0] ms - [osd.2(client) v2:172.21.15.57:6804/34494@55832 >> mon.0 v2:172.21.15.57:3300/0] <== #2 === auth_reply(proto 2 0 (0) Success) v1 (18) INFO 2021-03-02 14:29:01,192 [shard 0] monc - handle_auth_reply mon v2:172.21.15.57:6804/34494 => v2:172.21.15.57:3300/0 returns auth_reply(proto 2 0 (0) Success) v1: 0 INFO 2021-03-02 14:29:01,192 [shard 0] monc - handle_auth_reply INFO 2021-03-02 14:29:01,192 [shard 0] monc - do_auth_single: mon v2:172.21.15.57:6804/34494 => v2:172.21.15.57:3300/0 returns auth_reply(proto 2 0 (0) Success) v1: 0 ERROR 2021-03-02 14:29:01,192 [shard 0] none - /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-1345-g1dc78fd5/rpm/el8/BUILD/ceph-17.0.0-1345-g1dc78fd5/src/crimson/net/ProtocolV2.cc:1828 : In function 'crimson::net::ProtocolV2::do_sweep_messages(const std::deque<boost::intrusive_ptr<Message> >&, size_t, bool, std::optional<utime_t>, bool)::<lambda(const MessageRef&)>', ceph_assert(%s) !msg->get_seq() && "message already has seq" Aborting on shard 0. Backtrace: 0x00000000013c2bbc 0x0000000001384d10 0x0000000001385012 0x00000000013850d2 /lib64/libpthread.so.0+0x0000000000012b1f /lib64/libc.so.6+0x00000000000377fe /lib64/libc.so.6+0x0000000000021c34 0x00000000005e2e98 0x00000000005e2ee0 0x0000000000dfb215 0x0000000000def167 0x0000000000df0854 0x0000000000df100e 0x0000000000df165c 0x0000000000de97e9 0x0000000000d8c3fd 0x0000000000d8c5bf 0x0000000000d85ba5 0x0000000001381237 0x00000000013815a2 0x00000000013ae735 0x000000000134b1d7 0x0000000000661de5 /lib64/libc.so.6+0x00000000000237b2 0x00000000006b256d daemon-helper: command crashed with signal 6 ``` Processing the backtrace with thes`seastar-addr2line` confirms the hypothesis. ``` [Backtrace ceph#8] ceph::__ceph_assert_fail(ceph::assert_data const&) at /usr/src/debug/ceph-17.0.0-1345.g1dc78fd5.el8.x86_64/src/crimson/common/assert.cc:14 [Backtrace ceph#9] operator() at /usr/src/debug/ceph-17.0.0-1345.g1dc78fd5.el8.x86_64/src/crimson/net/ProtocolV2.cc:1828 (inlined by) ?? at /opt/rh/gcc-toolset-9/root/usr/include/c++/9/bits/stl_algo.h:3876 (inlined by) crimson::net::ProtocolV2::do_sweep_messages(std::deque<boost::intrusive_ptr<Message>, std::allocator<boost::intrusive_ptr<Message> > > const&, unsigned lo ng, bool, std::optional<utime_t>, bool) at /usr/src/debug/ceph-17.0.0-1345.g1dc78fd5.el8.x86_64/src/crimson/net/ProtocolV2.cc:1848 [Backtrace ceph#10] operator() at /usr/src/debug/ceph-17.0.0-1345.g1dc78fd5.el8.x86_64/src/crimson/net/Protocol.cc:235 [Backtrace ceph#11] crimson::net::Protocol::do_write_dispatch_sweep() at /usr/src/debug/ceph-17.0.0-1345.g1dc78fd5.el8.x86_64/src/seastar/include/seastar/core/future.hh:2135 (inlined by) ?? at /usr/src/debug/ceph-17.0.0-1345.g1dc78fd5.el8.x86_64/src/seastar/include/seastar/core/loop.hh:118 (inlined by) crimson::net::Protocol::do_write_dispatch_sweep() at /usr/src/debug/ceph-17.0.0-1345.g1dc78fd5.el8.x86_64/src/crimson/net/Protocol.cc:217 [Backtrace ceph#12] operator() at /usr/src/debug/ceph-17.0.0-1345.g1dc78fd5.el8.x86_64/src/crimson/net/Protocol.cc:312 (inlined by) ?? at /usr/src/debug/ceph-17.0.0-1345.g1dc78fd5.el8.x86_64/src/seastar/include/seastar/core/future.hh:2135 (inlined by) ?? at /usr/src/debug/ceph-17.0.0-1345.g1dc78fd5.el8.x86_64/src/seastar/include/seastar/core/future.hh:2166 (inlined by) ?? at /usr/src/debug/ceph-17.0.0-1345.g1dc78fd5.el8.x86_64/src/seastar/include/seastar/core/gate.hh:126 (inlined by) ?? at /usr/src/debug/ceph-17.0.0-1345.g1dc78fd5.el8.x86_64/src/seastar/include/seastar/core/gate.hh:144 (inlined by) ?? at /usr/src/debug/ceph-17.0.0-1345.g1dc78fd5.el8.x86_64/src/crimson/common/gated.h:38 (inlined by) ?? at /usr/src/debug/ceph-17.0.0-1345.g1dc78fd5.el8.x86_64/src/crimson/common/gated.h:23 (inlined by) ?? at /usr/src/debug/ceph-17.0.0-1345.g1dc78fd5.el8.x86_64/src/crimson/net/Protocol.cc:311 (inlined by) crimson::net::Protocol::write_event() at /usr/src/debug/ceph-17.0.0-1345.g1dc78fd5.el8.x86_64/src/crimson/net/Protocol.cc:298 [Backtrace ceph#13] crimson::net::Protocol::send(boost::intrusive_ptr<Message>) at /usr/src/debug/ceph-17.0.0-1345.g1dc78fd5.el8.x86_64/src/crimson/net/Protocol.cc:97 [Backtrace ceph#14] crimson::net::SocketConnection::send(boost::intrusive_ptr<Message>) at /usr/src/debug/ceph-17.0.0-1345.g1dc78fd5.el8.x86_64/src/crimson/net/SocketConnection.cc:75 [Backtrace ceph#15] crimson::mon::Client::send_message(boost::intrusive_ptr<Message>) at /usr/src/debug/ceph-17.0.0-1345.g1dc78fd5.el8.x86_64/src/crimson/mon/MonClient.cc:1024 [Backtrace ceph#16] operator()<crimson::mon::Client::mon_command_t> at /usr/src/debug/ceph-17.0.0-1345.g1dc78fd5.el8.x86_64/src/crimson/mon/MonClient.cc:1045 (inlined by) ?? at /usr/src/debug/ceph-17.0.0-1345.g1dc78fd5.el8.x86_64/src/seastar/include/seastar/core/future.hh:2135 (inlined by) ?? at /usr/src/debug/ceph-17.0.0-1345.g1dc78fd5.el8.x86_64/src/seastar/include/seastar/core/future.hh:2166 (inlined by) ?? at /usr/src/debug/ceph-17.0.0-1345.g1dc78fd5.el8.x86_64/src/seastar/include/seastar/core/loop.hh:549 (inlined by) parallel_for_each_impl<std::vector<crimson::mon::Client::mon_command_t>&, crimson::mon::Client::on_session_opened()::<lambda()>::<lambda(auto:82&)> > at $ usr/src/debug/ceph-17.0.0-1345.g1dc78fd5.el8.x86_64/src/seastar/include/seastar/core/loop.hh:594 ``` In classical OSD the `Message` serialization code is free from the assertion: ```cpp ssize_t ProtocolV2::write_message(Message *m, bool more) { FUNCTRACE(cct); ceph_assert(connection->center->in_thread()); m->set_seq(++out_seq); // ... } ``` Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
- Loading branch information