New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
msg/async/rdma: Introduce RDMAConnMgr + Debug prints #14201
Conversation
@yuyuyu101 @amirv @orendu @saritz @DanielBar-On patchset number 4 |
notify_fd = dispatcher->register_qp(qp, this); | ||
dispatcher->perf_logger->inc(l_msgr_rdma_created_queue_pair); | ||
dispatcher->perf_logger->inc(l_msgr_rdma_active_queue_pair); | ||
} | ||
|
||
RDMAConnTCP::~RDMAConnTCP() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be RDMAConnTCP.cc
@@ -88,7 +103,7 @@ void RDMAConnectedSocketImpl::get_wc(std::vector<ibv_wc> &w) | |||
w.swap(wc); | |||
} | |||
|
|||
int RDMAConnectedSocketImpl::activate() | |||
int RDMAConnTCP::activate() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitto
submit(false); | ||
} | ||
active = true; | ||
|
||
return 0; | ||
} | ||
|
||
int RDMAConnectedSocketImpl::try_connect(const entity_addr_t& peer_addr, const SocketOptions &opts) { | ||
int RDMAConnTCP::try_connect(const entity_addr_t& peer_addr, const SocketOptions &opts) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitto
void RDMAConnectedSocketImpl::handle_connection() { | ||
ldout(cct, 20) << __func__ << " QP: " << my_msg.qpn << " tcp_fd: " << tcp_fd << " fd: " << notify_fd << dendl; | ||
int r = infiniband->recv_msg(cct, tcp_fd, peer_msg); | ||
void RDMAConnTCP::handle_connection() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitto
@@ -575,7 +590,7 @@ void RDMAConnectedSocketImpl::fin() { | |||
} | |||
} | |||
|
|||
void RDMAConnectedSocketImpl::cleanup() { | |||
void RDMAConnTCP::cleanup() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitto
class RDMAConnectedSocketImpl : public ConnectedSocketImpl { | ||
protected: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one space before "protected"
}; | ||
|
||
class RDMAServerSocketImpl : public ServerSocketImpl { | ||
protected: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one space
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
@@ -34,7 +40,7 @@ RDMAServerSocketImpl::RDMAServerSocketImpl(CephContext *cct, Infiniband* i, RDMA | |||
ibdev->init(ibport); | |||
} | |||
|
|||
int RDMAServerSocketImpl::listen(entity_addr_t &sa, const SocketOptions &opt) | |||
int RDMAServerConnTCP::listen(entity_addr_t &sa, const SocketOptions &opt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move to tcp .cc
virtual void cleanup() = 0; | ||
virtual int try_connect(const entity_addr_t&, const SocketOptions &opt) = 0; | ||
virtual ssize_t submit(bool more); | ||
virtual int remote_qpn() = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this worth to define a virtual interface? I think we could have a variable in RDMAConnectedSocketImpl
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct - will change
src/msg/async/rdma/RDMAConnTCP.h
Outdated
class RDMAWorker; | ||
class RDMADispatcher; | ||
|
||
class RDMAConnTCP : public RDMAConnectedSocketImpl { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we define a RDMAConnMgr to encapsulate management info exchanges
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inherit from RDMAConnectedSocketImpl makes it unclear. Because tcp or rdmacm is only used for management, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack - adding RDMAConnMgr
I did it on purpose - to make the review easier (and rebase if needed). |
Encapsulate all connection establishment stuff in a new class - RDMAConnMgr and make it a friend class of RDMAConnectedSocketImpl. This class will be inherited for every type of connection establishment - Currently only TCP is supported, very soon CM will be added too. RDMAServerConnImpl which only handle connection establishment became an abstract class and RDMAServerConnTCP is inherting it for connections of type TCP. Some of the code was left in its original file and place, and therefore it looks misplaced. This was done to make it easier to review and rebase. Once it is accepted a cleanup patch will be sent to move the code into the right place. Issue: 995322 Change-Id: I8b0e163525ec80c2452f4b6481bf696968cc1e51 Signed-off-by: Amir Vadai <amir@vadai.me>
Setting RDMA_DEBUG to 1, will enable debug prints for every ibv_* function called. This makes it easier to debug RDMA issues. Issue: 995322 Change-Id: I49d327d5d4386b44264f5619d50f2dbc7d76dcae Signed-off-by: Amir Vadai <amir@vadai.me>
pushed new patchset and changed PR name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice refactor!
@Adirl and @yuyuyu101 this breaks master, see https://jenkins.ceph.com/job/ceph-dev-build/ARCH=x86_64,AVAILABLE_ARCH=x86_64,AVAILABLE_DIST=centos7,DIST=centos7,MACHINE_SIZE=huge/6048//consoleFull
|
and the jenkins "make check" of this PR failed also, see https://jenkins.ceph.com/job/ceph-pull-requests/21036/ |
#14245 is posted to revert the offending commit. |
@Adirl plz proposal again |
Checking and push new PR |
@yuyuyu101 , We'll send a new PR soon... |
No description provided.