New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

msg/async/rdma: Introduce RDMAConnMgr + Debug prints #14201

Merged
merged 2 commits into from Mar 30, 2017

Conversation

Projects
None yet
6 participants
@Adirl

Adirl commented Mar 28, 2017

No description provided.

@Adirl

This comment has been minimized.

Adirl commented Mar 28, 2017

@yuyuyu101 @amirv @orendu @saritz @DanielBar-On

patchset number 4
Introduce RDMAConnTCP + Debug prints

notify_fd = dispatcher->register_qp(qp, this);
dispatcher->perf_logger->inc(l_msgr_rdma_created_queue_pair);
dispatcher->perf_logger->inc(l_msgr_rdma_active_queue_pair);
}
RDMAConnTCP::~RDMAConnTCP()

This comment has been minimized.

@yuyuyu101

yuyuyu101 Mar 28, 2017

Member

this should be RDMAConnTCP.cc

@@ -88,7 +103,7 @@ void RDMAConnectedSocketImpl::get_wc(std::vector<ibv_wc> &w)
w.swap(wc);
}
int RDMAConnectedSocketImpl::activate()
int RDMAConnTCP::activate()

This comment has been minimized.

@yuyuyu101
submit(false);
}
active = true;
return 0;
}
int RDMAConnectedSocketImpl::try_connect(const entity_addr_t& peer_addr, const SocketOptions &opts) {
int RDMAConnTCP::try_connect(const entity_addr_t& peer_addr, const SocketOptions &opts) {

This comment has been minimized.

@yuyuyu101
void RDMAConnectedSocketImpl::handle_connection() {
ldout(cct, 20) << __func__ << " QP: " << my_msg.qpn << " tcp_fd: " << tcp_fd << " fd: " << notify_fd << dendl;
int r = infiniband->recv_msg(cct, tcp_fd, peer_msg);
void RDMAConnTCP::handle_connection() {

This comment has been minimized.

@yuyuyu101
@@ -575,7 +590,7 @@ void RDMAConnectedSocketImpl::fin() {
}
}
void RDMAConnectedSocketImpl::cleanup() {
void RDMAConnTCP::cleanup() {

This comment has been minimized.

@yuyuyu101
class RDMAConnectedSocketImpl : public ConnectedSocketImpl {
protected:

This comment has been minimized.

@yuyuyu101

yuyuyu101 Mar 28, 2017

Member

one space before "protected"

};
class RDMAServerSocketImpl : public ServerSocketImpl {
protected:

This comment has been minimized.

@yuyuyu101

yuyuyu101 Mar 28, 2017

Member

one space

This comment has been minimized.

@amirv

amirv Mar 29, 2017

Contributor

ack

@@ -34,7 +40,7 @@ RDMAServerSocketImpl::RDMAServerSocketImpl(CephContext *cct, Infiniband* i, RDMA
ibdev->init(ibport);
}
int RDMAServerSocketImpl::listen(entity_addr_t &sa, const SocketOptions &opt)
int RDMAServerConnTCP::listen(entity_addr_t &sa, const SocketOptions &opt)

This comment has been minimized.

@yuyuyu101

yuyuyu101 Mar 28, 2017

Member

move to tcp .cc

virtual void cleanup() = 0;
virtual int try_connect(const entity_addr_t&, const SocketOptions &opt) = 0;
virtual ssize_t submit(bool more);
virtual int remote_qpn() = 0;

This comment has been minimized.

@yuyuyu101

yuyuyu101 Mar 28, 2017

Member

is this worth to define a virtual interface? I think we could have a variable in RDMAConnectedSocketImpl

This comment has been minimized.

@amirv

amirv Mar 29, 2017

Contributor

Correct - will change

class RDMAWorker;
class RDMADispatcher;
class RDMAConnTCP : public RDMAConnectedSocketImpl {

This comment has been minimized.

@yuyuyu101

yuyuyu101 Mar 28, 2017

Member

Maybe we define a RDMAConnMgr to encapsulate management info exchanges

This comment has been minimized.

@yuyuyu101

yuyuyu101 Mar 28, 2017

Member

Inherit from RDMAConnectedSocketImpl makes it unclear. Because tcp or rdmacm is only used for management, right?

This comment has been minimized.

@amirv

amirv Mar 29, 2017

Contributor

ack - adding RDMAConnMgr

@amirv

This comment has been minimized.

Contributor

amirv commented Mar 29, 2017

this should be RDMAConnTCP.cc

I did it on purpose - to make the review easier (and rebase if needed).
If you don't mind, I prefer leaving this patch as is, and once it is merged I will send a cleanup commit to move code into the right files.

amirv added some commits Mar 22, 2017

msg/async/rdma: Introduce RDMAConnMgr
Encapsulate all connection establishment stuff in a new class -
RDMAConnMgr and make it a friend class of RDMAConnectedSocketImpl.
This class will be inherited for every type of connection establishment
- Currently only TCP is supported, very soon CM will be added too.

RDMAServerConnImpl which only handle connection establishment became an
abstract class and RDMAServerConnTCP is inherting it for connections of
type TCP.

Some of the code was left in its original file and place, and therefore
it looks misplaced. This was done to make it easier to review and rebase.
Once it is accepted a cleanup patch will be sent to move the code into
the right place.

Issue: 995322
Change-Id: I8b0e163525ec80c2452f4b6481bf696968cc1e51
Signed-off-by: Amir Vadai <amir@vadai.me>
msg/async/rdma: Debug prints for ibv_*
Setting RDMA_DEBUG to 1, will enable debug prints for every ibv_*
function called.
This makes it easier to debug RDMA issues.

Issue: 995322
Change-Id: I49d327d5d4386b44264f5619d50f2dbc7d76dcae
Signed-off-by: Amir Vadai <amir@vadai.me>

@Adirl Adirl changed the title from msg/async/rdma: Introduce RDMAConnTCP + Debug prints to msg/async/rdma: Introduce RDMAConnMgr + Debug prints Mar 29, 2017

@Adirl

This comment has been minimized.

Adirl commented Mar 29, 2017

pushed new patchset and changed PR name
tested on vstart + traffic

@yuyuyu101

Nice refactor!

@yuyuyu101 yuyuyu101 merged commit 4100803 into ceph:master Mar 30, 2017

2 of 3 checks passed

default Build finished.
Details
Signed-off-by all commits in this PR are signed
Details
Unmodifed Submodules submodules for project are unmodified
Details
@tchaikov

This comment has been minimized.

Contributor

tchaikov commented Mar 30, 2017

@Adirl and @yuyuyu101 this breaks master, see https://jenkins.ceph.com/job/ceph-dev-build/ARCH=x86_64,AVAILABLE_ARCH=x86_64,AVAILABLE_DIST=centos7,DIST=centos7,MACHINE_SIZE=huge/6048//consoleFull

[ 38%] Building CXX object src/CMakeFiles/common-objs.dir/msg/async/net_handler.cc.o
In file included from /mnt/jenkins/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.0.0-1939-g4100803/rpm/el7/BUILD/ceph-12.0.0-1939-g4100803/src/msg/async/rdma/RDMAStack.h:30:0,
                 from /mnt/jenkins/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.0.0-1939-g4100803/rpm/el7/BUILD/ceph-12.0.0-1939-g4100803/src/msg/async/Stack.cc:22:
/mnt/jenkins/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.0.0-1939-g4100803/rpm/el7/BUILD/ceph-12.0.0-1939-g4100803/src/msg/async/rdma/Infiniband.h:21:27: fatal error: rdma/rdma_cma.h: No such file or directory
 #include <rdma/rdma_cma.h>
                           ^
compilation terminated.
@tchaikov

This comment has been minimized.

Contributor

tchaikov commented Mar 30, 2017

and the jenkins "make check" of this PR failed also, see https://jenkins.ceph.com/job/ceph-pull-requests/21036/

@tchaikov

This comment has been minimized.

Contributor

tchaikov commented Mar 30, 2017

#14245 is posted to revert the offending commit.

@yuyuyu101

This comment has been minimized.

Member

yuyuyu101 commented Mar 30, 2017

@Adirl plz proposal again

@Adirl

This comment has been minimized.

Adirl commented Mar 30, 2017

Checking and push new PR
Sorry

@amirv

This comment has been minimized.

Contributor

amirv commented Mar 30, 2017

@yuyuyu101 ,
Sorry, an unused 'include "rdma_cma.h"' slipped into the patch.

We'll send a new PR soon...

@Adirl @orendu

@hwchiu

This comment has been minimized.

Hi,
Since the server side will close the event of the tcp_fd (via cleanup) after it receives the ack from client.
Do we have any concern why the client side won't do the same thing as the server after it send the ack successfully.

Thanks.

This comment has been minimized.

Contributor

amirv replied Apr 19, 2017

I guess @yuyuyu101 will give the best answer.
I think that closing immediately after sending an ack is not a good idea because the TCP reset might get before the ack to the server.
Anyway this patch doesn't change anything in this logic.

Amir

This comment has been minimized.

hwchiu replied Apr 19, 2017

Thanks your reply.

Sorry for bothering you, since I don't know where can I ask some questions about the code, I just ask here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment