Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

msg/async/rdma: Introduce RDMAConnMgr + Debug prints #14201

Merged
merged 2 commits into from Mar 30, 2017

Conversation

Adirl
Copy link

@Adirl Adirl commented Mar 28, 2017

No description provided.

@Adirl
Copy link
Author

Adirl commented Mar 28, 2017

@yuyuyu101 @amirv @orendu @saritz @DanielBar-On

patchset number 4
Introduce RDMAConnTCP + Debug prints

notify_fd = dispatcher->register_qp(qp, this);
dispatcher->perf_logger->inc(l_msgr_rdma_created_queue_pair);
dispatcher->perf_logger->inc(l_msgr_rdma_active_queue_pair);
}

RDMAConnTCP::~RDMAConnTCP()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be RDMAConnTCP.cc

@@ -88,7 +103,7 @@ void RDMAConnectedSocketImpl::get_wc(std::vector<ibv_wc> &w)
w.swap(wc);
}

int RDMAConnectedSocketImpl::activate()
int RDMAConnTCP::activate()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitto

submit(false);
}
active = true;

return 0;
}

int RDMAConnectedSocketImpl::try_connect(const entity_addr_t& peer_addr, const SocketOptions &opts) {
int RDMAConnTCP::try_connect(const entity_addr_t& peer_addr, const SocketOptions &opts) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitto

void RDMAConnectedSocketImpl::handle_connection() {
ldout(cct, 20) << __func__ << " QP: " << my_msg.qpn << " tcp_fd: " << tcp_fd << " fd: " << notify_fd << dendl;
int r = infiniband->recv_msg(cct, tcp_fd, peer_msg);
void RDMAConnTCP::handle_connection() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitto

@@ -575,7 +590,7 @@ void RDMAConnectedSocketImpl::fin() {
}
}

void RDMAConnectedSocketImpl::cleanup() {
void RDMAConnTCP::cleanup() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitto

class RDMAConnectedSocketImpl : public ConnectedSocketImpl {
protected:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one space before "protected"

};

class RDMAServerSocketImpl : public ServerSocketImpl {
protected:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one space

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

@@ -34,7 +40,7 @@ RDMAServerSocketImpl::RDMAServerSocketImpl(CephContext *cct, Infiniband* i, RDMA
ibdev->init(ibport);
}

int RDMAServerSocketImpl::listen(entity_addr_t &sa, const SocketOptions &opt)
int RDMAServerConnTCP::listen(entity_addr_t &sa, const SocketOptions &opt)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to tcp .cc

virtual void cleanup() = 0;
virtual int try_connect(const entity_addr_t&, const SocketOptions &opt) = 0;
virtual ssize_t submit(bool more);
virtual int remote_qpn() = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this worth to define a virtual interface? I think we could have a variable in RDMAConnectedSocketImpl

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct - will change

class RDMAWorker;
class RDMADispatcher;

class RDMAConnTCP : public RDMAConnectedSocketImpl {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we define a RDMAConnMgr to encapsulate management info exchanges

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inherit from RDMAConnectedSocketImpl makes it unclear. Because tcp or rdmacm is only used for management, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack - adding RDMAConnMgr

@amirv
Copy link
Contributor

amirv commented Mar 29, 2017

this should be RDMAConnTCP.cc

I did it on purpose - to make the review easier (and rebase if needed).
If you don't mind, I prefer leaving this patch as is, and once it is merged I will send a cleanup commit to move code into the right files.

Encapsulate all connection establishment stuff in a new class -
RDMAConnMgr and make it a friend class of RDMAConnectedSocketImpl.
This class will be inherited for every type of connection establishment
- Currently only TCP is supported, very soon CM will be added too.

RDMAServerConnImpl which only handle connection establishment became an
abstract class and RDMAServerConnTCP is inherting it for connections of
type TCP.

Some of the code was left in its original file and place, and therefore
it looks misplaced. This was done to make it easier to review and rebase.
Once it is accepted a cleanup patch will be sent to move the code into
the right place.

Issue: 995322
Change-Id: I8b0e163525ec80c2452f4b6481bf696968cc1e51
Signed-off-by: Amir Vadai <amir@vadai.me>
Setting RDMA_DEBUG to 1, will enable debug prints for every ibv_*
function called.
This makes it easier to debug RDMA issues.

Issue: 995322
Change-Id: I49d327d5d4386b44264f5619d50f2dbc7d76dcae
Signed-off-by: Amir Vadai <amir@vadai.me>
@Adirl Adirl changed the title msg/async/rdma: Introduce RDMAConnTCP + Debug prints msg/async/rdma: Introduce RDMAConnMgr + Debug prints Mar 29, 2017
@Adirl
Copy link
Author

Adirl commented Mar 29, 2017

pushed new patchset and changed PR name
tested on vstart + traffic

Copy link
Member

@yuyuyu101 yuyuyu101 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice refactor!

@yuyuyu101 yuyuyu101 merged commit 4100803 into ceph:master Mar 30, 2017
@tchaikov
Copy link
Contributor

@Adirl and @yuyuyu101 this breaks master, see https://jenkins.ceph.com/job/ceph-dev-build/ARCH=x86_64,AVAILABLE_ARCH=x86_64,AVAILABLE_DIST=centos7,DIST=centos7,MACHINE_SIZE=huge/6048//consoleFull

[ 38%] Building CXX object src/CMakeFiles/common-objs.dir/msg/async/net_handler.cc.o
In file included from /mnt/jenkins/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.0.0-1939-g4100803/rpm/el7/BUILD/ceph-12.0.0-1939-g4100803/src/msg/async/rdma/RDMAStack.h:30:0,
                 from /mnt/jenkins/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.0.0-1939-g4100803/rpm/el7/BUILD/ceph-12.0.0-1939-g4100803/src/msg/async/Stack.cc:22:
/mnt/jenkins/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.0.0-1939-g4100803/rpm/el7/BUILD/ceph-12.0.0-1939-g4100803/src/msg/async/rdma/Infiniband.h:21:27: fatal error: rdma/rdma_cma.h: No such file or directory
 #include <rdma/rdma_cma.h>
                           ^
compilation terminated.

@tchaikov
Copy link
Contributor

and the jenkins "make check" of this PR failed also, see https://jenkins.ceph.com/job/ceph-pull-requests/21036/

@tchaikov
Copy link
Contributor

#14245 is posted to revert the offending commit.

@yuyuyu101
Copy link
Member

@Adirl plz proposal again

@Adirl
Copy link
Author

Adirl commented Mar 30, 2017

Checking and push new PR
Sorry

@amirv
Copy link
Contributor

amirv commented Mar 30, 2017

@yuyuyu101 ,
Sorry, an unused 'include "rdma_cma.h"' slipped into the patch.

We'll send a new PR soon...

@Adirl @orendu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants