New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
msg/async/rdma: compile with rdma as default #13901
Conversation
Issue: 992583 Issue: 992580 Change-Id: I24128b87294d3083c44c934f7d4bed554dd1f8a4 Signed-off-by: DanielBar-On <danielbo@mellanox.com>
@liewegas Thanks |
it's excited that we could see rdma builtin |
Thank you |
Hi, I want to know do we need to change the systemd config for RDMA just like #13305 after this change? Thanks. |
If you want to use systemctl or ceph-deploy to bring up and control your cluster so the answer is yes. |
Hi, I try to use this version's ceph with RDMA and envounter some problem. After I set up the The following is my note and I hope that will be useful. How I install the ceph
What problem I meet.
What do I do for those problem. void Device::binding_port(CephContext *cct, uint8_t port_num) {
---> port_cnt = device_attr->phys_port_cnt;¬
for (uint8_t i = 0; i < port_cnt; ++i) {
Port *port = new Port(cct, ctxt, i+1);
...
You can see the full gdb content here. Thanks. |
@Adirl could we deploy rdma via ceph-deploy? I guess this may relate to device priviledge ? |
Instead of ens4 try putting the driver name in ceph.conf here is an example how to find it for ConnectX-4: |
If you have OFED installed you can run: ibdev2netdev |
Thanks your help and I will try to replace the device name. |
Hi, Thanks your help and I still have some problems. After change the ms_async_rdma_device_name to mlx4_0, I can successfully run the command rados lspoolsFor command hwchiu@ceph-1:~/cluster$ sudo rados lspools
rbd
Segmentation fault (core dumped) The gdb content is (here)[https://gist.github.com/hwchiu/eadc75c6582588db3a4a8f1faf70f70a], it crash after the ceph rados osd treeFor command /build/ceph-12.0.0-1287-g6f5e6e98/src/msg/async/rdma/RDMAConnectedSocketImpl.cc: In function 'void RDMAConnectedSocketImpl::handle_connection()' thread 7ff4395eb700 time 2017-03-15 15:24:33.581693
/build/ceph-12.0.0-1287-g6f5e6e98/src/msg/async/rdma/RDMAConnectedSocketImpl.cc: 218: FAILED assert(!r)
2017-03-15 15:24:33.581678 7ff4395eb700 -1 RDMAConnectedSocketImpl activate failed to transition to RTR state: (22) Invalid argument
2017-03-15 15:24:33.581863 7ff438dea700 -1 RDMAConnectedSocketImpl activate failed to transition to RTR state: (22) Invalid argument
/build/ceph-12.0.0-1287-g6f5e6e98/src/msg/async/rdma/RDMAConnectedSocketImpl.cc: In function 'void RDMAConnectedSocketImpl::handle_connection()' thread 7ff438dea700 time 2017-03-15 15:24:33.581878
/build/ceph-12.0.0-1287-g6f5e6e98/src/msg/async/rdma/RDMAConnectedSocketImpl.cc: 218: FAILED assert(!r)
ceph version 12.0.0-1287-g6f5e6e98 (6f5e6e984e19e5ad33552fbb65033c91dbb7a36c)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x7ff454a3bcb2]
2: (RDMAConnectedSocketImpl::handle_connection()+0xb4a) [0x7ff454b8536a]
3: (EventCenter::process_events(int)+0x9b1) [0x7ff454b6d5d1]
4: (()+0x3ba091) [0x7ff454b72091]
5: (()+0xb8c80) [0x7ff4542e6c80]
6: (()+0x76ba) [0x7ff4663a16ba]
7: (clone()+0x6d) [0x7ff4660d782d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Aborted (core dumped) I try to install different OFED versions but both MLNX_OFED_LINUX-3.4-2.0.0.0 and MLNX_OFED_LINUX-4.0-1.0.1.0 still have the problem. The ceph.conf is same as the I posed before and only the ms_async_rdma_device_name change to mlx4_0. Thanks your help again. |
Please ignore above message Thanks. |
Signed-off-by: DanielBar-On danielbo@mellanox.com