-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
关于qp及通信的相关问题 #5
Comments
Hi, I think this is related to how to use RDMA. I strongly suggest to read this document http://www.mellanox.com/related-docs/prod_software/RDMA_Aware_Programming_user_manual.pdf at first. In a short reply:
Thanks. |
感谢您的回答,我的RDMA网络是RoCE的,在源代码基础上做了调整,RDMA write时poll阶段报错“got bad completion with status: 0xc, vendor syndrome: 0x81, with error transport retry counter exceeded, qp n:1 t:0 |
Hi,
目前wukong的连接不支持RoCE。 我们计划会在一个月左右将RoCE的支持加到mainstream。
RoCE网络建立QP需要额外的信息,因此对目前代码结构修改比较大。
如果急需使用,可以参照下面
https://ipads.se.sjtu.edu.cn:1312/weixd/libRDMA <https://ipads.se.sjtu.edu.cn:1312/weixd/libRDMA>
我们新的RDMA库的连接方式。新的代码支持RoCE。我们会尽快将RoCE支持加入到wukong中。
Best,
XingDa Wei
The institute of parallel and distributed systems,
Shanghai Jiao Tong University
… 在 2018年9月25日,上午11:16,PrincessLiu ***@***.*** ***@***.***>> 写道:
我的RDMA网络是RoCE的,在源代码基础上做了调整,RDMA write时poll阶段报错“got bad completion with status: 0xc, vendor syndrome: 0x81, with error transport retry counter exceeded, qp n:1 t:0
”,请问你们有遇到过这一问题或者知道该如何解决吗?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub <#5 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADNREi1VRxuroNSFcv6G8a5BfEr5PX1Hks5ueaAYgaJpZM4WtH61>.
|
PS: 报的是0xc那基本得不到很多其他信息。。
能不能把ibstatus的输出也贴一下?
PS:能不能用一些工具,比如ib_send_bw测试网络是否连接成功?
… 在 2018年9月25日,上午11:20,PrincessLiu ***@***.***> 写道:
感谢您的回答,我的RDMA网络是RoCE的,在源代码基础上做了调整,RDMA write时poll阶段报错“got bad completion with status: 0xc, vendor syndrome: 0x81, with error transport retry counter exceeded, qp n:1 t:0
”,请问你们有遇到过这一问题或者知道该如何解决吗?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub <#5 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADNREjtOWIvBycdsJmRdP8YnbfUDR_xZks5ueaEZgaJpZM4WtH61>.
|
我看了一下你们是改用了rdma_cm重写了这一模块是吗,我还是在你们原来libib的基础上加了gid,代码能运行到poll这里是不是说明connection已经成功了呢,有做网络的同学建议过我抓包看看数据是否异常,ibststus的输出您指的是什么呢?wc的status是IBV_WC_RETRY_EXC_ERR |
嗯嗯,我建议你方便的话还是用下我们的新lib(或者等一个月左右我会port进wukong(但是目前我没有时间整这个))。
我觉得可能你RoCE没有改对。我们的新lib还是基于libibverbs,并没有用rdma cm重写。新的lib在RoCE上的网络进行过完整的测试。
… 在 2018年9月26日,上午9:15,PrincessLiu ***@***.*** ***@***.***>> 写道:
ibstatus和ib_send_bw的输出情况如下,目测没什么问题:
<https://user-images.githubusercontent.com/19248708/46051547-a265a800-c16c-11e8-8db6-88b3bad38330.png>
<https://user-images.githubusercontent.com/19248708/46051548-a265a800-c16c-11e8-8768-252fbf3d901e.png>
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub <#5 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADNREm7NBsT0FQZUj5AamzGGPaWSqdXJks5uetUVgaJpZM4WtH61>.
|
嗯嗯我也发现我昨天看错了,那我直接用你们的新lib好了,谢谢你啦 |
Please open a new issue. |
2.get_local_qp_attr函数利用获取的qid返回的是发送方还是接收方的qp_attr?
3.设备间建立连接的过程是在init2rtr还是在connect中change_qp_states之前?
The text was updated successfully, but these errors were encountered: