Skip to content

Commit 445fd4f

Browse files
Bob Pearsonjgunthorpe
authored andcommitted
RDMA/rxe: Fix rnr retry behavior
Currently the completer tasklet when retransmit timer or the rnr timer fires the same flag (qp->req.need_retry) is set so that if either timer fires it will attempt to perform a retry flow on the send queue. This has the effect of responding to an RNR NAK at the first retransmit timer event which might not allow the requested rnr timeout. This patch adds a new flag (qp->req.wait_for_rnr_timer) which, if set, prevents a retry flow until the rnr nak timer fires. This patch fixes rnr retry errors which can be observed by running the pyverbs test_rdmacm_async_traffic_external_qp multiple times. With this patch applied they do not occur. Link: https://lore.kernel.org/linux-rdma/a8287823-1408-4273-bc22-99a0678db640@gmail.com/ Link: https://lore.kernel.org/linux-rdma/2bafda9e-2bb6-186d-12a1-179e8f6a2678@talpey.com/ Fixes: 8700e3e ("Soft RoCE driver") Link: https://lore.kernel.org/r/20220630190425.2251-6-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
1 parent 930119a commit 445fd4f

File tree

4 files changed

+22
-3
lines changed

4 files changed

+22
-3
lines changed

drivers/infiniband/sw/rxe/rxe_comp.c

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -114,6 +114,8 @@ void retransmit_timer(struct timer_list *t)
114114
{
115115
struct rxe_qp *qp = from_timer(qp, t, retrans_timer);
116116

117+
pr_debug("%s: fired for qp#%d\n", __func__, qp->elem.index);
118+
117119
if (qp->valid) {
118120
qp->comp.timeout = 1;
119121
rxe_run_task(&qp->comp.task, 1);
@@ -730,11 +732,15 @@ int rxe_completer(void *arg)
730732
break;
731733

732734
case COMPST_RNR_RETRY:
735+
/* we come here if we received an RNR NAK */
733736
if (qp->comp.rnr_retry > 0) {
734737
if (qp->comp.rnr_retry != 7)
735738
qp->comp.rnr_retry--;
736739

737-
qp->req.need_retry = 1;
740+
/* don't start a retry flow until the
741+
* rnr timer has fired
742+
*/
743+
qp->req.wait_for_rnr_timer = 1;
738744
pr_debug("qp#%d set rnr nak timer\n",
739745
qp_num(qp));
740746
mod_timer(&qp->rnr_nak_timer,

drivers/infiniband/sw/rxe/rxe_qp.c

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -505,6 +505,7 @@ static void rxe_qp_reset(struct rxe_qp *qp)
505505
atomic_set(&qp->ssn, 0);
506506
qp->req.opcode = -1;
507507
qp->req.need_retry = 0;
508+
qp->req.wait_for_rnr_timer = 0;
508509
qp->req.noack_pkts = 0;
509510
qp->resp.msn = 0;
510511
qp->resp.opcode = -1;

drivers/infiniband/sw/rxe/rxe_req.c

Lines changed: 13 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -100,7 +100,11 @@ void rnr_nak_timer(struct timer_list *t)
100100
{
101101
struct rxe_qp *qp = from_timer(qp, t, rnr_nak_timer);
102102

103-
pr_debug("qp#%d rnr nak timer fired\n", qp_num(qp));
103+
pr_debug("%s: fired for qp#%d\n", __func__, qp_num(qp));
104+
105+
/* request a send queue retry */
106+
qp->req.need_retry = 1;
107+
qp->req.wait_for_rnr_timer = 0;
104108
rxe_run_task(&qp->req.task, 1);
105109
}
106110

@@ -641,10 +645,17 @@ int rxe_requester(void *arg)
641645
qp->req.need_rd_atomic = 0;
642646
qp->req.wait_psn = 0;
643647
qp->req.need_retry = 0;
648+
qp->req.wait_for_rnr_timer = 0;
644649
goto exit;
645650
}
646651

647-
if (unlikely(qp->req.need_retry)) {
652+
/* we come here if the retransmot timer has fired
653+
* or if the rnr timer has fired. If the retransmit
654+
* timer fires while we are processing an RNR NAK wait
655+
* until the rnr timer has fired before starting the
656+
* retry flow
657+
*/
658+
if (unlikely(qp->req.need_retry && !qp->req.wait_for_rnr_timer)) {
648659
req_retry(qp);
649660
qp->req.need_retry = 0;
650661
}

drivers/infiniband/sw/rxe/rxe_verbs.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -123,6 +123,7 @@ struct rxe_req_info {
123123
int need_rd_atomic;
124124
int wait_psn;
125125
int need_retry;
126+
int wait_for_rnr_timer;
126127
int noack_pkts;
127128
struct rxe_task task;
128129
};

0 commit comments

Comments
 (0)