Skip to content

Commit

Permalink
net/mlx5: fix hairpin queue unbind
Browse files Browse the repository at this point in the history
[ upstream commit ab2439f80bdf94e2382efe941cf827da6710b5d7 ]

Let's take an application with the following configuration:

- It uses 2 ports.
- Each port has 3 Rx queues and 3 Tx queues.
- On each port, Rx queues have a following purposes:
  - Rx queue 0 - SW queue,
  - Rx queue 1 - hairpin queue, bound to Tx queue on the same port,
  - Rx queue 2 - hairpin queue, bound to Tx queue on another port.
- On each port, Tx queues have a following purposes:
  - Tx queue 0 - SW queue,
  - Tx queue 1 - hairpin queue, bound to Rx queue on the same port,
  - Tx queue 2 - hairpin queue, bound to Rx queue on another port.
- Application configured all of the hairpin queues for manual binding.

After ports are configured and queues are set up,
if the application does the following API call sequence:

1. rte_eth_dev_start(port_id=0)
2. rte_eth_hairpin_bind(tx_port=0, rx_port=0)
3. rte_eth_hairpin_bind(tx_port=0, rx_port=1)

mlx5 PMD fails to modify SQ and logs this error:

  mlx5_common: mlx5_devx_cmds.c:2079: mlx5_devx_cmd_modify_sq():
    Failed to modify SQ using DevX

This error was caused by an incorrect unbind operation taken during
error handling inside call (3).

(3) fails, because port 1 (Rx side of the hairpin) was not started.
As a result of this failure, PMD goes into error handling, where all
previously bound hairpin queues are unbound.
This is incorrect, since this error handling procedure
in rte_eth_hairpin_bind() implementation assumes that
all hairpin queues are bound to the same rx_port, which is not the case.
The following sequence of function calls appears:

- rte_eth_hairpin_queue_peer_unbind(rx_port=**1**, rx_queue=1, 0),
- mlx5_hairpin_queue_peer_unbind(dev=**port 0**, tx_queue=1, 1).

Which violates the hairpin queue destroy flow, by unbinding Tx queue 1
on port 0, before unbinding Rx queue 1 on port 1.

This patch fixes that behavior, by filtering Tx queues on which error
handling is done to only affect:

- hairpin queues (it also reduces unnecessary debug log messages),
- hairpin queues connected to the rx_port which is currently processed.

Fixes: 37cd450 ("net/mlx5: support two ports hairpin mode")

Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
  • Loading branch information
sodar authored and bluca committed Nov 15, 2023
1 parent 0325a1c commit 7243435
Showing 1 changed file with 5 additions and 0 deletions.
5 changes: 5 additions & 0 deletions drivers/net/mlx5/mlx5_trigger.c
Original file line number Diff line number Diff line change
Expand Up @@ -820,6 +820,11 @@ mlx5_hairpin_bind_single_port(struct rte_eth_dev *dev, uint16_t rx_port)
txq_ctrl = mlx5_txq_get(dev, i);
if (txq_ctrl == NULL)
continue;
if (txq_ctrl->type != MLX5_TXQ_TYPE_HAIRPIN ||
txq_ctrl->hairpin_conf.peers[0].port != rx_port) {
mlx5_txq_release(dev, i);
continue;
}
rx_queue = txq_ctrl->hairpin_conf.peers[0].queue;
rte_eth_hairpin_queue_peer_unbind(rx_port, rx_queue, 0);
mlx5_hairpin_queue_peer_unbind(dev, i, 1);
Expand Down

0 comments on commit 7243435

Please sign in to comment.