Skip to content

Commit

Permalink
Merge branch 'Simplify-IPv4-route-offload-API'
Browse files Browse the repository at this point in the history
Ido Schimmel says:

====================
Simplify IPv4 route offload API

Motivation
==========

The aim of this patch set is to simplify the IPv4 route offload API by
making the stack a bit smarter about the notifications it is generating.
This allows driver authors to focus on programming the underlying device
instead of having to duplicate the IPv4 route insertion logic in their
driver, which is error-prone.

This is the first patch set out of a series of four. Subsequent patch
sets will simplify the IPv6 API, add offload/trap indication to routes
and add tests for all the code paths (including error paths). Available
here [1].

Details
=======

Today, whenever an IPv4 route is added or deleted a notification is sent
in the FIB notification chain and it is up to offload drivers to decide
if the route should be programmed to the hardware or not. This is not an
easy task as in hardware routes are keyed by {prefix, prefix length,
table id}, whereas the kernel can store multiple such routes that only
differ in metric / TOS / nexthop info.

This series makes sure that only routes that are actually used in the
data path are notified to offload drivers. This greatly simplifies the
work these drivers need to do, as they are now only concerned with
programming the hardware and do not need to replicate the IPv4 route
insertion logic and store multiple identical routes.

The route that is notified is the first FIB alias in the FIB node with
the given {prefix, prefix length, table ID}. In case the route is
deleted and there is another route with the same key, a replace
notification is emitted. Otherwise, a delete notification is emitted.

The above means that in the case of multiple routes with the same key,
but different TOS, only the route with the highest TOS is notified.
While the kernel can route a packet based on its TOS, this is not
supported by any hardware devices I am familiar with. Moreover, this is
not supported by IPv6 nor by BIRD/FRR from what I could see. Offload
drivers should therefore use the presence of a non-zero TOS as an
indication to trap packets matching the route and let the kernel route
them instead. mlxsw has been doing it for the past two years.

Testing
=======

To ensure there is no degradation in route insertion rates, I averaged
the insertion rate of 512k routes (/24 and /32) over 50 runs. Did not
observe any degradation.

Functional tests are available here [1]. They rely on route trap
indication, which is only added in the last patch set.

In addition, I have been running syzkaller for the past week with all
four patch sets and debug options enabled. Did not observe any problems.

Patch set overview
==================

Patches #1-#8 gradually introduce the new FIB notifications
Patch #9 converts mlxsw to use the new notifications
Patch #10 converts the remaining listeners and removes the old
notifications

v2:
* Extend fib_find_alias() with another argument instead of introducing a
  new function (David Ahern)

RFC: https://patchwork.ozlabs.org/cover/1170530/

[1] https://github.com/idosch/linux/tree/fib-notifier
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
  • Loading branch information
davem330 committed Dec 17, 2019
2 parents 366c7bb + 446f739 commit 03d51c4
Show file tree
Hide file tree
Showing 5 changed files with 104 additions and 165 deletions.
4 changes: 0 additions & 4 deletions drivers/net/ethernet/mellanox/mlx5/core/lag_mp.c
Original file line number Diff line number Diff line change
Expand Up @@ -200,8 +200,6 @@ static void mlx5_lag_fib_update(struct work_struct *work)
rtnl_lock();
switch (fib_work->event) {
case FIB_EVENT_ENTRY_REPLACE: /* fall through */
case FIB_EVENT_ENTRY_APPEND: /* fall through */
case FIB_EVENT_ENTRY_ADD: /* fall through */
case FIB_EVENT_ENTRY_DEL:
mlx5_lag_fib_route_event(ldev, fib_work->event,
fib_work->fen_info.fi);
Expand Down Expand Up @@ -259,8 +257,6 @@ static int mlx5_lag_fib_event(struct notifier_block *nb,

switch (event) {
case FIB_EVENT_ENTRY_REPLACE: /* fall through */
case FIB_EVENT_ENTRY_APPEND: /* fall through */
case FIB_EVENT_ENTRY_ADD: /* fall through */
case FIB_EVENT_ENTRY_DEL:
fen_info = container_of(info, struct fib_entry_notifier_info,
info);
Expand Down
136 changes: 17 additions & 119 deletions drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
Original file line number Diff line number Diff line change
Expand Up @@ -3845,7 +3845,7 @@ static void mlxsw_sp_nexthop4_event(struct mlxsw_sp *mlxsw_sp,

key.fib_nh = fib_nh;
nh = mlxsw_sp_nexthop_lookup(mlxsw_sp, key);
if (WARN_ON_ONCE(!nh))
if (!nh)
return;

switch (event) {
Expand Down Expand Up @@ -4780,95 +4780,6 @@ static void mlxsw_sp_fib_node_put(struct mlxsw_sp *mlxsw_sp,
mlxsw_sp_vr_put(mlxsw_sp, vr);
}

static struct mlxsw_sp_fib4_entry *
mlxsw_sp_fib4_node_entry_find(const struct mlxsw_sp_fib_node *fib_node,
const struct mlxsw_sp_fib4_entry *new4_entry)
{
struct mlxsw_sp_fib4_entry *fib4_entry;

list_for_each_entry(fib4_entry, &fib_node->entry_list, common.list) {
if (fib4_entry->tb_id > new4_entry->tb_id)
continue;
if (fib4_entry->tb_id != new4_entry->tb_id)
break;
if (fib4_entry->tos > new4_entry->tos)
continue;
if (fib4_entry->prio >= new4_entry->prio ||
fib4_entry->tos < new4_entry->tos)
return fib4_entry;
}

return NULL;
}

static int
mlxsw_sp_fib4_node_list_append(struct mlxsw_sp_fib4_entry *fib4_entry,
struct mlxsw_sp_fib4_entry *new4_entry)
{
struct mlxsw_sp_fib_node *fib_node;

if (WARN_ON(!fib4_entry))
return -EINVAL;

fib_node = fib4_entry->common.fib_node;
list_for_each_entry_from(fib4_entry, &fib_node->entry_list,
common.list) {
if (fib4_entry->tb_id != new4_entry->tb_id ||
fib4_entry->tos != new4_entry->tos ||
fib4_entry->prio != new4_entry->prio)
break;
}

list_add_tail(&new4_entry->common.list, &fib4_entry->common.list);
return 0;
}

static int
mlxsw_sp_fib4_node_list_insert(struct mlxsw_sp_fib4_entry *new4_entry,
bool replace, bool append)
{
struct mlxsw_sp_fib_node *fib_node = new4_entry->common.fib_node;
struct mlxsw_sp_fib4_entry *fib4_entry;

fib4_entry = mlxsw_sp_fib4_node_entry_find(fib_node, new4_entry);

if (append)
return mlxsw_sp_fib4_node_list_append(fib4_entry, new4_entry);
if (replace && WARN_ON(!fib4_entry))
return -EINVAL;

/* Insert new entry before replaced one, so that we can later
* remove the second.
*/
if (fib4_entry) {
list_add_tail(&new4_entry->common.list,
&fib4_entry->common.list);
} else {
struct mlxsw_sp_fib4_entry *last;

list_for_each_entry(last, &fib_node->entry_list, common.list) {
if (new4_entry->tb_id > last->tb_id)
break;
fib4_entry = last;
}

if (fib4_entry)
list_add(&new4_entry->common.list,
&fib4_entry->common.list);
else
list_add(&new4_entry->common.list,
&fib_node->entry_list);
}

return 0;
}

static void
mlxsw_sp_fib4_node_list_remove(struct mlxsw_sp_fib4_entry *fib4_entry)
{
list_del(&fib4_entry->common.list);
}

static int mlxsw_sp_fib_node_entry_add(struct mlxsw_sp *mlxsw_sp,
struct mlxsw_sp_fib_entry *fib_entry)
{
Expand Down Expand Up @@ -4912,14 +4823,12 @@ static void mlxsw_sp_fib_node_entry_del(struct mlxsw_sp *mlxsw_sp,
}

static int mlxsw_sp_fib4_node_entry_link(struct mlxsw_sp *mlxsw_sp,
struct mlxsw_sp_fib4_entry *fib4_entry,
bool replace, bool append)
struct mlxsw_sp_fib4_entry *fib4_entry)
{
struct mlxsw_sp_fib_node *fib_node = fib4_entry->common.fib_node;
int err;

err = mlxsw_sp_fib4_node_list_insert(fib4_entry, replace, append);
if (err)
return err;
list_add(&fib4_entry->common.list, &fib_node->entry_list);

err = mlxsw_sp_fib_node_entry_add(mlxsw_sp, &fib4_entry->common);
if (err)
Expand All @@ -4928,7 +4837,7 @@ static int mlxsw_sp_fib4_node_entry_link(struct mlxsw_sp *mlxsw_sp,
return 0;

err_fib_node_entry_add:
mlxsw_sp_fib4_node_list_remove(fib4_entry);
list_del(&fib4_entry->common.list);
return err;
}

Expand All @@ -4937,20 +4846,19 @@ mlxsw_sp_fib4_node_entry_unlink(struct mlxsw_sp *mlxsw_sp,
struct mlxsw_sp_fib4_entry *fib4_entry)
{
mlxsw_sp_fib_node_entry_del(mlxsw_sp, &fib4_entry->common);
mlxsw_sp_fib4_node_list_remove(fib4_entry);
list_del(&fib4_entry->common.list);

if (fib4_entry->common.type == MLXSW_SP_FIB_ENTRY_TYPE_IPIP_DECAP)
mlxsw_sp_fib_entry_decap_fini(mlxsw_sp, &fib4_entry->common);
}

static void mlxsw_sp_fib4_entry_replace(struct mlxsw_sp *mlxsw_sp,
struct mlxsw_sp_fib4_entry *fib4_entry,
bool replace)
struct mlxsw_sp_fib4_entry *fib4_entry)
{
struct mlxsw_sp_fib_node *fib_node = fib4_entry->common.fib_node;
struct mlxsw_sp_fib4_entry *replaced;

if (!replace)
if (list_is_singular(&fib_node->entry_list))
return;

/* We inserted the new entry before replaced one */
Expand All @@ -4962,9 +4870,8 @@ static void mlxsw_sp_fib4_entry_replace(struct mlxsw_sp *mlxsw_sp,
}

static int
mlxsw_sp_router_fib4_add(struct mlxsw_sp *mlxsw_sp,
const struct fib_entry_notifier_info *fen_info,
bool replace, bool append)
mlxsw_sp_router_fib4_replace(struct mlxsw_sp *mlxsw_sp,
const struct fib_entry_notifier_info *fen_info)
{
struct mlxsw_sp_fib4_entry *fib4_entry;
struct mlxsw_sp_fib_node *fib_node;
Expand All @@ -4989,14 +4896,13 @@ mlxsw_sp_router_fib4_add(struct mlxsw_sp *mlxsw_sp,
goto err_fib4_entry_create;
}

err = mlxsw_sp_fib4_node_entry_link(mlxsw_sp, fib4_entry, replace,
append);
err = mlxsw_sp_fib4_node_entry_link(mlxsw_sp, fib4_entry);
if (err) {
dev_warn(mlxsw_sp->bus_info->dev, "Failed to link FIB entry to node\n");
goto err_fib4_node_entry_link;
}

mlxsw_sp_fib4_entry_replace(mlxsw_sp, fib4_entry, replace);
mlxsw_sp_fib4_entry_replace(mlxsw_sp, fib4_entry);

return 0;

Expand Down Expand Up @@ -6094,21 +6000,16 @@ static void mlxsw_sp_router_fib4_event_work(struct work_struct *work)
struct mlxsw_sp_fib_event_work *fib_work =
container_of(work, struct mlxsw_sp_fib_event_work, work);
struct mlxsw_sp *mlxsw_sp = fib_work->mlxsw_sp;
bool replace, append;
int err;

/* Protect internal structures from changes */
rtnl_lock();
mlxsw_sp_span_respin(mlxsw_sp);

switch (fib_work->event) {
case FIB_EVENT_ENTRY_REPLACE: /* fall through */
case FIB_EVENT_ENTRY_APPEND: /* fall through */
case FIB_EVENT_ENTRY_ADD:
replace = fib_work->event == FIB_EVENT_ENTRY_REPLACE;
append = fib_work->event == FIB_EVENT_ENTRY_APPEND;
err = mlxsw_sp_router_fib4_add(mlxsw_sp, &fib_work->fen_info,
replace, append);
case FIB_EVENT_ENTRY_REPLACE:
err = mlxsw_sp_router_fib4_replace(mlxsw_sp,
&fib_work->fen_info);
if (err)
mlxsw_sp_router_fib_abort(mlxsw_sp);
fib_info_put(fib_work->fen_info.fi);
Expand Down Expand Up @@ -6211,8 +6112,6 @@ static void mlxsw_sp_router_fib4_event(struct mlxsw_sp_fib_event_work *fib_work,

switch (fib_work->event) {
case FIB_EVENT_ENTRY_REPLACE: /* fall through */
case FIB_EVENT_ENTRY_APPEND: /* fall through */
case FIB_EVENT_ENTRY_ADD: /* fall through */
case FIB_EVENT_ENTRY_DEL:
fen_info = container_of(info, struct fib_entry_notifier_info,
info);
Expand Down Expand Up @@ -6343,9 +6242,8 @@ static int mlxsw_sp_router_fib_event(struct notifier_block *nb,
err = mlxsw_sp_router_fib_rule_event(event, info,
router->mlxsw_sp);
return notifier_from_errno(err);
case FIB_EVENT_ENTRY_ADD:
case FIB_EVENT_ENTRY_REPLACE: /* fall through */
case FIB_EVENT_ENTRY_APPEND: /* fall through */
case FIB_EVENT_ENTRY_ADD: /* fall through */
case FIB_EVENT_ENTRY_REPLACE:
if (router->aborted) {
NL_SET_ERR_MSG_MOD(info->extack, "FIB offload was aborted. Not configuring route");
return notifier_from_errno(-EINVAL);
Expand Down
4 changes: 2 additions & 2 deletions drivers/net/ethernet/rocker/rocker_main.c
Original file line number Diff line number Diff line change
Expand Up @@ -2159,7 +2159,7 @@ static void rocker_router_fib_event_work(struct work_struct *work)
/* Protect internal structures from changes */
rtnl_lock();
switch (fib_work->event) {
case FIB_EVENT_ENTRY_ADD:
case FIB_EVENT_ENTRY_REPLACE:
err = rocker_world_fib4_add(rocker, &fib_work->fen_info);
if (err)
rocker_world_fib4_abort(rocker);
Expand Down Expand Up @@ -2201,7 +2201,7 @@ static int rocker_router_fib_event(struct notifier_block *nb,
fib_work->event = event;

switch (event) {
case FIB_EVENT_ENTRY_ADD: /* fall through */
case FIB_EVENT_ENTRY_REPLACE: /* fall through */
case FIB_EVENT_ENTRY_DEL:
if (info->family == AF_INET) {
struct fib_entry_notifier_info *fen_info = ptr;
Expand Down
4 changes: 2 additions & 2 deletions drivers/net/netdevsim/fib.c
Original file line number Diff line number Diff line change
Expand Up @@ -177,10 +177,10 @@ static int nsim_fib_event_nb(struct notifier_block *nb, unsigned long event,
event == FIB_EVENT_RULE_ADD);
break;

case FIB_EVENT_ENTRY_REPLACE: /* fall through */
case FIB_EVENT_ENTRY_ADD: /* fall through */
case FIB_EVENT_ENTRY_DEL:
err = nsim_fib_event(data, info,
event == FIB_EVENT_ENTRY_ADD);
err = nsim_fib_event(data, info, event != FIB_EVENT_ENTRY_DEL);
break;
}

Expand Down
Loading

0 comments on commit 03d51c4

Please sign in to comment.