Skip to content
This repository has been archived by the owner. It is now read-only.

kernel: 4.2.x infinite loop with bond interfaces and bridge fdb command #980

kayrus opened this issue Nov 13, 2015 · 5 comments

kernel: 4.2.x infinite loop with bond interfaces and bridge fdb command #980

kayrus opened this issue Nov 13, 2015 · 5 comments


Copy link

@kayrus kayrus commented Nov 13, 2015

/cc @dtatulea

I've discovered that the issue appeared in linux kernel 4.2.x. It's caused flannel OOM issue here coreos/flannel#367 which was already fixed in flannel but not in kernel.

It is possible to reproduce the issue by running this script: (ssh ubuntu@ubuntu1 with password: passw0rd)

just run bridge fdb and it will run forever

probably problem is somewhere here

Copy link

@dtatulea dtatulea commented Nov 14, 2015

There's a very easy way to reproduce this:


set -x

modprobe bonding
modprobe dummy numdummies=2

echo "+bond0" >  /sys/class/net/bonding_masters 
echo "+dummy0" > /sys/class/net/bond0/bonding/slaves
echo "+dummy1" > /sys/class/net/bond0/bonding/slaves

bridge fdb
Copy link

@dtatulea dtatulea commented Nov 14, 2015

An error gets misinterpreted as an index from the switchdev ops (used by the bonding driver) to the rtnetlink fdb dump.

Now the details:

It looks like in 4.2 the bonding driver started using fdb ops from switchdev which returns EOPTNOTSUPP.

This error value gets propagated to the main fdb dump function as the idx value which is not expected to be negative and is forwarded to netlink.

On a 4.1 kernel idx is always > 0.

This code also changed from 4.2 to upstream tip. Looking into how this could be fixed. Not sure if the callbacks should be made to never return an error, or have a check in the rtnl_fdb_dump for negative values before assigning to idx.

Copy link

@dtatulea dtatulea commented Nov 15, 2015

It looks like the main issue has been fixed in 4.3.

However, that works only if you have CONFIG_NET_SWITCHDEV turned on. Oterwise, you'll still get an error from the unimplemented switchdev_port_obj_dump which returns an error instead of returning an index. Let's see what netdev has to say about below patch:

diff --git a/include/net/switchdev.h b/include/net/switchdev.h
index bc865e2..bc5765a 100644
--- a/include/net/switchdev.h
+++ b/include/net/switchdev.h
@@ -323,7 +323,7 @@ static inline int switchdev_port_fdb_dump(struct sk_buff *skb,
                                          struct net_device *filter_dev,
                                          int idx)
-       return -EOPNOTSUPP;
+       return idx;

 static inline void switchdev_port_fwd_mark_set(struct net_device *dev,
@crawford crawford modified the milestone: CoreOS 871.0.0 Nov 16, 2015
Copy link

@dtatulea dtatulea commented Nov 17, 2015

Fix was applied to the net-next kernel.

Copy link

@vcaputo vcaputo commented Nov 17, 2015

@dtatulea thank you, sir! I've pulled both the patches in coreos/coreos-overlay#1640, just in case we enable CONFIG_NET_SWITCHDEV.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants
You can’t perform that action at this time.