Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Bad page state in process devlink #11

Open
moni-levy opened this issue Sep 22, 2020 · 9 comments
Open

BUG: Bad page state in process devlink #11

moni-levy opened this issue Sep 22, 2020 · 9 comments

Comments

@moni-levy
Copy link

During our regression testing on a DENT system we observe the below issue.
No specific reproduction steps, happens sporadically.

System is DNI TX4810
Attached inline dmesg output,

INFO - INFO - Dmesg file has 291 lines
INFO - INFO - ----------------------------- Dmesg file --------------------------
INFO - INFO - [ 6156.708982] 8021q: adding VLAN 0 to HW filter on device swp1
INFO - INFO - [ 6156.718551] 8021q: adding VLAN 0 to HW filter on device swp2
INFO - INFO - [ 6160.176012] mlxsw_spectrum 0000:01:00.0 swp1: link up
INFO - INFO - [ 6160.181142] mlxsw_spectrum 0000:01:00.0 swp2: link up
INFO - INFO - [ 6160.186271] IPv6: ADDRCONF(NETDEV_CHANGE): swp1: link becomes ready
INFO - INFO - [ 6160.197821] IPv6: ADDRCONF(NETDEV_CHANGE): swp2: link becomes ready
INFO - INFO - [ 6166.009184] mlxsw_spectrum 0000:01:00.0 swp1: link down
INFO - INFO - [ 6166.014475] mlxsw_spectrum 0000:01:00.0 swp2: link down
INFO - INFO - [ 6166.022305] 8021q: adding VLAN 0 to HW filter on device swp1
INFO - INFO - [ 6166.047256] 8021q: adding VLAN 0 to HW filter on device swp1
INFO - INFO - [ 6166.154248] br1: port 1(swp2) entered blocking state
INFO - INFO - [ 6166.159338] br1: port 1(swp2) entered disabled state
INFO - INFO - [ 6166.169385] device swp2 entered promiscuous mode
INFO - INFO - [ 6166.200506] br1: port 2(d) entered blocking state
INFO - INFO - [ 6166.205280] br1: port 2(d) entered disabled state
INFO - INFO - [ 6166.210703] device d entered promiscuous mode
INFO - INFO - [ 6166.220566] br1: port 2(d) entered blocking state
INFO - INFO - [ 6166.225408] br1: port 2(d) entered disabled state
INFO - INFO - [ 6166.271845] device swp2 left promiscuous mode
INFO - INFO - [ 6166.276638] br1: port 1(swp2) entered disabled state
INFO - INFO - [ 6166.435362] br1: port 1(swp2) entered blocking state
INFO - INFO - [ 6166.440553] br1: port 1(swp2) entered disabled state
INFO - INFO - [ 6166.450529] device swp2 entered promiscuous mode
INFO - INFO - [ 6166.480853] br1: port 2(d) entered blocking state
INFO - INFO - [ 6166.485732] br1: port 2(d) entered disabled state
INFO - INFO - [ 6166.491185] device d entered promiscuous mode
INFO - INFO - [ 6166.496285] br1: port 2(d) entered blocking state
INFO - INFO - [ 6166.501031] br1: port 2(d) entered forwarding state
INFO - INFO - [ 6166.521232] br1: port 2(d) entered disabled state
INFO - INFO - [ 6166.526310] device d left promiscuous mode
INFO - INFO - [ 6166.530468] br1: port 2(d) entered disabled state
INFO - INFO - [ 6166.595806] device swp2 left promiscuous mode
INFO - INFO - [ 6166.600639] br1: port 1(swp2) entered disabled state
INFO - INFO - [ 6166.967154] br0: port 1(swp1) entered blocking state
INFO - INFO - [ 6166.972243] br0: port 1(swp1) entered disabled state
INFO - INFO - [ 6166.977787] device swp1 entered promiscuous mode
INFO - INFO - [ 6167.120158] device swp1 left promiscuous mode
INFO - INFO - [ 6167.124624] br0: port 1(swp1) entered disabled state
INFO - INFO - [ 6167.200338] br0: port 1(swp1) entered blocking state
INFO - INFO - [ 6167.205485] br0: port 1(swp1) entered disabled state
INFO - INFO - [ 6167.211048] device swp1 entered promiscuous mode
INFO - INFO - [ 6167.265771] device swp1 left promiscuous mode
INFO - INFO - [ 6167.270222] br0: port 1(swp1) entered disabled state
INFO - INFO - [ 6167.424126] br0: port 1(swp1) entered blocking state
INFO - INFO - [ 6167.429205] br0: port 1(swp1) entered disabled state
INFO - INFO - [ 6167.434956] device swp1 entered promiscuous mode
INFO - INFO - [ 6167.496602] device swp1 left promiscuous mode
INFO - INFO - [ 6167.501033] br0: port 1(swp1) entered disabled state
INFO - INFO - [ 6172.653554] BUG: Bad page state in process devlink pfn:22f73d
INFO - INFO - [ 6172.659435] page:fffffe00089dcf40 refcount:-1 mapcount:0 mapping:0000000000000000 index:0x0
INFO - INFO - [ 6172.667826] flags: 0x2ffff00000000000()
INFO - INFO - [ 6172.671683] raw: 2ffff00000000000 0000000000000000 ffffffff089d0201 0000000000000000
INFO - INFO - [ 6172.679462] raw: 0000000000000000 0000000000000000 ffffffffffffffff 0000000000000000
INFO - INFO - [ 6172.687240] page dumped because: nonzero _refcount
INFO - INFO - [ 6172.692052] Modules linked in:
INFO - INFO - [ 6172.695125] CPU: 1 PID: 16346 Comm: devlink Tainted: G B 5.8.0-rc6-custom-273020-gac6b365b1bf5 #44
INFO - INFO - [ 6172.705433] Hardware name: Marvell Armada 7040 TX4810M (DT)
INFO - INFO - [ 6172.711031] Call trace:
INFO - INFO - [ 6172.713495] dump_backtrace+0x0/0x1d0
INFO - INFO - [ 6172.717174] show_stack+0x1c/0x28
INFO - INFO - [ 6172.720507] dump_stack+0xbc/0x118
INFO - INFO - [ 6172.723928] bad_page+0xcc/0xf8
INFO - INFO - [ 6172.727084] check_free_page_bad+0x80/0x88
INFO - INFO - [ 6172.731199] __free_pages_ok+0x3f8/0x418
INFO - INFO - [ 6172.735140] __free_pages+0x38/0x60
INFO - INFO - [ 6172.738647] kmem_freepages+0x200/0x2a8
INFO - INFO - [ 6172.742500] slab_destroy+0x28/0x68
INFO - INFO - [ 6172.746003] slabs_destroy+0x60/0x90
INFO - INFO - [ 6172.749595] ___cache_free+0x1b4/0x358
INFO - INFO - [ 6172.753361] kfree+0xc0/0x1d0
INFO - INFO - [ 6172.756345] skb_free_head+0x2c/0x38
INFO - INFO - [ 6172.759936] skb_release_data+0x110/0x1a0
INFO - INFO - [ 6172.763964] skb_release_all+0x2c/0x38
INFO - INFO - [ 6172.767730] consume_skb+0x38/0x130
INFO - INFO - [ 6172.771236] __dev_kfree_skb_any+0x44/0x50
INFO - INFO - [ 6172.775354] mlxsw_pci_rdq_fini+0x8c/0xb0
INFO - INFO - [ 6172.779381] mlxsw_pci_queue_fini.isra.0+0x28/0x58
INFO - INFO - [ 6172.784193] mlxsw_pci_queue_group_fini+0x58/0x88
INFO - INFO - [ 6172.788918] mlxsw_pci_aqs_fini+0x2c/0x60
INFO - INFO - [ 6172.792945] mlxsw_pci_fini+0x34/0x50
INFO - INFO - [ 6172.796628] mlxsw_core_bus_device_unregister+0x104/0x1d0
INFO - INFO - [ 6172.802053] mlxsw_devlink_core_bus_device_reload_down+0x2c/0x48
INFO - INFO - [ 6172.808089] devlink_reload+0x44/0x158
INFO - INFO - [ 6172.811855] devlink_nl_cmd_reload+0x270/0x290
INFO - INFO - [ 6172.816321] genl_rcv_msg+0x188/0x2f0
INFO - INFO - [ 6172.819999] netlink_rcv_skb+0x5c/0x118
INFO - INFO - [ 6172.823852] genl_rcv+0x3c/0x50
INFO - INFO - [ 6172.827007] netlink_unicast+0x1bc/0x278
INFO - INFO - [ 6172.830947] netlink_sendmsg+0x194/0x390
INFO - INFO - [ 6172.834888] __sys_sendto+0xe0/0x158
INFO - INFO - [ 6172.838479] __arm64_sys_sendto+0x2c/0x38
INFO - INFO - [ 6172.842508] el0_svc_common.constprop.0+0x70/0x168
INFO - INFO - [ 6172.847320] do_el0_svc+0x28/0x88
INFO - INFO - [ 6172.850651] el0_sync_handler+0x88/0x190
INFO - INFO - [ 6172.854590] el0_sync+0x140/0x180
INFO - INFO - [ 6175.665776] mlxsw_spectrum 0000:01:00.0 swp25: renamed from eth0
INFO - INFO - [ 6175.770197] mlxsw_spectrum 0000:01:00.0 swp26: renamed from eth0
INFO - INFO - [ 6175.872749] mlxsw_spectrum 0000:01:00.0 swp27: renamed from eth0
INFO - INFO - [ 6175.975019] mlxsw_spectrum 0000:01:00.0 swp28: renamed from eth0
INFO - INFO - [ 6176.077475] mlxsw_spectrum 0000:01:00.0 swp29: renamed from eth0
INFO - INFO - [ 6176.180444] mlxsw_spectrum 0000:01:00.0 swp30: renamed from eth0
INFO - INFO - [ 6176.283180] mlxsw_spectrum 0000:01:00.0 swp31: renamed from eth0
INFO - INFO - [ 6176.386828] mlxsw_spectrum 0000:01:00.0 swp32: renamed from eth0
INFO - INFO - [ 6176.489967] mlxsw_spectrum 0000:01:00.0 swp33: renamed from eth0
INFO - INFO - [ 6176.593878] mlxsw_spectrum 0000:01:00.0 swp35: renamed from eth0
INFO - INFO - [ 6176.698094] mlxsw_spectrum 0000:01:00.0 swp34: renamed from eth0
INFO - INFO - [ 6176.799857] mlxsw_spectrum 0000:01:00.0 swp36: renamed from eth0
INFO - INFO - [ 6176.902758] mlxsw_spectrum 0000:01:00.0 swp37: renamed from eth0
INFO - INFO - [ 6177.006829] mlxsw_spectrum 0000:01:00.0 swp38: renamed from eth0
INFO - INFO - [ 6177.109630] mlxsw_spectrum 0000:01:00.0 swp39: renamed from eth0
INFO - INFO - [ 6177.234867] mlxsw_spectrum 0000:01:00.0 swp40: renamed from eth0
INFO - INFO - [ 6177.316322] mlxsw_spectrum 0000:01:00.0 swp41: renamed from eth0
INFO - INFO - [ 6177.439142] mlxsw_spectrum 0000:01:00.0 swp42: renamed from eth0
INFO - INFO - [ 6177.521544] mlxsw_spectrum 0000:01:00.0 swp43: renamed from eth0
INFO - INFO - [ 6177.625832] mlxsw_spectrum 0000:01:00.0 swp44: renamed from eth0
INFO - INFO - [ 6177.727156] mlxsw_spectrum 0000:01:00.0 swp45: renamed from eth0
INFO - INFO - [ 6177.829574] mlxsw_spectrum 0000:01:00.0 swp46: renamed from eth0
INFO - INFO - [ 6177.933455] mlxsw_spectrum 0000:01:00.0 swp47: renamed from eth0
INFO - INFO - [ 6178.035413] mlxsw_spectrum 0000:01:00.0 swp48: renamed from eth0
INFO - INFO - [ 6178.144632] mlxsw_spectrum 0000:01:00.0 swp21: renamed from eth0
INFO - INFO - [ 6178.249669] mlxsw_spectrum 0000:01:00.0 swp22: renamed from eth0
INFO - INFO - [ 6178.353460] mlxsw_spectrum 0000:01:00.0 swp23: renamed from eth0
INFO - INFO - [ 6178.457781] mlxsw_spectrum 0000:01:00.0 swp24: renamed from eth0
INFO - INFO - [ 6178.562154] mlxsw_spectrum 0000:01:00.0 swp17: renamed from eth0
INFO - INFO - [ 6178.665663] mlxsw_spectrum 0000:01:00.0 swp18: renamed from eth0
INFO - INFO - [ 6178.772139] mlxsw_spectrum 0000:01:00.0 swp19: renamed from eth0
INFO - INFO - [ 6178.879614] mlxsw_spectrum 0000:01:00.0 swp20: renamed from eth0
INFO - INFO - [ 6178.982828] mlxsw_spectrum 0000:01:00.0 swp13: renamed from eth0
INFO - INFO - [ 6179.085690] mlxsw_spectrum 0000:01:00.0 swp15: renamed from eth0
INFO - INFO - [ 6179.189485] mlxsw_spectrum 0000:01:00.0 swp14: renamed from eth0
INFO - INFO - [ 6179.292114] mlxsw_spectrum 0000:01:00.0 swp16: renamed from eth0
INFO - INFO - [ 6179.395264] mlxsw_spectrum 0000:01:00.0 swp9: renamed from eth0
INFO - INFO - [ 6179.498159] mlxsw_spectrum 0000:01:00.0 swp10: renamed from eth0
INFO - INFO - [ 6179.604794] mlxsw_spectrum 0000:01:00.0 swp11: renamed from eth0
INFO - INFO - [ 6179.711006] mlxsw_spectrum 0000:01:00.0 swp12: renamed from eth0
INFO - INFO - [ 6179.813543] mlxsw_spectrum 0000:01:00.0 swp5: renamed from eth0
INFO - INFO - [ 6179.920513] mlxsw_spectrum 0000:01:00.0 swp6: renamed from eth0
INFO - INFO - [ 6180.023838] mlxsw_spectrum 0000:01:00.0 swp7: renamed from eth0
INFO - INFO - [ 6180.127134] mlxsw_spectrum 0000:01:00.0 swp8: renamed from eth0
INFO - INFO - [ 6180.251486] mlxsw_spectrum 0000:01:00.0 swp1: renamed from eth0
INFO - INFO - [ 6180.359932] mlxsw_spectrum 0000:01:00.0 swp2: renamed from eth0
INFO - INFO - [ 6180.440319] mlxsw_spectrum 0000:01:00.0 swp3: renamed from eth0
INFO - INFO - [ 6180.543408] mlxsw_spectrum 0000:01:00.0 swp4: renamed from eth0
INFO - INFO - [ 6200.634647] br0: port 1(bond1) entered blocking state
INFO - INFO - [ 6200.639922] br0: port 1(bond1) entered disabled state
INFO - INFO - [ 6200.662393] 8021q: adding VLAN 0 to HW filter on device swp1
INFO - INFO - [ 6200.733040] br0: port 1(bond1) entered disabled state
INFO - INFO - [ 6200.841197] bond1 (unregistering): Released all slaves
INFO - INFO - [ 6200.888074] br0: port 1(swp1) entered blocking state
INFO - INFO - [ 6200.893202] br0: port 1(swp1) entered disabled state
INFO - INFO - [ 6201.008680] br0: port 1(swp1) entered disabled state
INFO - INFO - [ 6201.139904] br0: port 1(swp1) entered blocking state
INFO - INFO - [ 6201.145014] br0: port 1(swp1) entered disabled state
INFO - INFO - [ 6201.173772] device swp1 entered promiscuous mode
INFO - INFO - [ 6201.178905] 8021q: adding VLAN 0 to HW filter on device swp1
INFO - INFO - [ 6201.278574] device swp1 left promiscuous mode
INFO - INFO - [ 6201.283023] br0: port 1(swp1) entered disabled state
INFO - INFO - [ 6201.479992] 8021q: adding VLAN 0 to HW filter on device swp1
INFO - INFO - [ 6201.494508] bond1: (slave swp1): Enslaving as a backup interface with a down link
INFO - INFO - [ 6201.509371] 8021q: adding VLAN 0 to HW filter on device swp2
INFO - INFO - [ 6201.516021] bond1: (slave swp2): Enslaving as a backup interface with a down link
INFO - INFO - [ 6201.529866] 8021q: adding VLAN 0 to HW filter on device bond1
INFO - INFO - [ 6201.619726] bond1: (slave swp1): Releasing backup interface
INFO - INFO - [ 6201.625406] bond1: (slave swp1): the permanent HWaddr of slave - 00:00:00:00:00:35 - is still in use by bond - set the HWaddr of slave to a different address to avoid conflicts
INFO - INFO - [ 6201.715098] bond1: (slave swp2): Releasing backup interface
INFO - INFO - [ 6201.809443] bond1 (unregistering): Released all slaves
INFO - INFO - [ 6201.904064] br10: port 1(swp1.10) entered blocking state
INFO - INFO - [ 6201.909562] br10: port 1(swp1.10) entered disabled state
INFO - INFO - [ 6201.921589] device swp1.10 entered promiscuous mode
INFO - INFO - [ 6201.936444] br20: port 1(swp1.20) entered blocking state
INFO - INFO - [ 6201.941896] br20: port 1(swp1.20) entered disabled state
INFO - INFO - [ 6201.952591] device swp1.20 entered promiscuous mode
INFO - INFO - [ 6201.967313] br30: port 1(swp1.30) entered blocking state
INFO - INFO - [ 6201.972716] br30: port 1(swp1.30) entered disabled state
INFO - INFO - [ 6201.983284] device swp1.30 entered promiscuous mode
INFO - INFO - [ 6201.998103] device swp1.30 left promiscuous mode
INFO - INFO - [ 6202.003082] br30: port 1(swp1.30) entered disabled state
INFO - INFO - [ 6202.055911] device swp1.20 left promiscuous mode
INFO - INFO - [ 6202.060623] br20: port 1(swp1.20) entered disabled state
INFO - INFO - [ 6202.336736] device swp1.10 left promiscuous mode
INFO - INFO - [ 6202.341750] br10: port 1(swp1.10) entered disabled state
INFO - INFO - [ 6202.468587] 8021q: adding VLAN 0 to HW filter on device swp1
INFO - INFO - [ 6202.501150] 8021q: adding VLAN 0 to HW filter on device swp2
INFO - INFO - [ 6202.508985] bond1: (slave swp2): Enslaving as a backup interface with a down link
INFO - INFO - [ 6202.553788] br0: port 1(bond1) entered blocking state
INFO - INFO - [ 6202.559882] br0: port 1(bond1) entered disabled state
INFO - INFO - [ 6202.594047] br10: port 1(bond1.10) entered blocking state
INFO - INFO - [ 6202.599594] br10: port 1(bond1.10) entered disabled state
INFO - INFO - [ 6202.610592] device bond1.10 entered promiscuous mode
INFO - INFO - [ 6202.631473] br20: port 1(bond1.20) entered blocking state
INFO - INFO - [ 6202.637224] br20: port 1(bond1.20) entered disabled state
INFO - INFO - [ 6202.647824] device bond1.20 entered promiscuous mode
INFO - INFO - [ 6202.693852] bond1: (slave swp2): Releasing backup interface
INFO - INFO - [ 6202.858185] device bond1.20 left promiscuous mode
INFO - INFO - [ 6202.863385] br20: port 1(bond1.20) entered disabled state
INFO - INFO - [ 6202.956917] device bond1.10 left promiscuous mode
INFO - INFO - [ 6202.961782] br10: port 1(bond1.10) entered disabled state
INFO - INFO - [ 6203.092576] br0: port 1(bond1) entered disabled state
INFO - INFO - [ 6203.199478] bond1 (unregistering): Released all slaves
INFO - INFO - [ 6203.258622] 8021q: adding VLAN 0 to HW filter on device swp1
INFO - INFO - [ 6203.265483] bond1: (slave swp1): Enslaving as a backup interface with a down link
INFO - INFO - [ 6203.281181] 8021q: adding VLAN 0 to HW filter on device swp2
INFO - INFO - [ 6203.287769] bond1: (slave swp2): Enslaving as a backup interface with a down link
INFO - INFO - [ 6203.338410] br0: port 1(bond1) entered blocking state
INFO - INFO - [ 6203.343829] br0: port 1(bond1) entered disabled state
INFO - INFO - [ 6203.397418] br10: port 1(bond1.10) entered blocking state
INFO - INFO - [ 6203.402983] br10: port 1(bond1.10) entered disabled state
INFO - INFO - [ 6203.419897] device bond1.10 entered promiscuous mode
INFO - INFO - [ 6203.448569] br20: port 1(bond1.20) entered blocking state
INFO - INFO - [ 6203.454096] br20: port 1(bond1.20) entered disabled state
INFO - INFO - [ 6203.469421] device bond1.20 entered promiscuous mode
INFO - INFO - [ 6203.494735] br0: port 1(bond1) entered disabled state
INFO - INFO - [ 6203.574125] device bond1.10 left promiscuous mode
INFO - INFO - [ 6203.578945] br10: port 1(bond1.10) entered disabled state
INFO - INFO - [ 6203.605174] device bond1.20 left promiscuous mode
INFO - INFO - [ 6203.609978] br20: port 1(bond1.20) entered disabled state
INFO - INFO - [ 6203.642254] bond1 (unregistering): (slave swp1): Releasing backup interface
INFO - INFO - [ 6203.675478] bond1 (unregistering): (slave swp2): Releasing backup interface
INFO - INFO - [ 6203.697494] bond1 (unregistering): Released all slaves
INFO - INFO - [ 6204.020926] br0: port 1(swp1) entered blocking state
INFO - INFO - [ 6204.026057] br0: port 1(swp1) entered disabled state
INFO - INFO - [ 6204.244931] br-test: port 1(br0.10) entered blocking state
INFO - INFO - [ 6204.250567] br-test: port 1(br0.10) entered disabled state
INFO - INFO - [ 6204.332965] br0: port 1(swp1) entered disabled state
INFO - INFO - [ 6204.472668] br0: port 1(swp1) entered blocking state
INFO - INFO - [ 6204.477811] br0: port 1(swp1) entered disabled state
INFO - INFO - [ 6204.488821] device swp1 entered promiscuous mode
INFO - INFO - [ 6204.527144] device swp1 left promiscuous mode
INFO - INFO - [ 6204.531814] br0: port 1(swp1) entered disabled state
INFO - INFO - [ 6204.805762] 8021q: adding VLAN 0 to HW filter on device swp1
INFO - INFO - [ 6204.861500] 8021q: adding VLAN 0 to HW filter on device swp2
INFO - INFO - [ 6208.593096] mlxsw_spectrum 0000:01:00.0 swp1: link up
INFO - INFO - [ 6208.598222] mlxsw_spectrum 0000:01:00.0 swp2: link up
INFO - INFO - [ 6208.603392] IPv6: ADDRCONF(NETDEV_CHANGE): swp1: link becomes ready
INFO - INFO - [ 6208.610058] IPv6: ADDRCONF(NETDEV_CHANGE): swp2: link becomes ready
INFO - INFO - [ 6213.953909] mlxsw_spectrum 0000:01:00.0 swp2: link down
INFO - INFO - [ 6213.959196] mlxsw_spectrum 0000:01:00.0 swp1: link down
INFO - INFO - [ 6215.059212] 8021q: adding VLAN 0 to HW filter on device swp2
INFO - INFO - [ 6218.812883] mlxsw_spectrum 0000:01:00.0 swp2: link up
INFO - INFO - [ 6218.817993] mlxsw_spectrum 0000:01:00.0 swp1: link up
INFO - INFO - [ 6218.823277] IPv6: ADDRCONF(NETDEV_CHANGE): swp2: link becomes ready
INFO - INFO - [ 6224.148715] mlxsw_spectrum 0000:01:00.0 swp2: link down
INFO - INFO - [ 6224.154005] mlxsw_spectrum 0000:01:00.0 swp1: link down
INFO - INFO - [ 6232.090286] mlxsw_spectrum 0000:01:00.0 swp25: renamed from eth0
INFO - INFO - [ 6232.195669] mlxsw_spectrum 0000:01:00.0 swp26: renamed from eth0
INFO - INFO - [ 6232.300464] mlxsw_spectrum 0000:01:00.0 swp27: renamed from eth0
INFO - INFO - [ 6232.408084] mlxsw_spectrum 0000:01:00.0 swp28: renamed from eth0
INFO - INFO - [ 6232.512363] mlxsw_spectrum 0000:01:00.0 swp29: renamed from eth0
INFO - INFO - [ 6232.616596] mlxsw_spectrum 0000:01:00.0 swp30: renamed from eth0
INFO - INFO - [ 6232.720662] mlxsw_spectrum 0000:01:00.0 swp31: renamed from eth0
INFO - INFO - [ 6232.825167] mlxsw_spectrum 0000:01:00.0 swp32: renamed from eth0
INFO - INFO - [ 6232.927826] mlxsw_spectrum 0000:01:00.0 swp33: renamed from eth0
INFO - INFO - [ 6233.034553] mlxsw_spectrum 0000:01:00.0 swp35: renamed from eth0
INFO - INFO - [ 6233.141420] mlxsw_spectrum 0000:01:00.0 swp34: renamed from eth0
INFO - INFO - [ 6233.243569] mlxsw_spectrum 0000:01:00.0 swp36: renamed from eth0
INFO - INFO - [ 6233.345739] mlxsw_spectrum 0000:01:00.0 swp37: renamed from eth0
INFO - INFO - [ 6233.449272] mlxsw_spectrum 0000:01:00.0 swp38: renamed from eth0
INFO - INFO - [ 6233.551780] mlxsw_spectrum 0000:01:00.0 swp39: renamed from eth0
INFO - INFO - [ 6233.656039] mlxsw_spectrum 0000:01:00.0 swp40: renamed from eth0
INFO - INFO - [ 6233.761081] mlxsw_spectrum 0000:01:00.0 swp41: renamed from eth0
INFO - INFO - [ 6233.864532] mlxsw_spectrum 0000:01:00.0 swp42: renamed from eth0
INFO - INFO - [ 6233.966518] mlxsw_spectrum 0000:01:00.0 swp43: renamed from eth0
INFO - INFO - [ 6234.071924] mlxsw_spectrum 0000:01:00.0 swp44: renamed from eth0
INFO - INFO - [ 6234.174967] mlxsw_spectrum 0000:01:00.0 swp45: renamed from eth0
INFO - INFO - [ 6234.298341] mlxsw_spectrum 0000:01:00.0 swp46: renamed from eth0
INFO - INFO - [ 6234.385566] mlxsw_spectrum 0000:01:00.0 swp47: renamed from eth0
INFO - INFO - [ 6234.486052] mlxsw_spectrum 0000:01:00.0 swp48: renamed from eth0
INFO - INFO - [ 6234.594207] mlxsw_spectrum 0000:01:00.0 swp21: renamed from eth0
INFO - INFO - [ 6234.699732] mlxsw_spectrum 0000:01:00.0 swp22: renamed from eth0
INFO - INFO - [ 6234.804463] mlxsw_spectrum 0000:01:00.0 swp23: renamed from eth0
INFO - INFO - [ 6234.913160] mlxsw_spectrum 0000:01:00.0 swp24: renamed from eth0
INFO - INFO - [ 6235.017244] mlxsw_spectrum 0000:01:00.0 swp17: renamed from eth0
INFO - INFO - [ 6235.121074] mlxsw_spectrum 0000:01:00.0 swp18: renamed from eth0
INFO - INFO - [ 6235.226564] mlxsw_spectrum 0000:01:00.0 swp19: renamed from eth0
INFO - INFO - [ 6235.329627] mlxsw_spectrum 0000:01:00.0 swp20: renamed from eth0
INFO - INFO - [ 6235.432571] mlxsw_spectrum 0000:01:00.0 swp13: renamed from eth0
INFO - INFO - [ 6235.540050] mlxsw_spectrum 0000:01:00.0 swp15: renamed from eth0
INFO - INFO - [ 6235.643194] mlxsw_spectrum 0000:01:00.0 swp14: renamed from eth0
INFO - INFO - [ 6235.746801] mlxsw_spectrum 0000:01:00.0 swp16: renamed from eth0
INFO - INFO - [ 6235.850802] mlxsw_spectrum 0000:01:00.0 swp9: renamed from eth0
INFO - INFO - [ 6235.953727] mlxsw_spectrum 0000:01:00.0 swp10: renamed from eth0
INFO - INFO - [ 6236.057093] mlxsw_spectrum 0000:01:00.0 swp11: renamed from eth0
INFO - INFO - [ 6236.159523] mlxsw_spectrum 0000:01:00.0 swp12: renamed from eth0
INFO - INFO - [ 6236.263504] mlxsw_spectrum 0000:01:00.0 swp5: renamed from eth0
INFO - INFO - [ 6236.365916] mlxsw_spectrum 0000:01:00.0 swp6: renamed from eth0
INFO - INFO - [ 6236.468033] mlxsw_spectrum 0000:01:00.0 swp7: renamed from eth0
INFO - INFO - [ 6236.571928] mlxsw_spectrum 0000:01:00.0 swp8: renamed from eth0
INFO - INFO - [ 6236.676152] mlxsw_spectrum 0000:01:00.0 swp1: renamed from eth0
INFO - INFO - [ 6236.777925] mlxsw_spectrum 0000:01:00.0 swp2: renamed from eth0
INFO - INFO - [ 6236.879845] mlxsw_spectrum 0000:01:00.0 swp3: renamed from eth0
INFO - INFO - [ 6236.983362] mlxsw_spectrum 0000:01:00.0 swp4: renamed from eth0
INFO - INFO - ----------------------------- End Dmesg --------------------------

@sonoble
Copy link
Contributor

sonoble commented Oct 7, 2020

Hello,

Please provide any information on how to reproduce this issue and confirm that it still exists in the upstream of the 5.8 kernel.

@daniellerts
Copy link

Hi,

The logs that Moni provided are the symptoms and you can see the root cause below:

[ 24.743678] ==================================================================
[ 24.750967] BUG: KASAN: slab-out-of-bounds in __build_skb_around+0xc4/0x100
[ 24.757998] Write of size 32 at addr ffff00013f305780 by task swapper/0/0
[ 24.764829]
[ 24.766337] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.12.0-rc7-custom-18677-g9e6ac00656eb #335
[ 24.775177] Hardware name: Marvell Armada 7040 TX4810 (DT)
[ 24.780695] Call trace:
[ 24.783158] dump_backtrace+0x0/0x2e0
[ 24.786854] show_stack+0x1c/0x28
[ 24.790195] dump_stack+0xf8/0x160
[ 24.793636] print_address_description.constprop.0+0x68/0x2e4
[ 24.799425] kasan_report+0x1d8/0x240
[ 24.803120] kasan_check_range+0xfc/0x1b0
[ 24.807163] memset+0x44/0x80
[ 24.810158] __build_skb_around+0xc4/0x100
[ 24.814289] __build_skb+0x54/0x68
[ 24.817717] build_skb+0x24/0xc0
[ 24.820970] mvpp2_rx+0x9b0/0xf10
[ 24.824319] mvpp2_poll+0x1a0/0x308
[ 24.827835] __napi_poll+0x60/0x2e0
[ 24.831356] net_rx_action+0x1f0/0x430
[ 24.835133] __do_softirq+0x1a4/0x62c
[ 24.838824] irq_exit+0x144/0x168
[ 24.842173] __handle_domain_irq+0x88/0xe8
[ 24.846305] gic_handle_irq+0xdc/0x118
[ 24.850081] el1_irq+0xb0/0x180
[ 24.853248] arch_local_irq_enable+0x8/0x10
[ 24.857466] default_idle_call+0x50/0x270
[ 24.861513] do_idle+0x2c0/0x310
[ 24.864768] cpu_startup_entry+0x2c/0x70
[ 24.868718] rest_init+0x100/0x114
[ 24.872147] arch_call_rest_init+0x14/0x1c
[ 24.876285] start_kernel+0x318/0x348
[ 24.879978] 0x0
[ 24.881837]
[ 24.883364] Allocated by task 40:
[ 24.883378] kasan_save_stack+0x24/0x50
[ 24.883399] __kasan_kmalloc+0x84/0xa8
[ 24.883412] __kmalloc_node_track_caller+0x60/0x98
[ 24.883425] kmalloc_reserve+0x80/0x150
[ 24.883442] __alloc_skb+0xe0/0x238
[ 24.883452] __netdev_alloc_skb+0x50/0x200
[ 24.883463] mlxsw_pci_rdq_skb_alloc.isra.0+0x40/0xd8
[ 24.883479] mlxsw_pci_rdq_init+0x2fc/0x478
[ 24.883491] mlxsw_pci_queue_group_init+0x2ac/0x448
[ 24.883504] mlxsw_pci_init+0x121c/0x15a8
[ 24.883514] __mlxsw_core_bus_device_register+0x1ec/0xb70
[ 24.883530] mlxsw_core_bus_device_register+0x60/0x88
[ 24.883543] mlxsw_pci_probe+0x2c8/0x3e0
[ 24.883554] local_pci_probe+0x7c/0xf8
[ 24.883572] pci_device_probe+0x1f4/0x2c8
[ 24.883585] really_probe+0x150/0x6f0
[ 24.883598] driver_probe_device+0x14c/0x1d8
[ 24.883610] __device_attach_driver+0xd8/0x148
[ 24.883621] bus_for_each_drv+0xf4/0x158
[ 24.883637] __device_attach+0x180/0x218
[ 24.883647] device_attach+0x18/0x20
[ 24.883657] pci_bus_add_device+0x5c/0xc8
[ 24.883672] pci_bus_add_devices+0x50/0xc0
[ 24.883684] pci_bus_add_devices+0x9c/0xc0
[ 24.883697] pci_host_probe+0x54/0xf8
[ 24.883709] dw_pcie_host_init+0x2e0/0x6a0
[ 24.883721] armada8k_pcie_probe+0x35c/0x488
[ 24.883734] platform_probe+0x90/0x108
[ 24.883748] really_probe+0x150/0x6f0
[ 24.883757] driver_probe_device+0x14c/0x1d8
[ 24.883768] __device_attach_driver+0xd8/0x148
[ 24.883778] bus_for_each_drv+0xf4/0x158
[ 24.883792] __device_attach+0x180/0x218
[ 24.883802] device_initial_probe+0x18/0x20
[ 24.883813] bus_probe_device+0xe8/0xf8
[ 24.883826] deferred_probe_work_func+0x108/0x160
[ 24.883842] process_one_work+0x3e4/0x878
[ 24.883856] worker_thread+0x3d8/0x670
[ 24.883867] kthread+0x214/0x220
[ 24.883881] ret_from_fork+0x10/0x18
[ 24.883892]
[ 24.883897] The buggy address belongs to the object at ffff00013f300000
[ 24.883897] which belongs to the cache kmalloc-16k of size 16384
[ 24.883908] The buggy address is located 6016 bytes to the right of
[ 24.883908] 16384-byte region [ffff00013f300000, ffff00013f304000)
[ 24.883921] The buggy address belongs to the page:
[ 24.883929] page:0000000017d86bae refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x13f300
[ 24.883944] head:0000000017d86bae order:3 compound_mapcount:0 compound_pincount:0
[ 24.883954] flags: 0x2ffff00000010200(slab|head)
[ 24.883975] raw: 2ffff00000010200 fffffc0004fcbe08 fffffc0004fcc208 ffff000100000700
[ 24.883988] raw: 0000000000000000 ffff00013f300000 0000000100000001 0000000000000000
[ 24.883995] page dumped because: kasan: bad access detected
[ 24.884001]
[ 24.884005] Memory state around the buggy address:
[ 24.884013] ffff00013f305680: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 24.884023] ffff00013f305700: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 24.884032] >ffff00013f305780: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 24.884038] ^
[ 24.884046] ffff00013f305800: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 24.884055] ffff00013f305880: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 24.884062] ==================================================================

The problem is reproduced by installing our updated kernel(5.12 for example) on top of the 5.10 DENTOS using kexec.
Last time we had the problem for a few months and it resolved after a while without any action from our side.

Is it a problem with the code, or maybe it is some patches that are missing in the kernel we burn on top on the 5.10?

Thanks.

@sonoble
Copy link
Contributor

sonoble commented May 26, 2021

Can we get direct instructions on how to reproduce i.e. step 1 - x.

@moni-levy
Copy link
Author

moni-levy commented May 30, 2021 via email

@daniellerts
Copy link

daniellerts commented Jun 1, 2021

  1. On a builder, compile our kernel:
  • $ git clone git://github.com/jpirko/linux_mlxsw.git -b combined_queue
  • $ cd ~/linux_mlxsw && make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- olddefconfig && make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- mod2yesconfig && make -jnproc ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu-
  1. On the switch, enter the same directory you have compiled the kernel and run the below with the suitable kernel cmdline:
  • $ cd ~/linux_mlxsw && kexec -l arch/arm64/boot/Image --append="onl_platform=arm64-delta-tx4810-r0 root=/dev/sda4 rw pci=pcie_bus_safe =no" && kexec -e &
  1. Connect the switch again and run:
  • $ dmesg | grep BUG

Output example:
[ 17.993676] BUG: Bad page state in process (agetty) pfn:101d20

Note: It isn't reproduced every time, but most of it.

@moni-levy
Copy link
Author

@sonoble can we raise this again and assign an owner that can help debug? We continue seeing it in our regression runs.

@daniellerts
Copy link

Hi,

We found a fix for the error such as above that point to mlxsw driver.
However, we keep getting a lot of other BUGs in dmesg such as below:

[ 26.301368] BUG: Bad page state in process swapper/0 pfn:104300
[ 26.307505] page:00000000ce81ba6f refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x104300
[ 26.317045] head:00000000ce81ba6f order:8 compound_mapcount:0 compound_pincount:0
[ 26.324622] flags: 0x2ffff00000010200(slab|head|node=0|zone=2|lastcpupid=0xffff)
[ 26.332119] raw: 2ffff00000010200 fffffc0004108008 fffffc0004110008 ffff000100000c00
[ 26.339953] raw: 0000000000000000 ffff000104300000 0000000000000001 0000000000000000
[ 26.347777] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
[ 26.354288] Modules linked in:
[ 26.357402] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.15.0-75757-g09fce596f9cb #468
[ 26.365298] Hardware name: Marvell Armada 7040 TX4810 (DT)
[ 26.370825] Call trace:
[ 26.373292] dump_backtrace+0x0/0x2ec
[ 26.376998] show_stack+0x24/0x30
[ 26.380346] dump_stack_lvl+0x68/0x84
[ 26.384047] dump_stack+0x1c/0x38
[ 26.387394] bad_page+0x12c/0x170
[ 26.390746] __free_pages_ok+0x63c/0x7b0
[ 26.394705] page_frag_free+0xcc/0xe0
[ 26.398402] skb_free_head+0x44/0xa0
[ 26.402017] skb_release_data+0x1d0/0x274
[ 26.406065] kfree_skb.part.0+0x6c/0x100
[ 26.410026] kfree_skb+0x54/0xc0
[ 26.413288] icmpv6_rcv+0x140/0x984
[ 26.416817] ip6_protocol_deliver_rcu+0x198/0x894
[ 26.421566] ip6_input+0x140/0x150
[ 26.425000] ip6_mc_input+0x228/0x550
[ 26.428696] ip6_sublist_rcv_finish+0x9c/0xd0
[ 26.433091] ip6_sublist_rcv+0x344/0x440
[ 26.437049] ipv6_list_rcv+0x1c0/0x220
[ 26.440833] __netif_receive_skb_list_core+0x2b0/0x3b0
[ 26.446020] netif_receive_skb_list_internal+0x29c/0x474
[ 26.451378] napi_complete_done+0xc4/0x2c0
[ 26.455513] mvpp2_poll+0x20c/0x310
[ 26.459040] __napi_poll+0x64/0x280
[ 26.462561] net_rx_action+0x4c4/0x550
[ 26.466345] __do_softirq+0x1a0/0x544
[ 26.470043] __irq_exit_rcu+0x164/0x184
[ 26.473920] irq_exit_rcu+0x1c/0x30
[ 26.477445] el1_interrupt+0x38/0x54
[ 26.481057] el1h_64_irq_handler+0x18/0x24
[ 26.485191] el1h_64_irq+0x78/0x7c
[ 26.488625] arch_local_irq_enable+0xc/0x20
[ 26.492851] default_idle_call+0x58/0x1c0
[ 26.496897] do_idle+0x2f8/0x380
[ 26.500161] cpu_startup_entry+0x34/0x8c
[ 26.504121] rest_init+0xf8/0x110
[ 26.507470] arch_call_rest_init+0x1c/0x28
[ 26.511613] start_kernel+0x3a4/0x3dc
[ 26.515313] __primary_switched+0xc4/0xcc

Each time a different process fails.
Can you help with that please?
Thanks.

@paulmenzel
Copy link
Contributor

Please mark up the comments with Markdown to make it better readable.

We found a fix for the error such as above that point to mlxsw driver.

Please share the solution/fix.

However, we keep getting a lot of other BUGs in dmesg such as below:

[ 26.301368] BUG: Bad page state in process swapper/0 pfn:104300
[ 26.307505] page:00000000ce81ba6f refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x104300
[ 26.317045] head:00000000ce81ba6f order:8 compound_mapcount:0 compound_pincount:0
[…]

Although the issues has a similar title, please create a separate issue for this problems, and give all the details again, and how to reproduce this.

@daniellerts
Copy link

daniellerts commented Dec 6, 2021

The fix that fixed the mlxsw issue is:

diff --git a/drivers/net/ethernet/mellanox/mlxsw/pci.c b/drivers/net/ethernet/mellanox/mlxsw/pci.c
index a15c95a10bae..cd3331a077bb 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/pci.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/pci.c
@@ -1973,6 +1973,7 @@ int mlxsw_pci_driver_register(struct pci_driver *pci_driver)
 {
        pci_driver->probe = mlxsw_pci_probe;
        pci_driver->remove = mlxsw_pci_remove;
+       pci_driver->shutdown = mlxsw_pci_remove;
        return pci_register_driver(pci_driver);
 }
 EXPORT_SYMBOL(mlxsw_pci_driver_register);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants