Skip to content
Permalink
Ralf-Baechle/a…
Switch branches/tags

Commits on Oct 12, 2021

  1. ax25: Fix deadlock hang during concurrent read and write on socket.

    Before this patch, this hangs, because the read(2) blocks the
    write(2).
    
    Before:
    strace -f -eread,write ./examples/client_lockcheck M0THC-9 M0THC-0 M0THC-2
    strace: Process 3888 attached
    [pid  3888] read(3,  <unfinished ...>
    [pid  3887] write(3, "hello world", 11
    [hang]
    
    After:
    strace -f -eread,write ./examples/client_lockcheck M0THC-9 M0THC-0 M0THC-2
    strace: Process 2433 attached
    [pid  2433] read(3,  <unfinished ...>
    [pid  2432] write(3, "hello world", 11) = 11
    [pid  2433] <... read resumed> "yo", 1000) = 2
    [pid  2433] write(1, "yo\n", 3yo
    )         = 3
    [successful exit]
    
    Signed-off-by: Thomas Habets <thomas@habets.se>
    Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
    ThomasHabets authored and intel-lab-lkp committed Oct 12, 2021
  2. ax25: Fix use of copy_from_sockptr() in ax25_setsockopt()

    The destination pointer passed to copy_from_sockptr() is an unsigned long *
    but the source in userspace is an unsigned int.
    
    This happens to work on 32 bit but breaks 64-bit where bytes 4..7 will not
    be initialized.  By luck it may work on little endian but on big endian
    where the userspace data is copied to the upper 32 bit of the destination
    it's most likely going to break.
    
    Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Fixes: a7b75c5 ("net: pass a sockptr_t into ->setsockopt")
    ralfbaechle authored and intel-lab-lkp committed Oct 12, 2021
  3. ice: fix locking for Tx timestamp tracking flush

    Commit 4dd0d5c ("ice: add lock around Tx timestamp tracker flush")
    added a lock around the Tx timestamp tracker flow which is used to
    cleanup any left over SKBs and prepare for device removal.
    
    This lock is problematic because it is being held around a call to
    ice_clear_phy_tstamp. The clear function takes a mutex to send a PHY
    write command to firmware. This could lead to a deadlock if the mutex
    actually sleeps, and causes the following warning on a kernel with
    preemption debugging enabled:
    
    [  715.419426] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:573
    [  715.427900] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 3100, name: rmmod
    [  715.435652] INFO: lockdep is turned off.
    [  715.439591] Preemption disabled at:
    [  715.439594] [<0000000000000000>] 0x0
    [  715.446678] CPU: 52 PID: 3100 Comm: rmmod Tainted: G        W  OE     5.15.0-rc4+ torvalds#42 bdd7ec3018e725f159ca0d372ce8c2c0e784891c
    [  715.458058] Hardware name: Intel Corporation S2600STQ/S2600STQ, BIOS SE5C620.86B.02.01.0010.010620200716 01/06/2020
    [  715.468483] Call Trace:
    [  715.470940]  dump_stack_lvl+0x6a/0x9a
    [  715.474613]  ___might_sleep.cold+0x224/0x26a
    [  715.478895]  __mutex_lock+0xb3/0x1440
    [  715.482569]  ? stack_depot_save+0x378/0x500
    [  715.486763]  ? ice_sq_send_cmd+0x78/0x14c0 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
    [  715.494979]  ? kfree+0xc1/0x520
    [  715.498128]  ? mutex_lock_io_nested+0x12a0/0x12a0
    [  715.502837]  ? kasan_set_free_info+0x20/0x30
    [  715.507110]  ? __kasan_slab_free+0x10b/0x140
    [  715.511385]  ? slab_free_freelist_hook+0xc7/0x220
    [  715.516092]  ? kfree+0xc1/0x520
    [  715.519235]  ? ice_deinit_lag+0x16c/0x220 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
    [  715.527359]  ? ice_remove+0x1cf/0x6a0 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
    [  715.535133]  ? pci_device_remove+0xab/0x1d0
    [  715.539318]  ? __device_release_driver+0x35b/0x690
    [  715.544110]  ? driver_detach+0x214/0x2f0
    [  715.548035]  ? bus_remove_driver+0x11d/0x2f0
    [  715.552309]  ? pci_unregister_driver+0x26/0x250
    [  715.556840]  ? ice_module_exit+0xc/0x2f [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
    [  715.564799]  ? __do_sys_delete_module.constprop.0+0x2d8/0x4e0
    [  715.570554]  ? do_syscall_64+0x3b/0x90
    [  715.574303]  ? entry_SYSCALL_64_after_hwframe+0x44/0xae
    [  715.579529]  ? start_flush_work+0x542/0x8f0
    [  715.583719]  ? ice_sq_send_cmd+0x78/0x14c0 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
    [  715.591923]  ice_sq_send_cmd+0x78/0x14c0 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
    [  715.599960]  ? wait_for_completion_io+0x250/0x250
    [  715.604662]  ? lock_acquire+0x196/0x200
    [  715.608504]  ? do_raw_spin_trylock+0xa5/0x160
    [  715.612864]  ice_sbq_rw_reg+0x1e6/0x2f0 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
    [  715.620813]  ? ice_reset+0x130/0x130 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
    [  715.628497]  ? __debug_check_no_obj_freed+0x1e8/0x3c0
    [  715.633550]  ? trace_hardirqs_on+0x1c/0x130
    [  715.637748]  ice_write_phy_reg_e810+0x70/0xf0 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
    [  715.646220]  ? do_raw_spin_trylock+0xa5/0x160
    [  715.650581]  ? ice_ptp_release+0x910/0x910 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
    [  715.658797]  ? ice_ptp_release+0x255/0x910 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
    [  715.667013]  ice_clear_phy_tstamp+0x2c/0x110 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
    [  715.675403]  ice_ptp_release+0x408/0x910 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
    [  715.683440]  ice_remove+0x560/0x6a0 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
    [  715.691037]  ? _raw_spin_unlock_irqrestore+0x46/0x73
    [  715.696005]  pci_device_remove+0xab/0x1d0
    [  715.700018]  __device_release_driver+0x35b/0x690
    [  715.704637]  driver_detach+0x214/0x2f0
    [  715.708389]  bus_remove_driver+0x11d/0x2f0
    [  715.712489]  pci_unregister_driver+0x26/0x250
    [  715.716857]  ice_module_exit+0xc/0x2f [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
    [  715.724637]  __do_sys_delete_module.constprop.0+0x2d8/0x4e0
    [  715.730210]  ? free_module+0x6d0/0x6d0
    [  715.733963]  ? task_work_run+0xe1/0x170
    [  715.737803]  ? exit_to_user_mode_loop+0x17f/0x1d0
    [  715.742509]  ? rcu_read_lock_sched_held+0x12/0x80
    [  715.747215]  ? trace_hardirqs_on+0x1c/0x130
    [  715.751401]  do_syscall_64+0x3b/0x90
    [  715.754981]  entry_SYSCALL_64_after_hwframe+0x44/0xae
    [  715.760033] RIP: 0033:0x7f4dfe59000b
    [  715.763612] Code: 73 01 c3 48 8b 0d 6d 1e 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 3d 1e 0c 00 f7 d8 64 89 01 48
    [  715.782357] RSP: 002b:00007ffe8c891708 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
    [  715.789923] RAX: ffffffffffffffda RBX: 00005558a20468b0 RCX: 00007f4dfe59000b
    [  715.797054] RDX: 000000000000000a RSI: 0000000000000800 RDI: 00005558a2046918
    [  715.804189] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
    [  715.811319] R10: 00007f4dfe603ac0 R11: 0000000000000206 R12: 00007ffe8c891940
    [  715.818455] R13: 00007ffe8c8920a3 R14: 00005558a20462a0 R15: 00005558a20468b0
    
    Notice that this is the only case where we use the lock in this way. In
    the cleanup kthread and work kthread the lock is only taken around the
    bit accesses. This was done intentionally to avoid this kind of issue.
    The way the lock is used, we only protect ordering of bit sets vs bit
    clears. The Tx writers in the hot path don't need to be protected
    against the entire kthread loop. The Tx queues threads only need to
    ensure that they do not re-use an index that is currently in use. The
    cleanup loop does not need to block all new set bits, since it will
    re-queue itself if new timestamps are present.
    
    Fix the tracker flow so that it uses the same flow as the standard
    cleanup thread. In addition, ensure the in_use bitmap actually gets
    cleared properly.
    
    This fixes the warning and also avoids the potential deadlock that might
    have occurred otherwise.
    
    Fixes: 4dd0d5c ("ice: add lock around Tx timestamp tracker flush")
    Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
    Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    jacob-keller authored and davem330 committed Oct 12, 2021
  4. Merge branch 'ioam-fixes'

    Justin Iurman says:
    
    ====================
    Correct the IOAM behavior for undefined trace type bits
    
    (@jakub @david: there will be a conflict for #2 when merging net->net-next, due
    to commit [1]. The conflict is only 5-10 lines for #2 (#1 should be fine) inside
    the file tools/testing/selftests/net/ioam6.sh, so quite short though possibly
    ugly. Sorry for that, I didn't expect to post this one... Had I known, I'd have
    made the opposite.)
    
    Modify both the input and output behaviors regarding the trace type when one of
    the undefined bits is set. The goal is to keep the interoperability when new
    fields (aka new bits inside the range 12-21) will be defined.
    
    The draft [2] says the following:
    ---------------------------------------------------------------
    "Bit 12-21  Undefined.  These values are available for future
           assignment in the IOAM Trace-Type Registry (Section 8.2).
           Every future node data field corresponding to one of
           these bits MUST be 4-octets long.  An IOAM encapsulating
           node MUST set the value of each undefined bit to 0.  If
           an IOAM transit node receives a packet with one or more
           of these bits set to 1, it MUST either:
    
           1.  Add corresponding node data filled with the reserved
               value 0xFFFFFFFF, after the node data fields for the
               IOAM-Trace-Type bits defined above, such that the
               total node data added by this node in units of
               4-octets is equal to NodeLen, or
    
           2.  Not add any node data fields to the packet, even for
               the IOAM-Trace-Type bits defined above."
    ---------------------------------------------------------------
    
    The output behavior has been modified to respect the fact that "an IOAM encap
    node MUST set the value of each undefined bit to 0" (i.e., undefined bits can't
    be set anymore).
    
    As for the input behavior, current implementation is based on the second choice
    (i.e., "not add any data fields to the packet [...]"). With this solution, any
    interoperability is lost (i.e., if a new bit is defined, then an "old" kernel
    implementation wouldn't fill IOAM data when such new bit is set inside the trace
    type).
    
    The input behavior is therefore relaxed and these undefined bits are now allowed
    to be set. It is only possible thanks to the sentence "every future node data
    field corresponding to one of these bits MUST be 4-octets long". Indeed, the
    default empty value (the one for 4-octet fields) is inserted whenever an
    undefined bit is set.
    
      [1] cfbe9b0
      [2] https://datatracker.ietf.org/doc/html/draft-ietf-ippm-ioam-data#section-5.4.1
    ====================
    
    Signed-off-by: David S. Miller <davem@davemloft.net>
    davem330 committed Oct 12, 2021
  5. selftests: net: modify IOAM tests for undef bits

    The output behavior for undefined bits is now directly tested inside the bash
    script. Trying to set an undefined bit should be refused.
    
    The input behavior for undefined bits has been removed due to the fact that we
    would need another sender allowed to set undefined bits.
    
    Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    IurmanJ authored and davem330 committed Oct 12, 2021
  6. ipv6: ioam: move the check for undefined bits

    The check for undefined bits in the trace type is moved from the input side to
    the output side, while the input side is relaxed and now inserts default empty
    values when an undefined bit is set.
    
    Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    IurmanJ authored and davem330 committed Oct 12, 2021
  7. net: dsa: microchip: Added the condition for scheduling ksz_mib_read_…

    …work
    
    When the ksz module is installed and removed using rmmod, kernel crashes
    with null pointer dereferrence error. During rmmod, ksz_switch_remove
    function tries to cancel the mib_read_workqueue using
    cancel_delayed_work_sync routine and unregister switch from dsa.
    
    During dsa_unregister_switch it calls ksz_mac_link_down, which in turn
    reschedules the workqueue since mib_interval is non-zero.
    Due to which queue executed after mib_interval and it tries to access
    dp->slave. But the slave is unregistered in the ksz_switch_remove
    function. Hence kernel crashes.
    
    To avoid this crash, before canceling the workqueue, resetted the
    mib_interval to 0.
    
    v1 -> v2:
    -Removed the if condition in ksz_mib_read_work
    
    Fixes: 469b390 ("net: dsa: microchip: use delayed_work instead of timer + work")
    Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Arun Ramadoss authored and davem330 committed Oct 12, 2021
  8. r8152: select CRC32 and CRYPTO/CRYPTO_HASH/CRYPTO_SHA256

    Fix the following build/link errors by adding a dependency on
    CRYPTO, CRYPTO_HASH, CRYPTO_SHA256 and CRC32:
    
      ld: drivers/net/usb/r8152.o: in function `rtl8152_fw_verify_checksum':
      r8152.c:(.text+0x2b2a): undefined reference to `crypto_alloc_shash'
      ld: r8152.c:(.text+0x2bed): undefined reference to `crypto_shash_digest'
      ld: r8152.c:(.text+0x2c50): undefined reference to `crypto_destroy_tfm'
      ld: drivers/net/usb/r8152.o: in function `_rtl8152_set_rx_mode':
      r8152.c:(.text+0xdcb0): undefined reference to `crc32_le'
    
    Fixes: 9370f2d ("r8152: support request_firmware for RTL8153")
    Fixes: ac718b6 ("net/usb: new driver for RTL8152")
    Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    vegard authored and davem330 committed Oct 12, 2021
  9. net: dsa: mv88e6xxx: don't use PHY_DETECT on internal PHY's

    mv88e6xxx_port_ppu_updates() interpretes data in the PORT_STS
    register incorrectly for internal ports (ie no PPU). In these
    cases, the PHY_DETECT bit indicates link status. This results
    in forcing the MAC state whenever the PHY link goes down which
    is not intended. As a side effect, LED's configured to show
    link status stay lit even though the physical link is down.
    
    Add a check in mac_link_down and mac_link_up to see if it
    concerns an external port and only then, look at PPU status.
    
    Fixes: 5d5b231 (net: dsa: mv88e6xxx: use PHY_DETECT in mac_link_up/mac_link_down)
    Reported-by: Maarten Zanders <m.zanders@televic.com>
    Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
    Signed-off-by: Maarten Zanders <maarten.zanders@mind.be>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    mzanders authored and davem330 committed Oct 12, 2021
  10. net: mscc: ocelot: Fix dumplicated argument in ocelot

    Fix the following coccicheck warning:
    drivers/net/ethernet/mscc/ocelot.c:474:duplicated argument to & or |
    drivers/net/ethernet/mscc/ocelot.c:476:duplicated argument to & or |
    drivers/net/ethernet/mscc/ocelot_net.c:1627:duplicated argument
    to & or |
    
    These DEV_CLOCK_CFG_MAC_TX_RST are duplicate here.
    Here should be DEV_CLOCK_CFG_MAC_RX_RST.
    
    Fixes: e6e12df ("net: mscc: ocelot: convert to phylink")
    Signed-off-by: Wan Jiabing <wanjiabing@vivo.com>
    Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Wan Jiabing authored and davem330 committed Oct 12, 2021
  11. af_unix: Rename UNIX-DGRAM to UNIX to maintain backwards compatability

    Then name of this protocol changed in commit 94531cf ("af_unix: Add
    unix_stream_proto for sockmap") because that commit added stream support
    to the af_unix protocol. Renaming the existing protocol makes a ChromeOS
    protocol test[1] fail now that the name has changed in
    /proc/net/protocols from "UNIX" to "UNIX-DGRAM".
    
    Let's put the name back to how it was while keeping the stream protocol
    as "UNIX-STREAM" so that the procfs interface doesn't change. This fixes
    the test and maintains backwards compatibility in proc.
    
    Cc: Jiang Wang <jiang.wang@bytedance.com>
    Cc: Andrii Nakryiko <andrii@kernel.org>
    Cc: Cong Wang <cong.wang@bytedance.com>
    Cc: Jakub Sitnicki <jakub@cloudflare.com>
    Cc: John Fastabend <john.fastabend@gmail.com>
    Cc: Dmitry Osipenko <digetx@gmail.com>
    Link: https://source.chromium.org/chromiumos/chromiumos/codesearch/+/main:src/platform/tast-tests/src/chromiumos/tast/local/bundles/cros/network/supported_protocols.go;l=50;drc=e8b1c3f94cb40a054f4aa1ef1aff61e75dc38f18 [1]
    Fixes: 94531cf ("af_unix: Add unix_stream_proto for sockmap")
    Signed-off-by: Stephen Boyd <swboyd@chromium.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    bebarino authored and davem330 committed Oct 12, 2021

Commits on Oct 9, 2021

  1. virtio-net: fix for skb_over_panic inside big mode

    commit 1262856 ("Merge ra.kernel.org:/pub/scm/linux/kernel/git/netdev/net")
    accidentally reverted the effect of
    commit 1a80242 ("virtio-net: fix for skb_over_panic inside big mode")
    on drivers/net/virtio_net.c
    
    As a result, users of crosvm (which is using large packet mode)
    are experiencing crashes with 5.14-rc1 and above that do not
    occur with 5.13.
    
    Crash trace:
    
    [   61.346677] skbuff: skb_over_panic: text:ffffffff881ae2c7 len:3762 put:3762 head:ffff8a5ec8c22000 data:ffff8a5ec8c22010 tail:0xec2 end:0xec0 dev:<NULL>
    [   61.369192] kernel BUG at net/core/skbuff.c:111!
    [   61.372840] invalid opcode: 0000 [#1] SMP PTI
    [   61.374892] CPU: 5 PID: 0 Comm: swapper/5 Not tainted 5.14.0-rc1 linux-v5.14-rc1-for-mesa-ci.tar.bz2 #1
    [   61.376450] Hardware name: ChromiumOS crosvm, BIOS 0
    
    ..
    
    [   61.393635] Call Trace:
    [   61.394127]  <IRQ>
    [   61.394488]  skb_put.cold+0x10/0x10
    [   61.395095]  page_to_skb+0xf7/0x410
    [   61.395689]  receive_buf+0x81/0x1660
    [   61.396228]  ? netif_receive_skb_list_internal+0x1ad/0x2b0
    [   61.397180]  ? napi_gro_flush+0x97/0xe0
    [   61.397896]  ? detach_buf_split+0x67/0x120
    [   61.398573]  virtnet_poll+0x2cf/0x420
    [   61.399197]  __napi_poll+0x25/0x150
    [   61.399764]  net_rx_action+0x22f/0x280
    [   61.400394]  __do_softirq+0xba/0x257
    [   61.401012]  irq_exit_rcu+0x8e/0xb0
    [   61.401618]  common_interrupt+0x7b/0xa0
    [   61.402270]  </IRQ>
    
    See
    https://lore.kernel.org/r/5edaa2b7c2fe4abd0347b8454b2ac032b6694e2c.camel%40collabora.com
    for the report.
    
    Apply the original 1a80242 ("virtio-net: fix for skb_over_panic inside big mode")
    again, the original logic still holds:
    
    In virtio-net's large packet mode, there is a hole in the space behind
    buf.
    
        hdr_padded_len - hdr_len
    
    We must take this into account when calculating tailroom.
    
    Cc: Greg KH <gregkh@linuxfoundation.org>
    Fixes: fb32856 ("virtio-net: page_to_skb() use build_skb when there's sufficient tailroom")
    Fixes: 1262856 ("Merge ra.kernel.org:/pub/scm/linux/kernel/git/netdev/net")
    Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
    Reported-by: Corentin Noël <corentin.noel@collabora.com>
    Tested-by: Corentin Noël <corentin.noel@collabora.com>
    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    fengidri authored and davem330 committed Oct 9, 2021
  2. net: phy: Do not shutdown PHYs in READY state

    In case a PHY device was probed thus in the PHY_READY state, but not
    configured and with no network device attached yet, we should not be
    trying to shut it down because it has been brought back into reset by
    phy_device_reset() towards the end of phy_probe() and anyway we have not
    configured the PHY yet.
    
    Fixes: e2f016c ("net: phy: add a shutdown procedure")
    Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    ffainelli authored and davem330 committed Oct 9, 2021
  3. qed: Fix missing error code in qed_slowpath_start()

    The error code is missing in this code scenario, add the error code
    '-EINVAL' to the return value 'rc'.
    
    Eliminate the follow smatch warning:
    
    drivers/net/ethernet/qlogic/qed/qed_main.c:1298 qed_slowpath_start()
    warn: missing error code 'rc'.
    
    Reported-by: Abaci Robot <abaci@linux.alibaba.com>
    Fixes: d51e4af ("qed: aRFS infrastructure support")
    Signed-off-by: chongjiapeng <jiapeng.chong@linux.alibaba.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    chongjiapeng authored and davem330 committed Oct 9, 2021
  4. net: dsa: hold rtnl_lock in dsa_switch_setup_tag_protocol

    It was a documented fact that ds->ops->change_tag_protocol() offered
    rtnetlink mutex protection to the switch driver, since there was an
    ASSERT_RTNL right before the call in dsa_switch_change_tag_proto()
    (initiated from sysfs).
    
    The blamed commit introduced another call path for
    ds->ops->change_tag_protocol() which does not hold the rtnl_mutex.
    This is:
    
    dsa_tree_setup
    -> dsa_tree_setup_switches
       -> dsa_switch_setup
          -> dsa_switch_setup_tag_protocol
             -> ds->ops->change_tag_protocol()
       -> dsa_port_setup
          -> dsa_slave_create
             -> register_netdevice(slave_dev)
    -> dsa_tree_setup_master
       -> dsa_master_setup
          -> dev->dsa_ptr = cpu_dp
    
    The reason why the rtnl_mutex is held in the sysfs call path is to
    ensure that, once the master and all the DSA interfaces are down (which
    is required so that no packets flow), they remain down during the
    tagging protocol change.
    
    The above calling order illustrates the fact that it should not be risky
    to change the initial tagging protocol to the one specified in the
    device tree at the given time:
    
    - packets cannot enter the dsa_switch_rcv() packet type handler since
      netdev_uses_dsa() for the master will not yet return true, since
      dev->dsa_ptr has not yet been populated
    
    - packets cannot enter the dsa_slave_xmit() function because no DSA
      interface has yet been registered
    
    So from the DSA core's perspective, holding the rtnl_mutex is indeed not
    necessary.
    
    Yet, drivers may need to do things which need rtnl_mutex protection. For
    example:
    
    felix_set_tag_protocol
    -> felix_setup_tag_8021q
       -> dsa_tag_8021q_register
          -> dsa_tag_8021q_setup
             -> dsa_tag_8021q_port_setup
                -> vlan_vid_add
                   -> ASSERT_RTNL
    
    These drivers do not really have a choice to take the rtnl_mutex
    themselves, since in the sysfs case, the rtnl_mutex is already held.
    
    Fixes: deff710 ("net: dsa: Allow default tag protocol to be overridden from DT")
    Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    vladimiroltean authored and davem330 committed Oct 9, 2021
  5. isdn: mISDN: Fix sleeping function called from invalid context

    The driver can call card->isac.release() function from an atomic
    context.
    
    Fix this by calling this function after releasing the lock.
    
    The following log reveals it:
    
    [   44.168226 ] BUG: sleeping function called from invalid context at kernel/workqueue.c:3018
    [   44.168941 ] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 5475, name: modprobe
    [   44.169574 ] INFO: lockdep is turned off.
    [   44.169899 ] irq event stamp: 0
    [   44.170160 ] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
    [   44.170627 ] hardirqs last disabled at (0): [<ffffffff814209ed>] copy_process+0x132d/0x3e00
    [   44.171240 ] softirqs last  enabled at (0): [<ffffffff81420a1a>] copy_process+0x135a/0x3e00
    [   44.171852 ] softirqs last disabled at (0): [<0000000000000000>] 0x0
    [   44.172318 ] Preemption disabled at:
    [   44.172320 ] [<ffffffffa009b0a9>] nj_release+0x69/0x500 [netjet]
    [   44.174441 ] Call Trace:
    [   44.174630 ]  dump_stack_lvl+0xa8/0xd1
    [   44.174912 ]  dump_stack+0x15/0x17
    [   44.175166 ]  ___might_sleep+0x3a2/0x510
    [   44.175459 ]  ? nj_release+0x69/0x500 [netjet]
    [   44.175791 ]  __might_sleep+0x82/0xe0
    [   44.176063 ]  ? start_flush_work+0x20/0x7b0
    [   44.176375 ]  start_flush_work+0x33/0x7b0
    [   44.176672 ]  ? trace_irq_enable_rcuidle+0x85/0x170
    [   44.177034 ]  ? kasan_quarantine_put+0xaa/0x1f0
    [   44.177372 ]  ? kasan_quarantine_put+0xaa/0x1f0
    [   44.177711 ]  __flush_work+0x11a/0x1a0
    [   44.177991 ]  ? flush_work+0x20/0x20
    [   44.178257 ]  ? lock_release+0x13c/0x8f0
    [   44.178550 ]  ? __kasan_check_write+0x14/0x20
    [   44.178872 ]  ? do_raw_spin_lock+0x148/0x360
    [   44.179187 ]  ? read_lock_is_recursive+0x20/0x20
    [   44.179530 ]  ? __kasan_check_read+0x11/0x20
    [   44.179846 ]  ? do_raw_spin_unlock+0x55/0x900
    [   44.180168 ]  ? ____kasan_slab_free+0x116/0x140
    [   44.180505 ]  ? _raw_spin_unlock_irqrestore+0x41/0x60
    [   44.180878 ]  ? skb_queue_purge+0x1a3/0x1c0
    [   44.181189 ]  ? kfree+0x13e/0x290
    [   44.181438 ]  flush_work+0x17/0x20
    [   44.181695 ]  mISDN_freedchannel+0xe8/0x100
    [   44.182006 ]  isac_release+0x210/0x260 [mISDNipac]
    [   44.182366 ]  nj_release+0xf6/0x500 [netjet]
    [   44.182685 ]  nj_remove+0x48/0x70 [netjet]
    [   44.182989 ]  pci_device_remove+0xa9/0x250
    
    Signed-off-by: Zheyu Ma <zheyuma97@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    ZheyuMa authored and davem330 committed Oct 9, 2021
  6. ionic: don't remove netdev->dev_addr when syncing uc list

    Bridging, and possibly other upper stack gizmos, adds the
    lower device's netdev->dev_addr to its own uc list, and
    then requests it be deleted when the upper bridge device is
    removed.  This delete request also happens with the bridging
    vlan_filtering is enabled and then disabled.
    
    Bonding has a similar behavior with the uc list, but since it
    also uses set_mac to manage netdev->dev_addr, it doesn't have
    the same the failure case.
    
    Because we store our netdev->dev_addr in our uc list, we need
    to ignore the delete request from dev_uc_sync so as to not
    lose the address and all hope of communicating.  Note that
    ndo_set_mac_address is expressly changing netdev->dev_addr,
    so no limitation is set there.
    
    Fixes: 2a65454 ("ionic: Add Rx filter and rx_mode ndo support")
    Signed-off-by: Shannon Nelson <snelson@pensando.io>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    emusln authored and davem330 committed Oct 9, 2021
  7. net: mana: Fix error handling in mana_create_rxq()

    Fix error handling in mana_create_rxq() when
    cq->gdma_id >= gc->max_num_cqs.
    
    Fixes: ca9c54d ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)")
    Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
    Link: https://lore.kernel.org/r/1633698691-31721-1-git-send-email-haiyangz@microsoft.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    haiyangz authored and Jakub Kicinski committed Oct 9, 2021

Commits on Oct 8, 2021

  1. isdn: cpai: check ctr->cnr to avoid array index out of bound

    The cmtp_add_connection() would add a cmtp session to a controller
    and run a kernel thread to process cmtp.
    
    	__module_get(THIS_MODULE);
    	session->task = kthread_run(cmtp_session, session, "kcmtpd_ctr_%d",
    								session->num);
    
    During this process, the kernel thread would call detach_capi_ctr()
    to detach a register controller. if the controller
    was not attached yet, detach_capi_ctr() would
    trigger an array-index-out-bounds bug.
    
    [   46.866069][ T6479] UBSAN: array-index-out-of-bounds in
    drivers/isdn/capi/kcapi.c:483:21
    [   46.867196][ T6479] index -1 is out of range for type 'capi_ctr *[32]'
    [   46.867982][ T6479] CPU: 1 PID: 6479 Comm: kcmtpd_ctr_0 Not tainted
    5.15.0-rc2+ torvalds#8
    [   46.869002][ T6479] Hardware name: QEMU Standard PC (i440FX + PIIX,
    1996), BIOS 1.14.0-2 04/01/2014
    [   46.870107][ T6479] Call Trace:
    [   46.870473][ T6479]  dump_stack_lvl+0x57/0x7d
    [   46.870974][ T6479]  ubsan_epilogue+0x5/0x40
    [   46.871458][ T6479]  __ubsan_handle_out_of_bounds.cold+0x43/0x48
    [   46.872135][ T6479]  detach_capi_ctr+0x64/0xc0
    [   46.872639][ T6479]  cmtp_session+0x5c8/0x5d0
    [   46.873131][ T6479]  ? __init_waitqueue_head+0x60/0x60
    [   46.873712][ T6479]  ? cmtp_add_msgpart+0x120/0x120
    [   46.874256][ T6479]  kthread+0x147/0x170
    [   46.874709][ T6479]  ? set_kthread_struct+0x40/0x40
    [   46.875248][ T6479]  ret_from_fork+0x1f/0x30
    [   46.875773][ T6479]
    
    Signed-off-by: Xiaolong Huang <butterflyhuangxx@gmail.com>
    Acked-by: Arnd Bergmann <arnd@arndb.de>
    Link: https://lore.kernel.org/r/20211008065830.305057-1-butterflyhuangxx@gmail.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Xiaolong Huang authored and Jakub Kicinski committed Oct 8, 2021
  2. mqprio: Correct stats in mqprio_dump_class_stats().

    Introduction of lockless subqueues broke the class statistics.
    Before the change stats were accumulated in `bstats' and `qstats'
    on the stack which was then copied to struct gnet_dump.
    
    After the change the `bstats' and `qstats' are initialized to 0
    and never updated, yet still fed to gnet_dump. The code updates
    the global qdisc->cpu_bstats and qdisc->cpu_qstats instead,
    clobbering them. Most likely a copy-paste error from the code in
    mqprio_dump().
    
    __gnet_stats_copy_basic() and __gnet_stats_copy_queue() accumulate
    the values for per-CPU case but for global stats they overwrite
    the value, so only stats from the last loop iteration / tc end up
    in sch->[bq]stats.
    
    Use the on-stack [bq]stats variables again and add the stats manually
    in the global case.
    
    Fixes: ce679e8 ("net: sched: add support for TCQ_F_NOLOCK subqueues to sch_mqprio")
    Cc: John Fastabend <john.fastabend@gmail.com>
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    https://lore.kernel.org/all/20211007175000.2334713-2-bigeasy@linutronix.de/
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Sebastian Andrzej Siewior authored and Jakub Kicinski committed Oct 8, 2021
  3. Merge branch 'dsa-bridge-tx-forwarding-offload-fixes-part-1'

    Vladimir Oltean says:
    
    ====================
    DSA bridge TX forwarding offload fixes - part 1
    
    This is part 1 of a series of fixes to the bridge TX forwarding offload
    feature introduced for v5.15. Sadly, the other fixes are so intrusive
    that they cannot be reasonably be sent to the "net" tree, as they also
    include API changes. So they are left as part 2 for net-next.
    ====================
    
    Link: https://lore.kernel.org/r/20211007164711.2897238-1-vladimir.oltean@nxp.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Jakub Kicinski committed Oct 8, 2021
  4. net: dsa: mv88e6xxx: isolate the ATU databases of standalone and brid…

    …ged ports
    
    Similar to commit 6087175 ("net: dsa: mt7530: use independent VLAN
    learning on VLAN-unaware bridges"), software forwarding between an
    unoffloaded LAG port (a bonding interface with an unsupported policy)
    and a mv88e6xxx user port directly under a bridge is broken.
    
    We adopt the same strategy, which is to make the standalone ports not
    find any ATU entry learned on a bridge port.
    
    Theory: the mv88e6xxx ATU is looked up by FID and MAC address. There are
    as many FIDs as VIDs (4096). The FID is derived from the VID when
    possible (the VTU maps a VID to a FID), with a fallback to the port
    based default FID value when not (802.1Q Mode is disabled on the port,
    or the classified VID isn't present in the VTU).
    
    The mv88e6xxx driver makes the following use of FIDs and VIDs:
    
    - the port's DefaultVID (to which untagged & pvid-tagged packets get
      classified) is 0 and is absent from the VTU, so this kind of packets is
      processed in FID 0, the default FID assigned by mv88e6xxx_setup_port.
    
    - every time a bridge VLAN is created, mv88e6xxx_port_vlan_join() ->
      mv88e6xxx_atu_new() associates a FID with that VID which increases
      linearly starting from 1. Like this:
    
      bridge vlan add dev lan0 vid 100 # FID 1
      bridge vlan add dev lan1 vid 100 # still FID 1
      bridge vlan add dev lan2 vid 1024 # FID 2
    
    The FID allocation made by the driver is sub-optimal for the following
    reasons:
    
    (a) A standalone port has a DefaultPVID of 0 and a default FID of 0 too.
        A VLAN-unaware bridged port has a DefaultPVID of 0 and a default FID
        of 0 too. The difference is that the bridged ports may learn ATU
        entries, while the standalone port has the requirement that it must
        not, and must not find them either. Standalone ports must not use
        the same FID as ports belonging to a bridge. All standalone ports
        can use the same FID, since the ATU will never have an entry in
        that FID.
    
    (b) Multiple VLAN-unaware bridges will all use a DefaultPVID of 0 and a
        default FID of 0 on all their ports. The FDBs will not be isolated
        between these bridges. Every VLAN-unaware bridge must use the same
        FID on all its ports, different from the FID of other bridge ports.
    
    (c) Each bridge VLAN uses a unique FID which is useful for Independent
        VLAN Learning, but the same VLAN ID on multiple VLAN-aware bridges
        will result in the same FID being used by mv88e6xxx_atu_new().
        The correct behavior is for VLAN 1 in br0 to have a different FID
        compared to VLAN 1 in br1.
    
    This patch cannot fix all the above. Traditionally the DSA framework did
    not care about this, and the reality is that DSA core involvement is
    needed for the aforementioned issues to be solved. The only thing we can
    solve here is an issue which does not require API changes, and that is
    issue (a), aka use a different FID for standalone ports vs ports under
    VLAN-unaware bridges.
    
    The first step is deciding what VID and FID to use for standalone ports,
    and what VID and FID for bridged ports. The 0/0 pair for standalone
    ports is what they used up till now, let's keep using that. For bridged
    ports, there are 2 cases:
    
    - VLAN-aware ports will never end up using the port default FID, because
      packets will always be classified to a VID in the VTU or dropped
      otherwise. The FID is the one associated with the VID in the VTU.
    
    - On VLAN-unaware ports, we _could_ leave their DefaultVID (pvid) at
      zero (just as in the case of standalone ports), and just change the
      port's default FID from 0 to a different number (say 1).
    
    However, Tobias points out that there is one more requirement to cater to:
    cross-chip bridging. The Marvell DSA header does not carry the FID in
    it, only the VID. So once a packet crosses a DSA link, if it has a VID
    of zero it will get classified to the default FID of that cascade port.
    Relying on a port default FID for upstream cascade ports results in
    contradictions: a default FID of 0 breaks ATU isolation of bridged ports
    on the downstream switch, a default FID of 1 breaks standalone ports on
    the downstream switch.
    
    So not only must standalone ports have different FIDs compared to
    bridged ports, they must also have different DefaultVID values.
    IEEE 802.1Q defines two reserved VID values: 0 and 4095. So we simply
    choose 4095 as the DefaultVID of ports belonging to VLAN-unaware
    bridges, and VID 4095 maps to FID 1.
    
    For the xmit operation to look up the same ATU database, we need to put
    VID 4095 in DSA tags sent to ports belonging to VLAN-unaware bridges
    too. All shared ports are configured to map this VID to the bridging
    FID, because they are members of that VLAN in the VTU. Shared ports
    don't need to have 802.1QMode enabled in any way, they always parse the
    VID from the DSA header, they don't need to look at the 802.1Q header.
    
    We install VID 4095 to the VTU in mv88e6xxx_setup_port(), with the
    mention that mv88e6xxx_vtu_setup() which was located right below that
    call was flushing the VTU so those entries wouldn't be preserved.
    So we need to relocate the VTU flushing prior to the port initialization
    during ->setup(). Also note that this is why it is safe to assume that
    VID 4095 will get associated with FID 1: the user ports haven't been
    created, so there is no avenue for the user to create a bridge VLAN
    which could otherwise race with the creation of another FID which would
    otherwise use up the non-reserved FID value of 1.
    
    [ Currently mv88e6xxx_port_vlan_join() doesn't have the option of
      specifying a preferred FID, it always calls mv88e6xxx_atu_new(). ]
    
    mv88e6xxx_port_db_load_purge() is the function to access the ATU for
    FDB/MDB entries, and it used to determine the FID to use for
    VLAN-unaware FDB entries (VID=0) using mv88e6xxx_port_get_fid().
    But the driver only called mv88e6xxx_port_set_fid() once, during probe,
    so no surprises, the port FID was always 0, the call to get_fid() was
    redundant. As much as I would have wanted to not touch that code, the
    logic is broken when we add a new FID which is not the port-based
    default. Now the port-based default FID only corresponds to standalone
    ports, and FDB/MDB entries belong to the bridging service. So while in
    the future, when the DSA API will support FDB isolation, we will have to
    figure out the FID based on the bridge number, for now there's a single
    bridging FID, so hardcode that.
    
    Lastly, the tagger needs to check, when it is transmitting a VLAN
    untagged skb, whether it is sending it towards a bridged or a standalone
    port. When we see it is bridged we assume the bridge is VLAN-unaware.
    Not because it cannot be VLAN-aware but:
    
    - if we are transmitting from a VLAN-aware bridge we are likely doing so
      using TX forwarding offload. That code path guarantees that skbs have
      a vlan hwaccel tag in them, so we would not enter the "else" branch
      of the "if (skb->protocol == htons(ETH_P_8021Q))" condition.
    
    - if we are transmitting on behalf of a VLAN-aware bridge but with no TX
      forwarding offload (no PVT support, out of space in the PVT, whatever),
      we would indeed be transmitting with VLAN 4095 instead of the bridge
      device's pvid. However we would be injecting a "From CPU" frame, and
      the switch won't learn from that - it only learns from "Forward" frames.
      So it is inconsequential for address learning. And VLAN 4095 is
      absolutely enough for the frame to exit the switch, since we never
      remove that VLAN from any port.
    
    Fixes: 57e661a ("net: dsa: mv88e6xxx: Link aggregation support")
    Reported-by: Tobias Waldekranz <tobias@waldekranz.com>
    Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    vladimiroltean authored and Jakub Kicinski committed Oct 8, 2021
  5. net: dsa: mv88e6xxx: keep the pvid at 0 when VLAN-unaware

    The VLAN support in mv88e6xxx has a loaded history. Commit 2ea7a67
    ("net: dsa: Don't add vlans when vlan filtering is disabled") noticed
    some issues with VLAN and decided the best way to deal with them was to
    make the DSA core ignore VLANs added by the bridge while VLAN awareness
    is turned off. Those issues were never explained, just presented as
    "at least one corner case".
    
    That approach had problems of its own, presented by
    commit 54a0ed0 ("net: dsa: provide an option for drivers to always
    receive bridge VLANs") for the DSA core, followed by
    commit 1fb7419 ("net: dsa: mv88e6xxx: fix vlan setup") which
    applied ds->configure_vlan_while_not_filtering = true for mv88e6xxx in
    particular.
    
    We still don't know what corner case Andrew saw when he wrote
    commit 2ea7a67 ("net: dsa: Don't add vlans when vlan filtering is
    disabled"), but Tobias now reports that when we use TX forwarding
    offload, pinging an external station from the bridge device is broken if
    the front-facing DSA user port has flooding turned off. The full
    description is in the link below, but for short, when a mv88e6xxx port
    is under a VLAN-unaware bridge, it inherits that bridge's pvid.
    So packets ingressing a user port will be classified to e.g. VID 1
    (assuming that value for the bridge_default_pvid), whereas when
    tag_dsa.c xmits towards a user port, it always sends packets using a VID
    of 0 if that port is standalone or under a VLAN-unaware bridge - or at
    least it did so prior to commit d82f8ab ("net: dsa: tag_dsa:
    offload the bridge forwarding process").
    
    In any case, when there is a conversation between the CPU and a station
    connected to a user port, the station's MAC address is learned in VID 1
    but the CPU tries to transmit through VID 0. The packets reach the
    intended station, but via flooding and not by virtue of matching the
    existing ATU entry.
    
    DSA has established (and enforced in other drivers: sja1105, felix,
    mt7530) that a VLAN-unaware port should use a private pvid, and not
    inherit the one from the bridge. The bridge's pvid should only be
    inherited when that bridge is VLAN-aware, so all state transitions need
    to be handled. On the other hand, all bridge VLANs should sit in the VTU
    starting with the moment when the bridge offloads them via switchdev,
    they are just not used.
    
    This solves the problem that Tobias sees because packets ingressing on
    VLAN-unaware user ports now get classified to VID 0, which is also the
    VID used by tag_dsa.c on xmit.
    
    Fixes: d82f8ab ("net: dsa: tag_dsa: offload the bridge forwarding process")
    Link: https://patchwork.kernel.org/project/netdevbpf/patch/20211003222312.284175-2-vladimir.oltean@nxp.com/#24491503
    Reported-by: Tobias Waldekranz <tobias@waldekranz.com>
    Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    vladimiroltean authored and Jakub Kicinski committed Oct 8, 2021
  6. net: dsa: tag_dsa: send packets with TX fwd offload from VLAN-unaware…

    … bridges using VID 0
    
    The present code is structured this way due to an incomplete thought
    process. In Documentation/networking/switchdev.rst we document that if a
    bridge is VLAN-unaware, then the presence or lack of a pvid on a bridge
    port (or on the bridge itself, for that matter) should not affect the
    ability to receive and transmit tagged or untagged packets.
    
    If the bridge on behalf of which we are sending this packet is
    VLAN-aware, then the TX forwarding offload API ensures that the skb will
    be VLAN-tagged (if the packet was sent by user space as untagged, it
    will get transmitted town to the driver as tagged with the bridge
    device's pvid). But if the bridge is VLAN-unaware, it may or may not be
    VLAN-tagged. In fact the logic to insert the bridge's PVID came from the
    idea that we should emulate what is being done in the VLAN-aware case.
    But we shouldn't.
    
    It appears that injecting packets using a VLAN ID of 0 serves the
    purpose of forwarding the packets to the egress port with no VLAN tag
    added or stripped by the hardware, and no filtering being performed.
    So we can simply remove the superfluous logic.
    
    One reason why this logic is broken is that when CONFIG_BRIDGE_VLAN_FILTERING=n,
    we call br_vlan_get_pvid_rcu() but that returns an error and we do error
    out, dropping all packets on xmit. Not really smart. This is also an
    issue when the user deletes the bridge pvid:
    
    $ bridge vlan del dev br0 vid 1 self
    
    As mentioned, in both cases, packets should still flow freely, and they
    do just that on any net device where the bridge is not offloaded, but on
    mv88e6xxx they don't.
    
    Fixes: d82f8ab ("net: dsa: tag_dsa: offload the bridge forwarding process")
    Reported-by: Andrew Lunn <andrew@lunn.ch>
    Link: https://patchwork.kernel.org/project/netdevbpf/patch/20211003155141.2241314-1-andrew@lunn.ch/
    Link: https://patchwork.kernel.org/project/netdevbpf/patch/20210928233708.1246774-1-vladimir.oltean@nxp.com/
    Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    vladimiroltean authored and Jakub Kicinski committed Oct 8, 2021
  7. net: dsa: fix bridge_num not getting cleared after ports leaving the …

    …bridge
    
    The dp->bridge_num is zero-based, with -1 being the encoding for an
    invalid value. But dsa_bridge_num_put used to check for an invalid value
    by comparing bridge_num with 0, which is of course incorrect.
    
    The result is that the bridge_num will never get cleared by
    dsa_bridge_num_put, and further port joins to other bridges will get a
    bridge_num larger than the previous one, and once all the available
    bridges with TX forwarding offload supported by the hardware get
    exhausted, the TX forwarding offload feature is simply disabled.
    
    In the case of sja1105, 7 iterations of the loop below are enough to
    exhaust the TX forwarding offload bits, and further bridge joins operate
    without that feature.
    
    ip link add br0 type bridge vlan_filtering 1
    
    while :; do
            ip link set sw0p2 master br0 && sleep 1
            ip link set sw0p2 nomaster && sleep 1
    done
    
    This issue is enough of an indication that having the dp->bridge_num
    invalid encoding be a negative number is prone to bugs, so this will be
    changed to a one-based value, with the dp->bridge_num of zero being the
    indication of no bridge. However, that is material for net-next.
    
    Fixes: f5e165e ("net: dsa: track unique bridge numbers across all DSA switch trees")
    Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    vladimiroltean authored and Jakub Kicinski committed Oct 8, 2021
  8. nfc: nci: fix the UAF of rf_conn_info object

    The nci_core_conn_close_rsp_packet() function will release the conn_info
    with given conn_id. However, it needs to set the rf_conn_info to NULL to
    prevent other routines like nci_rf_intf_activated_ntf_packet() to trigger
    the UAF.
    
    Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
    Signed-off-by: Lin Ma <linma@zju.edu.cn>
    Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    f0rm2l1n authored and davem330 committed Oct 8, 2021
  9. net/smc: improved fix wait on already cleared link

    Commit 8f3d65c ("net/smc: fix wait on already cleared link")
    introduced link refcounting to avoid waits on already cleared links.
    This patch extents and improves the refcounting to cover all
    remaining possible cases for this kind of error situation.
    
    Fixes: 15e1b99 ("net/smc: no WR buffer wait for terminating link group")
    Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    karstengr authored and davem330 committed Oct 8, 2021
  10. Merge branch 'stmmac-regression-fix'

    Merge branch 'stmmac-regression-fix'
    
    Herve Codina says:
    
    ====================
    net: stmmac: fix regression on SPEAr3xx SOC
    
    The ethernet driver used on old SPEAr3xx soc was previously supported on old
    kernel. Some regressions were introduced during the different updates leading
    to a broken driver for this soc.
    
    This series fixes these regressions and brings back ethernet on SPEAr3xx.
    Tested on a SPEAr320 board.
    ====================
    
    Signed-off-by: David S. Miller <davem@davemloft.net>
    davem330 committed Oct 8, 2021
  11. ARM: dts: spear3xx: Fix gmac node

    On SPEAr3xx, ethernet driver is not compatible with the SPEAr600
    one.
    Indeed, SPEAr3xx uses an earlier version of this IP (v3.40) and
    needs some driver tuning compare to SPEAr600.
    
    The v3.40 IP support was added to stmmac driver and this patch
    fixes this issue and use the correct compatible string for
    SPEAr3xx
    
    Signed-off-by: Herve Codina <herve.codina@bootlin.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    hcodina authored and davem330 committed Oct 8, 2021
  12. net: stmmac: add support for dwmac 3.40a

    dwmac 3.40a is an old ip version that can be found on SPEAr3xx soc.
    
    Signed-off-by: Herve Codina <herve.codina@bootlin.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    hcodina authored and davem330 committed Oct 8, 2021
  13. dt-bindings: net: snps,dwmac: add dwmac 3.40a IP version

    dwmac 3.40a is an old ip version that can be found on SPEAr3xx soc.
    
    Signed-off-by: Herve Codina <herve.codina@bootlin.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    hcodina authored and davem330 committed Oct 8, 2021
  14. net: stmmac: fix get_hw_feature() on old hardware

    Some old IPs do not provide the hardware feature register.
    On these IPs, this register is read 0x00000000.
    
    In old driver version, this feature was handled but a regression came
    with the commit f10a6a3 ("stmmac: rework get_hw_feature function").
    Indeed, this commit removes the return value in dma->get_hw_feature().
    This return value was used to indicate the validity of retrieved
    information and used later on in stmmac_hw_init() to override
    priv->plat data if this hardware feature were valid.
    
    This patch restores the return code in ->get_hw_feature() in order
    to indicate the hardware feature validity and override priv->plat
    data only if this hardware feature is valid.
    
    Fixes: f10a6a3 ("stmmac: rework get_hw_feature function")
    Signed-off-by: Herve Codina <herve.codina@bootlin.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    hcodina authored and davem330 committed Oct 8, 2021
  15. mptcp: fix possible stall on recvmsg()

    recvmsg() can enter an infinite loop if the caller provides the
    MSG_WAITALL, the data present in the receive queue is not sufficient to
    fulfill the request, and no more data is received by the peer.
    
    When the above happens, mptcp_wait_data() will always return with
    no wait, as the MPTCP_DATA_READY flag checked by such function is
    set and never cleared in such code path.
    
    Leveraging the above syzbot was able to trigger an RCU stall:
    
    rcu: INFO: rcu_preempt self-detected stall on CPU
    rcu:    0-...!: (10499 ticks this GP) idle=0af/1/0x4000000000000000 softirq=10678/10678 fqs=1
            (t=10500 jiffies g=13089 q=109)
    rcu: rcu_preempt kthread starved for 10497 jiffies! g13089 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=1
    rcu:    Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
    rcu: RCU grace-period kthread stack dump:
    task:rcu_preempt     state:R  running task     stack:28696 pid:   14 ppid:     2 flags:0x00004000
    Call Trace:
     context_switch kernel/sched/core.c:4955 [inline]
     __schedule+0x940/0x26f0 kernel/sched/core.c:6236
     schedule+0xd3/0x270 kernel/sched/core.c:6315
     schedule_timeout+0x14a/0x2a0 kernel/time/timer.c:1881
     rcu_gp_fqs_loop+0x186/0x810 kernel/rcu/tree.c:1955
     rcu_gp_kthread+0x1de/0x320 kernel/rcu/tree.c:2128
     kthread+0x405/0x4f0 kernel/kthread.c:327
     ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
    rcu: Stack dump where RCU GP kthread last ran:
    Sending NMI from CPU 0 to CPUs 1:
    NMI backtrace for cpu 1
    CPU: 1 PID: 8510 Comm: syz-executor827 Not tainted 5.15.0-rc2-next-20210920-syzkaller #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    RIP: 0010:bytes_is_nonzero mm/kasan/generic.c:84 [inline]
    RIP: 0010:memory_is_nonzero mm/kasan/generic.c:102 [inline]
    RIP: 0010:memory_is_poisoned_n mm/kasan/generic.c:128 [inline]
    RIP: 0010:memory_is_poisoned mm/kasan/generic.c:159 [inline]
    RIP: 0010:check_region_inline mm/kasan/generic.c:180 [inline]
    RIP: 0010:kasan_check_range+0xc8/0x180 mm/kasan/generic.c:189
    Code: 38 00 74 ed 48 8d 50 08 eb 09 48 83 c0 01 48 39 d0 74 7a 80 38 00 74 f2 48 89 c2 b8 01 00 00 00 48 85 d2 75 56 5b 5d 41 5c c3 <48> 85 d2 74 5e 48 01 ea eb 09 48 83 c0 01 48 39 d0 74 50 80 38 00
    RSP: 0018:ffffc9000cd676c8 EFLAGS: 00000283
    RAX: ffffed100e9a110e RBX: ffffed100e9a110f RCX: ffffffff88ea062a
    RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffff888074d08870
    RBP: ffffed100e9a110e R08: 0000000000000001 R09: ffff888074d08877
    R10: ffffed100e9a110e R11: 0000000000000000 R12: ffff888074d08000
    R13: ffff888074d08000 R14: ffff888074d08088 R15: ffff888074d08000
    FS:  0000555556d8e300(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000
    S:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000020000180 CR3: 0000000068909000 CR4: 00000000001506e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
     instrument_atomic_read_write include/linux/instrumented.h:101 [inline]
     test_and_clear_bit include/asm-generic/bitops/instrumented-atomic.h:83 [inline]
     mptcp_release_cb+0x14a/0x210 net/mptcp/protocol.c:3016
     release_sock+0xb4/0x1b0 net/core/sock.c:3204
     mptcp_wait_data net/mptcp/protocol.c:1770 [inline]
     mptcp_recvmsg+0xfd1/0x27b0 net/mptcp/protocol.c:2080
     inet6_recvmsg+0x11b/0x5e0 net/ipv6/af_inet6.c:659
     sock_recvmsg_nosec net/socket.c:944 [inline]
     ____sys_recvmsg+0x527/0x600 net/socket.c:2626
     ___sys_recvmsg+0x127/0x200 net/socket.c:2670
     do_recvmmsg+0x24d/0x6d0 net/socket.c:2764
     __sys_recvmmsg net/socket.c:2843 [inline]
     __do_sys_recvmmsg net/socket.c:2866 [inline]
     __se_sys_recvmmsg net/socket.c:2859 [inline]
     __x64_sys_recvmmsg+0x20b/0x260 net/socket.c:2859
     do_syscall_x64 arch/x86/entry/common.c:50 [inline]
     do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
     entry_SYSCALL_64_after_hwframe+0x44/0xae
    RIP: 0033:0x7fc200d2dc39
    Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 41 15 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
    RSP: 002b:00007ffc5758e5a8 EFLAGS: 00000246 ORIG_RAX: 000000000000012b
    RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fc200d2dc39
    RDX: 0000000000000002 RSI: 00000000200017c0 RDI: 0000000000000003
    RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000f0b5ff
    R10: 0000000000000100 R11: 0000000000000246 R12: 0000000000000003
    R13: 00007ffc5758e5d0 R14: 00007ffc5758e5c0 R15: 0000000000000003
    
    Fix the issue by replacing the MPTCP_DATA_READY bit with direct
    inspection of the msk receive queue.
    
    Reported-and-tested-by: syzbot+3360da629681aa0d22fe@syzkaller.appspotmail.com
    Fixes: 7a6a6cb ("mptcp: recvmsg() can drain data from multiple subflow")
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Paolo Abeni authored and davem330 committed Oct 8, 2021

Commits on Oct 7, 2021

  1. Merge tag 'nfsd-5.15-3' of git://git.kernel.org/pub/scm/linux/kernel/…

    …git/cel/linux
    
    Pull nfsd fixes from Chuck Lever:
     "Bug fixes for NFSD error handling paths"
    
    * tag 'nfsd-5.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
      NFSD: Keep existing listeners on portlist error
      SUNRPC: fix sign error causing rpcsec_gss drops
      nfsd: Fix a warning for nfsd_file_close_inode
      nfsd4: Handle the NFSv4 READDIR 'dircount' hint being zero
      nfsd: fix error handling of register_pernet_subsys() in init_nfsd()
    torvalds committed Oct 7, 2021
  2. Merge tag 'armsoc-fixes-5.15' of git://git.kernel.org/pub/scm/linux/k…

    …ernel/git/soc/soc
    
    Pull ARM SoC fixes from Arnd Bergmann:
     "This is a larger than normal update for Arm SoC specific code, most of
      it in device trees, but also drivers and the omap and at91/sama7
      platforms:
    
       - There are four new entries to the MAINTAINERS file: Sven Peter and
         Alyssa Rosenzweig for Apple M1, Romain Perier for Mstar/sigmastar,
         and Vignesh Raghavendra for TI K3
    
       - Build fixes to address randconfig warnings in sharpsl, dove, omap1,
         and qcom platforms as well as the scmi and op-tee subsystems
    
       - Regression fixes for missing CONFIG_FB and other options for
         several defconfigs
    
       - Several bug fixes for the newly added Microchip SAMA7 platform,
         mostly regarding power management
    
       - Missing SMP barriers to protect accesses to SCMI virtio device
    
       - Regression fixes for TI OMAP, including a boot-time hang on am335x.
    
       - Lots of bug fixes for NXP i.MX, mostly addressing incorrect
         settings in devicetree files, and one revert for broken suspend.
    
       - Fixes for ARM Juno/Vexpress devicetree files, addressing a couple
         of schema warnings.
    
       - Regression fixes for qualcomm SoC specific drivers and devicetree
         files, reverting an mdt_loader change and at least pastially
         reverting some of the 5.15 DTS changes, plus some minor bugfixes"
    
    * tag 'armsoc-fixes-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (64 commits)
      MAINTAINERS: Add Sven Peter as ARM/APPLE MACHINE maintainer
      MAINTAINERS: Add Alyssa Rosenzweig as M1 reviewer
      firmware: arm_scmi: Add proper barriers to scmi virtio device
      firmware: arm_scmi: Simplify spinlocks in virtio transport
      ARM: dts: omap3430-sdp: Fix NAND device node
      bus: ti-sysc: Use CLKDM_NOAUTO for dra7 dcan1 for errata i893
      ARM: sharpsl_param: work around -Wstringop-overread warning
      ARM: defconfig: gemini: Restore framebuffer
      ARM: dove: mark 'putc' as inline
      ARM: omap1: move omap15xx local bus handling to usb.c
      MAINTAINERS: Add Vignesh to TI K3 platform maintainership
      arm64: dts: imx8m*-venice-gw7902: fix M2_RST# gpio
      ARM: imx6: disable the GIC CPU interface before calling stby-poweroff sequence
      arm64: dts: ls1028a: fix eSDHC2 node
      arm64: dts: imx8mm-kontron-n801x-som: do not allow to switch off buck2
      ARM: dts: at91: sama7g5ek: to not touch slew-rate for SDMMC pins
      ARM: dts: at91: sama7g5ek: use proper slew-rate settings for GMACs
      ARM: at91: pm: preload base address of controllers in tlb
      ARM: at91: pm: group constants and addresses loading
      ARM: dts: at91: sama7g5ek: add suspend voltage for ddr3l rail
      ...
    torvalds committed Oct 7, 2021
Older