Skip to content

[pull] master from torvalds:master#45

Merged
pull[bot] merged 205 commits intoDVRKHz:masterfrom
torvalds:master
Aug 12, 2022
Merged

[pull] master from torvalds:master#45
pull[bot] merged 205 commits intoDVRKHz:masterfrom
torvalds:master

Conversation

@pull
Copy link
Copy Markdown

@pull pull bot commented Aug 12, 2022

See Commits and Changes for more details.


Created by pull[bot]

Can you help keep this open source service alive? 💖 Please sponsor : )

Christoph Hellwig and others added 30 commits July 22, 2022 10:59
Use filemap_fdatawrite_wbc instead of generic_writepages in
gfs2_ail1_start_one so that the functin can also cope with address_space
operations that only implement ->writepages and to properly account
for cgroup writeback.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
->writepage is only used for single page writeback from memory reclaim,
and not called at all for cgroup writeback.  Follow the lead of XFS
and remove ->writepage and rely entirely on ->writepages.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
->writepage is only used for single page writeback from memory reclaim,
and not called at all for cgroup writeback.  Follow the lead of XFS
and remove ->writepage and rely entirely on ->writepages.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Unused now.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
The value acpi_add_nondev_subnodes() returns is bool so change the return
type of the function to match that.

Fixes: 445b0eb ("ACPI / property: Add support for data-only subnodes")
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
ACPICA allows associating additional information (i.e. pointers with
specific tag) to acpi_handles. The acpi_device's are associated to
acpi_handle's in acpi_tie_acpi_dev() in scan.c, do the same here for the
_DSD data nodes.

This allows direct data node references in properties, implemented later on
in the series.

Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
…sing

The type of union acpi_object field type is acpi_object_type. Use that
instead of int.

Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Split out property reference argument parsing out of the
__acpi_node_get_property_reference() function into a new one,
acpi_get_ref_args(). The new function will be needed also for parsing
string references soon.

Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
__acpi_node_get_property_reference() uses a series of if () statements for
testing the same variable. There's soon going to be one more value to be
tested.

Switch to use switch() instead.

Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Unify functions reading ACPI property integer values into a single macro
using C99 _Generic().

Also use size_t for the counter instead of int.

Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Add support for newly added buffer property UUID, as defined in the DSD
guide section 3.3 [1]

Link: https://github.com/UEFI/DSD-Guide/blob/main/src/dsd-guide.adoc#buffer-data-extension-uuid # [1]
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Instead of adding a new property type, read buffer properties as integers.
Even though the internal representation in ACPI is different, the data
type is the same (byte) than on 8-bit integers.

Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
inode_insert5 currently looks at I_CREATING to decide whether to insert
the inode into the sb list. This test is a bit ambiguous, as I_CREATING
state is not directly related to that list.

This test is also problematic for some upcoming ceph changes to add
fscrypt support. We need to be able to allocate an inode using new_inode
and insert it into the hash later iff we end up using it, and doing that
now means that we double add it and corrupt the list.

What we really want to know in this test is whether the inode is already
in its superblock list, and then add it if it isn't. Have it test for
list_empty instead and ensure that we always initialize the list by
doing it in inode_init_once. It's only ever removed from the list with
list_del_init, so that should be sufficient.

Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
For ceph, we want to use our own scheme for handling filenames that are
are longer than NAME_MAX after encryption and Base64 encoding. This
allows us to have a consistent view of the encrypted filenames for
clients that don't support fscrypt and clients that do but that don't
have the key.

Currently, fs/crypto only supports encrypting filenames using
fscrypt_setup_filename, but that also handles encoding nokey names. Ceph
can't use that because it handles nokey names in a different way.

Export fscrypt_fname_encrypt. Rename fscrypt_fname_encrypted_size to
__fscrypt_fname_encrypted_size and add a new wrapper called
fscrypt_fname_encrypted_size that takes an inode argument rather than a
pointer to a fscrypt_policy union.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Most filesystems just call fscrypt_set_context on new inodes, which
usually causes a setxattr. That's a bit late for ceph, which can send
along a full set of attributes with the create request.

Doing so allows it to avoid race windows that where the new inode could
be seen by other clients without the crypto context attached. It also
avoids the separate round trip to the server.

Refactor the fscrypt code a bit to allow us to create a new crypto
context, attach it to the inode, and write it to the buffer, but without
calling set_context on it. ceph can later use this to marshal the
context into the attributes we send along with the create request.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Feature bits have to be encoded into the correct locations.  This hasn't
been an issue so far because the only hole in the feature bits was in bit
10 (CEPHFS_FEATURE_RECLAIM_CLIENT), which is located in the 2nd byte.  When
adding more bits that go beyond the this 2nd byte, the bug will show up.

[xiubli: remove incorrect comment for CEPHFS_FEATURES_CLIENT_SUPPORTED]

Fixes: 9ba1e22 ("ceph: allocate the correct amount of extra bytes for the session features")
Signed-off-by: Luís Henriques <lhenriques@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
This macro was added but never be used. And check the ceph code
there has another CEPHFS_FEATURES_MDS_REQUIRED but always be empty.

We should clean up all this related code, which make no sense but
introducing confusion.

Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Luís Henriques <lhenriques@suse.de>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Compare dentry name with case-exact name, return true if names
are same, or false.

Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
In async unlink case the kclient won't wait for the first reply
from MDS and just drop all the links and unhash the dentry and then
succeeds immediately.

For any new create/link/rename,etc requests followed by using the
same file names we must wait for the first reply of the inflight
unlink request, or the MDS possibly will fail these following
requests with -EEXIST if the inflight async unlink request was
delayed for some reasons.

And the worst case is that for the none async openc request it will
successfully open the file if the CDentry hasn't been unlinked yet,
but later the previous delayed async unlink request will remove the
CDenty. That means the just created file is possiblly deleted later
by accident.

We need to wait for the inflight async unlink requests to finish
when creating new files/directories by using the same file names.

Link: https://tracker.ceph.com/issues/55332
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
If the connection was accidently closed due to the socket issue or
something else the clients will try to open the opened sessions, the
MDSes will send the session open reply one more time if the clients
support the notify feature.

When the clients retry to open the sessions the s_seq will be 0 as
default, we need to update it anyway.

Link: https://tracker.ceph.com/issues/53911
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
And for the 'Xs' caps for getxattr we will also choose the auth MDS,
because the MDS side code is buggy due to setxattr won't notify the
replica MDSes when the values changed and the replica MDS will return
the old values. Though we will fix it in MDS code, but this still
makes sense for old ceph.

Link: https://tracker.ceph.com/issues/55331
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
The MDS tries to enforce a limit on the total key/values in extended
attributes.  However, this limit is enforced only if doing a synchronous
operation (MDS_OP_SETXATTR) -- if we're buffering the xattrs, the MDS
doesn't have a chance to enforce these limits.

This patch adds support for decoding the xattrs maximum size setting that is
distributed in the mdsmap.  Then, when setting an xattr, the kernel client
will revert to do a synchronous operation if that maximum size is exceeded.

While there, fix a dout() that would trigger a printk warning:

[   98.718078] ------------[ cut here ]------------
[   98.719012] precision 65536 too large
[   98.719039] WARNING: CPU: 1 PID: 3755 at lib/vsprintf.c:2703 vsnprintf+0x5e3/0x600
...

Link: https://tracker.ceph.com/issues/55725
Signed-off-by: Luís Henriques <lhenriques@suse.de>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
When handle_cap_grant is called on an IMPORT op, then the snap_rwsem is
held and the function is expected to release it before returning. It
currently fails to do that in all cases which could lead to a deadlock.

Fixes: 6f05b30 ("ceph: reset i_requested_max_size if file write is not wanted")
Link: https://tracker.ceph.com/issues/55857
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Luís Henriques <lhenriques@suse.de>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
The incorrect comment is misleading. Acutally the last members
in ceph_mds_caps strcut is a union for none export and export
bodies.

Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
There's no reason we need to lock the inode for write in order to handle
an llseek. I suspect this should have been dropped in 2013 when we
stopped doing vmtruncate in llseek.

With that gone, ceph_llseek is functionally equivalent to
generic_file_llseek, so just call that after getting the size.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Luís Henriques <lhenriques@suse.de>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
"was_async" is a bit misleadingly named. It's supposed to indicate
whether it's safe to call blocking operations from the context you're
calling it from, but it sounds like it's asking whether this was done
via async operation. For ceph, this it's always called from kernel
thread context so it should be safe to set this to false.

Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Willy requested that we change this back to warning on folio->private
being non-NULl. He's trying to kill off the PG_private flag, and so we'd
like to catch where it's non-NULL.

Add a VM_WARN_ON_FOLIO (since it doesn't exist yet) and change over to
using that instead of VM_BUG_ON_FOLIO along with testing the ->private
pointer.

[ xiubli: define VM_WARN_ON_FOLIO macro in case DEBUG_VM is disabled
  reported by kernel test robot <lkp@intel.com> ]

Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
The 'old_size' is a __le64 type since birth, not sure why the
kclient incorrectly switched it to __le32. This change is okay
won't break anything because union will always allocate more memory
than the 'open' member needed.

Rename 'file_replication' to 'pool' as ceph did. Though this 'open'
struct may never be used in kclient in future, it's confusing when
going through the ceph code.

Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
For async create we will always try to choose the auth MDS of frag
the dentry belonged to of the parent directory to send the request
and ususally this works fine, but if the MDS migrated the directory
to another MDS before it could be handled the request will be
forwarded. And then the auth cap will be changed.

We need to update the auth cap in this case before the request is
forwarded.

Link: https://tracker.ceph.com/issues/55857
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
bagasme and others added 17 commits August 11, 2022 02:35
…lity table

There is unexpected warning on KVM_CAP_VM_DISABLE_NX_HUGE_PAGES capability
table, which cause the table to be rendered as paragraph text instead.

The warning is due to missing colon at capability name and returns keyword,
as well as improper alignment on multi-line returns field.

Fix the warning by adding missing colons and aligning the field.

Link: https://lore.kernel.org/lkml/20220627181937.3be67263@canb.auug.org.au/
Fixes: 084cc29 ("KVM: x86/MMU: Allow NX huge pages to be disabled on a per-vm basis")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: David Matlack <dmatlack@google.com>
Cc: Ben Gardon <bgardon@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: kvm@vger.kernel.org
Cc: linux-next@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Message-Id: <20220627095151.19339-3-bagasdotme@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Shut up this warning:
kernel/bpf/syscall.c:5089:5: warning: no previous prototype for function 'kern_sys_bpf' [-Wmissing-prototypes]
int kern_sys_bpf(int cmd, union bpf_attr *attr, unsigned int size)

Reported-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Pull in bpf to silence a false positive warning.

* git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  bpf: Shut up kern_sys_bpf warning.

Link: https://lore.kernel.org/r/CAADnVQK589CZN1Q9w8huJqkEyEed+ZMTWqcpA1Rm2CjN3a4XoQ@mail.gmail.com/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
There are 2 models for MV32 serials. MV32-W-A is designed
based on Qualcomm SDX62 chip, and MV32-W-B is designed based
on Qualcomm SDX65 chip. So we use 2 different PID to separate it.

Test evidence as below:
T:  Bus=03 Lev=01 Prnt=01 Port=02 Cnt=03 Dev#=  3 Spd=480 MxCh= 0
D:  Ver= 2.10 Cls=ef(misc ) Sub=02 Prot=01 MxPS=64 #Cfgs=  1
P:  Vendor=1e2d ProdID=00f3 Rev=05.04
S:  Manufacturer=Cinterion
S:  Product=Cinterion PID 0x00F3 USB Mobile Broadband
S:  SerialNumber=d7b4be8d
C:  #Ifs= 4 Cfg#= 1 Atr=a0 MxPwr=500mA
I:  If#=0x0 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=50 Driver=qmi_wwan
I:  If#=0x1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option
I:  If#=0x2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option
I:  If#=0x3 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=option

T:  Bus=03 Lev=01 Prnt=01 Port=02 Cnt=03 Dev#= 10 Spd=480 MxCh= 0
D:  Ver= 2.10 Cls=ef(misc ) Sub=02 Prot=01 MxPS=64 #Cfgs=  1
P:  Vendor=1e2d ProdID=00f4 Rev=05.04
S:  Manufacturer=Cinterion
S:  Product=Cinterion PID 0x00F4 USB Mobile Broadband
S:  SerialNumber=d095087d
C:  #Ifs= 4 Cfg#= 1 Atr=a0 MxPwr=500mA
I:  If#=0x0 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=50 Driver=qmi_wwan
I:  If#=0x1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option
I:  If#=0x2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option
I:  If#=0x3 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=option

Signed-off-by: Slark Xiao <slark_xiao@163.com>
Acked-by: Bjørn Mork <bjorn@mork.no>
Link: https://lore.kernel.org/r/20220810014521.9383-1-slark_xiao@163.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit d5410ac ("net:bonding:support balance-alb interface
with vlan to bridge") introduced a reference count leak by not releasing
the reference acquired by ip_dev_find().  Remedy this by insuring the
reference is released.

Fixes: d5410ac ("net:bonding:support balance-alb interface with vlan to bridge")
Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://lore.kernel.org/r/26758.1660194413@famine
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Drop devl_lock() / devl_unlock() from ports creation and removal flows
since the devlink instance lock is now taken by mlxsw_core.

Fixes: 72a4c8c ("mlxsw: convert driver to use unlocked devlink API during init/fini")
Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/f4afce5ab0318617f3866b85274be52542d59b32.1660211614.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Change my address for mt7601u to the main one.

Link: https://lore.kernel.org/r/20220809233843.408004-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
area_cache_get() is used to distribute cache->area and set cache->id,
 and if cache->id is not 0 and cache->area->kref refcount is 0, it will
 release the cache->area by nfp_cpp_area_release(). area_cache_get()
 set cache->id before cpp->op->area_init() and nfp_cpp_area_acquire().

But if area_init() or nfp_cpp_area_acquire() fails, the cache->id is
 is already set but the refcount is not increased as expected. At this
 time, calling the nfp_cpp_area_release() will cause use-after-free.

To avoid the use-after-free, set cache->id after area_init() and
 nfp_cpp_area_acquire() complete successfully.

Note: This vulnerability is triggerable by providing emulated device
 equipped with specified configuration.

 BUG: KASAN: use-after-free in nfp6000_area_init (drivers/net/ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c:760)
  Write of size 4 at addr ffff888005b7f4a0 by task swapper/0/1

 Call Trace:
  <TASK>
 nfp6000_area_init (drivers/net/ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c:760)
 area_cache_get.constprop.8 (drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c:884)

 Allocated by task 1:
 nfp_cpp_area_alloc_with_name (drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c:303)
 nfp_cpp_area_cache_add (drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c:802)
 nfp6000_init (drivers/net/ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c:1230)
 nfp_cpp_from_operations (drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c:1215)
 nfp_pci_probe (drivers/net/ethernet/netronome/nfp/nfp_main.c:744)

 Freed by task 1:
 kfree (mm/slub.c:4562)
 area_cache_get.constprop.8 (drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c:873)
 nfp_cpp_read (drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c:924 drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c:973)
 nfp_cpp_readl (drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cpplib.c:48)

Signed-off-by: Jialiang Wang <wangjialiang0806@163.com>
Reviewed-by: Yinjun Zhang <yinjun.zhang@corigine.com>
Acked-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20220810073057.4032-1-wangjialiang0806@163.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Multicast group flags were added in commit 4d54cc3 ("mptcp: avoid
lock_fast usage in accept path"), but it missed adding the kdoc.

Mention which flags go into that field, and do the same for
op structs.

Link: https://lore.kernel.org/r/20220809232012.403730-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Merge changes adding support for device properties with buffer values
to the ACPI device properties handling code.

* acpi-properties:
  ACPI: property: Fix error handling in acpi_init_properties()
  ACPI: property: Read buffer properties as integers
  ACPI: property: Add support for parsing buffer property UUID
  ACPI: property: Unify integer value reading functions
  ACPI: property: Switch node property referencing from ifs to a switch
  ACPI: property: Move property ref argument parsing into a new function
  ACPI: property: Use acpi_object_type consistently in property ref parsing
  ACPI: property: Tie data nodes to acpi handles
  ACPI: property: Return type of acpi_add_nondev_subnodes() should be bool
We should trace the allocated address instead of page struct.

Fixes: 27c8748 ("dpaa2-eth: Use a single page per Rx buffer")
Signed-off-by: Chen Lin <chen.lin5@zte.com.cn>
Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Link: https://lore.kernel.org/r/20220811151651.3327-1-chen45464546@163.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jiri reports that linux-atm does not build without this header.
Bring it back. It's completely dead code but we can't break
the build for user space :(

Reported-by: Jiri Slaby <jirislaby@kernel.org>
Fixes: 052e1f0 ("net: atm: remove support for ZeitNet ZN122x ATM devices")
Link: https://lore.kernel.org/all/8576aef3-37e4-8bae-bab5-08f82a78efd3@kernel.org/
Link: https://lore.kernel.org/r/20220810164547.484378-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull more kvm updates from Paolo Bonzini:

 - Xen timer fixes

 - Documentation formatting fixes

 - Make rseq selftest compatible with glibc-2.35

 - Fix handling of illegal LEA reg, reg

 - Cleanup creation of debugfs entries

 - Fix steal time cache handling bug

 - Fixes for MMIO caching

 - Optimize computation of number of LBRs

 - Fix uninitialized field in guest_maxphyaddr < host_maxphyaddr path

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (26 commits)
  KVM: x86/MMU: properly format KVM_CAP_VM_DISABLE_NX_HUGE_PAGES capability table
  Documentation: KVM: extend KVM_CAP_VM_DISABLE_NX_HUGE_PAGES heading underline
  KVM: VMX: Adjust number of LBR records for PERF_CAPABILITIES at refresh
  KVM: VMX: Use proper type-safe functions for vCPU => LBRs helpers
  KVM: x86: Refresh PMU after writes to MSR_IA32_PERF_CAPABILITIES
  KVM: selftests: Test all possible "invalid" PERF_CAPABILITIES.LBR_FMT vals
  KVM: selftests: Use getcpu() instead of sched_getcpu() in rseq_test
  KVM: selftests: Make rseq compatible with glibc-2.35
  KVM: Actually create debugfs in kvm_create_vm()
  KVM: Pass the name of the VM fd to kvm_create_vm_debugfs()
  KVM: Get an fd before creating the VM
  KVM: Shove vcpu stats_id init into kvm_vcpu_init()
  KVM: Shove vm stats_id init into kvm_create_vm()
  KVM: x86/mmu: Add sanity check that MMIO SPTE mask doesn't overlap gen
  KVM: x86/mmu: rename trace function name for asynchronous page fault
  KVM: x86/xen: Stop Xen timer before changing IRQ
  KVM: x86/xen: Initialize Xen timer only once
  KVM: SVM: Disable SEV-ES support if MMIO caching is disable
  KVM: x86/mmu: Fully re-evaluate MMIO caching when SPTE masks change
  KVM: x86: Tag kvm_mmu_x86_module_init() with __init
  ...
Pull ceph updates from Ilya Dryomov:
 "We have a good pile of various fixes and cleanups from Xiubo, Jeff,
  Luis and others, almost exclusively in the filesystem.

  Several patches touch files outside of our normal purview to set the
  stage for bringing in Jeff's long awaited ceph+fscrypt series in the
  near future. All of them have appropriate acks and sat in linux-next
  for a while"

* tag 'ceph-for-5.20-rc1' of https://github.com/ceph/ceph-client: (27 commits)
  libceph: clean up ceph_osdc_start_request prototype
  libceph: fix ceph_pagelist_reserve() comment typo
  ceph: remove useless check for the folio
  ceph: don't truncate file in atomic_open
  ceph: make f_bsize always equal to f_frsize
  ceph: flush the dirty caps immediatelly when quota is approaching
  libceph: print fsid and epoch with osd id
  libceph: check pointer before assigned to "c->rules[]"
  ceph: don't get the inline data for new creating files
  ceph: update the auth cap when the async create req is forwarded
  ceph: make change_auth_cap_ses a global symbol
  ceph: fix incorrect old_size length in ceph_mds_request_args
  ceph: switch back to testing for NULL folio->private in ceph_dirty_folio
  ceph: call netfs_subreq_terminated with was_async == false
  ceph: convert to generic_file_llseek
  ceph: fix the incorrect comment for the ceph_mds_caps struct
  ceph: don't leak snap_rwsem in handle_cap_grant
  ceph: prevent a client from exceeding the MDS maximum xattr size
  ceph: choose auth MDS for getxattr with the Xs caps
  ceph: add session already open notify support
  ...
…xfs-linux

Pull more iomap updates from Darrick Wong:
 "In the past 10 days or so I've not heard any ZOMG STOP style
  complaints about removing ->writepage support from gfs2 or zonefs, so
  here's the pull request removing them (and the underlying fs iomap
  support) from the kernel:

   - Remove iomap_writepage and all callers, since the mm apparently
     never called the zonefs or gfs2 writepage functions"

* tag 'iomap-6.0-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  iomap: remove iomap_writepage
  zonefs: remove ->writepage
  gfs2: remove ->writepage
  gfs2: stop using generic_writepages in gfs2_ail1_start_one
…nel/git/rafael/linux-pm

Pull more ACPI updates from Rafael Wysocki:
 "These fix up direct references to the fwnode field in struct device
  and extend ACPI device properties support.

  Specifics:

   - Replace direct references to the fwnode field in struct device with
     dev_fwnode() and device_match_fwnode() (Andy Shevchenko)

   - Make the ACPI code handling device properties support properties
     with buffer values (Sakari Ailus)"

* tag 'acpi-5.20-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: property: Fix error handling in acpi_init_properties()
  ACPI: VIOT: Do not dereference fwnode in struct device
  ACPI: property: Read buffer properties as integers
  ACPI: property: Add support for parsing buffer property UUID
  ACPI: property: Unify integer value reading functions
  ACPI: property: Switch node property referencing from ifs to a switch
  ACPI: property: Move property ref argument parsing into a new function
  ACPI: property: Use acpi_object_type consistently in property ref parsing
  ACPI: property: Tie data nodes to acpi handles
  ACPI: property: Return type of acpi_add_nondev_subnodes() should be bool
…git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from bluetooth, bpf, can and netfilter.

  A little larger than usual but it's all fixes, no late features. It's
  large partially because of timing, and partially because of follow ups
  to stuff that got merged a week or so before the merge window and
  wasn't as widely tested. Maybe the Bluetooth fixes are a little
  alarming so we'll address that, but the rest seems okay and not scary.

  Notably we're including a fix for the netfilter Kconfig [1], your WiFi
  warning [2] and a bluetooth fix which should unblock syzbot [3].

  Current release - regressions:

   - Bluetooth:
      - don't try to cancel uninitialized works [3]
      - L2CAP: fix use-after-free caused by l2cap_chan_put

   - tls: rx: fix device offload after recent rework

   - devlink: fix UAF on failed reload and leftover locks in mlxsw

  Current release - new code bugs:

   - netfilter:
      - flowtable: fix incorrect Kconfig dependencies [1]
      - nf_tables: fix crash when nf_trace is enabled

   - bpf:
      - use proper target btf when exporting attach_btf_obj_id
      - arm64: fixes for bpf trampoline support

   - Bluetooth:
      - ISO: unlock on error path in iso_sock_setsockopt()
      - ISO: fix info leak in iso_sock_getsockopt()
      - ISO: fix iso_sock_getsockopt for BT_DEFER_SETUP
      - ISO: fix memory corruption on iso_pinfo.base
      - ISO: fix not using the correct QoS
      - hci_conn: fix updating ISO QoS PHY

   - phy: dp83867: fix get nvmem cell fail

  Previous releases - regressions:

   - wifi: cfg80211: fix validating BSS pointers in
     __cfg80211_connect_result [2]

   - atm: bring back zatm uAPI after ATM had been removed

   - properly fix old bug making bonding ARP monitor mode not being able
     to work with software devices with lockless Tx

   - tap: fix null-deref on skb->dev in dev_parse_header_protocol

   - revert "net: usb: ax88179_178a needs FLAG_SEND_ZLP" it helps some
     devices and breaks others

   - netfilter:
      - nf_tables: many fixes rejecting cross-object linking which may
        lead to UAFs
      - nf_tables: fix null deref due to zeroed list head
      - nf_tables: validate variable length element extension

   - bgmac: fix a BUG triggered by wrong bytes_compl

   - bcmgenet: indicate MAC is in charge of PHY PM

  Previous releases - always broken:

   - bpf:
      - fix bad pointer deref in bpf_sys_bpf() injected via test infra
      - disallow non-builtin bpf programs calling the prog_run command
      - don't reinit map value in prealloc_lru_pop
      - fix UAFs during the read of map iterator fd
      - fix invalidity check for values in sk local storage map
      - reject sleepable program for non-resched map iterator

   - mptcp:
      - move subflow cleanup in mptcp_destroy_common()
      - do not queue data on closed subflows

   - virtio_net: fix memory leak inside XDP_TX with mergeable

   - vsock: fix memory leak when multiple threads try to connect()

   - rework sk_user_data sharing to prevent psock leaks

   - geneve: fix TOS inheriting for ipv4

   - tunnels & drivers: do not use RT_TOS for IPv6 flowlabel

   - phy: c45 baset1: do not skip aneg configuration if clock role is
     not specified

   - rose: avoid overflow when /proc displays timer information

   - x25: fix call timeouts in blocking connects

   - can: mcp251x: fix race condition on receive interrupt

   - can: j1939:
      - replace user-reachable WARN_ON_ONCE() with netdev_warn_once()
      - fix memory leak of skbs in j1939_session_destroy()

  Misc:

   - docs: bpf: clarify that many things are not uAPI

   - seg6: initialize induction variable to first valid array index (to
     silence clang vs objtool warning)

   - can: ems_usb: fix clang 14's -Wunaligned-access warning"

* tag 'net-6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (117 commits)
  net: atm: bring back zatm uAPI
  dpaa2-eth: trace the allocated address instead of page struct
  net: add missing kdoc for struct genl_multicast_group::flags
  nfp: fix use-after-free in area_cache_get()
  MAINTAINERS: use my korg address for mt7601u
  mlxsw: minimal: Fix deadlock in ports creation
  bonding: fix reference count leak in balance-alb mode
  net: usb: qmi_wwan: Add support for Cinterion MV32
  bpf: Shut up kern_sys_bpf warning.
  net/tls: Use RCU API to access tls_ctx->netdev
  tls: rx: device: don't try to copy too much on detach
  tls: rx: device: bound the frag walk
  net_sched: cls_route: remove from list when handle is 0
  selftests: forwarding: Fix failing tests with old libnet
  net: refactor bpf_sk_reuseport_detach()
  net: fix refcount bug in sk_psock_get (2)
  selftests/bpf: Ensure sleepable program is rejected by hash map iter
  selftests/bpf: Add write tests for sk local storage map iterator
  selftests/bpf: Add tests for reading a dangling map iter fd
  bpf: Only allow sleepable program for resched-able iterator
  ...
@pull pull bot added the ⤵️ pull label Aug 12, 2022
@pull pull bot merged commit 7ebfc85 into DVRKHz:master Aug 12, 2022
pull bot pushed a commit that referenced this pull request Aug 13, 2022
When use 'echo c > /proc/sysrq-trigger' to trigger kdump, riscv_crash_save_regs()
will be called to save regs for vmcore, we found "epc" value 00ffffffa5537400
is not a valid kernel virtual address, but is a user virtual address. Other
regs(eg, ra, sp, gp...) are correct kernel virtual address.
Actually 0x00ffffffb0dd9400 is the user mode PC of 'PID: 113 Comm: sh', which
is saved in the task's stack.

[   21.201701] CPU: 0 PID: 113 Comm: sh Kdump: loaded Not tainted 5.18.9 #45
[   21.201979] Hardware name: riscv-virtio,qemu (DT)
[   21.202160] epc : 00ffffffa5537400 ra : ffffffff80088640 sp : ff20000010333b90
[   21.202435]  gp : ffffffff810dde38 tp : ff6000000226c200 t0 : ffffffff8032be7c
[   21.202707]  t1 : 0720072007200720 t2 : 30203a7375746174 s0 : ff20000010333cf0
[   21.202973]  s1 : 0000000000000000 a0 : ff20000010333b98 a1 : 0000000000000001
[   21.203243]  a2 : 0000000000000010 a3 : 0000000000000000 a4 : 28c8f0aeffea4e00
[   21.203519]  a5 : 28c8f0aeffea4e00 a6 : 0000000000000009 a7 : ffffffff8035c9b8
[   21.203794]  s2 : ffffffff810df0a8 s3 : ffffffff810df718 s4 : ff20000010333b98
[   21.204062]  s5 : 0000000000000000 s6 : 0000000000000007 s7 : ffffffff80c4a468
[   21.204331]  s8 : 00ffffffef451410 s9 : 0000000000000007 s10: 00aaaaaac0510700
[   21.204606]  s11: 0000000000000001 t3 : ff60000001218f00 t4 : ff60000001218f00
[   21.204876]  t5 : ff60000001218000 t6 : ff200000103338b8
[   21.205079] status: 0000000200000020 badaddr: 0000000000000000 cause: 0000000000000008

With the incorrect PC, the backtrace showed by crash tool as below, the first
stack frame is abnormal,

crash> bt
PID: 113      TASK: ff60000002269600  CPU: 0    COMMAND: "sh"
 #0 [ff2000001039bb90] __efistub_.Ldebug_info0 at 00ffffffa5537400 <-- Abnormal
 #1 [ff2000001039bcf0] panic at ffffffff806578ba
 #2 [ff2000001039bd50] sysrq_reset_seq_param_set at ffffffff8038c030
 #3 [ff2000001039bda0] __handle_sysrq at ffffffff8038c5f8
 #4 [ff2000001039be00] write_sysrq_trigger at ffffffff8038cad8
 #5 [ff2000001039be20] proc_reg_write at ffffffff801b7edc
 #6 [ff2000001039be40] vfs_write at ffffffff80152ba6
 #7 [ff2000001039be80] ksys_write at ffffffff80152ece
 #8 [ff2000001039bed0] sys_write at ffffffff80152f46

With the patch, we can get current kernel mode PC, the output as below,

[   17.607658] CPU: 0 PID: 113 Comm: sh Kdump: loaded Not tainted 5.18.9 #42
[   17.607937] Hardware name: riscv-virtio,qemu (DT)
[   17.608150] epc : ffffffff800078f8 ra : ffffffff8008862c sp : ff20000010333b90
[   17.608441]  gp : ffffffff810dde38 tp : ff6000000226c200 t0 : ffffffff8032be68
[   17.608741]  t1 : 0720072007200720 t2 : 666666666666663c s0 : ff20000010333cf0
[   17.609025]  s1 : 0000000000000000 a0 : ff20000010333b98 a1 : 0000000000000001
[   17.609320]  a2 : 0000000000000010 a3 : 0000000000000000 a4 : 0000000000000000
[   17.609601]  a5 : ff60000001c78000 a6 : 000000000000003c a7 : ffffffff8035c9a4
[   17.609894]  s2 : ffffffff810df0a8 s3 : ffffffff810df718 s4 : ff20000010333b98
[   17.610186]  s5 : 0000000000000000 s6 : 0000000000000007 s7 : ffffffff80c4a468
[   17.610469]  s8 : 00ffffffca281410 s9 : 0000000000000007 s10: 00aaaaaab5bb6700
[   17.610755]  s11: 0000000000000001 t3 : ff60000001218f00 t4 : ff60000001218f00
[   17.611041]  t5 : ff60000001218000 t6 : ff20000010333988
[   17.611255] status: 0000000200000020 badaddr: 0000000000000000 cause: 0000000000000008

With the correct PC, the backtrace showed by crash tool as below,

crash> bt
PID: 113      TASK: ff6000000226c200  CPU: 0    COMMAND: "sh"
 #0 [ff20000010333b90] riscv_crash_save_regs at ffffffff800078f8 <--- Normal
 #1 [ff20000010333cf0] panic at ffffffff806578c6
 #2 [ff20000010333d50] sysrq_reset_seq_param_set at ffffffff8038c03c
 #3 [ff20000010333da0] __handle_sysrq at ffffffff8038c604
 #4 [ff20000010333e00] write_sysrq_trigger at ffffffff8038cae4
 #5 [ff20000010333e20] proc_reg_write at ffffffff801b7ee8
 #6 [ff20000010333e40] vfs_write at ffffffff80152bb2
 #7 [ff20000010333e80] ksys_write at ffffffff80152eda
 #8 [ff20000010333ed0] sys_write at ffffffff80152f52

Fixes: e53d281 ("RISC-V: Add kdump support")
Co-developed-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com>
Link: https://lore.kernel.org/r/20220811074150.3020189-3-xianting.tian@linux.alibaba.com
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
pull bot pushed a commit that referenced this pull request Sep 8, 2023
team interface has used a dynamic lockdep key to avoid false-positive
lockdep deadlock detection. Virtual interfaces such as team usually
have their own lock for protecting private data.
These interfaces can be nested.
team0
  |
team1

Each interface's lock is actually different(team0->lock and team1->lock).
So,
mutex_lock(&team0->lock);
mutex_lock(&team1->lock);
mutex_unlock(&team1->lock);
mutex_unlock(&team0->lock);
The above case is absolutely safe. But lockdep warns about deadlock.
Because the lockdep understands these two locks are same. This is a
false-positive lockdep warning.

So, in order to avoid this problem, the team interfaces started to use
dynamic lockdep key. The false-positive problem was fixed, but it
introduced a new problem.

When the new team virtual interface is created, it registers a dynamic
lockdep key(creates dynamic lockdep key) and uses it. But there is the
limitation of the number of lockdep keys.
So, If so many team interfaces are created, it consumes all lockdep keys.
Then, the lockdep stops to work and warns about it.

In order to fix this problem, team interfaces use the subclass instead
of the dynamic key. So, when a new team interface is created, it doesn't
register(create) a new lockdep, but uses existed subclass key instead.
It is already used by the bonding interface for a similar case.

As the bonding interface does, the subclass variable is the same as
the 'dev->nested_level'. This variable indicates the depth in the stacked
interface graph.

The 'dev->nested_level' is protected by RTNL and RCU.
So, 'mutex_lock_nested()' for 'team->lock' requires RTNL or RCU.
In the current code, 'team->lock' is usually acquired under RTNL, there is
no problem with using 'dev->nested_level'.

The 'team_nl_team_get()' and The 'lb_stats_refresh()' functions acquire
'team->lock' without RTNL.
But these don't iterate their own ports nested so they don't need nested
lock.

Reproducer:
   for i in {0..1000}
   do
           ip link add team$i type team
           ip link add dummy$i master team$i type dummy
           ip link set dummy$i up
           ip link set team$i up
   done

Splat looks like:
   BUG: MAX_LOCKDEP_ENTRIES too low!
   turning off the locking correctness validator.
   Please attach the output of /proc/lock_stat to the bug report
   CPU: 0 PID: 4104 Comm: ip Not tainted 6.5.0-rc7+ #45
   Call Trace:
    <TASK>
   dump_stack_lvl+0x64/0xb0
   add_lock_to_list+0x30d/0x5e0
   check_prev_add+0x73a/0x23a0
   ...
   sock_def_readable+0xfe/0x4f0
   netlink_broadcast+0x76b/0xac0
   nlmsg_notify+0x69/0x1d0
   dev_open+0xed/0x130
   ...

Reported-by: syzbot+9bbbacfbf1e04d5221f7@syzkaller.appspotmail.com
Fixes: 369f61b ("team: fix nested locking lockdep warning")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
pull bot pushed a commit that referenced this pull request Sep 23, 2023
There is a UAF when xfstests on cifs:

  BUG: KASAN: use-after-free in smb2_is_network_name_deleted+0x27/0x160
  Read of size 4 at addr ffff88810103fc08 by task cifsd/923

  CPU: 1 PID: 923 Comm: cifsd Not tainted 6.1.0-rc4+ #45
  ...
  Call Trace:
   <TASK>
   dump_stack_lvl+0x34/0x44
   print_report+0x171/0x472
   kasan_report+0xad/0x130
   kasan_check_range+0x145/0x1a0
   smb2_is_network_name_deleted+0x27/0x160
   cifs_demultiplex_thread.cold+0x172/0x5a4
   kthread+0x165/0x1a0
   ret_from_fork+0x1f/0x30
   </TASK>

  Allocated by task 923:
   kasan_save_stack+0x1e/0x40
   kasan_set_track+0x21/0x30
   __kasan_slab_alloc+0x54/0x60
   kmem_cache_alloc+0x147/0x320
   mempool_alloc+0xe1/0x260
   cifs_small_buf_get+0x24/0x60
   allocate_buffers+0xa1/0x1c0
   cifs_demultiplex_thread+0x199/0x10d0
   kthread+0x165/0x1a0
   ret_from_fork+0x1f/0x30

  Freed by task 921:
   kasan_save_stack+0x1e/0x40
   kasan_set_track+0x21/0x30
   kasan_save_free_info+0x2a/0x40
   ____kasan_slab_free+0x143/0x1b0
   kmem_cache_free+0xe3/0x4d0
   cifs_small_buf_release+0x29/0x90
   SMB2_negotiate+0x8b7/0x1c60
   smb2_negotiate+0x51/0x70
   cifs_negotiate_protocol+0xf0/0x160
   cifs_get_smb_ses+0x5fa/0x13c0
   mount_get_conns+0x7a/0x750
   cifs_mount+0x103/0xd00
   cifs_smb3_do_mount+0x1dd/0xcb0
   smb3_get_tree+0x1d5/0x300
   vfs_get_tree+0x41/0xf0
   path_mount+0x9b3/0xdd0
   __x64_sys_mount+0x190/0x1d0
   do_syscall_64+0x35/0x80
   entry_SYSCALL_64_after_hwframe+0x46/0xb0

The UAF is because:

 mount(pid: 921)               | cifsd(pid: 923)
-------------------------------|-------------------------------
                               | cifs_demultiplex_thread
SMB2_negotiate                 |
 cifs_send_recv                |
  compound_send_recv           |
   smb_send_rqst               |
    wait_for_response          |
     wait_event_state      [1] |
                               |  standard_receive3
                               |   cifs_handle_standard
                               |    handle_mid
                               |     mid->resp_buf = buf;  [2]
                               |     dequeue_mid           [3]
     KILL the process      [4] |
    resp_iov[i].iov_base = buf |
 free_rsp_buf              [5] |
                               |   is_network_name_deleted [6]
                               |   callback

1. After send request to server, wait the response until
    mid->mid_state != SUBMITTED;
2. Receive response from server, and set it to mid;
3. Set the mid state to RECEIVED;
4. Kill the process, the mid state already RECEIVED, get 0;
5. Handle and release the negotiate response;
6. UAF.

It can be easily reproduce with add some delay in [3] - [6].

Only sync call has the problem since async call's callback is
executed in cifsd process.

Add an extra state to mark the mid state to READY before wakeup the
waitter, then it can get the resp safely.

Fixes: ec637e3 ("[CIFS] Avoid extra large buffer allocation (and memcpy) in cifs_readpages")
Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
pull bot pushed a commit that referenced this pull request Oct 20, 2023
rtnl_offload_xstats_get_size_hw_s_info_one() conditionalizes the
size-computation for IFLA_OFFLOAD_XSTATS_HW_S_INFO_USED based on whether
or not the device has offload_xstats enabled.

However, rtnl_offload_xstats_fill_hw_s_info_one() is adding the u8 for
that field uncondtionally.

syzkaller triggered a WARNING in rtnl_stats_get due to this:
------------[ cut here ]------------
WARNING: CPU: 0 PID: 754 at net/core/rtnetlink.c:5982 rtnl_stats_get+0x2f4/0x300
Modules linked in:
CPU: 0 PID: 754 Comm: syz-executor148 Not tainted 6.6.0-rc2-g331b78eb12af #45
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
RIP: 0010:rtnl_stats_get+0x2f4/0x300 net/core/rtnetlink.c:5982
Code: ff ff 89 ee e8 7d 72 50 ff 83 fd a6 74 17 e8 33 6e 50 ff 4c 89 ef be 02 00 00 00 e8 86 00 fa ff e9 7b fe ff ff e8 1c 6e 50 ff <0f> 0b eb e5 e8 73 79 7b 00 0f 1f 00 90 90 90 90 90 90 90 90 90 90
RSP: 0018:ffffc900006837c0 EFLAGS: 00010293
RAX: ffffffff81cf7f24 RBX: ffff8881015d9000 RCX: ffff888101815a00
RDX: 0000000000000000 RSI: 00000000ffffffa6 RDI: 00000000ffffffa6
RBP: 00000000ffffffa6 R08: ffffffff81cf7f03 R09: 0000000000000001
R10: ffff888101ba47b9 R11: ffff888101815a00 R12: ffff8881017dae00
R13: ffff8881017dad00 R14: ffffc90000683ab8 R15: ffffffff83c1f740
FS:  00007fbc22dbc740(0000) GS:ffff88813bc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020000046 CR3: 000000010264e003 CR4: 0000000000170ef0
Call Trace:
 <TASK>
 rtnetlink_rcv_msg+0x677/0x710 net/core/rtnetlink.c:6480
 netlink_rcv_skb+0xea/0x1c0 net/netlink/af_netlink.c:2545
 netlink_unicast+0x430/0x500 net/netlink/af_netlink.c:1342
 netlink_sendmsg+0x4fc/0x620 net/netlink/af_netlink.c:1910
 sock_sendmsg+0xa8/0xd0 net/socket.c:730
 ____sys_sendmsg+0x22a/0x320 net/socket.c:2541
 ___sys_sendmsg+0x143/0x190 net/socket.c:2595
 __x64_sys_sendmsg+0xd8/0x150 net/socket.c:2624
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x47/0xa0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x6e/0xd8
RIP: 0033:0x7fbc22e8d6a9
Code: 5c c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 4f 37 0d 00 f7 d8 64 89 01 48
RSP: 002b:00007ffc4320e778 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00000000004007d0 RCX: 00007fbc22e8d6a9
RDX: 0000000000000000 RSI: 0000000020000000 RDI: 0000000000000003
RBP: 0000000000000001 R08: 0000000000000000 R09: 00000000004007d0
R10: 0000000000000008 R11: 0000000000000246 R12: 00007ffc4320e898
R13: 00007ffc4320e8a8 R14: 00000000004004a0 R15: 00007fbc22fa5a80
 </TASK>
---[ end trace 0000000000000000 ]---

Which didn't happen prior to commit bf9f1ba ("net: add dedicated
kmem_cache for typical/small skb->head") as the skb always was large
enough.

Fixes: 0e7788f ("net: rtnetlink: Add UAPI for obtaining L3 offload xstats")
Signed-off-by: Christoph Paasch <cpaasch@apple.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://lore.kernel.org/r/20231013041448.8229-1-cpaasch@apple.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
pull bot pushed a commit that referenced this pull request Nov 3, 2023
…le nodes

The decreasing of hugetlb pages number failed with the following message
given:

 sh: page allocation failure: order:0, mode:0x204cc0(GFP_KERNEL|__GFP_RETRY_MAYFAIL|__GFP_THISNODE)
 CPU: 1 PID: 112 Comm: sh Not tainted 6.5.0-rc7-... #45
 Hardware name: linux,dummy-virt (DT)
 Call trace:
  dump_backtrace.part.6+0x84/0xe4
  show_stack+0x18/0x24
  dump_stack_lvl+0x48/0x60
  dump_stack+0x18/0x24
  warn_alloc+0x100/0x1bc
  __alloc_pages_slowpath.constprop.107+0xa40/0xad8
  __alloc_pages+0x244/0x2d0
  hugetlb_vmemmap_restore+0x104/0x1e4
  __update_and_free_hugetlb_folio+0x44/0x1f4
  update_and_free_hugetlb_folio+0x20/0x68
  update_and_free_pages_bulk+0x4c/0xac
  set_max_huge_pages+0x198/0x334
  nr_hugepages_store_common+0x118/0x178
  nr_hugepages_store+0x18/0x24
  kobj_attr_store+0x18/0x2c
  sysfs_kf_write+0x40/0x54
  kernfs_fop_write_iter+0x164/0x1dc
  vfs_write+0x3a8/0x460
  ksys_write+0x6c/0x100
  __arm64_sys_write+0x1c/0x28
  invoke_syscall+0x44/0x100
  el0_svc_common.constprop.1+0x6c/0xe4
  do_el0_svc+0x38/0x94
  el0_svc+0x28/0x74
  el0t_64_sync_handler+0xa0/0xc4
  el0t_64_sync+0x174/0x178
 Mem-Info:
  ...

The reason is that the hugetlb pages being released are allocated from
movable nodes, and with hugetlb_optimize_vmemmap enabled, vmemmap pages
need to be allocated from the same node during the hugetlb pages
releasing. With GFP_KERNEL and __GFP_THISNODE set, allocating from movable
node is always failed. Fix this problem by removing __GFP_THISNODE.

Link: https://lkml.kernel.org/r/20230905124503.24899-1-yuancan@huawei.com
Fixes: ad2fa37 ("mm: hugetlb: alloc the vmemmap pages associated with each HugeTLB page")
Signed-off-by: Yuan Can <yuancan@huawei.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
pull bot pushed a commit that referenced this pull request Nov 16, 2023
After the blamed commit below, the TCP sockets (and the MPTCP subflows)
can build egress packets larger than 64K. That exceeds the maximum DSS
data size, the length being misrepresent on the wire and the stream being
corrupted, as later observed on the receiver:

  WARNING: CPU: 0 PID: 9696 at net/mptcp/protocol.c:705 __mptcp_move_skbs_from_subflow+0x2604/0x26e0
  CPU: 0 PID: 9696 Comm: syz-executor.7 Not tainted 6.6.0-rc5-gcd8bdf563d46 #45
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
  netlink: 8 bytes leftover after parsing attributes in process `syz-executor.4'.
  RIP: 0010:__mptcp_move_skbs_from_subflow+0x2604/0x26e0 net/mptcp/protocol.c:705
  RSP: 0018:ffffc90000006e80 EFLAGS: 00010246
  RAX: ffffffff83e9f674 RBX: ffff88802f45d870 RCX: ffff888102ad0000
  netlink: 8 bytes leftover after parsing attributes in process `syz-executor.4'.
  RDX: 0000000080000303 RSI: 0000000000013908 RDI: 0000000000003908
  RBP: ffffc90000007110 R08: ffffffff83e9e078 R09: 1ffff1100e548c8a
  R10: dffffc0000000000 R11: ffffed100e548c8b R12: 0000000000013908
  R13: dffffc0000000000 R14: 0000000000003908 R15: 000000000031cf29
  FS:  00007f239c47e700(0000) GS:ffff88811b200000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007f239c45cd78 CR3: 000000006a66c006 CR4: 0000000000770ef0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
  PKRU: 55555554
  Call Trace:
   <IRQ>
   mptcp_data_ready+0x263/0xac0 net/mptcp/protocol.c:819
   subflow_data_ready+0x268/0x6d0 net/mptcp/subflow.c:1409
   tcp_data_queue+0x21a1/0x7a60 net/ipv4/tcp_input.c:5151
   tcp_rcv_established+0x950/0x1d90 net/ipv4/tcp_input.c:6098
   tcp_v6_do_rcv+0x554/0x12f0 net/ipv6/tcp_ipv6.c:1483
   tcp_v6_rcv+0x2e26/0x3810 net/ipv6/tcp_ipv6.c:1749
   ip6_protocol_deliver_rcu+0xd6b/0x1ae0 net/ipv6/ip6_input.c:438
   ip6_input+0x1c5/0x470 net/ipv6/ip6_input.c:483
   ipv6_rcv+0xef/0x2c0 include/linux/netfilter.h:304
   __netif_receive_skb+0x1ea/0x6a0 net/core/dev.c:5532
   process_backlog+0x353/0x660 net/core/dev.c:5974
   __napi_poll+0xc6/0x5a0 net/core/dev.c:6536
   net_rx_action+0x6a0/0xfd0 net/core/dev.c:6603
   __do_softirq+0x184/0x524 kernel/softirq.c:553
   do_softirq+0xdd/0x130 kernel/softirq.c:454

Address the issue explicitly bounding the maximum GSO size to what MPTCP
actually allows.

Reported-by: Christoph Paasch <cpaasch@apple.com>
Closes: multipath-tcp/mptcp_net-next#450
Fixes: 7c4e983 ("net: allow gso_max_size to exceed 65536")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20231114-upstream-net-20231113-mptcp-misc-fixes-6-7-rc2-v1-1-7b9cd6a7b7f4@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
pull bot pushed a commit that referenced this pull request Oct 5, 2024
There is another found exception that the "timerlat/1" thread was
scheduled on CPU0, and lead to timer corruption finally:

```
ODEBUG: init active (active state 0) object: ffff888237c2e108 object type: hrtimer hint: timerlat_irq+0x0/0x220
WARNING: CPU: 0 PID: 426 at lib/debugobjects.c:518 debug_print_object+0x7d/0xb0
Modules linked in:
CPU: 0 UID: 0 PID: 426 Comm: timerlat/1 Not tainted 6.11.0-rc7+ #45
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
RIP: 0010:debug_print_object+0x7d/0xb0
...
Call Trace:
 <TASK>
 ? __warn+0x7c/0x110
 ? debug_print_object+0x7d/0xb0
 ? report_bug+0xf1/0x1d0
 ? prb_read_valid+0x17/0x20
 ? handle_bug+0x3f/0x70
 ? exc_invalid_op+0x13/0x60
 ? asm_exc_invalid_op+0x16/0x20
 ? debug_print_object+0x7d/0xb0
 ? debug_print_object+0x7d/0xb0
 ? __pfx_timerlat_irq+0x10/0x10
 __debug_object_init+0x110/0x150
 hrtimer_init+0x1d/0x60
 timerlat_main+0xab/0x2d0
 ? __pfx_timerlat_main+0x10/0x10
 kthread+0xb7/0xe0
 ? __pfx_kthread+0x10/0x10
 ret_from_fork+0x2d/0x40
 ? __pfx_kthread+0x10/0x10
 ret_from_fork_asm+0x1a/0x30
 </TASK>
```

After tracing the scheduling event, it was discovered that the migration
of the "timerlat/1" thread was performed during thread creation. Further
analysis confirmed that it is because the CPU online processing for
osnoise is implemented through workers, which is asynchronous with the
offline processing. When the worker was scheduled to create a thread, the
CPU may has already been removed from the cpu_online_mask during the offline
process, resulting in the inability to select the right CPU:

T1                       | T2
[CPUHP_ONLINE]           | cpu_device_down()
osnoise_hotplug_workfn() |
                         |     cpus_write_lock()
                         |     takedown_cpu(1)
                         |     cpus_write_unlock()
[CPUHP_OFFLINE]          |
    cpus_read_lock()     |
    start_kthread(1)     |
    cpus_read_unlock()   |

To fix this, skip online processing if the CPU is already offline.

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20240924094515.3561410-4-liwei391@huawei.com
Fixes: c8895e2 ("trace/osnoise: Support hotplug operations")
Signed-off-by: Wei Li <liwei391@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
pull bot pushed a commit that referenced this pull request May 1, 2025
When retrieving the FW coredump using ethtool, it can sometimes cause
memory corruption:

BUG: KFENCE: memory corruption in __bnxt_get_coredump+0x3ef/0x670 [bnxt_en]
Corrupted memory at 0x000000008f0f30e8 [ ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ] (in kfence-#45):
__bnxt_get_coredump+0x3ef/0x670 [bnxt_en]
ethtool_get_dump_data+0xdc/0x1a0
__dev_ethtool+0xa1e/0x1af0
dev_ethtool+0xa8/0x170
dev_ioctl+0x1b5/0x580
sock_do_ioctl+0xab/0xf0
sock_ioctl+0x1ce/0x2e0
__x64_sys_ioctl+0x87/0xc0
do_syscall_64+0x5c/0xf0
entry_SYSCALL_64_after_hwframe+0x78/0x80

...

This happens when copying the coredump segment list in
bnxt_hwrm_dbg_dma_data() with the HWRM_DBG_COREDUMP_LIST FW command.
The info->dest_buf buffer is allocated based on the number of coredump
segments returned by the FW.  The segment list is then DMA'ed by
the FW and the length of the DMA is returned by FW.  The driver then
copies this DMA'ed segment list to info->dest_buf.

In some cases, this DMA length may exceed the info->dest_buf length
and cause the above BUG condition.  Fix it by capping the copy
length to not exceed the length of info->dest_buf.  The extra
DMA data contains no useful information.

This code path is shared for the HWRM_DBG_COREDUMP_LIST and the
HWRM_DBG_COREDUMP_RETRIEVE FW commands.  The buffering is different
for these 2 FW commands.  To simplify the logic, we need to move
the line to adjust the buffer length for HWRM_DBG_COREDUMP_RETRIEVE
up, so that the new check to cap the copy length will work for both
commands.

Fixes: c74751f ("bnxt_en: Return error if FW returns more data than dump length")
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Shruti Parab <shruti.parab@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
pull bot pushed a commit that referenced this pull request Sep 30, 2025
…show()

For the reader it is not obvious that the counter values are displayed
in hex as there's no leading '0x' before '%x'.

So changed them to %u instead and added '0x' for non-counter values
and also a string for the status.

Note this generates some checkpatch warnings like this:

 WARNING: quoted string split across lines
 #45: FILE: fs/smb/client/cifs_debug.c:460:
 +               seq_printf(m, "\nSMBDirect protocol version: 0x%x "
 +                       "transport status: %s (%u)",

But I'll leave this is the current style in the
related code...

Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Long Li <longli@microsoft.com>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
pull bot pushed a commit that referenced this pull request Mar 5, 2026
Offloading ETS requires computing each class' WRR weight: this is done by
averaging over the sums of quanta as 'q_sum' and 'q_psum'. Using unsigned
int, the same integer size as the individual DRR quanta, can overflow and
even cause division by zero, like it happened in the following splat:

 Oops: divide error: 0000 [#1] SMP PTI
 CPU: 13 UID: 0 PID: 487 Comm: tc Tainted: G            E       6.19.0-virtme #45 PREEMPT(full)
 Tainted: [E]=UNSIGNED_MODULE
 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
 RIP: 0010:ets_offload_change+0x11f/0x290 [sch_ets]
 Code: e4 45 31 ff eb 03 41 89 c7 41 89 cb 89 ce 83 f9 0f 0f 87 b7 00 00 00 45 8b 08 31 c0 45 01 cc 45 85 c9 74 09 41 6b c4 64 31 d2 <41> f7 f2 89 c2 44 29 fa 45 89 df 41 83 fb 0f 0f 87 c7 00 00 00 44
 RSP: 0018:ffffd0a180d77588 EFLAGS: 00010246
 RAX: 00000000ffffff38 RBX: ffff8d3d482ca000 RCX: 0000000000000000
 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffd0a180d77660
 RBP: ffffd0a180d77690 R08: ffff8d3d482ca2d8 R09: 00000000fffffffe
 R10: 0000000000000000 R11: 0000000000000000 R12: 00000000fffffffe
 R13: ffff8d3d472f2000 R14: 0000000000000003 R15: 0000000000000000
 FS:  00007f440b6c2740(0000) GS:ffff8d3dc9803000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 000000003cdd2000 CR3: 0000000007b58002 CR4: 0000000000172ef0
 Call Trace:
  <TASK>
  ets_qdisc_change+0x870/0xf40 [sch_ets]
  qdisc_create+0x12b/0x540
  tc_modify_qdisc+0x6d7/0xbd0
  rtnetlink_rcv_msg+0x168/0x6b0
  netlink_rcv_skb+0x5c/0x110
  netlink_unicast+0x1d6/0x2b0
  netlink_sendmsg+0x22e/0x470
  ____sys_sendmsg+0x38a/0x3c0
  ___sys_sendmsg+0x99/0xe0
  __sys_sendmsg+0x8a/0xf0
  do_syscall_64+0x111/0xf80
  entry_SYSCALL_64_after_hwframe+0x77/0x7f
 RIP: 0033:0x7f440b81c77e
 Code: 4d 89 d8 e8 d4 bc 00 00 4c 8b 5d f8 41 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 11 c9 c3 0f 1f 80 00 00 00 00 48 8b 45 10 0f 05 <c9> c3 83 e2 39 83 fa 08 75 e7 e8 13 ff ff ff 0f 1f 00 f3 0f 1e fa
 RSP: 002b:00007fff951e4c10 EFLAGS: 00000202 ORIG_RAX: 000000000000002e
 RAX: ffffffffffffffda RBX: 0000000000481820 RCX: 00007f440b81c77e
 RDX: 0000000000000000 RSI: 00007fff951e4cd0 RDI: 0000000000000003
 RBP: 00007fff951e4c20 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000000000000000 R11: 0000000000000202 R12: 00007fff951f4fa8
 R13: 00000000699ddede R14: 00007f440bb01000 R15: 0000000000486980
  </TASK>
 Modules linked in: sch_ets(E) netdevsim(E)
 ---[ end trace 0000000000000000 ]---
 RIP: 0010:ets_offload_change+0x11f/0x290 [sch_ets]
 Code: e4 45 31 ff eb 03 41 89 c7 41 89 cb 89 ce 83 f9 0f 0f 87 b7 00 00 00 45 8b 08 31 c0 45 01 cc 45 85 c9 74 09 41 6b c4 64 31 d2 <41> f7 f2 89 c2 44 29 fa 45 89 df 41 83 fb 0f 0f 87 c7 00 00 00 44
 RSP: 0018:ffffd0a180d77588 EFLAGS: 00010246
 RAX: 00000000ffffff38 RBX: ffff8d3d482ca000 RCX: 0000000000000000
 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffd0a180d77660
 RBP: ffffd0a180d77690 R08: ffff8d3d482ca2d8 R09: 00000000fffffffe
 R10: 0000000000000000 R11: 0000000000000000 R12: 00000000fffffffe
 R13: ffff8d3d472f2000 R14: 0000000000000003 R15: 0000000000000000
 FS:  00007f440b6c2740(0000) GS:ffff8d3dc9803000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 000000003cdd2000 CR3: 0000000007b58002 CR4: 0000000000172ef0
 Kernel panic - not syncing: Fatal exception
 Kernel Offset: 0x30000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
 ---[ end Kernel panic - not syncing: Fatal exception ]---

Fix this using 64-bit integers for 'q_sum' and 'q_psum'.

Cc: stable@vger.kernel.org
Fixes: d35eb52 ("net: sch_ets: Make the ETS qdisc offloadable")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/28504887df314588c7255e9911769c36f751edee.1771964872.git.dcaratti@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
pull bot pushed a commit that referenced this pull request Mar 20, 2026
INTC10EE UART can end up into an interrupt storm where it reports
IIR_NO_INT (0x1). If the storm happens during active UART operation, it
is promptly stopped by IIR value change due to Rx or Tx events.
However, when there is no activity, either due to idle serial line or
due to specific circumstances such as during shutdown that writes
IER=0, there is nothing to stop the storm.

During shutdown the storm is particularly problematic because
serial8250_do_shutdown() calls synchronize_irq() that will hang in
waiting for the storm to finish which never happens.

This problem can also result in triggering a warning:

  irq 45: nobody cared (try booting with the "irqpoll" option)
  [...snip...]
  handlers:
    serial8250_interrupt
  Disabling IRQ #45

Normal means to reset interrupt status by reading LSR, MSR, USR, or RX
register do not result in the UART deasserting the IRQ.

Add a quirk to INTC10EE UARTs to enable Tx interrupts if UART's Tx is
currently empty and inactive. Rework IIR_NO_INT to keep track of the
number of consecutive IIR_NO_INT, and on fourth one perform the quirk.
Enabling Tx interrupts should change IIR value from IIR_NO_INT to
IIR_THRI which has been observed to stop the storm.

Fixes: e92fad0 ("serial: 8250_dw: Add ACPI ID for Granite Rapids-D UART")
Cc: stable <stable@kernel.org>
Reported-by: Bandal, Shankar <shankar.bandal@intel.com>
Tested-by: Bandal, Shankar <shankar.bandal@intel.com>
Tested-by: Murthy, Shanth <shanth.murthy@intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://patch.msgid.link/20260203171049.4353-6-ilpo.jarvinen@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.