Skip to content

Conversation

@martinradev
Copy link

@martinradev martinradev commented Jun 4, 2020

Hello people,

I recently disclosed a small issue of making the KASLR offsets the same and known between reboots of the VM. This happens in two steps:

  1. disabling RDRAND for the VM
  2. modifying the TSC for the guest via interception or setting the tsc ratio msr to 0.

This patch fixes the issue by making RDRAND a required feature for starting a SEV-ES and -SNP VM. I tested locally that booting the VM fails if RDRAND is not supported.

Let me know of your thoughts on the fix and whether you have any other concerns.

This patch offers a hardening to prevent the HV from 'disabling'
KASLR for the VM. With this patch any intercepted cpuid would fail
if RDRAND is not offered and already cause a fault in the Linux boot
stage.

Without RDRAND, KASLR and the initial sequence of kernel stack
canaries rely on the TSC for randomness. The HV is able to
specify a fixed TSC value by either intercepting RDTSC or specifying

a zero multiplier with the TSC_RATIO MSR.
Adding such a requirement should not cause any harm for when
SEV-ES or SEV-SNP are enabled because in such cases RDRAND should
also be supported.

Signed-off-by: Martin Radev <martin.b.radev@gmail.com>
@martinradev martinradev changed the base branch from master to sev-es-5.6-v3 June 4, 2020 13:35
tlendacky pushed a commit that referenced this pull request Jun 24, 2020
Here is the steps to reproduce the problem:
ip netns add foo
ip netns add bar
ip -n foo link add xfrmi0 type xfrm dev lo if_id 42
ip -n foo link set xfrmi0 netns bar
ip netns del foo
ip netns del bar

Which results to:
[  186.686395] general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6bd3: 0000 [#1] SMP PTI
[  186.687665] CPU: 7 PID: 232 Comm: kworker/u16:2 Not tainted 5.6.0+ #1
[  186.688430] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
[  186.689420] Workqueue: netns cleanup_net
[  186.689903] RIP: 0010:xfrmi_dev_uninit+0x1b/0x4b [xfrm_interface]
[  186.690657] Code: 44 f6 ff ff 31 c0 5b 5d 41 5c 41 5d 41 5e c3 48 8d 8f c0 08 00 00 8b 05 ce 14 00 00 48 8b 97 d0 08 00 00 48 8b 92 c0 0e 00 00 <48> 8b 14 c2 48 8b 02 48 85 c0 74 19 48 39 c1 75 0c 48 8b 87 c0 08
[  186.692838] RSP: 0018:ffffc900003b7d68 EFLAGS: 00010286
[  186.693435] RAX: 000000000000000d RBX: ffff8881b0f31000 RCX: ffff8881b0f318c0
[  186.694334] RDX: 6b6b6b6b6b6b6b6b RSI: 0000000000000246 RDI: ffff8881b0f31000
[  186.695190] RBP: ffffc900003b7df0 R08: ffff888236c07740 R09: 0000000000000040
[  186.696024] R10: ffffffff81fce1b8 R11: 0000000000000002 R12: ffffc900003b7d80
[  186.696859] R13: ffff8881edcc6a40 R14: ffff8881a1b6e780 R15: ffffffff81ed47c8
[  186.697738] FS:  0000000000000000(0000) GS:ffff888237dc0000(0000) knlGS:0000000000000000
[  186.698705] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  186.699408] CR2: 00007f2129e93148 CR3: 0000000001e0a000 CR4: 00000000000006e0
[  186.700221] Call Trace:
[  186.700508]  rollback_registered_many+0x32b/0x3fd
[  186.701058]  ? __rtnl_unlock+0x20/0x3d
[  186.701494]  ? arch_local_irq_save+0x11/0x17
[  186.702012]  unregister_netdevice_many+0x12/0x55
[  186.702594]  default_device_exit_batch+0x12b/0x150
[  186.703160]  ? prepare_to_wait_exclusive+0x60/0x60
[  186.703719]  cleanup_net+0x17d/0x234
[  186.704138]  process_one_work+0x196/0x2e8
[  186.704652]  worker_thread+0x1a4/0x249
[  186.705087]  ? cancel_delayed_work+0x92/0x92
[  186.705620]  kthread+0x105/0x10f
[  186.706000]  ? __kthread_bind_mask+0x57/0x57
[  186.706501]  ret_from_fork+0x35/0x40
[  186.706978] Modules linked in: xfrm_interface nfsv3 nfs_acl auth_rpcgss nfsv4 nfs lockd grace fscache sunrpc button parport_pc parport serio_raw evdev pcspkr loop ext4 crc16 mbcache jbd2 crc32c_generic 8139too ide_cd_mod cdrom ide_gd_mod ata_generic ata_piix libata scsi_mod piix psmouse i2c_piix4 ide_core 8139cp i2c_core mii floppy
[  186.710423] ---[ end trace 463bba18105537e5 ]---

The problem is that x-netns xfrm interface are not removed when the link
netns is removed. This causes later this oops when thoses interfaces are
removed.

Let's add a handler to remove all interfaces related to a netns when this
netns is removed.

Fixes: f203b76 ("xfrm: Add virtual xfrm interfaces")
Reported-by: Christophe Gouault <christophe.gouault@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
tlendacky pushed a commit that referenced this pull request Jun 24, 2020
Commit ed83038 ("scsi: ibmvfc: Avoid loss of all paths during SVC node
reboot") introduced a regression where when the client resets or re-enables
its CRQ with the hypervisor there is a chance that if the server side
doesn't issue its INIT handshake quick enough the client can issue an
Implicit Logout prior to doing an NPIV Login. The server treats this
scenario as a protocol violation and closes the CRQ on its end forcing the
client through a reset that gets the client host state and next host action
out of agreement leading to a BUG assert.

ibmvfc 30000003: Partner initialization complete
ibmvfc 30000002: Partner initialization complete
ibmvfc 30000002: Host partner adapter deregistered or failed (rc=2)
ibmvfc 30000002: Partner initialized
------------[ cut here ]------------
kernel BUG at ../drivers/scsi/ibmvscsi/ibmvfc.c:4489!
Oops: Exception in kernel mode, sig: 5 [#1]
LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
Supported: No, Unreleased kernel
CPU: 16 PID: 1290 Comm: ibmvfc_0 Tainted: G           OE  X   5.3.18-12-default
NIP:  c00800000d84a2b4 LR: c00800000d84a040 CTR: c00800000d84a2a0
REGS: c00000000cb57a00 TRAP: 0700   Tainted: G           OE  X    (5.3.18-12-default)
MSR:  800000000282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 24000848  XER: 00000001
CFAR: c00800000d84a070 IRQMASK: 1
GPR00: c00800000d84a040 c00000000cb57c90 c00800000d858e00 0000000000000000
GPR04: 0000000000000000 0000000000000000 0000000000000000 00000000000000a0
GPR08: c00800000d84a074 0000000000000001 0000000000000014 c00800000d84d7d0
GPR12: 0000000000000000 c00000001ea28200 c00000000016cd98 0000000000000000
GPR16: c00800000d84b7b8 0000000000000000 0000000000000000 c00000542c706d68
GPR20: 0000000000000005 c00000542c706d88 5deadbeef0000100 5deadbeef0000122
GPR24: 000000000000000c 000000000000000b c00800000d852180 0000000000000001
GPR28: 0000000000000000 c00000542c706da0 c00000542c706860 c00000542c706828
NIP [c00800000d84a2b4] ibmvfc_work+0x3ac/0xc90 [ibmvfc]
LR [c00800000d84a040] ibmvfc_work+0x138/0xc90 [ibmvfc]

This scenario can be prevented by rejecting any attempt to send an Implicit
Logout if the client adapter is not logged in yet.

Link: https://lore.kernel.org/r/20200427214824.6890-1-tyreld@linux.ibm.com
Fixes: ed83038 ("scsi: ibmvfc: Avoid loss of all paths during SVC node reboot")
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
tlendacky pushed a commit that referenced this pull request Jun 24, 2020
Commit 64e90a8 ("Introduce STATIC_USERMODEHELPER to mediate
call_usermodehelper()") added the optiont to disable all
call_usermodehelper() calls by setting STATIC_USERMODEHELPER_PATH to
an empty string. When this is done, and crashdump is triggered, it
will crash on null pointer dereference, since we make assumptions
over what call_usermodehelper_exec() did.

This has been reported by Sergey when one triggers a a coredump
with the following configuration:

```
CONFIG_STATIC_USERMODEHELPER=y
CONFIG_STATIC_USERMODEHELPER_PATH=""
kernel.core_pattern = |/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h %e
```

The way disabling the umh was designed was that call_usermodehelper_exec()
would just return early, without an error. But coredump assumes
certain variables are set up for us when this happens, and calls
ile_start_write(cprm.file) with a NULL file.

[    2.819676] BUG: kernel NULL pointer dereference, address: 0000000000000020
[    2.819859] #PF: supervisor read access in kernel mode
[    2.820035] #PF: error_code(0x0000) - not-present page
[    2.820188] PGD 0 P4D 0
[    2.820305] Oops: 0000 [#1] SMP PTI
[    2.820436] CPU: 2 PID: 89 Comm: a Not tainted 5.7.0-rc1+ torvalds#7
[    2.820680] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190711_202441-buildvm-armv7-10.arm.fedoraproject.org-2.fc31 04/01/2014
[    2.821150] RIP: 0010:do_coredump+0xd80/0x1060
[    2.821385] Code: e8 95 11 ed ff 48 c7 c6 cc a7 b4 81 48 8d bd 28 ff
ff ff 89 c2 e8 70 f1 ff ff 41 89 c2 85 c0 0f 84 72 f7 ff ff e9 b4 fe ff
ff <48> 8b 57 20 0f b7 02 66 25 00 f0 66 3d 00 8
0 0f 84 9c 01 00 00 44
[    2.822014] RSP: 0000:ffffc9000029bcb8 EFLAGS: 00010246
[    2.822339] RAX: 0000000000000000 RBX: ffff88803f860000 RCX: 000000000000000a
[    2.822746] RDX: 0000000000000009 RSI: 0000000000000282 RDI: 0000000000000000
[    2.823141] RBP: ffffc9000029bde8 R08: 0000000000000000 R09: ffffc9000029bc00
[    2.823508] R10: 0000000000000001 R11: ffff88803dec90be R12: ffffffff81c39da0
[    2.823902] R13: ffff88803de84400 R14: 0000000000000000 R15: 0000000000000000
[    2.824285] FS:  00007fee08183540(0000) GS:ffff88803e480000(0000) knlGS:0000000000000000
[    2.824767] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    2.825111] CR2: 0000000000000020 CR3: 000000003f856005 CR4: 0000000000060ea0
[    2.825479] Call Trace:
[    2.825790]  get_signal+0x11e/0x720
[    2.826087]  do_signal+0x1d/0x670
[    2.826361]  ? force_sig_info_to_task+0xc1/0xf0
[    2.826691]  ? force_sig_fault+0x3c/0x40
[    2.826996]  ? do_trap+0xc9/0x100
[    2.827179]  exit_to_usermode_loop+0x49/0x90
[    2.827359]  prepare_exit_to_usermode+0x77/0xb0
[    2.827559]  ? invalid_op+0xa/0x30
[    2.827747]  ret_from_intr+0x20/0x20
[    2.827921] RIP: 0033:0x55e2c76d2129
[    2.828107] Code: 2d ff ff ff e8 68 ff ff ff 5d c6 05 18 2f 00 00 01
c3 0f 1f 80 00 00 00 00 c3 0f 1f 80 00 00 00 00 e9 7b ff ff ff 55 48 89
e5 <0f> 0b b8 00 00 00 00 5d c3 66 2e 0f 1f 84 0
0 00 00 00 00 0f 1f 40
[    2.828603] RSP: 002b:00007fffeba5e080 EFLAGS: 00010246
[    2.828801] RAX: 000055e2c76d2125 RBX: 0000000000000000 RCX: 00007fee0817c718
[    2.829034] RDX: 00007fffeba5e188 RSI: 00007fffeba5e178 RDI: 0000000000000001
[    2.829257] RBP: 00007fffeba5e080 R08: 0000000000000000 R09: 00007fee08193c00
[    2.829482] R10: 0000000000000009 R11: 0000000000000000 R12: 000055e2c76d2040
[    2.829727] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[    2.829964] CR2: 0000000000000020
[    2.830149] ---[ end trace ceed83d8c68a1bf1 ]---
```

Cc: <stable@vger.kernel.org> # v4.11+
Fixes: 64e90a8 ("Introduce STATIC_USERMODEHELPER to mediate call_usermodehelper()")
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=199795
Reported-by: Tony Vroon <chainsaw@gentoo.org>
Reported-by: Sergey Kvachonok <ravenexp@gmail.com>
Tested-by: Sergei Trofimovich <slyfox@gentoo.org>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Link: https://lore.kernel.org/r/20200416162859.26518-1-mcgrof@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
tlendacky pushed a commit that referenced this pull request Jun 24, 2020
netvsc_start_xmit is used as a callback function for the ndo_start_xmit
function pointer. ndo_start_xmit's return type is netdev_tx_t but
netvsc_start_xmit's return type is int.

This causes a failure with Control Flow Integrity (CFI), which requires
function pointer prototypes and callback function definitions to match
exactly. When CFI is in enforcing, the kernel panics. When booting a
CFI kernel with WSL 2, the VM is immediately terminated because of this.

The splat when CONFIG_CFI_PERMISSIVE is used:

[    5.916765] CFI failure (target: netvsc_start_xmit+0x0/0x10):
[    5.916771] WARNING: CPU: 8 PID: 0 at kernel/cfi.c:29 __cfi_check_fail+0x2e/0x40
[    5.916772] Modules linked in:
[    5.916774] CPU: 8 PID: 0 Comm: swapper/8 Not tainted 5.7.0-rc3-next-20200424-microsoft-cbl-00001-ged4eb37d2c69-dirty #1
[    5.916776] RIP: 0010:__cfi_check_fail+0x2e/0x40
[    5.916777] Code: 48 c7 c7 70 98 63 a9 48 c7 c6 11 db 47 a9 e8 69 55 59 00 85 c0 75 02 5b c3 48 c7 c7 73 c6 43 a9 48 89 de 31 c0 e8 12 2d f0 ff <0f> 0b 5b c3 00 00 cc cc 00 00 cc cc 00 00 cc cc 00 00 85 f6 74 25
[    5.916778] RSP: 0018:ffffa803c0260b78 EFLAGS: 00010246
[    5.916779] RAX: 712a1af25779e900 RBX: ffffffffa8cf7950 RCX: ffffffffa962cf08
[    5.916779] RDX: ffffffffa9c36b60 RSI: 0000000000000082 RDI: ffffffffa9c36b5c
[    5.916780] RBP: ffff8ffc4779c2c0 R08: 0000000000000001 R09: ffffffffa9c3c300
[    5.916781] R10: 0000000000000151 R11: ffffffffa9c36b60 R12: ffff8ffe39084000
[    5.916782] R13: ffffffffa8cf7950 R14: ffffffffa8d12cb0 R15: ffff8ffe39320140
[    5.916784] FS:  0000000000000000(0000) GS:ffff8ffe3bc00000(0000) knlGS:0000000000000000
[    5.916785] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    5.916786] CR2: 00007ffef5749408 CR3: 00000002f4f5e000 CR4: 0000000000340ea0
[    5.916787] Call Trace:
[    5.916788]  <IRQ>
[    5.916790]  __cfi_check+0x3ab58/0x450e0
[    5.916793]  ? dev_hard_start_xmit+0x11f/0x160
[    5.916795]  ? sch_direct_xmit+0xf2/0x230
[    5.916796]  ? __dev_queue_xmit.llvm.11471227737707190958+0x69d/0x8e0
[    5.916797]  ? neigh_resolve_output+0xdf/0x220
[    5.916799]  ? neigh_connected_output.cfi_jt+0x8/0x8
[    5.916801]  ? ip6_finish_output2+0x398/0x4c0
[    5.916803]  ? nf_nat_ipv6_out+0x10/0xa0
[    5.916804]  ? nf_hook_slow+0x84/0x100
[    5.916807]  ? ip6_input_finish+0x8/0x8
[    5.916807]  ? ip6_output+0x6f/0x110
[    5.916808]  ? __ip6_local_out.cfi_jt+0x8/0x8
[    5.916810]  ? mld_sendpack+0x28e/0x330
[    5.916811]  ? ip_rt_bug+0x8/0x8
[    5.916813]  ? mld_ifc_timer_expire+0x2db/0x400
[    5.916814]  ? neigh_proxy_process+0x8/0x8
[    5.916816]  ? call_timer_fn+0x3d/0xd0
[    5.916817]  ? __run_timers+0x2a9/0x300
[    5.916819]  ? rcu_core_si+0x8/0x8
[    5.916820]  ? run_timer_softirq+0x14/0x30
[    5.916821]  ? __do_softirq+0x154/0x262
[    5.916822]  ? native_x2apic_icr_write+0x8/0x8
[    5.916824]  ? irq_exit+0xba/0xc0
[    5.916825]  ? hv_stimer0_vector_handler+0x99/0xe0
[    5.916826]  ? hv_stimer0_callback_vector+0xf/0x20
[    5.916826]  </IRQ>
[    5.916828]  ? hv_stimer_global_cleanup.cfi_jt+0x8/0x8
[    5.916829]  ? raw_setsockopt+0x8/0x8
[    5.916830]  ? default_idle+0xe/0x10
[    5.916832]  ? do_idle.llvm.10446269078108580492+0xb7/0x130
[    5.916833]  ? raw_setsockopt+0x8/0x8
[    5.916833]  ? cpu_startup_entry+0x15/0x20
[    5.916835]  ? cpu_hotplug_enable.cfi_jt+0x8/0x8
[    5.916836]  ? start_secondary+0x188/0x190
[    5.916837]  ? secondary_startup_64+0xa5/0xb0
[    5.916838] ---[ end trace f2683fa869597ba5 ]---

Avoid this by using the right return type for netvsc_start_xmit.

Fixes: fceaf24 ("Staging: hv: add the Hyper-V virtual network driver")
Link: ClangBuiltLinux#1009
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
tlendacky pushed a commit that referenced this pull request Jun 24, 2020
…kernel/git/kvmarm/kvmarm into kvm-master

KVM/arm fixes for Linux 5.7, take #1

- Prevent the userspace API from interacting directly with the HW
  stage of the virtual GIC
- Fix a couple of vGIC memory leaks
- Tighten the rules around the use of the 32bit PSCI functions
  for 64bit guest, as well as the opposite situation (matches the
  specification)
tlendacky pushed a commit that referenced this pull request Jun 24, 2020
Commit f458d03 ("kvm: ioapic: Lazy update IOAPIC EOI") introduces
the following infinite loop:

BUG: stack guard page was hit at 000000008f595917 \
(stack is 00000000bdefe5a4..00000000ae2b06f5)
kernel stack overflow (double-fault): 0000 [#1] SMP NOPTI
RIP: 0010:kvm_set_irq+0x51/0x160 [kvm]
Call Trace:
 irqfd_resampler_ack+0x32/0x90 [kvm]
 kvm_notify_acked_irq+0x62/0xd0 [kvm]
 kvm_ioapic_update_eoi_one.isra.0+0x30/0x120 [kvm]
 ioapic_set_irq+0x20e/0x240 [kvm]
 kvm_ioapic_set_irq+0x5c/0x80 [kvm]
 kvm_set_irq+0xbb/0x160 [kvm]
 ? kvm_hv_set_sint+0x20/0x20 [kvm]
 irqfd_resampler_ack+0x32/0x90 [kvm]
 kvm_notify_acked_irq+0x62/0xd0 [kvm]
 kvm_ioapic_update_eoi_one.isra.0+0x30/0x120 [kvm]
 ioapic_set_irq+0x20e/0x240 [kvm]
 kvm_ioapic_set_irq+0x5c/0x80 [kvm]
 kvm_set_irq+0xbb/0x160 [kvm]
 ? kvm_hv_set_sint+0x20/0x20 [kvm]
....

The re-entrancy happens because the irq state is the OR of
the interrupt state and the resamplefd state.  That is, we don't
want to show the state as 0 until we've had a chance to set the
resamplefd.  But if the interrupt has _not_ gone low then
ioapic_set_irq is invoked again, causing an infinite loop.

This can only happen for a level-triggered interrupt, otherwise
irqfd_inject would immediately set the KVM_USERSPACE_IRQ_SOURCE_ID high
and then low.  Fortunately, in the case of level-triggered interrupts the VMEXIT already happens because
TMR is set.  Thus, fix the bug by restricting the lazy invocation
of the ack notifier to edge-triggered interrupts, the only ones that
need it.

Tested-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Reported-by: borisvk@bstnet.org
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Link: https://www.spinics.net/lists/kvm/msg213512.html
Fixes: f458d03 ("kvm: ioapic: Lazy update IOAPIC EOI")
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=207489
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
tlendacky pushed a commit that referenced this pull request Jun 24, 2020
Once the intel_context is closed, the GEM context may be freed and so
the link from intel_context.gem_context is invalid.

<3>[  219.782944] BUG: KASAN: use-after-free in intel_engine_coredump_alloc+0x1bc3/0x2250 [i915]
<3>[  219.782996] Read of size 8 at addr ffff8881d7dff0b8 by task kworker/0:1/12

<4>[  219.783052] CPU: 0 PID: 12 Comm: kworker/0:1 Tainted: G     U            5.7.0-rc2-g1f3ffd7683d54-kasan_118+ #1
<4>[  219.783055] Hardware name: System manufacturer System Product Name/Z170 PRO GAMING, BIOS 3402 04/26/2017
<4>[  219.783105] Workqueue: events heartbeat [i915]
<4>[  219.783109] Call Trace:
<4>[  219.783113]  <IRQ>
<4>[  219.783119]  dump_stack+0x96/0xdb
<4>[  219.783177]  ? intel_engine_coredump_alloc+0x1bc3/0x2250 [i915]
<4>[  219.783182]  print_address_description.constprop.6+0x16/0x310
<4>[  219.783239]  ? intel_engine_coredump_alloc+0x1bc3/0x2250 [i915]
<4>[  219.783295]  ? intel_engine_coredump_alloc+0x1bc3/0x2250 [i915]
<4>[  219.783300]  __kasan_report+0x137/0x190
<4>[  219.783359]  ? intel_engine_coredump_alloc+0x1bc3/0x2250 [i915]
<4>[  219.783366]  kasan_report+0x32/0x50
<4>[  219.783426]  intel_engine_coredump_alloc+0x1bc3/0x2250 [i915]
<4>[  219.783481]  execlists_reset+0x39c/0x13d0 [i915]
<4>[  219.783494]  ? mark_held_locks+0x9e/0xe0
<4>[  219.783546]  ? execlists_hold+0xfc0/0xfc0 [i915]
<4>[  219.783551]  ? lockdep_hardirqs_on+0x348/0x5f0
<4>[  219.783557]  ? _raw_spin_unlock_irqrestore+0x34/0x60
<4>[  219.783606]  ? execlists_submission_tasklet+0x118/0x3a0 [i915]
<4>[  219.783615]  tasklet_action_common.isra.14+0x13b/0x410
<4>[  219.783623]  ? __do_softirq+0x1e4/0x9a7
<4>[  219.783630]  __do_softirq+0x226/0x9a7
<4>[  219.783643]  do_softirq_own_stack+0x2a/0x40
<4>[  219.783647]  </IRQ>
<4>[  219.783692]  ? heartbeat+0x3e2/0x10f0 [i915]
<4>[  219.783696]  do_softirq.part.13+0x49/0x50
<4>[  219.783700]  __local_bh_enable_ip+0x1a2/0x1e0
<4>[  219.783748]  heartbeat+0x409/0x10f0 [i915]
<4>[  219.783801]  ? __live_idle_pulse+0x9f0/0x9f0 [i915]
<4>[  219.783806]  ? lock_acquire+0x1ac/0x8a0
<4>[  219.783811]  ? process_one_work+0x811/0x1870
<4>[  219.783827]  ? rcu_read_lock_sched_held+0x9c/0xd0
<4>[  219.783832]  ? rcu_read_lock_bh_held+0xb0/0xb0
<4>[  219.783836]  ? _raw_spin_unlock_irq+0x1f/0x40
<4>[  219.783845]  process_one_work+0x8ca/0x1870
<4>[  219.783848]  ? lock_acquire+0x1ac/0x8a0
<4>[  219.783852]  ? worker_thread+0x1d0/0xb80
<4>[  219.783864]  ? pwq_dec_nr_in_flight+0x2c0/0x2c0
<4>[  219.783870]  ? do_raw_spin_lock+0x129/0x290
<4>[  219.783886]  worker_thread+0x82/0xb80
<4>[  219.783895]  ? __kthread_parkme+0xaf/0x1b0
<4>[  219.783902]  ? process_one_work+0x1870/0x1870
<4>[  219.783906]  kthread+0x34e/0x420
<4>[  219.783911]  ? kthread_create_on_node+0xc0/0xc0
<4>[  219.783918]  ret_from_fork+0x3a/0x50

<3>[  219.783950] Allocated by task 1264:
<4>[  219.783975]  save_stack+0x19/0x40
<4>[  219.783978]  __kasan_kmalloc.constprop.3+0xa0/0xd0
<4>[  219.784029]  i915_gem_create_context+0xa2/0xab8 [i915]
<4>[  219.784081]  i915_gem_context_create_ioctl+0x1fa/0x450 [i915]
<4>[  219.784085]  drm_ioctl_kernel+0x1d8/0x270
<4>[  219.784088]  drm_ioctl+0x676/0x930
<4>[  219.784092]  ksys_ioctl+0xb7/0xe0
<4>[  219.784096]  __x64_sys_ioctl+0x6a/0xb0
<4>[  219.784100]  do_syscall_64+0x94/0x530
<4>[  219.784103]  entry_SYSCALL_64_after_hwframe+0x49/0xb3

<3>[  219.784120] Freed by task 12:
<4>[  219.784141]  save_stack+0x19/0x40
<4>[  219.784145]  __kasan_slab_free+0x130/0x180
<4>[  219.784148]  kmem_cache_free_bulk+0x1bd/0x500
<4>[  219.784152]  kfree_rcu_work+0x1d8/0x890
<4>[  219.784155]  process_one_work+0x8ca/0x1870
<4>[  219.784158]  worker_thread+0x82/0xb80
<4>[  219.784162]  kthread+0x34e/0x420
<4>[  219.784165]  ret_from_fork+0x3a/0x50

Fixes: 2e46a2a ("drm/i915: Use explicit flag to mark unreachable intel_context")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200428090255.10035-1-chris@chris-wilson.co.uk
(cherry picked from commit 24aac33)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
tlendacky pushed a commit that referenced this pull request Jun 24, 2020
…ay_plane

Since moving the obj->vma.list to a spin_lock, and the vm->bound_list to
its vm->mutex, along with tracking shrinkable status under its own
spinlock, we no long require the object to be locked by the caller.

This is fortunate as it appears we can be called with the lock along an
error path in flipping:

<4> [139.942851] WARN_ON(debug_locks && !lock_is_held(&(&((obj)->base.resv)->lock.base)->dep_map))
<4> [139.943242] WARNING: CPU: 0 PID: 1203 at drivers/gpu/drm/i915/gem/i915_gem_domain.c:405 i915_gem_object_unpin_from_display_plane+0x70/0x130 [i915]
<4> [139.943263] Modules linked in: snd_hda_intel i915 vgem snd_hda_codec_realtek snd_hda_codec_generic coretemp snd_intel_dspcfg snd_hda_codec snd_hwdep snd_hda_core r8169 lpc_ich snd_pcm realtek prime_numbers [last unloaded: i915]
<4> [139.943347] CPU: 0 PID: 1203 Comm: kms_flip Tainted: G     U            5.6.0-gd0fda5c2cf3f1-drmtip_474+ #1
<4> [139.943363] Hardware name:  /D510MO, BIOS MOPNV10J.86A.0311.2010.0802.2346 08/02/2010
<4> [139.943589] RIP: 0010:i915_gem_object_unpin_from_display_plane+0x70/0x130 [i915]
<4> [139.943589] Code: 85 28 01 00 00 be ff ff ff ff 48 8d 78 60 e8 d7 9b f0 e2 85 c0 75 b9 48 c7 c6 50 b9 38 c0 48 c7 c7 e9 48 3c c0 e8 20 d4 e9 e2 <0f> 0b eb a2 48 c7 c1 08 bb 38 c0 ba 0a 01 00 00 48 c7 c6 88 a3 35
<4> [139.943589] RSP: 0018:ffffb774c0603b48 EFLAGS: 00010282
<4> [139.943589] RAX: 0000000000000000 RBX: ffff9a142fa36e80 RCX: 0000000000000006
<4> [139.943589] RDX: 000000000000160d RSI: ffff9a142c1a88f8 RDI: ffffffffa434a64d
<4> [139.943589] RBP: ffff9a1410a513c0 R08: ffff9a142c1a88f8 R09: 0000000000000000
<4> [139.943589] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9a1436ee94b8
<4> [139.943589] R13: 0000000000000001 R14: 00000000ffffffff R15: ffff9a1410960000
<4> [139.943589] FS:  00007fc73a744e40(0000) GS:ffff9a143da00000(0000) knlGS:0000000000000000
<4> [139.943589] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [139.943589] CR2: 00007fc73997e098 CR3: 000000002f5fe000 CR4: 00000000000006f0
<4> [139.943589] Call Trace:
<4> [139.943589]  intel_pin_and_fence_fb_obj+0x1c9/0x1f0 [i915]
<4> [139.943589]  intel_plane_pin_fb+0x3f/0xd0 [i915]
<4> [139.943589]  intel_prepare_plane_fb+0x13b/0x5c0 [i915]
<4> [139.943589]  drm_atomic_helper_prepare_planes+0x85/0x110
<4> [139.943589]  intel_atomic_commit+0xda/0x390 [i915]
<4> [139.943589]  drm_atomic_helper_page_flip+0x9c/0xd0
<4> [139.943589]  ? drm_event_reserve_init+0x46/0x60
<4> [139.943589]  drm_mode_page_flip_ioctl+0x587/0x5d0

This completes the symmetry lost in commit 8b1c78e ("drm/i915: Avoid
calling i915_gem_object_unbind holding object lock").

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1743
Fixes: 8b1c78e ("drm/i915: Avoid calling i915_gem_object_unbind holding object lock")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Andi Shyti <andi.shyti@intel.com>
Cc: <stable@vger.kernel.org> # v5.6+
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200420125356.26614-1-chris@chris-wilson.co.uk
(cherry picked from commit a95f3ac)
(cherry picked from commit 2208b85fa1766ee4821a9435d548578b67090531)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
tlendacky pushed a commit that referenced this pull request Jun 24, 2020
This patch fixes the following warning and few other instances of
traversal of evm_config_xattrnames list:

[   32.848432] =============================
[   32.848707] WARNING: suspicious RCU usage
[   32.848966] 5.7.0-rc1-00006-ga8d5875ce5f0b #1 Not tainted
[   32.849308] -----------------------------
[   32.849567] security/integrity/evm/evm_main.c:231 RCU-list traversed in non-reader section!!

Since entries are only added to the list and never deleted, use
list_for_each_entry_lockless() instead of list_for_each_entry_rcu for
traversing the list.  Also, add a relevant comment in evm_secfs.c to
indicate this fact.

Reported-by: kernel test robot <lkp@intel.com>
Suggested-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com>
Acked-by: Paul E. McKenney <paulmck@kernel.org> (RCU viewpoint)
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
tlendacky pushed a commit that referenced this pull request Jun 24, 2020
Since 5.7-rc1, on btrfs we have a percpu counter initialization for
which we always pass a GFP_KERNEL gfp_t argument (this happens since
commit 2992df7 ("btrfs: Implement DREW lock")).

That is safe in some contextes but not on others where allowing fs
reclaim could lead to a deadlock because we are either holding some
btrfs lock needed for a transaction commit or holding a btrfs
transaction handle open.  Because of that we surround the call to the
function that initializes the percpu counter with a NOFS context using
memalloc_nofs_save() (this is done at btrfs_init_fs_root()).

However it turns out that this is not enough to prevent a possible
deadlock because percpu_alloc() determines if it is in an atomic context
by looking exclusively at the gfp flags passed to it (GFP_KERNEL in this
case) and it is not aware that a NOFS context is set.

Because percpu_alloc() thinks it is in a non atomic context it locks the
pcpu_alloc_mutex.  This can result in a btrfs deadlock when
pcpu_balance_workfn() is running, has acquired that mutex and is waiting
for reclaim, while the btrfs task that called percpu_counter_init() (and
therefore percpu_alloc()) is holding either the btrfs commit_root
semaphore or a transaction handle (done fs/btrfs/backref.c:
iterate_extent_inodes()), which prevents reclaim from finishing as an
attempt to commit the current btrfs transaction will deadlock.

Lockdep reports this issue with the following trace:

  ======================================================
  WARNING: possible circular locking dependency detected
  5.6.0-rc7-btrfs-next-77 #1 Not tainted
  ------------------------------------------------------
  kswapd0/91 is trying to acquire lock:
  ffff8938a3b3fdc8 (&delayed_node->mutex){+.+.}, at: __btrfs_release_delayed_node.part.0+0x3f/0x320 [btrfs]

  but task is already holding lock:
  ffffffffb4f0dbc0 (fs_reclaim){+.+.}, at: __fs_reclaim_acquire+0x5/0x30

  which lock already depends on the new lock.

  the existing dependency chain (in reverse order) is:

  -> #4 (fs_reclaim){+.+.}:
         fs_reclaim_acquire.part.0+0x25/0x30
         __kmalloc+0x5f/0x3a0
         pcpu_create_chunk+0x19/0x230
         pcpu_balance_workfn+0x56a/0x680
         process_one_work+0x235/0x5f0
         worker_thread+0x50/0x3b0
         kthread+0x120/0x140
         ret_from_fork+0x3a/0x50

  -> #3 (pcpu_alloc_mutex){+.+.}:
         __mutex_lock+0xa9/0xaf0
         pcpu_alloc+0x480/0x7c0
         __percpu_counter_init+0x50/0xd0
         btrfs_drew_lock_init+0x22/0x70 [btrfs]
         btrfs_get_fs_root+0x29c/0x5c0 [btrfs]
         resolve_indirect_refs+0x120/0xa30 [btrfs]
         find_parent_nodes+0x50b/0xf30 [btrfs]
         btrfs_find_all_leafs+0x60/0xb0 [btrfs]
         iterate_extent_inodes+0x139/0x2f0 [btrfs]
         iterate_inodes_from_logical+0xa1/0xe0 [btrfs]
         btrfs_ioctl_logical_to_ino+0xb4/0x190 [btrfs]
         btrfs_ioctl+0x165a/0x3130 [btrfs]
         ksys_ioctl+0x87/0xc0
         __x64_sys_ioctl+0x16/0x20
         do_syscall_64+0x5c/0x260
         entry_SYSCALL_64_after_hwframe+0x49/0xbe

  -> #2 (&fs_info->commit_root_sem){++++}:
         down_write+0x38/0x70
         btrfs_cache_block_group+0x2ec/0x500 [btrfs]
         find_free_extent+0xc6a/0x1600 [btrfs]
         btrfs_reserve_extent+0x9b/0x180 [btrfs]
         btrfs_alloc_tree_block+0xc1/0x350 [btrfs]
         alloc_tree_block_no_bg_flush+0x4a/0x60 [btrfs]
         __btrfs_cow_block+0x122/0x5a0 [btrfs]
         btrfs_cow_block+0x106/0x240 [btrfs]
         commit_cowonly_roots+0x55/0x310 [btrfs]
         btrfs_commit_transaction+0x509/0xb20 [btrfs]
         sync_filesystem+0x74/0x90
         generic_shutdown_super+0x22/0x100
         kill_anon_super+0x14/0x30
         btrfs_kill_super+0x12/0x20 [btrfs]
         deactivate_locked_super+0x31/0x70
         cleanup_mnt+0x100/0x160
         task_work_run+0x93/0xc0
         exit_to_usermode_loop+0xf9/0x100
         do_syscall_64+0x20d/0x260
         entry_SYSCALL_64_after_hwframe+0x49/0xbe

  -> #1 (&space_info->groups_sem){++++}:
         down_read+0x3c/0x140
         find_free_extent+0xef6/0x1600 [btrfs]
         btrfs_reserve_extent+0x9b/0x180 [btrfs]
         btrfs_alloc_tree_block+0xc1/0x350 [btrfs]
         alloc_tree_block_no_bg_flush+0x4a/0x60 [btrfs]
         __btrfs_cow_block+0x122/0x5a0 [btrfs]
         btrfs_cow_block+0x106/0x240 [btrfs]
         btrfs_search_slot+0x50c/0xd60 [btrfs]
         btrfs_lookup_inode+0x3a/0xc0 [btrfs]
         __btrfs_update_delayed_inode+0x90/0x280 [btrfs]
         __btrfs_commit_inode_delayed_items+0x81f/0x870 [btrfs]
         __btrfs_run_delayed_items+0x8e/0x180 [btrfs]
         btrfs_commit_transaction+0x31b/0xb20 [btrfs]
         iterate_supers+0x87/0xf0
         ksys_sync+0x60/0xb0
         __ia32_sys_sync+0xa/0x10
         do_syscall_64+0x5c/0x260
         entry_SYSCALL_64_after_hwframe+0x49/0xbe

  -> #0 (&delayed_node->mutex){+.+.}:
         __lock_acquire+0xef0/0x1c80
         lock_acquire+0xa2/0x1d0
         __mutex_lock+0xa9/0xaf0
         __btrfs_release_delayed_node.part.0+0x3f/0x320 [btrfs]
         btrfs_evict_inode+0x40d/0x560 [btrfs]
         evict+0xd9/0x1c0
         dispose_list+0x48/0x70
         prune_icache_sb+0x54/0x80
         super_cache_scan+0x124/0x1a0
         do_shrink_slab+0x176/0x440
         shrink_slab+0x23a/0x2c0
         shrink_node+0x188/0x6e0
         balance_pgdat+0x31d/0x7f0
         kswapd+0x238/0x550
         kthread+0x120/0x140
         ret_from_fork+0x3a/0x50

  other info that might help us debug this:

  Chain exists of:
    &delayed_node->mutex --> pcpu_alloc_mutex --> fs_reclaim

   Possible unsafe locking scenario:

         CPU0                    CPU1
         ----                    ----
    lock(fs_reclaim);
                                 lock(pcpu_alloc_mutex);
                                 lock(fs_reclaim);
    lock(&delayed_node->mutex);

   *** DEADLOCK ***

  3 locks held by kswapd0/91:
   #0: (fs_reclaim){+.+.}, at: __fs_reclaim_acquire+0x5/0x30
   #1: (shrinker_rwsem){++++}, at: shrink_slab+0x12f/0x2c0
   #2: (&type->s_umount_key#43){++++}, at: trylock_super+0x16/0x50

  stack backtrace:
  CPU: 1 PID: 91 Comm: kswapd0 Not tainted 5.6.0-rc7-btrfs-next-77 #1
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
  Call Trace:
   dump_stack+0x8f/0xd0
   check_noncircular+0x170/0x190
   __lock_acquire+0xef0/0x1c80
   lock_acquire+0xa2/0x1d0
   __mutex_lock+0xa9/0xaf0
   __btrfs_release_delayed_node.part.0+0x3f/0x320 [btrfs]
   btrfs_evict_inode+0x40d/0x560 [btrfs]
   evict+0xd9/0x1c0
   dispose_list+0x48/0x70
   prune_icache_sb+0x54/0x80
   super_cache_scan+0x124/0x1a0
   do_shrink_slab+0x176/0x440
   shrink_slab+0x23a/0x2c0
   shrink_node+0x188/0x6e0
   balance_pgdat+0x31d/0x7f0
   kswapd+0x238/0x550
   kthread+0x120/0x140
   ret_from_fork+0x3a/0x50

This could be fixed by making btrfs pass GFP_NOFS instead of GFP_KERNEL
to percpu_counter_init() in contextes where it is not reclaim safe,
however that type of approach is discouraged since
memalloc_[nofs|noio]_save() were introduced.  Therefore this change
makes pcpu_alloc() look up into an existing nofs/noio context before
deciding whether it is in an atomic context or not.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Dennis Zhou <dennis@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Link: http://lkml.kernel.org/r/20200430164356.15543-1-fdmanana@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
tlendacky pushed a commit that referenced this pull request Jun 24, 2020
And there are no multiple TRBs on EP0 and WA1 workaround,
so it doesn't need to change TRB for EP0. It fixes below oops.

configfs-gadget gadget: high-speed config #1: b
android_work: sent uevent USB_STATE=CONFIGURED
Unable to handle kernel read from unreadable memory at virtual address 0000000000000008
Mem abort info:
android_work: sent uevent USB_STATE=DISCONNECTED
  ESR = 0x96000004
  EC = 0x25: DABT (current EL), IL = 32 bits

  SET = 0, FnV = 0
  EA = 0, S1PTW = 0
Data abort info:
  ISV = 0, ISS = 0x00000004
  CM = 0, WnR = 0
user pgtable: 4k pages, 48-bit VAs, pgdp=00000008b5bb7000
[0000000000000008] pgd=0000000000000000
Internal error: Oops: 96000004 [#1] PREEMPT SMP
Modules linked in:
CPU: 2 PID: 430 Comm: HwBinder:401_1 Not tainted 5.4.24-06071-g6fa8921409c1-dirty torvalds#77
Hardware name: Freescale i.MX8QXP MEK (DT)
pstate: 60400085 (nZCv daIf +PAN -UAO)
pc : cdns3_gadget_ep_dequeue+0x1d4/0x270
lr : cdns3_gadget_ep_dequeue+0x48/0x270
sp : ffff800012763ba0
x29: ffff800012763ba0 x28: ffff00082c653c00
x27: 0000000000000000 x26: ffff000068fa7b00
x25: ffff0000699b2000 x24: ffff00082c6ac000
x23: ffff000834f0a480 x22: ffff000834e87b9c
x21: 0000000000000000 x20: ffff000834e87800
x19: ffff000069eddc00 x18: 0000000000000000
x17: 0000000000000000 x16: 0000000000000000
x15: 0000000000000000 x14: 0000000000000000
x13: 0000000000000000 x12: 0000000000000001
x11: ffff80001180fbe8 x10: 0000000000000001
x9 : ffff800012101558 x8 : 0000000000000001
x7 : 0000000000000006 x6 : ffff000835d9c668
x5 : ffff000834f0a4c8 x4 : 0000000096000000
x3 : 0000000000001810 x2 : 0000000000000000
x1 : ffff800024bd001c x0 : 0000000000000001
Call trace:
 cdns3_gadget_ep_dequeue+0x1d4/0x270
 usb_ep_dequeue+0x34/0xf8
 composite_dev_cleanup+0x154/0x170
 configfs_composite_unbind+0x6c/0xa8
 usb_gadget_remove_driver+0x44/0x70
 usb_gadget_unregister_driver+0x74/0xe0
 unregister_gadget+0x28/0x58
 gadget_dev_desc_UDC_store+0x80/0x110
 configfs_write_file+0x1e0/0x2a0
 __vfs_write+0x48/0x90
 vfs_write+0xe4/0x1c8
 ksys_write+0x78/0x100
 __arm64_sys_write+0x24/0x30
 el0_svc_common.constprop.0+0x74/0x168
 el0_svc_handler+0x34/0xa0
 el0_svc+0x8/0xc
Code: 52830203 b9407660 f94042e4 11000400 (b9400841)
---[ end trace 1574516e4c1772ca ]---
Kernel panic - not syncing: Fatal exception
SMP: stopping secondary CPUs
Kernel Offset: disabled
CPU features: 0x0002,20002008
Memory Limit: none
Rebooting in 5 seconds..

Fixes: f616c3b ("usb: cdns3: Fix dequeue implementation")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Peter Chen <peter.chen@nxp.com>
Signed-off-by: Felipe Balbi <balbi@kernel.org>
tlendacky pushed a commit that referenced this pull request Jun 24, 2020
…em boot up

Booting one of my machines, it triggered the following crash:

 Kernel/User page tables isolation: enabled
 ftrace: allocating 36577 entries in 143 pages
 Starting tracer 'function'
 BUG: unable to handle page fault for address: ffffffffa000005c
 #PF: supervisor write access in kernel mode
 #PF: error_code(0x0003) - permissions violation
 PGD 2014067 P4D 2014067 PUD 2015063 PMD 7b253067 PTE 7b25206
 Oops: 0003 [#1] PREEMPT SMP PTI
 CPU: 0 PID: 0 Comm: swapper Not tainted 5.4.0-test+ torvalds#24
 Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007
 RIP: 0010:text_poke_early+0x4a/0x58
 Code: 34 24 48 89 54 24 08 e8 bf 72 0b 00 48 8b 34 24 48 8b 4c 24 08 84 c0 74 0b 48 89 df f3 a4 48 83 c4 10 5b c3 9c 58 fa 48 89 df <f3> a4 50 9d 48 83 c4 10 5b e9 d6 f9 ff ff
0 41 57 49
 RSP: 0000:ffffffff82003d38 EFLAGS: 00010046
 RAX: 0000000000000046 RBX: ffffffffa000005c RCX: 0000000000000005
 RDX: 0000000000000005 RSI: ffffffff825b9a90 RDI: ffffffffa000005c
 RBP: ffffffffa000005c R08: 0000000000000000 R09: ffffffff8206e6e0
 R10: ffff88807b01f4c0 R11: ffffffff8176c106 R12: ffffffff8206e6e0
 R13: ffffffff824f2440 R14: 0000000000000000 R15: ffffffff8206eac0
 FS:  0000000000000000(0000) GS:ffff88807d400000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: ffffffffa000005c CR3: 0000000002012000 CR4: 00000000000006b0
 Call Trace:
  text_poke_bp+0x27/0x64
  ? mutex_lock+0x36/0x5d
  arch_ftrace_update_trampoline+0x287/0x2d5
  ? ftrace_replace_code+0x14b/0x160
  ? ftrace_update_ftrace_func+0x65/0x6c
  __register_ftrace_function+0x6d/0x81
  ftrace_startup+0x23/0xc1
  register_ftrace_function+0x20/0x37
  func_set_flag+0x59/0x77
  __set_tracer_option.isra.19+0x20/0x3e
  trace_set_options+0xd6/0x13e
  apply_trace_boot_options+0x44/0x6d
  register_tracer+0x19e/0x1ac
  early_trace_init+0x21b/0x2c9
  start_kernel+0x241/0x518
  ? load_ucode_intel_bsp+0x21/0x52
  secondary_startup_64+0xa4/0xb0

I was able to trigger it on other machines, when I added to the kernel
command line of both "ftrace=function" and "trace_options=func_stack_trace".

The cause is the "ftrace=function" would register the function tracer
and create a trampoline, and it will set it as executable and
read-only. Then the "trace_options=func_stack_trace" would then update
the same trampoline to include the stack tracer version of the function
tracer. But since the trampoline already exists, it updates it with
text_poke_bp(). The problem is that text_poke_bp() called while
system_state == SYSTEM_BOOTING, it will simply do a memcpy() and not
the page mapping, as it would think that the text is still read-write.
But in this case it is not, and we take a fault and crash.

Instead, lets keep the ftrace trampolines read-write during boot up,
and then when the kernel executable text is set to read-only, the
ftrace trampolines get set to read-only as well.

Link: https://lkml.kernel.org/r/20200430202147.4dc6e2de@oasis.local.home

Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: stable@vger.kernel.org
Fixes: 768ae44 ("x86/ftrace: Use text_poke()")
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
tlendacky pushed a commit that referenced this pull request Jun 24, 2020
…sg list

On platforms with IOMMU enabled, multiple SGs can be coalesced into one
by the IOMMU driver. In that case the SG list processing as part of the
completion of a urb on a bulk endpoint can result into a NULL pointer
dereference with the below stack dump.

<6> Unable to handle kernel NULL pointer dereference at virtual address 0000000c
<6> pgd = c0004000
<6> [0000000c] *pgd=00000000
<6> Internal error: Oops: 5 [#1] PREEMPT SMP ARM
<2> PC is at xhci_queue_bulk_tx+0x454/0x80c
<2> LR is at xhci_queue_bulk_tx+0x44c/0x80c
<2> pc : [<c08907c4>]    lr : [<c08907bc>]    psr: 000000d3
<2> sp : ca337c80  ip : 00000000  fp : ffffffff
<2> r10: 00000000  r9 : 50037000  r8 : 00004000
<2> r7 : 00000000  r6 : 00004000  r5 : 00000000  r4 : 00000000
<2> r3 : 00000000  r2 : 00000082  r1 : c2c1a200  r0 : 00000000
<2> Flags: nzcv  IRQs off  FIQs off  Mode SVC_32  ISA ARM  Segment none
<2> Control: 10c0383d  Table: b412c06a  DAC: 00000051
<6> Process usb-storage (pid: 5961, stack limit = 0xca336210)
<snip>
<2> [<c08907c4>] (xhci_queue_bulk_tx)
<2> [<c0881b3c>] (xhci_urb_enqueue)
<2> [<c0831068>] (usb_hcd_submit_urb)
<2> [<c08350b4>] (usb_sg_wait)
<2> [<c089f384>] (usb_stor_bulk_transfer_sglist)
<2> [<c089f2c0>] (usb_stor_bulk_srb)
<2> [<c089fe38>] (usb_stor_Bulk_transport)
<2> [<c089f468>] (usb_stor_invoke_transport)
<2> [<c08a11b4>] (usb_stor_control_thread)
<2> [<c014a534>] (kthread)

The above NULL pointer dereference is the result of block_len and the
sent_len set to zero after the first SG of the list when IOMMU driver
is enabled. Because of this the loop of processing the SGs has run
more than num_sgs which resulted in a sg_next on the last SG of the
list which has SG_END set.

Fix this by check for the sg before any attributes of the sg are
accessed.

[modified reason for null pointer dereference in commit message subject -Mathias]
Fixes: f9c589e ("xhci: TD-fragment, align the unsplittable case with a bounce buffer")
Cc: stable@vger.kernel.org
Signed-off-by: Sriharsha Allenki <sallenki@codeaurora.org>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20200514110432.25564-2-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
tlendacky pushed a commit that referenced this pull request Jun 24, 2020
While removing the host (e.g. for USB role switch from host to device),
if runtime pm is enabled by user, below oops occurs on dwc3 and cdns3
platforms.
Keeping the xhci-plat device active during host removal, and disabling
runtime pm before calling pm_runtime_set_suspended() fixes them.

oops1:
Unable to handle kernel NULL pointer dereference at virtual address
0000000000000240
Internal error: Oops: 96000004 [#1] PREEMPT SMP
Modules linked in:
CPU: 0 PID: 5 Comm: kworker/0:0 Not tainted 5.4.3-00107-g64d454a-dirty
Hardware name: FSL i.MX8MP EVK (DT)
Workqueue: pm pm_runtime_work
pstate: 60000005 (nZCv daif -PAN -UAO)
pc : xhci_suspend+0x34/0x698
lr : xhci_plat_runtime_suspend+0x2c/0x38
sp : ffff800011ddbbc0
Call trace:
 xhci_suspend+0x34/0x698
 xhci_plat_runtime_suspend+0x2c/0x38
 pm_generic_runtime_suspend+0x28/0x40
 __rpm_callback+0xd8/0x138
 rpm_callback+0x24/0x98
 rpm_suspend+0xe0/0x448
 rpm_idle+0x124/0x140
 pm_runtime_work+0xa0/0xf8
 process_one_work+0x1dc/0x370
 worker_thread+0x48/0x468
 kthread+0xf0/0x120
 ret_from_fork+0x10/0x1c

oops2:
usb 2-1: USB disconnect, device number 2
xhci-hcd xhci-hcd.1.auto: remove, state 4
usb usb2: USB disconnect, device number 1
xhci-hcd xhci-hcd.1.auto: USB bus 2 deregistered
xhci-hcd xhci-hcd.1.auto: remove, state 4
usb usb1: USB disconnect, device number 1
Unable to handle kernel NULL pointer dereference at virtual address
0000000000000138
Internal error: Oops: 96000004 [#1] PREEMPT SMP
Modules linked in:
CPU: 2 PID: 7 Comm: kworker/u8:0 Not tainted 5.6.0-rc4-next-20200304-03578
Hardware name: Freescale i.MX8QXP MEK (DT)
Workqueue: 1-0050 tcpm_state_machine_work
pstate: 20000005 (nzCv daif -PAN -UAO)
pc : xhci_free_dev+0x214/0x270
lr : xhci_plat_runtime_resume+0x78/0x88
sp : ffff80001006b5b0
Call trace:
 xhci_free_dev+0x214/0x270
 xhci_plat_runtime_resume+0x78/0x88
 pm_generic_runtime_resume+0x30/0x48
 __rpm_callback+0x90/0x148
 rpm_callback+0x28/0x88
 rpm_resume+0x568/0x758
 rpm_resume+0x260/0x758
 rpm_resume+0x260/0x758
 __pm_runtime_resume+0x40/0x88
 device_release_driver_internal+0xa0/0x1c8
 device_release_driver+0x1c/0x28
 bus_remove_device+0xd4/0x158
 device_del+0x15c/0x3a0
 usb_disable_device+0xb0/0x268
 usb_disconnect+0xcc/0x300
 usb_remove_hcd+0xf4/0x1dc
 xhci_plat_remove+0x78/0xe0
 platform_drv_remove+0x30/0x50
 device_release_driver_internal+0xfc/0x1c8
 device_release_driver+0x1c/0x28
 bus_remove_device+0xd4/0x158
 device_del+0x15c/0x3a0
 platform_device_del.part.0+0x20/0x90
 platform_device_unregister+0x28/0x40
 cdns3_host_exit+0x20/0x40
 cdns3_role_stop+0x60/0x90
 cdns3_role_set+0x64/0xd8
 usb_role_switch_set_role.part.0+0x3c/0x68
 usb_role_switch_set_role+0x20/0x30
 tcpm_mux_set+0x60/0xf8
 tcpm_reset_port+0xa4/0xf0
 tcpm_detach.part.0+0x28/0x50
 tcpm_state_machine_work+0x12ac/0x2360
 process_one_work+0x1c8/0x470
 worker_thread+0x50/0x428
 kthread+0xfc/0x128
 ret_from_fork+0x10/0x18
Code: c8037c02 35ffffa3 17ffe7c3 f9800011 (c85f7c01)
---[ end trace 45b1a173d2679e44 ]---

[minor commit message cleanup  -Mathias]
Cc: Baolin Wang <baolin.wang@linaro.org>
Cc: <stable@vger.kernel.org>
Fixes: b0c69b4 ("usb: host: plat: Enable xHCI plat runtime PM")
Reviewed-by: Peter Chen <peter.chen@nxp.com>
Tested-by: Peter Chen <peter.chen@nxp.com>
Signed-off-by: Li Jun <jun.li@nxp.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20200514110432.25564-3-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
tlendacky pushed a commit that referenced this pull request Jun 24, 2020
security_secid_to_secctx is called by the bpf_lsm hook and a successful
return value (i.e 0) implies that the parameter will be consumed by the
LSM framework. The current behaviour return success when the pointer
isn't initialized when CONFIG_BPF_LSM is enabled, with the default
return from kernel/bpf/bpf_lsm.c.

This is the internal error:

[ 1229.341488][ T2659] usercopy: Kernel memory exposure attempt detected from null address (offset 0, size 280)!
[ 1229.374977][ T2659] ------------[ cut here ]------------
[ 1229.376813][ T2659] kernel BUG at mm/usercopy.c:99!
[ 1229.378398][ T2659] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
[ 1229.380348][ T2659] Modules linked in:
[ 1229.381654][ T2659] CPU: 0 PID: 2659 Comm: systemd-journal Tainted: G    B   W         5.7.0-rc5-next-20200511-00019-g864e0c6319b8-dirty torvalds#13
[ 1229.385429][ T2659] Hardware name: linux,dummy-virt (DT)
[ 1229.387143][ T2659] pstate: 80400005 (Nzcv daif +PAN -UAO BTYPE=--)
[ 1229.389165][ T2659] pc : usercopy_abort+0xc8/0xcc
[ 1229.390705][ T2659] lr : usercopy_abort+0xc8/0xcc
[ 1229.392225][ T2659] sp : ffff000064247450
[ 1229.393533][ T2659] x29: ffff000064247460 x28: 0000000000000000
[ 1229.395449][ T2659] x27: 0000000000000118 x26: 0000000000000000
[ 1229.397384][ T2659] x25: ffffa000127049e0 x24: ffffa000127049e0
[ 1229.399306][ T2659] x23: ffffa000127048e0 x22: ffffa000127048a0
[ 1229.401241][ T2659] x21: ffffa00012704b80 x20: ffffa000127049e0
[ 1229.403163][ T2659] x19: ffffa00012704820 x18: 0000000000000000
[ 1229.405094][ T2659] x17: 0000000000000000 x16: 0000000000000000
[ 1229.407008][ T2659] x15: 0000000000000000 x14: 003d090000000000
[ 1229.408942][ T2659] x13: ffff80000d5b25b2 x12: 1fffe0000d5b25b1
[ 1229.410859][ T2659] x11: 1fffe0000d5b25b1 x10: ffff80000d5b25b1
[ 1229.412791][ T2659] x9 : ffffa0001034bee0 x8 : ffff00006ad92d8f
[ 1229.414707][ T2659] x7 : 0000000000000000 x6 : ffffa00015eacb20
[ 1229.416642][ T2659] x5 : ffff0000693c8040 x4 : 0000000000000000
[ 1229.418558][ T2659] x3 : ffffa0001034befc x2 : d57a7483a01c6300
[ 1229.420610][ T2659] x1 : 0000000000000000 x0 : 0000000000000059
[ 1229.422526][ T2659] Call trace:
[ 1229.423631][ T2659]  usercopy_abort+0xc8/0xcc
[ 1229.425091][ T2659]  __check_object_size+0xdc/0x7d4
[ 1229.426729][ T2659]  put_cmsg+0xa30/0xa90
[ 1229.428132][ T2659]  unix_dgram_recvmsg+0x80c/0x930
[ 1229.429731][ T2659]  sock_recvmsg+0x9c/0xc0
[ 1229.431123][ T2659]  ____sys_recvmsg+0x1cc/0x5f8
[ 1229.432663][ T2659]  ___sys_recvmsg+0x100/0x160
[ 1229.434151][ T2659]  __sys_recvmsg+0x110/0x1a8
[ 1229.435623][ T2659]  __arm64_sys_recvmsg+0x58/0x70
[ 1229.437218][ T2659]  el0_svc_common.constprop.1+0x29c/0x340
[ 1229.438994][ T2659]  do_el0_svc+0xe8/0x108
[ 1229.440587][ T2659]  el0_svc+0x74/0x88
[ 1229.441917][ T2659]  el0_sync_handler+0xe4/0x8b4
[ 1229.443464][ T2659]  el0_sync+0x17c/0x180
[ 1229.444920][ T2659] Code: aa1703e2 aa1603e1 910a8260 97ecc860 (d4210000)
[ 1229.447070][ T2659] ---[ end trace 400497d91baeaf51 ]---
[ 1229.448791][ T2659] Kernel panic - not syncing: Fatal exception
[ 1229.450692][ T2659] Kernel Offset: disabled
[ 1229.452061][ T2659] CPU features: 0x240002,20002004
[ 1229.453647][ T2659] Memory Limit: none
[ 1229.455015][ T2659] ---[ end Kernel panic - not syncing: Fatal exception ]---

Rework the so the default return value is -EOPNOTSUPP.

There are likely other callbacks such as security_inode_getsecctx() that
may have the same problem, and that someone that understand the code
better needs to audit them.

Thank you Arnd for helping me figure out what went wrong.

Fixes: 98e828a ("security: Refactor declaration of LSM hooks")
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: James Morris <jamorris@linux.microsoft.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/bpf/20200512174607.9630-1-anders.roxell@linaro.org
tlendacky pushed a commit that referenced this pull request Jun 24, 2020
FuzzUSB (a variant of syzkaller) found an illegal array access
using an incorrect index while binding a gadget with UDC.

Reference: https://www.spinics.net/lists/linux-usb/msg194331.html

This bug occurs when a size variable used for a buffer
is misused to access its strcpy-ed buffer.
Given a buffer along with its size variable (taken from user input),
from which, a new buffer is created using kstrdup().
Due to the original buffer containing 0 value in the middle,
the size of the kstrdup-ed buffer becomes smaller than that of the original.
So accessing the kstrdup-ed buffer with the same size variable
triggers memory access violation.

The fix makes sure no zero value in the buffer,
by comparing the strlen() of the orignal buffer with the size variable,
so that the access to the kstrdup-ed buffer is safe.

BUG: KASAN: slab-out-of-bounds in gadget_dev_desc_UDC_store+0x1ba/0x200
drivers/usb/gadget/configfs.c:266
Read of size 1 at addr ffff88806a55dd7e by task syz-executor.0/17208

CPU: 2 PID: 17208 Comm: syz-executor.0 Not tainted 5.6.8 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0xce/0x128 lib/dump_stack.c:118
 print_address_description.constprop.4+0x21/0x3c0 mm/kasan/report.c:374
 __kasan_report+0x131/0x1b0 mm/kasan/report.c:506
 kasan_report+0x12/0x20 mm/kasan/common.c:641
 __asan_report_load1_noabort+0x14/0x20 mm/kasan/generic_report.c:132
 gadget_dev_desc_UDC_store+0x1ba/0x200 drivers/usb/gadget/configfs.c:266
 flush_write_buffer fs/configfs/file.c:251 [inline]
 configfs_write_file+0x2f1/0x4c0 fs/configfs/file.c:283
 __vfs_write+0x85/0x110 fs/read_write.c:494
 vfs_write+0x1cd/0x510 fs/read_write.c:558
 ksys_write+0x18a/0x220 fs/read_write.c:611
 __do_sys_write fs/read_write.c:623 [inline]
 __se_sys_write fs/read_write.c:620 [inline]
 __x64_sys_write+0x73/0xb0 fs/read_write.c:620
 do_syscall_64+0x9e/0x510 arch/x86/entry/common.c:294
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Signed-off-by: Kyungtae Kim <kt0755@gmail.com>
Reported-and-tested-by: Kyungtae Kim <kt0755@gmail.com>
Cc: Felipe Balbi <balbi@kernel.org>
Cc: stable <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20200510054326.GA19198@pizza01
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
tlendacky pushed a commit that referenced this pull request Jun 24, 2020
call_undef_hook() in traps.c applies the same instr_mask for both 16-bit
and 32-bit thumb instructions. If instr_mask then is only 16 bits wide
(0xffff as opposed to 0xffffffff), the first half-word of 32-bit thumb
instructions will be masked out. This makes the function match 32-bit
thumb instructions where the second half-word is equal to instr_val,
regardless of the first half-word.

The result in this case is that all undefined 32-bit thumb instructions
with the second half-word equal to 0xde01 (udf #1) work as breakpoints
and will raise a SIGTRAP instead of a SIGILL, instead of just the one
intended 16-bit instruction. An example of such an instruction is
0xeaa0de01, which is unallocated according to Arm ARM and should raise a
SIGILL, but instead raises a SIGTRAP.

This patch fixes the issue by setting all the bits in instr_mask, which
will still match the intended 16-bit thumb instruction (where the
upper half is always 0), but not any 32-bit thumb instructions.

Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Fredrik Strupe <fredrik@strupe.net>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
tlendacky pushed a commit that referenced this pull request Jun 24, 2020
This BUG halt was reported a while back, but the patch somehow got
missed:

PID: 2879   TASK: c16adaa0  CPU: 1   COMMAND: "sctpn"
 #0 [f418dd28] crash_kexec at c04a7d8c
 #1 [f418dd7c] oops_end at c0863e02
 #2 [f418dd90] do_invalid_op at c040aaca
 #3 [f418de28] error_code (via invalid_op) at c08631a5
    EAX: f34baac0  EBX: 00000090  ECX: f418deb0  EDX: f5542950  EBP: 00000000
    DS:  007b      ESI: f34ba800  ES:  007b      EDI: f418dea0  GS:  00e0
    CS:  0060      EIP: c046fa5e  ERR: ffffffff  EFLAGS: 00010286
 #4 [f418de5c] add_timer at c046fa5e
 #5 [f418de68] sctp_do_sm at f8db8c77 [sctp]
 #6 [f418df30] sctp_primitive_SHUTDOWN at f8dcc1b5 [sctp]
 torvalds#7 [f418df48] inet_shutdown at c080baf9
 torvalds#8 [f418df5c] sys_shutdown at c079eedf
 torvalds#9 [f418df7] sys_socketcall at c079fe88
    EAX: ffffffda  EBX: 0000000d  ECX: bfceea90  EDX: 0937af98
    DS:  007b      ESI: 0000000c  ES:  007b      EDI: b7150ae4
    SS:  007b      ESP: bfceea7c  EBP: bfceeaa8  GS:  0033
    CS:  0073      EIP: b775c424  ERR: 00000066  EFLAGS: 00000282

It appears that the side effect that starts the shutdown timer was processed
multiple times, which can happen as multiple paths can trigger it.  This of
course leads to the BUG halt in add_timer getting called.

Fix seems pretty straightforward, just check before the timer is added if its
already been started.  If it has mod the timer instead to min(current
expiration, new expiration)

Its been tested but not confirmed to fix the problem, as the issue has only
occured in production environments where test kernels are enjoined from being
installed.  It appears to be a sane fix to me though.  Also, recentely,
Jere found a reproducer posted on list to confirm that this resolves the
issues

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Vlad Yasevich <vyasevich@gmail.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: jere.leppanen@nokia.com
CC: marcelo.leitner@gmail.com
CC: netdev@vger.kernel.org
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
tlendacky pushed a commit that referenced this pull request Jun 24, 2020
In null_init_zone_dev() check if the zone size is larger than device
capacity, return error if needed.

This also fixes the following oops :-

null_blk: changed the number of conventional zones to 4294967295
BUG: kernel NULL pointer dereference, address: 0000000000000010
PGD 7d76c5067 P4D 7d76c5067 PUD 7d240c067 PMD 0
Oops: 0002 [#1] SMP NOPTI
CPU: 4 PID: 5508 Comm: nullbtests.sh Tainted: G OE 5.7.0-rc4lblk-fnext0
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e4
RIP: 0010:null_init_zoned_dev+0x17a/0x27f [null_blk]
RSP: 0018:ffffc90007007e00 EFLAGS: 00010246
RAX: 0000000000000020 RBX: ffff8887fb3f3c00 RCX: 0000000000000007
RDX: 0000000000000000 RSI: ffff8887ca09d688 RDI: ffff888810fea510
RBP: 0000000000000010 R08: ffff8887ca09d688 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff8887c26e8000
R13: ffffffffa05e9390 R14: 0000000000000000 R15: 0000000000000001
FS:  00007fcb5256f740(0000) GS:ffff888810e00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000010 CR3: 000000081e8fe000 CR4: 00000000003406e0
Call Trace:
 null_add_dev+0x534/0x71b [null_blk]
 nullb_device_power_store.cold.41+0x8/0x2e [null_blk]
 configfs_write_file+0xe6/0x150
 vfs_write+0xba/0x1e0
 ksys_write+0x5f/0xe0
 do_syscall_64+0x60/0x250
 entry_SYSCALL_64_after_hwframe+0x49/0xb3
RIP: 0033:0x7fcb51c71840

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
tlendacky pushed a commit that referenced this pull request Jun 24, 2020
q_deferred_work isn't initialized when creating an explicit ODP memory
region. This can lead to a NULL pointer dereference when user performs
asynchronous prefetch MR. Fix it by initializing q_deferred_work for
explicit ODP.

  BUG: kernel NULL pointer dereference, address: 0000000000000000
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 0 P4D 0
  Oops: 0000 [#1] SMP PTI
  CPU: 4 PID: 6074 Comm: kworker/u16:6 Not tainted 5.7.0-rc1-for-upstream-perf-2020-04-17_07-03-39-64 #1
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
  Workqueue: events_unbound mlx5_ib_prefetch_mr_work [mlx5_ib]
  RIP: 0010:__wake_up_common+0x49/0x120
  Code: 04 89 54 24 0c 89 4c 24 08 74 0a 41 f6 01 04 0f 85 8e 00 00 00 48 8b 47 08 48 83 e8 18 4c 8d 67 08 48 8d 50 18 49 39 d4 74 66 <48> 8b 70 18 31 db 4c 8d 7e e8 eb 17 49 8b 47 18 48 8d 50 e8 49 8d
  RSP: 0000:ffffc9000097bd88 EFLAGS: 00010082
  RAX: ffffffffffffffe8 RBX: ffff888454cd9f90 RCX: 0000000000000000
  RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffff888454cd9f90
  RBP: ffffc9000097bdd0 R08: 0000000000000000 R09: ffffc9000097bdd0
  R10: 0000000000000000 R11: 0000000000000001 R12: ffff888454cd9f98
  R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000003
  FS:  0000000000000000(0000) GS:ffff88846fd00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000000 CR3: 000000044c19e002 CR4: 0000000000760ee0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  PKRU: 55555554
  Call Trace:
   __wake_up_common_lock+0x7a/0xc0
   destroy_prefetch_work+0x5a/0x60 [mlx5_ib]
   mlx5_ib_prefetch_mr_work+0x64/0x80 [mlx5_ib]
   process_one_work+0x15b/0x360
   worker_thread+0x49/0x3d0
   kthread+0xf5/0x130
   ? rescuer_thread+0x310/0x310
   ? kthread_bind+0x10/0x10
   ret_from_fork+0x1f/0x30

Fixes: de5ed00 ("IB/mlx5: Fix implicit ODP race")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20200521072504.567406-1-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
tlendacky pushed a commit that referenced this pull request Jun 24, 2020
Ido Schimmel says:

====================
netdevsim: Two small fixes

Fix two bugs observed while analyzing regression failures.

Patch #1 fixes a bug where sometimes the drop counter of a packet trap
policer would not increase.

Patch #2 adds a missing initialization of a variable in a related
selftest.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
tlendacky pushed a commit that referenced this pull request Jun 24, 2020
Ido Schimmel says:

====================
mlxsw: Various fixes

Patch #1 from Jiri fixes a use-after-free discovered while fuzzing mlxsw
/ devlink with syzkaller.

Patch #2 from Amit works around a limitation in new versions of arping,
which is used in several selftests.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
tlendacky pushed a commit that referenced this pull request Jun 24, 2020
On a non-forwarding 802.11s link between two fairly busy
neighboring nodes (iperf with -P 16 at ~850MBit/s TCP;
1733.3 MBit/s VHT-MCS 9 80MHz short GI VHT-NSS 4), so with
frequent PREQ retries, usually after around 30-40 seconds the
following crash would occur:

[ 1110.822428] Unable to handle kernel read from unreadable memory at virtual address 00000000
[ 1110.830786] Mem abort info:
[ 1110.833573]   Exception class = IABT (current EL), IL = 32 bits
[ 1110.839494]   SET = 0, FnV = 0
[ 1110.842546]   EA = 0, S1PTW = 0
[ 1110.845678] user pgtable: 4k pages, 48-bit VAs, pgd = ffff800076386000
[ 1110.852204] [0000000000000000] *pgd=00000000f6322003, *pud=00000000f62de003, *pmd=0000000000000000
[ 1110.861167] Internal error: Oops: 86000004 [#1] PREEMPT SMP
[ 1110.866730] Modules linked in: pppoe ppp_async batman_adv ath10k_pci ath10k_core ath pppox ppp_generic nf_conntrack_ipv6 mac80211 iptable_nat ipt_REJECT ipt_MASQUERADE cfg80211 xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_FLOWOFFLOAD slhc nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_conntrack_ipv4 nf_nat_ipv4 nf_nat nf_log_ipv4 nf_flow_table_hw nf_flow_table nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_rtcache nf_conntrack iptable_mangle iptable_filter ip_tables crc_ccitt compat nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables ip6t_REJECT x_tables nf_reject_ipv6 usb_storage xhci_plat_hcd xhci_pci xhci_hcd dwc3 usbcore usb_common
[ 1110.932190] Process swapper/3 (pid: 0, stack limit = 0xffff0000090c8000)
[ 1110.938884] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.14.162 #0
[ 1110.944965] Hardware name: LS1043A RGW Board (DT)
[ 1110.949658] task: ffff8000787a81c0 task.stack: ffff0000090c8000
[ 1110.955568] PC is at 0x0
[ 1110.958097] LR is at call_timer_fn.isra.27+0x24/0x78
[ 1110.963055] pc : [<0000000000000000>] lr : [<ffff0000080ff29c>] pstate: 00400145
[ 1110.970440] sp : ffff00000801be10
[ 1110.973744] x29: ffff00000801be10 x28: ffff000008bf7018
[ 1110.979047] x27: ffff000008bf87c8 x26: ffff000008c160c0
[ 1110.984352] x25: 0000000000000000 x24: 0000000000000000
[ 1110.989657] x23: dead000000000200 x22: 0000000000000000
[ 1110.994959] x21: 0000000000000000 x20: 0000000000000101
[ 1111.000262] x19: ffff8000787a81c0 x18: 0000000000000000
[ 1111.005565] x17: ffff0000089167b0 x16: 0000000000000058
[ 1111.010868] x15: ffff0000089167b0 x14: 0000000000000000
[ 1111.016172] x13: ffff000008916788 x12: 0000000000000040
[ 1111.021475] x11: ffff80007fda9af0 x10: 0000000000000001
[ 1111.026777] x9 : ffff00000801bea0 x8 : 0000000000000004
[ 1111.032080] x7 : 0000000000000000 x6 : ffff80007fda9aa8
[ 1111.037383] x5 : ffff00000801bea0 x4 : 0000000000000010
[ 1111.042685] x3 : ffff00000801be98 x2 : 0000000000000614
[ 1111.047988] x1 : 0000000000000000 x0 : 0000000000000000
[ 1111.053290] Call trace:
[ 1111.055728] Exception stack(0xffff00000801bcd0 to 0xffff00000801be10)
[ 1111.062158] bcc0:                                   0000000000000000 0000000000000000
[ 1111.069978] bce0: 0000000000000614 ffff00000801be98 0000000000000010 ffff00000801bea0
[ 1111.077798] bd00: ffff80007fda9aa8 0000000000000000 0000000000000004 ffff00000801bea0
[ 1111.085618] bd20: 0000000000000001 ffff80007fda9af0 0000000000000040 ffff000008916788
[ 1111.093437] bd40: 0000000000000000 ffff0000089167b0 0000000000000058 ffff0000089167b0
[ 1111.101256] bd60: 0000000000000000 ffff8000787a81c0 0000000000000101 0000000000000000
[ 1111.109075] bd80: 0000000000000000 dead000000000200 0000000000000000 0000000000000000
[ 1111.116895] bda0: ffff000008c160c0 ffff000008bf87c8 ffff000008bf7018 ffff00000801be10
[ 1111.124715] bdc0: ffff0000080ff29c ffff00000801be10 0000000000000000 0000000000400145
[ 1111.132534] bde0: ffff8000787a81c0 ffff00000801bde8 0000ffffffffffff 000001029eb19be8
[ 1111.140353] be00: ffff00000801be10 0000000000000000
[ 1111.145220] [<          (null)>]           (null)
[ 1111.149917] [<ffff0000080ff77c>] run_timer_softirq+0x184/0x398
[ 1111.155741] [<ffff000008081938>] __do_softirq+0x100/0x1fc
[ 1111.161130] [<ffff0000080a2e28>] irq_exit+0x80/0xd8
[ 1111.166002] [<ffff0000080ea708>] __handle_domain_irq+0x88/0xb0
[ 1111.171825] [<ffff000008081678>] gic_handle_irq+0x68/0xb0
[ 1111.177213] Exception stack(0xffff0000090cbe30 to 0xffff0000090cbf70)
[ 1111.183642] be20:                                   0000000000000020 0000000000000000
[ 1111.191461] be40: 0000000000000001 0000000000000000 00008000771af000 0000000000000000
[ 1111.199281] be60: ffff000008c95180 0000000000000000 ffff000008c19360 ffff0000090cbef0
[ 1111.207101] be80: 0000000000000810 0000000000000400 0000000000000098 ffff000000000000
[ 1111.214920] bea0: 0000000000000001 ffff0000089167b0 0000000000000000 ffff0000089167b0
[ 1111.222740] bec0: 0000000000000000 ffff000008c198e8 ffff000008bf7018 ffff000008c19000
[ 1111.230559] bee0: 0000000000000000 0000000000000000 ffff8000787a81c0 ffff000008018000
[ 1111.238380] bf00: ffff00000801c000 ffff00000913ba34 ffff8000787a81c0 ffff0000090cbf70
[ 1111.246199] bf20: ffff0000080857cc ffff0000090cbf70 ffff0000080857d0 0000000000400145
[ 1111.254020] bf40: ffff000008018000 ffff00000801c000 ffffffffffffffff ffff0000080fa574
[ 1111.261838] bf60: ffff0000090cbf70 ffff0000080857d0
[ 1111.266706] [<ffff0000080832e8>] el1_irq+0xe8/0x18c
[ 1111.271576] [<ffff0000080857d0>] arch_cpu_idle+0x10/0x18
[ 1111.276880] [<ffff0000080d7de4>] do_idle+0xec/0x1b8
[ 1111.281748] [<ffff0000080d8020>] cpu_startup_entry+0x20/0x28
[ 1111.287399] [<ffff00000808f81c>] secondary_start_kernel+0x104/0x110
[ 1111.293662] Code: bad PC value
[ 1111.296710] ---[ end trace 555b6ca4363c3edd ]---
[ 1111.301318] Kernel panic - not syncing: Fatal exception in interrupt
[ 1111.307661] SMP: stopping secondary CPUs
[ 1111.311574] Kernel Offset: disabled
[ 1111.315053] CPU features: 0x0002000
[ 1111.318530] Memory Limit: none
[ 1111.321575] Rebooting in 3 seconds..

With some added debug output / delays we were able to push the crash from
the timer callback runner into the callback function and by that shedding
some light on which object holding the timer gets corrupted:

[  401.720899] Unable to handle kernel read from unreadable memory at virtual address 00000868
[...]
[  402.335836] [<ffff0000088fafa4>] _raw_spin_lock_bh+0x14/0x48
[  402.341548] [<ffff000000dbe684>] mesh_path_timer+0x10c/0x248 [mac80211]
[  402.348154] [<ffff0000080ff29c>] call_timer_fn.isra.27+0x24/0x78
[  402.354150] [<ffff0000080ff77c>] run_timer_softirq+0x184/0x398
[  402.359974] [<ffff000008081938>] __do_softirq+0x100/0x1fc
[  402.365362] [<ffff0000080a2e28>] irq_exit+0x80/0xd8
[  402.370231] [<ffff0000080ea708>] __handle_domain_irq+0x88/0xb0
[  402.376053] [<ffff000008081678>] gic_handle_irq+0x68/0xb0

The issue happens due to the following sequence of events:

1) mesh_path_start_discovery():
-> spin_unlock_bh(&mpath->state_lock) before mesh_path_sel_frame_tx()

2) mesh_path_free_rcu()
-> del_timer_sync(&mpath->timer)
   [...]
-> kfree_rcu(mpath)

3) mesh_path_start_discovery():
-> mod_timer(&mpath->timer, ...)
   [...]
-> rcu_read_unlock()

4) mesh_path_free_rcu()'s kfree_rcu():
-> kfree(mpath)

5) mesh_path_timer() starts after timeout, using freed mpath object

So a use-after-free issue due to a timer re-arming bug caused by an
early spin-unlocking.

This patch fixes this issue by re-checking if mpath is about to be
free'd and if so bails out of re-arming the timer.

Cc: stable@vger.kernel.org
Fixes: 050ac52 ("mac80211: code for on-demand Hybrid Wireless Mesh Protocol")
Cc: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Linus Lüssing <ll@simonwunderlich.de>
Link: https://lore.kernel.org/r/20200522170413.14973-1-linus.luessing@c0d3.blue
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
tlendacky pushed a commit that referenced this pull request Jun 24, 2020
In the MPTCP receive path we must cope with TCP fallback
on blocking recvmsg(). Currently in such code path we detect
the fallback condition, but we don't fetch the struct socket
required for fallback.

The above allowed syzkaller to trigger a NULL pointer
dereference:

general protection fault, probably for non-canonical address 0xdffffc0000000004: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000020-0x0000000000000027]
CPU: 1 PID: 7226 Comm: syz-executor523 Not tainted 5.7.0-rc6-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:sock_recvmsg_nosec net/socket.c:886 [inline]
RIP: 0010:sock_recvmsg+0x92/0x110 net/socket.c:904
Code: 5b 41 5c 41 5d 41 5e 41 5f 5d c3 44 89 6c 24 04 e8 53 18 1d fb 4d 8d 6f 20 4c 89 e8 48 c1 e8 03 48 b9 00 00 00 00 00 fc ff df <80> 3c 08 00 74 08 4c 89 ef e8 20 12 5b fb bd a0 00 00 00 49 03 6d
RSP: 0018:ffffc90001077b98 EFLAGS: 00010202
RAX: 0000000000000004 RBX: ffffc90001077dc0 RCX: dffffc0000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 0000000000000000 R08: ffffffff86565e59 R09: ffffed10115afeaa
R10: ffffed10115afeaa R11: 0000000000000000 R12: 1ffff9200020efbc
R13: 0000000000000020 R14: ffffc90001077de0 R15: 0000000000000000
FS:  00007fc6a3abe700(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000004d0050 CR3: 00000000969f0000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 mptcp_recvmsg+0x18d5/0x19b0 net/mptcp/protocol.c:891
 inet_recvmsg+0xf6/0x1d0 net/ipv4/af_inet.c:838
 sock_recvmsg_nosec net/socket.c:886 [inline]
 sock_recvmsg net/socket.c:904 [inline]
 __sys_recvfrom+0x2f3/0x470 net/socket.c:2057
 __do_sys_recvfrom net/socket.c:2075 [inline]
 __se_sys_recvfrom net/socket.c:2071 [inline]
 __x64_sys_recvfrom+0xda/0xf0 net/socket.c:2071
 do_syscall_64+0xf3/0x1b0 arch/x86/entry/common.c:295
 entry_SYSCALL_64_after_hwframe+0x49/0xb3

Address the issue initializing the struct socket reference
before entering the fallback code.

Reported-and-tested-by: syzbot+c6bfc3db991edc918432@syzkaller.appspotmail.com
Suggested-by: Ondrej Mosnacek <omosnace@redhat.com>
Fixes: 8ab183d ("mptcp: cope with later TCP fallback")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
tlendacky pushed a commit that referenced this pull request Jun 24, 2020
Be there a platform with the following layout:

      Regular NIC
       |
       +----> DSA master for switch port
               |
               +----> DSA master for another switch port

After changing DSA back to static lockdep class keys in commit
1a33e10 ("net: partially revert dynamic lockdep key changes"), this
kernel splat can be seen:

[   13.361198] ============================================
[   13.366524] WARNING: possible recursive locking detected
[   13.371851] 5.7.0-rc4-02121-gc32a05ecd7af-dirty torvalds#988 Not tainted
[   13.377874] --------------------------------------------
[   13.383201] swapper/0/0 is trying to acquire lock:
[   13.388004] ffff0000668ff298 (&dsa_slave_netdev_xmit_lock_key){+.-.}-{2:2}, at: __dev_queue_xmit+0x84c/0xbe0
[   13.397879]
[   13.397879] but task is already holding lock:
[   13.403727] ffff0000661a1698 (&dsa_slave_netdev_xmit_lock_key){+.-.}-{2:2}, at: __dev_queue_xmit+0x84c/0xbe0
[   13.413593]
[   13.413593] other info that might help us debug this:
[   13.420140]  Possible unsafe locking scenario:
[   13.420140]
[   13.426075]        CPU0
[   13.428523]        ----
[   13.430969]   lock(&dsa_slave_netdev_xmit_lock_key);
[   13.435946]   lock(&dsa_slave_netdev_xmit_lock_key);
[   13.440924]
[   13.440924]  *** DEADLOCK ***
[   13.440924]
[   13.446860]  May be due to missing lock nesting notation
[   13.446860]
[   13.453668] 6 locks held by swapper/0/0:
[   13.457598]  #0: ffff800010003de0 ((&idev->mc_ifc_timer)){+.-.}-{0:0}, at: call_timer_fn+0x0/0x400
[   13.466593]  #1: ffffd4d3fb478700 (rcu_read_lock){....}-{1:2}, at: mld_sendpack+0x0/0x560
[   13.474803]  #2: ffffd4d3fb478728 (rcu_read_lock_bh){....}-{1:2}, at: ip6_finish_output2+0x64/0xb10
[   13.483886]  #3: ffffd4d3fb478728 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x6c/0xbe0
[   13.492793]  #4: ffff0000661a1698 (&dsa_slave_netdev_xmit_lock_key){+.-.}-{2:2}, at: __dev_queue_xmit+0x84c/0xbe0
[   13.503094]  #5: ffffd4d3fb478728 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x6c/0xbe0
[   13.512000]
[   13.512000] stack backtrace:
[   13.516369] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.7.0-rc4-02121-gc32a05ecd7af-dirty torvalds#988
[   13.530421] Call trace:
[   13.532871]  dump_backtrace+0x0/0x1d8
[   13.536539]  show_stack+0x24/0x30
[   13.539862]  dump_stack+0xe8/0x150
[   13.543271]  __lock_acquire+0x1030/0x1678
[   13.547290]  lock_acquire+0xf8/0x458
[   13.550873]  _raw_spin_lock+0x44/0x58
[   13.554543]  __dev_queue_xmit+0x84c/0xbe0
[   13.558562]  dev_queue_xmit+0x24/0x30
[   13.562232]  dsa_slave_xmit+0xe0/0x128
[   13.565988]  dev_hard_start_xmit+0xf4/0x448
[   13.570182]  __dev_queue_xmit+0x808/0xbe0
[   13.574200]  dev_queue_xmit+0x24/0x30
[   13.577869]  neigh_resolve_output+0x15c/0x220
[   13.582237]  ip6_finish_output2+0x244/0xb10
[   13.586430]  __ip6_finish_output+0x1dc/0x298
[   13.590709]  ip6_output+0x84/0x358
[   13.594116]  mld_sendpack+0x2bc/0x560
[   13.597786]  mld_ifc_timer_expire+0x210/0x390
[   13.602153]  call_timer_fn+0xcc/0x400
[   13.605822]  run_timer_softirq+0x588/0x6e0
[   13.609927]  __do_softirq+0x118/0x590
[   13.613597]  irq_exit+0x13c/0x148
[   13.616918]  __handle_domain_irq+0x6c/0xc0
[   13.621023]  gic_handle_irq+0x6c/0x160
[   13.624779]  el1_irq+0xbc/0x180
[   13.627927]  cpuidle_enter_state+0xb4/0x4d0
[   13.632120]  cpuidle_enter+0x3c/0x50
[   13.635703]  call_cpuidle+0x44/0x78
[   13.639199]  do_idle+0x228/0x2c8
[   13.642433]  cpu_startup_entry+0x2c/0x48
[   13.646363]  rest_init+0x1ac/0x280
[   13.649773]  arch_call_rest_init+0x14/0x1c
[   13.653878]  start_kernel+0x490/0x4bc

Lockdep keys themselves were added in commit ab92d68 ("net: core:
add generic lockdep keys"), and it's very likely that this splat existed
since then, but I have no real way to check, since this stacked platform
wasn't supported by mainline back then.

>From Taehee's own words:

  This patch was considered that all stackable devices have LLTX flag.
  But the dsa doesn't have LLTX, so this splat happened.
  After this patch, dsa shares the same lockdep class key.
  On the nested dsa interface architecture, which you illustrated,
  the same lockdep class key will be used in __dev_queue_xmit() because
  dsa doesn't have LLTX.
  So that lockdep detects deadlock because the same lockdep class key is
  used recursively although actually the different locks are used.
  There are some ways to fix this problem.

  1. using NETIF_F_LLTX flag.
  If possible, using the LLTX flag is a very clear way for it.
  But I'm so sorry I don't know whether the dsa could have LLTX or not.

  2. using dynamic lockdep again.
  It means that each interface uses a separate lockdep class key.
  So, lockdep will not detect recursive locking.
  But this way has a problem that it could consume lockdep class key
  too many.
  Currently, lockdep can have 8192 lockdep class keys.
   - you can see this number with the following command.
     cat /proc/lockdep_stats
     lock-classes:                         1251 [max: 8192]
     ...
     The [max: 8192] means that the maximum number of lockdep class keys.
  If too many lockdep class keys are registered, lockdep stops to work.
  So, using a dynamic(separated) lockdep class key should be considered
  carefully.
  In addition, updating lockdep class key routine might have to be existing.
  (lockdep_register_key(), lockdep_set_class(), lockdep_unregister_key())

  3. Using lockdep subclass.
  A lockdep class key could have 8 subclasses.
  The different subclass is considered different locks by lockdep
  infrastructure.
  But "lock-classes" is not counted by subclasses.
  So, it could avoid stopping lockdep infrastructure by an overflow of
  lockdep class keys.
  This approach should also have an updating lockdep class key routine.
  (lockdep_set_subclass())

  4. Using nonvalidate lockdep class key.
  The lockdep infrastructure supports nonvalidate lockdep class key type.
  It means this lockdep is not validated by lockdep infrastructure.
  So, the splat will not happen but lockdep couldn't detect real deadlock
  case because lockdep really doesn't validate it.
  I think this should be used for really special cases.
  (lockdep_set_novalidate_class())

Further discussion here:
https://patchwork.ozlabs.org/project/netdev/patch/20200503052220.4536-2-xiyou.wangcong@gmail.com/

There appears to be no negative side-effect to declaring lockless TX for
the DSA virtual interfaces, which means they handle their own locking.
So that's what we do to make the splat go away.

Patch tested in a wide variety of cases: unicast, multicast, PTP, etc.

Fixes: ab92d68 ("net: core: add generic lockdep keys")
Suggested-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
tlendacky pushed a commit that referenced this pull request Jun 24, 2020
The intermediate result of the old term (4UL * 1024 * 1024 * 1024) is
4 294 967 296 or 0x100000000 which is no problem on 64 bit systems.
The patch does not change the later overall result of 0x100000 for
MAX_DMA32_PFN (after it has been shifted by PAGE_SHIFT). The new
calculation yields the same result, but does not require 64 bit
arithmetic.

On 32 bit systems the old calculation suffers from an arithmetic
overflow in that intermediate term in braces: 4UL aka unsigned long int
is 4 byte wide and an arithmetic overflow happens (the 0x100000000 does
not fit in 4 bytes), the in braces result is truncated to zero, the
following right shift does not alter that, so MAX_DMA32_PFN evaluates to
0 on 32 bit systems.

That wrong value is a problem in a comparision against MAX_DMA32_PFN in
the init code for swiotlb in pci_swiotlb_detect_4gb() to decide if
swiotlb should be active.  That comparison yields the opposite result,
when compiling on 32 bit systems.

This was not possible before

  1b7e03e ("x86, NUMA: Enable emulation on 32bit too")

when that MAX_DMA32_PFN was first made visible to x86_32 (and which
landed in v3.0).

In practice this wasn't a problem, unless CONFIG_SWIOTLB is active on
x86-32.

However if one has set CONFIG_IOMMU_INTEL, since

  c5a5dc4 ("iommu/vt-d: Don't switch off swiotlb if bounce page is used")

there's a dependency on CONFIG_SWIOTLB, which was not necessarily
active before. That landed in v5.4, where we noticed it in the fli4l
Linux distribution. We have CONFIG_IOMMU_INTEL active on both 32 and 64
bit kernel configs there (I could not find out why, so let's just say
historical reasons).

The effect is at boot time 64 MiB (default size) were allocated for
bounce buffers now, which is a noticeable amount of memory on small
systems like pcengines ALIX 2D3 with 256 MiB memory, which are still
frequently used as home routers.

We noticed this effect when migrating from kernel v4.19 (LTS) to v5.4
(LTS) in fli4l and got that kernel messages for example:

  Linux version 5.4.22 (buildroot@buildroot) (gcc version 7.3.0 (Buildroot 2018.02.8)) #1 SMP Mon Nov 26 23:40:00 CET 2018
  …
  Memory: 183484K/261756K available (4594K kernel code, 393K rwdata, 1660K rodata, 536K init, 456K bss , 78272K reserved, 0K cma-reserved, 0K highmem)
  …
  PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
  software IO TLB: mapped [mem 0x0bb78000-0x0fb78000] (64MB)

The initial analysis and the suggested fix was done by user 'sourcejedi'
at stackoverflow and explicitly marked as GPLv2 for inclusion in the
Linux kernel:

  https://unix.stackexchange.com/a/520525/50007

The new calculation, which does not suffer from that overflow, is the
same as for arch/mips now as suggested by Robin Murphy.

The fix was tested by fli4l users on round about two dozen different
systems, including both 32 and 64 bit archs, bare metal and virtualized
machines.

 [ bp: Massage commit message. ]

Fixes: 1b7e03e ("x86, NUMA: Enable emulation on 32bit too")
Reported-by: Alan Jenkins <alan.christopher.jenkins@gmail.com>
Suggested-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Alexander Dahl <post@lespocky.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: stable@vger.kernel.org
Link: https://unix.stackexchange.com/q/520065/50007
Link: https://web.nettworks.org/bugs/browse/FFL-2560
Link: https://lkml.kernel.org/r/20200526175749.20742-1-post@lespocky.de
tlendacky pushed a commit that referenced this pull request Jun 24, 2020
Replace superfluous VM_BUG_ON() with comment about correct usage.

Technically reverts commit 1d148e2 ("mm: add VM_BUG_ON_PAGE() to
page_mapcount()"), but context lines have changed.

Function isolate_migratepages_block() runs some checks out of lru_lock
when choose pages for migration.  After checking PageLRU() it checks
extra page references by comparing page_count() and page_mapcount().
Between these two checks page could be removed from lru, freed and taken
by slab.

As a result this race triggers VM_BUG_ON(PageSlab()) in page_mapcount().
Race window is tiny.  For certain workload this happens around once a
year.

    page:ffffea0105ca9380 count:1 mapcount:0 mapping:ffff88ff7712c180 index:0x0 compound_mapcount: 0
    flags: 0x500000000008100(slab|head)
    raw: 0500000000008100 dead000000000100 dead000000000200 ffff88ff7712c180
    raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
    page dumped because: VM_BUG_ON_PAGE(PageSlab(page))
    ------------[ cut here ]------------
    kernel BUG at ./include/linux/mm.h:628!
    invalid opcode: 0000 [#1] SMP NOPTI
    CPU: 77 PID: 504 Comm: kcompactd1 Tainted: G        W         4.19.109-27 #1
    Hardware name: Yandex T175-N41-Y3N/MY81-EX0-Y3N, BIOS R05 06/20/2019
    RIP: 0010:isolate_migratepages_block+0x986/0x9b0

The code in isolate_migratepages_block() was added in commit
119d6d5 ("mm, compaction: avoid isolating pinned pages") before
adding VM_BUG_ON into page_mapcount().

This race has been predicted in 2015 by Vlastimil Babka (see link
below).

[akpm@linux-foundation.org: comment tweaks, per Hugh]
Fixes: 1d148e2 ("mm: add VM_BUG_ON_PAGE() to page_mapcount()")
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Hugh Dickins <hughd@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/159032779896.957378.7852761411265662220.stgit@buzz
Link: https://lore.kernel.org/lkml/557710E1.6060103@suse.cz/
Link: https://lore.kernel.org/linux-mm/158937872515.474360.5066096871639561424.stgit@buzz/T/ (v1)
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
tlendacky pushed a commit that referenced this pull request Jun 24, 2020
This patch is to fix a crash:

  [ ] kasan: GPF could be caused by NULL-ptr deref or user memory access
  [ ] general protection fault: 0000 [#1] SMP KASAN PTI
  [ ] RIP: 0010:ipv6_local_error+0xac/0x7a0
  [ ] Call Trace:
  [ ]  xfrm6_local_error+0x1eb/0x300
  [ ]  xfrm_local_error+0x95/0x130
  [ ]  __xfrm6_output+0x65f/0xb50
  [ ]  xfrm6_output+0x106/0x46f
  [ ]  udp_tunnel6_xmit_skb+0x618/0xbf0 [ip6_udp_tunnel]
  [ ]  vxlan_xmit_one+0xbc6/0x2c60 [vxlan]
  [ ]  vxlan_xmit+0x6a0/0x4276 [vxlan]
  [ ]  dev_hard_start_xmit+0x165/0x820
  [ ]  __dev_queue_xmit+0x1ff0/0x2b90
  [ ]  ip_finish_output2+0xd3e/0x1480
  [ ]  ip_do_fragment+0x182d/0x2210
  [ ]  ip_output+0x1d0/0x510
  [ ]  ip_send_skb+0x37/0xa0
  [ ]  raw_sendmsg+0x1b4c/0x2b80
  [ ]  sock_sendmsg+0xc0/0x110

This occurred when sending a v4 skb over vxlan6 over ipsec, in which case
skb->protocol == htons(ETH_P_IPV6) while skb->sk->sk_family == AF_INET in
xfrm_local_error(). Then it will go to xfrm6_local_error() where it tries
to get ipv6 info from a ipv4 sk.

This issue was actually fixed by Commit 628e341 ("xfrm: make local
error reporting more robust"), but brought back by Commit 844d487
("xfrm: choose protocol family by skb protocol").

So to fix it, we should call xfrm6_local_error() only when skb->protocol
is htons(ETH_P_IPV6) and skb->sk->sk_family is AF_INET6.

Fixes: 844d487 ("xfrm: choose protocol family by skb protocol")
Reported-by: Xiumei Mu <xmu@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
mdroth pushed a commit that referenced this pull request Mar 19, 2025
Fix a lockdep warning [1] observed during the write combining test.

The warning indicates a potential nested lock scenario that could lead
to a deadlock.

However, this is a false positive alarm because the SF lock and its
parent lock are distinct ones.

The lockdep confusion arises because the locks belong to the same object
class (i.e., struct mlx5_core_dev).

To resolve this, the code has been refactored to avoid taking both
locks. Instead, only the parent lock is acquired.

[1]
raw_ethernet_bw/2118 is trying to acquire lock:
[  213.619032] ffff88811dd75e08 (&dev->wc_state_lock){+.+.}-{3:3}, at:
               mlx5_wc_support_get+0x18c/0x210 [mlx5_core]
[  213.620270]
[  213.620270] but task is already holding lock:
[  213.620943] ffff88810b585e08 (&dev->wc_state_lock){+.+.}-{3:3}, at:
               mlx5_wc_support_get+0x10c/0x210 [mlx5_core]
[  213.622045]
[  213.622045] other info that might help us debug this:
[  213.622778]  Possible unsafe locking scenario:
[  213.622778]
[  213.623465]        CPU0
[  213.623815]        ----
[  213.624148]   lock(&dev->wc_state_lock);
[  213.624615]   lock(&dev->wc_state_lock);
[  213.625071]
[  213.625071]  *** DEADLOCK ***
[  213.625071]
[  213.625805]  May be due to missing lock nesting notation
[  213.625805]
[  213.626522] 4 locks held by raw_ethernet_bw/2118:
[  213.627019]  #0: ffff88813f80d578 (&uverbs_dev->disassociate_srcu){.+.+}-{0:0},
                at: ib_uverbs_ioctl+0xc4/0x170 [ib_uverbs]
[  213.628088]  #1: ffff88810fb23930 (&file->hw_destroy_rwsem){.+.+}-{3:3},
                at: ib_init_ucontext+0x2d/0xf0 [ib_uverbs]
[  213.629094]  #2: ffff88810fb23878 (&file->ucontext_lock){+.+.}-{3:3},
                at: ib_init_ucontext+0x49/0xf0 [ib_uverbs]
[  213.630106]  #3: ffff88810b585e08 (&dev->wc_state_lock){+.+.}-{3:3},
                at: mlx5_wc_support_get+0x10c/0x210 [mlx5_core]
[  213.631185]
[  213.631185] stack backtrace:
[  213.631718] CPU: 1 UID: 0 PID: 2118 Comm: raw_ethernet_bw Not tainted
               6.12.0-rc7_internal_net_next_mlx5_89a0ad0 #1
[  213.632722] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
               rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[  213.633785] Call Trace:
[  213.634099]
[  213.634393]  dump_stack_lvl+0x7e/0xc0
[  213.634806]  print_deadlock_bug+0x278/0x3c0
[  213.635265]  __lock_acquire+0x15f4/0x2c40
[  213.635712]  lock_acquire+0xcd/0x2d0
[  213.636120]  ? mlx5_wc_support_get+0x18c/0x210 [mlx5_core]
[  213.636722]  ? mlx5_ib_enable_lb+0x24/0xa0 [mlx5_ib]
[  213.637277]  __mutex_lock+0x81/0xda0
[  213.637697]  ? mlx5_wc_support_get+0x18c/0x210 [mlx5_core]
[  213.638305]  ? mlx5_wc_support_get+0x18c/0x210 [mlx5_core]
[  213.638902]  ? rcu_read_lock_sched_held+0x3f/0x70
[  213.639400]  ? mlx5_wc_support_get+0x18c/0x210 [mlx5_core]
[  213.640016]  mlx5_wc_support_get+0x18c/0x210 [mlx5_core]
[  213.640615]  set_ucontext_resp+0x68/0x2b0 [mlx5_ib]
[  213.641144]  ? debug_mutex_init+0x33/0x40
[  213.641586]  mlx5_ib_alloc_ucontext+0x18e/0x7b0 [mlx5_ib]
[  213.642145]  ib_init_ucontext+0xa0/0xf0 [ib_uverbs]
[  213.642679]  ib_uverbs_handler_UVERBS_METHOD_GET_CONTEXT+0x95/0xc0
                [ib_uverbs]
[  213.643426]  ? _copy_from_user+0x46/0x80
[  213.643878]  ib_uverbs_cmd_verbs+0xa6b/0xc80 [ib_uverbs]
[  213.644426]  ? ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0x130/0x130
               [ib_uverbs]
[  213.645213]  ? __lock_acquire+0xa99/0x2c40
[  213.645675]  ? lock_acquire+0xcd/0x2d0
[  213.646101]  ? ib_uverbs_ioctl+0xc4/0x170 [ib_uverbs]
[  213.646625]  ? reacquire_held_locks+0xcf/0x1f0
[  213.647102]  ? do_user_addr_fault+0x45d/0x770
[  213.647586]  ib_uverbs_ioctl+0xe0/0x170 [ib_uverbs]
[  213.648102]  ? ib_uverbs_ioctl+0xc4/0x170 [ib_uverbs]
[  213.648632]  __x64_sys_ioctl+0x4d3/0xaa0
[  213.649060]  ? do_user_addr_fault+0x4a8/0x770
[  213.649528]  do_syscall_64+0x6d/0x140
[  213.649947]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
[  213.650478] RIP: 0033:0x7fa179b0737b
[  213.650893] Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c
               89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8
               10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d
               7d 2a 0f 00 f7 d8 64 89 01 48
[  213.652619] RSP: 002b:00007ffd2e6d46e8 EFLAGS: 00000246 ORIG_RAX:
               0000000000000010
[  213.653390] RAX: ffffffffffffffda RBX: 00007ffd2e6d47f8 RCX:
               00007fa179b0737b
[  213.654084] RDX: 00007ffd2e6d47e0 RSI: 00000000c0181b01 RDI:
               0000000000000003
[  213.654767] RBP: 00007ffd2e6d47c0 R08: 00007fa1799be010 R09:
               0000000000000002
[  213.655453] R10: 00007ffd2e6d4960 R11: 0000000000000246 R12:
               00007ffd2e6d487c
[  213.656170] R13: 0000000000000027 R14: 0000000000000001 R15:
               00007ffd2e6d4f70

Fixes: d98995b ("net/mlx5: Reimplement write combining test")
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
mdroth pushed a commit that referenced this pull request Mar 19, 2025
Clear the port select structure on error so no stale values left after
definers are destroyed. That's because the mlx5_lag_destroy_definers()
always try to destroy all lag definers in the tt_map, so in the flow
below lag definers get double-destroyed and cause kernel crash:

  mlx5_lag_port_sel_create()
    mlx5_lag_create_definers()
      mlx5_lag_create_definer()     <- Failed on tt 1
        mlx5_lag_destroy_definers() <- definers[tt=0] gets destroyed
  mlx5_lag_port_sel_create()
    mlx5_lag_create_definers()
      mlx5_lag_create_definer()     <- Failed on tt 0
        mlx5_lag_destroy_definers() <- definers[tt=0] gets double-destroyed

 Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
 Mem abort info:
   ESR = 0x0000000096000005
   EC = 0x25: DABT (current EL), IL = 32 bits
   SET = 0, FnV = 0
   EA = 0, S1PTW = 0
   FSC = 0x05: level 1 translation fault
 Data abort info:
   ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
 user pgtable: 64k pages, 48-bit VAs, pgdp=0000000112ce2e00
 [0000000000000008] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000
 Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP
 Modules linked in: iptable_raw bonding ip_gre ip6_gre gre ip6_tunnel tunnel6 geneve ip6_udp_tunnel udp_tunnel ipip tunnel4 ip_tunnel rdma_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_ib(OE) ib_uverbs(OE) mlx5_fwctl(OE) fwctl(OE) mlx5_core(OE) mlxdevm(OE) ib_core(OE) mlxfw(OE) memtrack(OE) mlx_compat(OE) openvswitch nsh nf_conncount psample xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter bridge stp llc netconsole overlay efi_pstore sch_fq_codel zram ip_tables crct10dif_ce qemu_fw_cfg fuse ipv6 crc_ccitt [last unloaded: mlx_compat(OE)]
  CPU: 3 UID: 0 PID: 217 Comm: kworker/u53:2 Tainted: G           OE      6.11.0+ #2
  Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
  Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
  Workqueue: mlx5_lag mlx5_do_bond_work [mlx5_core]
  pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
  pc : mlx5_del_flow_rules+0x24/0x2c0 [mlx5_core]
  lr : mlx5_lag_destroy_definer+0x54/0x100 [mlx5_core]
  sp : ffff800085fafb00
  x29: ffff800085fafb00 x28: ffff0000da0c8000 x27: 0000000000000000
  x26: ffff0000da0c8000 x25: ffff0000da0c8000 x24: ffff0000da0c8000
  x23: ffff0000c31f81a0 x22: 0400000000000000 x21: ffff0000da0c8000
  x20: 0000000000000000 x19: 0000000000000001 x18: 0000000000000000
  x17: 0000000000000000 x16: 0000000000000000 x15: 0000ffff8b0c9350
  x14: 0000000000000000 x13: ffff800081390d18 x12: ffff800081dc3cc0
  x11: 0000000000000001 x10: 0000000000000b10 x9 : ffff80007ab7304c
  x8 : ffff0000d00711f0 x7 : 0000000000000004 x6 : 0000000000000190
  x5 : ffff00027edb3010 x4 : 0000000000000000 x3 : 0000000000000000
  x2 : ffff0000d39b8000 x1 : ffff0000d39b8000 x0 : 0400000000000000
  Call trace:
   mlx5_del_flow_rules+0x24/0x2c0 [mlx5_core]
   mlx5_lag_destroy_definer+0x54/0x100 [mlx5_core]
   mlx5_lag_destroy_definers+0xa0/0x108 [mlx5_core]
   mlx5_lag_port_sel_create+0x2d4/0x6f8 [mlx5_core]
   mlx5_activate_lag+0x60c/0x6f8 [mlx5_core]
   mlx5_do_bond_work+0x284/0x5c8 [mlx5_core]
   process_one_work+0x170/0x3e0
   worker_thread+0x2d8/0x3e0
   kthread+0x11c/0x128
   ret_from_fork+0x10/0x20
  Code: a9025bf5 aa0003f6 a90363f7 f90023f9 (f9400400)
  ---[ end trace 0000000000000000 ]---

Fixes: dc48516 ("net/mlx5: Lag, add support to create definers for LAG")
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
mdroth pushed a commit that referenced this pull request Mar 19, 2025
Attempt to enable IPsec packet offload in tunnel mode in debug kernel
generates the following kernel panic, which is happening due to two
issues:
1. In SA add section, the should be _bh() variant when marking SA mode.
2. There is not needed flush_workqueue in SA delete routine. It is not
needed as at this stage as it is removed from SADB and the running work
will be canceled later in SA free.

 =====================================================
 WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected
 6.12.0+ #4 Not tainted
 -----------------------------------------------------
 charon/1337 [HC0[0]:SC0[4]:HE1:SE0] is trying to acquire:
 ffff88810f365020 (&xa->xa_lock#24){+.+.}-{3:3}, at: mlx5e_xfrm_del_state+0xca/0x1e0 [mlx5_core]

 and this task is already holding:
 ffff88813e0f0d48 (&x->lock){+.-.}-{3:3}, at: xfrm_state_delete+0x16/0x30
 which would create a new lock dependency:
  (&x->lock){+.-.}-{3:3} -> (&xa->xa_lock#24){+.+.}-{3:3}

 but this new dependency connects a SOFTIRQ-irq-safe lock:
  (&x->lock){+.-.}-{3:3}

 ... which became SOFTIRQ-irq-safe at:
   lock_acquire+0x1be/0x520
   _raw_spin_lock_bh+0x34/0x40
   xfrm_timer_handler+0x91/0xd70
   __hrtimer_run_queues+0x1dd/0xa60
   hrtimer_run_softirq+0x146/0x2e0
   handle_softirqs+0x266/0x860
   irq_exit_rcu+0x115/0x1a0
   sysvec_apic_timer_interrupt+0x6e/0x90
   asm_sysvec_apic_timer_interrupt+0x16/0x20
   default_idle+0x13/0x20
   default_idle_call+0x67/0xa0
   do_idle+0x2da/0x320
   cpu_startup_entry+0x50/0x60
   start_secondary+0x213/0x2a0
   common_startup_64+0x129/0x138

 to a SOFTIRQ-irq-unsafe lock:
  (&xa->xa_lock#24){+.+.}-{3:3}

 ... which became SOFTIRQ-irq-unsafe at:
 ...
   lock_acquire+0x1be/0x520
   _raw_spin_lock+0x2c/0x40
   xa_set_mark+0x70/0x110
   mlx5e_xfrm_add_state+0xe48/0x2290 [mlx5_core]
   xfrm_dev_state_add+0x3bb/0xd70
   xfrm_add_sa+0x2451/0x4a90
   xfrm_user_rcv_msg+0x493/0x880
   netlink_rcv_skb+0x12e/0x380
   xfrm_netlink_rcv+0x6d/0x90
   netlink_unicast+0x42f/0x740
   netlink_sendmsg+0x745/0xbe0
   __sock_sendmsg+0xc5/0x190
   __sys_sendto+0x1fe/0x2c0
   __x64_sys_sendto+0xdc/0x1b0
   do_syscall_64+0x6d/0x140
   entry_SYSCALL_64_after_hwframe+0x4b/0x53

 other info that might help us debug this:

  Possible interrupt unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(&xa->xa_lock#24);
                                local_irq_disable();
                                lock(&x->lock);
                                lock(&xa->xa_lock#24);
   <Interrupt>
     lock(&x->lock);

  *** DEADLOCK ***

 2 locks held by charon/1337:
  #0: ffffffff87f8f858 (&net->xfrm.xfrm_cfg_mutex){+.+.}-{4:4}, at: xfrm_netlink_rcv+0x5e/0x90
  #1: ffff88813e0f0d48 (&x->lock){+.-.}-{3:3}, at: xfrm_state_delete+0x16/0x30

 the dependencies between SOFTIRQ-irq-safe lock and the holding lock:
 -> (&x->lock){+.-.}-{3:3} ops: 29 {
    HARDIRQ-ON-W at:
                     lock_acquire+0x1be/0x520
                     _raw_spin_lock_bh+0x34/0x40
                     xfrm_alloc_spi+0xc0/0xe60
                     xfrm_alloc_userspi+0x5f6/0xbc0
                     xfrm_user_rcv_msg+0x493/0x880
                     netlink_rcv_skb+0x12e/0x380
                     xfrm_netlink_rcv+0x6d/0x90
                     netlink_unicast+0x42f/0x740
                     netlink_sendmsg+0x745/0xbe0
                     __sock_sendmsg+0xc5/0x190
                     __sys_sendto+0x1fe/0x2c0
                     __x64_sys_sendto+0xdc/0x1b0
                     do_syscall_64+0x6d/0x140
                     entry_SYSCALL_64_after_hwframe+0x4b/0x53
    IN-SOFTIRQ-W at:
                     lock_acquire+0x1be/0x520
                     _raw_spin_lock_bh+0x34/0x40
                     xfrm_timer_handler+0x91/0xd70
                     __hrtimer_run_queues+0x1dd/0xa60
                     hrtimer_run_softirq+0x146/0x2e0
                     handle_softirqs+0x266/0x860
                     irq_exit_rcu+0x115/0x1a0
                     sysvec_apic_timer_interrupt+0x6e/0x90
                     asm_sysvec_apic_timer_interrupt+0x16/0x20
                     default_idle+0x13/0x20
                     default_idle_call+0x67/0xa0
                     do_idle+0x2da/0x320
                     cpu_startup_entry+0x50/0x60
                     start_secondary+0x213/0x2a0
                     common_startup_64+0x129/0x138
    INITIAL USE at:
                    lock_acquire+0x1be/0x520
                    _raw_spin_lock_bh+0x34/0x40
                    xfrm_alloc_spi+0xc0/0xe60
                    xfrm_alloc_userspi+0x5f6/0xbc0
                    xfrm_user_rcv_msg+0x493/0x880
                    netlink_rcv_skb+0x12e/0x380
                    xfrm_netlink_rcv+0x6d/0x90
                    netlink_unicast+0x42f/0x740
                    netlink_sendmsg+0x745/0xbe0
                    __sock_sendmsg+0xc5/0x190
                    __sys_sendto+0x1fe/0x2c0
                    __x64_sys_sendto+0xdc/0x1b0
                    do_syscall_64+0x6d/0x140
                    entry_SYSCALL_64_after_hwframe+0x4b/0x53
  }
  ... key      at: [<ffffffff87f9cd20>] __key.18+0x0/0x40

 the dependencies between the lock to be acquired
  and SOFTIRQ-irq-unsafe lock:
 -> (&xa->xa_lock#24){+.+.}-{3:3} ops: 9 {
    HARDIRQ-ON-W at:
                     lock_acquire+0x1be/0x520
                     _raw_spin_lock_bh+0x34/0x40
                     mlx5e_xfrm_add_state+0xc5b/0x2290 [mlx5_core]
                     xfrm_dev_state_add+0x3bb/0xd70
                     xfrm_add_sa+0x2451/0x4a90
                     xfrm_user_rcv_msg+0x493/0x880
                     netlink_rcv_skb+0x12e/0x380
                     xfrm_netlink_rcv+0x6d/0x90
                     netlink_unicast+0x42f/0x740
                     netlink_sendmsg+0x745/0xbe0
                     __sock_sendmsg+0xc5/0x190
                     __sys_sendto+0x1fe/0x2c0
                     __x64_sys_sendto+0xdc/0x1b0
                     do_syscall_64+0x6d/0x140
                     entry_SYSCALL_64_after_hwframe+0x4b/0x53
    SOFTIRQ-ON-W at:
                     lock_acquire+0x1be/0x520
                     _raw_spin_lock+0x2c/0x40
                     xa_set_mark+0x70/0x110
                     mlx5e_xfrm_add_state+0xe48/0x2290 [mlx5_core]
                     xfrm_dev_state_add+0x3bb/0xd70
                     xfrm_add_sa+0x2451/0x4a90
                     xfrm_user_rcv_msg+0x493/0x880
                     netlink_rcv_skb+0x12e/0x380
                     xfrm_netlink_rcv+0x6d/0x90
                     netlink_unicast+0x42f/0x740
                     netlink_sendmsg+0x745/0xbe0
                     __sock_sendmsg+0xc5/0x190
                     __sys_sendto+0x1fe/0x2c0
                     __x64_sys_sendto+0xdc/0x1b0
                     do_syscall_64+0x6d/0x140
                     entry_SYSCALL_64_after_hwframe+0x4b/0x53
    INITIAL USE at:
                    lock_acquire+0x1be/0x520
                    _raw_spin_lock_bh+0x34/0x40
                    mlx5e_xfrm_add_state+0xc5b/0x2290 [mlx5_core]
                    xfrm_dev_state_add+0x3bb/0xd70
                    xfrm_add_sa+0x2451/0x4a90
                    xfrm_user_rcv_msg+0x493/0x880
                    netlink_rcv_skb+0x12e/0x380
                    xfrm_netlink_rcv+0x6d/0x90
                    netlink_unicast+0x42f/0x740
                    netlink_sendmsg+0x745/0xbe0
                    __sock_sendmsg+0xc5/0x190
                    __sys_sendto+0x1fe/0x2c0
                    __x64_sys_sendto+0xdc/0x1b0
                    do_syscall_64+0x6d/0x140
                    entry_SYSCALL_64_after_hwframe+0x4b/0x53
  }
  ... key      at: [<ffffffffa078ff60>] __key.48+0x0/0xfffffffffff210a0 [mlx5_core]
  ... acquired at:
    __lock_acquire+0x30a0/0x5040
    lock_acquire+0x1be/0x520
    _raw_spin_lock_bh+0x34/0x40
    mlx5e_xfrm_del_state+0xca/0x1e0 [mlx5_core]
    xfrm_dev_state_delete+0x90/0x160
    __xfrm_state_delete+0x662/0xae0
    xfrm_state_delete+0x1e/0x30
    xfrm_del_sa+0x1c2/0x340
    xfrm_user_rcv_msg+0x493/0x880
    netlink_rcv_skb+0x12e/0x380
    xfrm_netlink_rcv+0x6d/0x90
    netlink_unicast+0x42f/0x740
    netlink_sendmsg+0x745/0xbe0
    __sock_sendmsg+0xc5/0x190
    __sys_sendto+0x1fe/0x2c0
    __x64_sys_sendto+0xdc/0x1b0
    do_syscall_64+0x6d/0x140
    entry_SYSCALL_64_after_hwframe+0x4b/0x53

 stack backtrace:
 CPU: 7 UID: 0 PID: 1337 Comm: charon Not tainted 6.12.0+ #4
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
 Call Trace:
  <TASK>
  dump_stack_lvl+0x74/0xd0
  check_irq_usage+0x12e8/0x1d90
  ? print_shortest_lock_dependencies_backwards+0x1b0/0x1b0
  ? check_chain_key+0x1bb/0x4c0
  ? __lockdep_reset_lock+0x180/0x180
  ? check_path.constprop.0+0x24/0x50
  ? mark_lock+0x108/0x2fb0
  ? print_circular_bug+0x9b0/0x9b0
  ? mark_lock+0x108/0x2fb0
  ? print_usage_bug.part.0+0x670/0x670
  ? check_prev_add+0x1c4/0x2310
  check_prev_add+0x1c4/0x2310
  __lock_acquire+0x30a0/0x5040
  ? lockdep_set_lock_cmp_fn+0x190/0x190
  ? lockdep_set_lock_cmp_fn+0x190/0x190
  lock_acquire+0x1be/0x520
  ? mlx5e_xfrm_del_state+0xca/0x1e0 [mlx5_core]
  ? lockdep_hardirqs_on_prepare+0x400/0x400
  ? __xfrm_state_delete+0x5f0/0xae0
  ? lock_downgrade+0x6b0/0x6b0
  _raw_spin_lock_bh+0x34/0x40
  ? mlx5e_xfrm_del_state+0xca/0x1e0 [mlx5_core]
  mlx5e_xfrm_del_state+0xca/0x1e0 [mlx5_core]
  xfrm_dev_state_delete+0x90/0x160
  __xfrm_state_delete+0x662/0xae0
  xfrm_state_delete+0x1e/0x30
  xfrm_del_sa+0x1c2/0x340
  ? xfrm_get_sa+0x250/0x250
  ? check_chain_key+0x1bb/0x4c0
  xfrm_user_rcv_msg+0x493/0x880
  ? copy_sec_ctx+0x270/0x270
  ? check_chain_key+0x1bb/0x4c0
  ? lockdep_set_lock_cmp_fn+0x190/0x190
  ? lockdep_set_lock_cmp_fn+0x190/0x190
  netlink_rcv_skb+0x12e/0x380
  ? copy_sec_ctx+0x270/0x270
  ? netlink_ack+0xd90/0xd90
  ? netlink_deliver_tap+0xcd/0xb60
  xfrm_netlink_rcv+0x6d/0x90
  netlink_unicast+0x42f/0x740
  ? netlink_attachskb+0x730/0x730
  ? lock_acquire+0x1be/0x520
  netlink_sendmsg+0x745/0xbe0
  ? netlink_unicast+0x740/0x740
  ? __might_fault+0xbb/0x170
  ? netlink_unicast+0x740/0x740
  __sock_sendmsg+0xc5/0x190
  ? fdget+0x163/0x1d0
  __sys_sendto+0x1fe/0x2c0
  ? __x64_sys_getpeername+0xb0/0xb0
  ? do_user_addr_fault+0x856/0xe30
  ? lock_acquire+0x1be/0x520
  ? __task_pid_nr_ns+0x117/0x410
  ? lock_downgrade+0x6b0/0x6b0
  __x64_sys_sendto+0xdc/0x1b0
  ? lockdep_hardirqs_on_prepare+0x284/0x400
  do_syscall_64+0x6d/0x140
  entry_SYSCALL_64_after_hwframe+0x4b/0x53
 RIP: 0033:0x7f7d31291ba4
 Code: 7d e8 89 4d d4 e8 4c 42 f7 ff 44 8b 4d d0 4c 8b 45 c8 89 c3 44 8b 55 d4 8b 7d e8 b8 2c 00 00 00 48 8b 55 d8 48 8b 75 e0 0f 05 <48> 3d 00 f0 ff ff 77 34 89 df 48 89 45 e8 e8 99 42 f7 ff 48 8b 45
 RSP: 002b:00007f7d2ccd94f0 EFLAGS: 00000297 ORIG_RAX: 000000000000002c
 RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f7d31291ba4
 RDX: 0000000000000028 RSI: 00007f7d2ccd96a0 RDI: 000000000000000a
 RBP: 00007f7d2ccd9530 R08: 00007f7d2ccd9598 R09: 000000000000000c
 R10: 0000000000000000 R11: 0000000000000297 R12: 0000000000000028
 R13: 00007f7d2ccd9598 R14: 00007f7d2ccd96a0 R15: 00000000000000e1
  </TASK>

Fixes: 4c24272 ("net/mlx5e: Listen to ARP events to update IPsec L2 headers in tunnel mode")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
headheadhead-csie pushed a commit to cvm-networking/linux-sev-snp that referenced this pull request Sep 1, 2025
Some of the core functions can only be called if the transport
has been assigned.

As Michal reported, a socket might have the transport at NULL,
for example after a failed connect(), causing the following trace:

    BUG: kernel NULL pointer dereference, address: 00000000000000a0
    #PF: supervisor read access in kernel mode
    #PF: error_code(0x0000) - not-present page
    PGD 12faf8067 P4D 12faf8067 PUD 113670067 PMD 0
    Oops: Oops: 0000 [AMDESE#1] PREEMPT SMP NOPTI
    CPU: 15 UID: 0 PID: 1198 Comm: a.out Not tainted 6.13.0-rc2+
    RIP: 0010:vsock_connectible_has_data+0x1f/0x40
    Call Trace:
     vsock_bpf_recvmsg+0xca/0x5e0
     sock_recvmsg+0xb9/0xc0
     __sys_recvfrom+0xb3/0x130
     __x64_sys_recvfrom+0x20/0x30
     do_syscall_64+0x93/0x180
     entry_SYSCALL_64_after_hwframe+0x76/0x7e

So we need to check the `vsk->transport` in vsock_bpf_recvmsg(),
especially for connected sockets (stream/seqpacket) as we already
do in __vsock_connectible_recvmsg().

Fixes: 634f1a7 ("vsock: support sockmap")
Cc: stable@vger.kernel.org
Reported-by: Michal Luczaj <mhal@rbox.co>
Closes: https://lore.kernel.org/netdev/5ca20d4c-1017-49c2-9516-f6f75fd331e9@rbox.co/
Tested-by: Michal Luczaj <mhal@rbox.co>
Reported-by: syzbot+3affdbfc986ecd9200fd@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/677f84a8.050a0220.25a300.01b3.GAE@google.com/
Tested-by: syzbot+3affdbfc986ecd9200fd@syzkaller.appspotmail.com
Reviewed-by: Hyunwoo Kim <v4bel@theori.io>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Luigi Leonardi <leonardi@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
mdroth pushed a commit that referenced this pull request Oct 10, 2025
The 'icc_bw_lock' mutex is introduced in commit af42269
("interconnect: Fix locking for runpm vs reclaim") in order to decouple
serialization of bw aggregation from codepaths that require memory
allocation.

However commit d30f83d ("interconnect: core: Add dynamic id
allocation support") added a devm_kasprintf() call into a path protected
by the 'icc_bw_lock' which causes the following lockdep warning on
machines like the Lenovo ThinkPad X13s:

    ======================================================
    WARNING: possible circular locking dependency detected
    6.16.0-rc3 torvalds#15 Not tainted
    ------------------------------------------------------
    (udev-worker)/342 is trying to acquire lock:
    ffffb973f7ec4638 (fs_reclaim){+.+.}-{0:0}, at: __kmalloc_node_track_caller_noprof+0xa0/0x3e0

    but task is already holding lock:
    ffffb973f7f7f0e8 (icc_bw_lock){+.+.}-{4:4}, at: icc_node_add+0x44/0x154

    which lock already depends on the new lock.

    the existing dependency chain (in reverse order) is:

    -> #1 (icc_bw_lock){+.+.}-{4:4}:
           icc_init+0x48/0x108
           do_one_initcall+0x64/0x30c
           kernel_init_freeable+0x27c/0x500
           kernel_init+0x20/0x1d8
           ret_from_fork+0x10/0x20

    -> #0 (fs_reclaim){+.+.}-{0:0}:
           __lock_acquire+0x136c/0x2114
           lock_acquire+0x1c8/0x354
           fs_reclaim_acquire+0x74/0xa8
           __kmalloc_node_track_caller_noprof+0xa0/0x3e0
           devm_kmalloc+0x54/0x124
           devm_kvasprintf+0x74/0xd4
           devm_kasprintf+0x58/0x80
           icc_node_add+0xb4/0x154
           qcom_osm_l3_probe+0x20c/0x314 [icc_osm_l3]
           platform_probe+0x68/0xd8
           really_probe+0xc0/0x38c
           __driver_probe_device+0x7c/0x160
           driver_probe_device+0x40/0x110
           __driver_attach+0xfc/0x208
           bus_for_each_dev+0x74/0xd0
           driver_attach+0x24/0x30
           bus_add_driver+0x110/0x234
           driver_register+0x60/0x128
           __platform_driver_register+0x24/0x30
           osm_l3_driver_init+0x20/0x1000 [icc_osm_l3]
           do_one_initcall+0x64/0x30c
           do_init_module+0x58/0x23c
           load_module+0x1df8/0x1f70
           init_module_from_file+0x88/0xc4
           idempotent_init_module+0x188/0x280
           __arm64_sys_finit_module+0x6c/0xd8
           invoke_syscall+0x48/0x110
           el0_svc_common.constprop.0+0xc0/0xe0
           do_el0_svc+0x1c/0x28
           el0_svc+0x4c/0x158
           el0t_64_sync_handler+0xc8/0xcc
           el0t_64_sync+0x198/0x19c

    other info that might help us debug this:

     Possible unsafe locking scenario:

           CPU0                    CPU1
           ----                    ----
      lock(icc_bw_lock);
                                   lock(fs_reclaim);
                                   lock(icc_bw_lock);
      lock(fs_reclaim);

     *** DEADLOCK ***

The icc_node_add() functions is not designed to fail, and as such it
should not do any memory allocation. In order to avoid this, add a new
helper function for the name generation to be called by drivers which
are using the new dynamic id feature.

Fixes: d30f83d ("interconnect: core: Add dynamic id allocation support")
Signed-off-by: Gabor Juhos <j4g8y7@gmail.com>
Link: https://lore.kernel.org/r/20250625-icc-bw-lockdep-v3-1-2b8f8b8987c4@gmail.com
Co-developed-by: Johan Hovold <johan+linaro@kernel.org>
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Link: https://lore.kernel.org/r/20250627075854.26943-1-johan+linaro@kernel.org
Signed-off-by: Georgi Djakov <djakov@kernel.org>
mdroth pushed a commit that referenced this pull request Oct 10, 2025
On MicroChip MPFS Icicle:

  microchip-pcie 2000000000.pcie: host bridge /soc/pcie@2000000000 ranges:
  microchip-pcie 2000000000.pcie: Parsing ranges property...
  microchip-pcie 2000000000.pcie:      MEM 0x2008000000..0x2087ffffff -> 0x0008000000
  Unable to handle kernel NULL pointer dereference at virtual address 0000000000000368
  Current swapper/0 pgtable: 4K pagesize, 39-bit VAs, pgdp=0x00000000814f1000
  [0000000000000368] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000
  Oops [#1]
  Modules linked in:
  CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.15.0-rc1-icicle-00003-gafc0a570bb61 torvalds#232 NONE
  Hardware name: Microchip PolarFire-SoC Icicle Kit (DT)
  [...]
  [<ffffffff803fb8a4>] plda_pcie_setup_iomems+0xe/0x78
  [<ffffffff803fc246>] mc_platform_init+0x80/0x1d2
  [<ffffffff803f9c88>] pci_ecam_create+0x104/0x1e2
  [<ffffffff8000adbe>] pci_host_common_init+0x120/0x228
  [<ffffffff8000af42>] pci_host_common_probe+0x7c/0x8a

The initialization of driver_data was moved after the call to
gen_pci_init(), while the pci_ecam_ops.init() callback
mc_platform_init() expects it has already been initialized.

Fix this by moving the initialization of driver_data up.

Fixes: afc0a57 ("PCI: host-generic: Extract an ECAM bridge creation helper from pci_host_common_probe()")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/774290708a6f0f683711914fda110742c18a7fb2.1750787223.git.geert+renesas@glider.be
Link: https://patch.msgid.link/20250625111806.4153773-2-maz@kernel.org
mdroth pushed a commit that referenced this pull request Oct 10, 2025
When I run the NVME over TCP test in virtme-ng, I get the following
"suspicious RCU usage" warning in nvme_mpath_add_sysfs_link():

'''
[    5.024557][   T44] nvmet: Created nvm controller 1 for subsystem nqn.2025-06.org.nvmexpress.mptcp for NQN nqn.2014-08.org.nvmexpress:uuid:f7f6b5e0-ff97-4894-98ac-c85309e0bc77.
[    5.027401][  T183] nvme nvme0: creating 2 I/O queues.
[    5.029017][  T183] nvme nvme0: mapped 2/0/0 default/read/poll queues.
[    5.032587][  T183] nvme nvme0: new ctrl: NQN "nqn.2025-06.org.nvmexpress.mptcp", addr 127.0.0.1:4420, hostnqn: nqn.2014-08.org.nvmexpress:uuid:f7f6b5e0-ff97-4894-98ac-c85309e0bc77
[    5.042214][   T25]
[    5.042440][   T25] =============================
[    5.042579][   T25] WARNING: suspicious RCU usage
[    5.042705][   T25] 6.16.0-rc3+ torvalds#23 Not tainted
[    5.042812][   T25] -----------------------------
[    5.042934][   T25] drivers/nvme/host/multipath.c:1203 RCU-list traversed in non-reader section!!
[    5.043111][   T25]
[    5.043111][   T25] other info that might help us debug this:
[    5.043111][   T25]
[    5.043341][   T25]
[    5.043341][   T25] rcu_scheduler_active = 2, debug_locks = 1
[    5.043502][   T25] 3 locks held by kworker/u9:0/25:
[    5.043615][   T25]  #0: ffff888008730948 ((wq_completion)async){+.+.}-{0:0}, at: process_one_work+0x7ed/0x1350
[    5.043830][   T25]  #1: ffffc900001afd40 ((work_completion)(&entry->work)){+.+.}-{0:0}, at: process_one_work+0xcf3/0x1350
[    5.044084][   T25]  #2: ffff888013ee0020 (&head->srcu){.+.+}-{0:0}, at: nvme_mpath_add_sysfs_link.part.0+0xb4/0x3a0
[    5.044300][   T25]
[    5.044300][   T25] stack backtrace:
[    5.044439][   T25] CPU: 0 UID: 0 PID: 25 Comm: kworker/u9:0 Not tainted 6.16.0-rc3+ torvalds#23 PREEMPT(full)
[    5.044441][   T25] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[    5.044442][   T25] Workqueue: async async_run_entry_fn
[    5.044445][   T25] Call Trace:
[    5.044446][   T25]  <TASK>
[    5.044449][   T25]  dump_stack_lvl+0x6f/0xb0
[    5.044453][   T25]  lockdep_rcu_suspicious.cold+0x4f/0xb1
[    5.044457][   T25]  nvme_mpath_add_sysfs_link.part.0+0x2fb/0x3a0
[    5.044459][   T25]  ? queue_work_on+0x90/0xf0
[    5.044461][   T25]  ? lockdep_hardirqs_on+0x78/0x110
[    5.044466][   T25]  nvme_mpath_set_live+0x1e9/0x4f0
[    5.044470][   T25]  nvme_mpath_add_disk+0x240/0x2f0
[    5.044472][   T25]  ? __pfx_nvme_mpath_add_disk+0x10/0x10
[    5.044475][   T25]  ? add_disk_fwnode+0x361/0x580
[    5.044480][   T25]  nvme_alloc_ns+0x81c/0x17c0
[    5.044483][   T25]  ? kasan_quarantine_put+0x104/0x240
[    5.044487][   T25]  ? __pfx_nvme_alloc_ns+0x10/0x10
[    5.044495][   T25]  ? __pfx_nvme_find_get_ns+0x10/0x10
[    5.044496][   T25]  ? rcu_read_lock_any_held+0x45/0xa0
[    5.044498][   T25]  ? validate_chain+0x232/0x4f0
[    5.044503][   T25]  nvme_scan_ns+0x4c8/0x810
[    5.044506][   T25]  ? __pfx_nvme_scan_ns+0x10/0x10
[    5.044508][   T25]  ? find_held_lock+0x2b/0x80
[    5.044512][   T25]  ? ktime_get+0x16d/0x220
[    5.044517][   T25]  ? kvm_clock_get_cycles+0x18/0x30
[    5.044520][   T25]  ? __pfx_nvme_scan_ns_async+0x10/0x10
[    5.044522][   T25]  async_run_entry_fn+0x97/0x560
[    5.044523][   T25]  ? rcu_is_watching+0x12/0xc0
[    5.044526][   T25]  process_one_work+0xd3c/0x1350
[    5.044532][   T25]  ? __pfx_process_one_work+0x10/0x10
[    5.044536][   T25]  ? assign_work+0x16c/0x240
[    5.044539][   T25]  worker_thread+0x4da/0xd50
[    5.044545][   T25]  ? __pfx_worker_thread+0x10/0x10
[    5.044546][   T25]  kthread+0x356/0x5c0
[    5.044548][   T25]  ? __pfx_kthread+0x10/0x10
[    5.044549][   T25]  ? ret_from_fork+0x1b/0x2e0
[    5.044552][   T25]  ? __lock_release.isra.0+0x5d/0x180
[    5.044553][   T25]  ? ret_from_fork+0x1b/0x2e0
[    5.044555][   T25]  ? rcu_is_watching+0x12/0xc0
[    5.044557][   T25]  ? __pfx_kthread+0x10/0x10
[    5.044559][   T25]  ret_from_fork+0x218/0x2e0
[    5.044561][   T25]  ? __pfx_kthread+0x10/0x10
[    5.044562][   T25]  ret_from_fork_asm+0x1a/0x30
[    5.044570][   T25]  </TASK>
'''

This patch uses sleepable RCU version of helper list_for_each_entry_srcu()
instead of list_for_each_entry_rcu() to fix it.

Fixes: 4dbd2b2 ("nvme-multipath: Add visibility for round-robin io-policy")
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Nilay Shroff <nilay@linux.ibm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
mdroth pushed a commit that referenced this pull request Oct 10, 2025
With VIRTCHNL2_CAP_MACFILTER enabled, the following warning is generated
on module load:

[  324.701677] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:578
[  324.701684] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1582, name: NetworkManager
[  324.701689] preempt_count: 201, expected: 0
[  324.701693] RCU nest depth: 0, expected: 0
[  324.701697] 2 locks held by NetworkManager/1582:
[  324.701702]  #0: ffffffff9f7be770 (rtnl_mutex){....}-{3:3}, at: rtnl_newlink+0x791/0x21e0
[  324.701730]  #1: ff1100216c380368 (_xmit_ETHER){....}-{2:2}, at: __dev_open+0x3f0/0x870
[  324.701749] Preemption disabled at:
[  324.701752] [<ffffffff9cd23b9d>] __dev_open+0x3dd/0x870
[  324.701765] CPU: 30 UID: 0 PID: 1582 Comm: NetworkManager Not tainted 6.15.0-rc5+ #2 PREEMPT(voluntary)
[  324.701771] Hardware name: Intel Corporation M50FCP2SBSTD/M50FCP2SBSTD, BIOS SE5C741.86B.01.01.0001.2211140926 11/14/2022
[  324.701774] Call Trace:
[  324.701777]  <TASK>
[  324.701779]  dump_stack_lvl+0x5d/0x80
[  324.701788]  ? __dev_open+0x3dd/0x870
[  324.701793]  __might_resched.cold+0x1ef/0x23d
<..>
[  324.701818]  __mutex_lock+0x113/0x1b80
<..>
[  324.701917]  idpf_ctlq_clean_sq+0xad/0x4b0 [idpf]
[  324.701935]  ? kasan_save_track+0x14/0x30
[  324.701941]  idpf_mb_clean+0x143/0x380 [idpf]
<..>
[  324.701991]  idpf_send_mb_msg+0x111/0x720 [idpf]
[  324.702009]  idpf_vc_xn_exec+0x4cc/0x990 [idpf]
[  324.702021]  ? rcu_is_watching+0x12/0xc0
[  324.702035]  idpf_add_del_mac_filters+0x3ed/0xb50 [idpf]
<..>
[  324.702122]  __hw_addr_sync_dev+0x1cf/0x300
[  324.702126]  ? find_held_lock+0x32/0x90
[  324.702134]  idpf_set_rx_mode+0x317/0x390 [idpf]
[  324.702152]  __dev_open+0x3f8/0x870
[  324.702159]  ? __pfx___dev_open+0x10/0x10
[  324.702174]  __dev_change_flags+0x443/0x650
<..>
[  324.702208]  netif_change_flags+0x80/0x160
[  324.702218]  do_setlink.isra.0+0x16a0/0x3960
<..>
[  324.702349]  rtnl_newlink+0x12fd/0x21e0

The sequence is as follows:
	rtnl_newlink()->
	__dev_change_flags()->
	__dev_open()->
	dev_set_rx_mode() - >  # disables BH and grabs "dev->addr_list_lock"
	idpf_set_rx_mode() ->  # proceed only if VIRTCHNL2_CAP_MACFILTER is ON
	__dev_uc_sync() ->
	idpf_add_mac_filter ->
	idpf_add_del_mac_filters ->
	idpf_send_mb_msg() ->
	idpf_mb_clean() ->
	idpf_ctlq_clean_sq()   # mutex_lock(cq_lock)

Fix by converting cq_lock to a spinlock. All operations under the new
lock are safe except freeing the DMA memory, which may use vunmap(). Fix
by requesting a contiguous physical memory for the DMA mapping.

Fixes: a251eee ("idpf: add SRIOV support and other ndo_ops")
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
mdroth pushed a commit that referenced this pull request Oct 10, 2025
Mitigate e.g. the following:

    # echo 1e789080.lpc-snoop > /sys/bus/platform/drivers/aspeed-lpc-snoop/unbind
    ...
    [  120.363594] Unable to handle kernel NULL pointer dereference at virtual address 00000004 when write
    [  120.373866] [00000004] *pgd=00000000
    [  120.377910] Internal error: Oops: 805 [#1] SMP ARM
    [  120.383306] CPU: 1 UID: 0 PID: 315 Comm: sh Not tainted 6.15.0-rc1-00009-g926217bc7d7d-dirty torvalds#20 NONE
    ...
    [  120.679543] Call trace:
    [  120.679559]  misc_deregister from aspeed_lpc_snoop_remove+0x84/0xac
    [  120.692462]  aspeed_lpc_snoop_remove from platform_remove+0x28/0x38
    [  120.700996]  platform_remove from device_release_driver_internal+0x188/0x200
    ...

Fixes: 9f4f9ae ("drivers/misc: add Aspeed LPC snoop driver")
Cc: stable@vger.kernel.org
Cc: Jean Delvare <jdelvare@suse.de>
Acked-by: Jean Delvare <jdelvare@suse.de>
Link: https://patch.msgid.link/20250616-aspeed-lpc-snoop-fixes-v2-2-3cdd59c934d3@codeconstruct.com.au
Signed-off-by: Andrew Jeffery <andrew@codeconstruct.com.au>
mdroth pushed a commit that referenced this pull request Oct 10, 2025
Currently, an interrupt can be triggered during a GPU reset, which can
lead to GPU hangs and NULL pointer dereference in an interrupt context
as shown in the following trace:

 [  314.035040] Unable to handle kernel NULL pointer dereference at virtual address 00000000000000c0
 [  314.043822] Mem abort info:
 [  314.046606]   ESR = 0x0000000096000005
 [  314.050347]   EC = 0x25: DABT (current EL), IL = 32 bits
 [  314.055651]   SET = 0, FnV = 0
 [  314.058695]   EA = 0, S1PTW = 0
 [  314.061826]   FSC = 0x05: level 1 translation fault
 [  314.066694] Data abort info:
 [  314.069564]   ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
 [  314.075039]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
 [  314.080080]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
 [  314.085382] user pgtable: 4k pages, 39-bit VAs, pgdp=0000000102728000
 [  314.091814] [00000000000000c0] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000
 [  314.100511] Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP
 [  314.106770] Modules linked in: v3d i2c_brcmstb vc4 snd_soc_hdmi_codec gpu_sched drm_shmem_helper drm_display_helper cec drm_dma_helper drm_kms_helper drm drm_panel_orientation_quirks snd_soc_core snd_compress snd_pcm_dmaengine snd_pcm snd_timer snd backlight
 [  314.129654] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.12.25+rpt-rpi-v8 #1  Debian 1:6.12.25-1+rpt1
 [  314.139388] Hardware name: Raspberry Pi 4 Model B Rev 1.4 (DT)
 [  314.145211] pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
 [  314.152165] pc : v3d_irq+0xec/0x2e0 [v3d]
 [  314.156187] lr : v3d_irq+0xe0/0x2e0 [v3d]
 [  314.160198] sp : ffffffc080003ea0
 [  314.163502] x29: ffffffc080003ea0 x28: ffffffec1f184980 x27: 021202b000000000
 [  314.170633] x26: ffffffec1f17f630 x25: ffffff8101372000 x24: ffffffec1f17d9f0
 [  314.177764] x23: 000000000000002a x22: 000000000000002a x21: ffffff8103252000
 [  314.184895] x20: 0000000000000001 x19: 00000000deadbeef x18: 0000000000000000
 [  314.192026] x17: ffffff94e51d2000 x16: ffffffec1dac3cb0 x15: c306000000000000
 [  314.199156] x14: 0000000000000000 x13: b2fc982e03cc5168 x12: 0000000000000001
 [  314.206286] x11: ffffff8103f8bcc0 x10: ffffffec1f196868 x9 : ffffffec1dac3874
 [  314.213416] x8 : 0000000000000000 x7 : 0000000000042a3a x6 : ffffff810017a180
 [  314.220547] x5 : ffffffec1ebad400 x4 : ffffffec1ebad320 x3 : 00000000000bebeb
 [  314.227677] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000
 [  314.234807] Call trace:
 [  314.237243]  v3d_irq+0xec/0x2e0 [v3d]
 [  314.240906]  __handle_irq_event_percpu+0x58/0x218
 [  314.245609]  handle_irq_event+0x54/0xb8
 [  314.249439]  handle_fasteoi_irq+0xac/0x240
 [  314.253527]  handle_irq_desc+0x48/0x68
 [  314.257269]  generic_handle_domain_irq+0x24/0x38
 [  314.261879]  gic_handle_irq+0x48/0xd8
 [  314.265533]  call_on_irq_stack+0x24/0x58
 [  314.269448]  do_interrupt_handler+0x88/0x98
 [  314.273624]  el1_interrupt+0x34/0x68
 [  314.277193]  el1h_64_irq_handler+0x18/0x28
 [  314.281281]  el1h_64_irq+0x64/0x68
 [  314.284673]  default_idle_call+0x3c/0x168
 [  314.288675]  do_idle+0x1fc/0x230
 [  314.291895]  cpu_startup_entry+0x3c/0x50
 [  314.295810]  rest_init+0xe4/0xf0
 [  314.299030]  start_kernel+0x5e8/0x790
 [  314.302684]  __primary_switched+0x80/0x90
 [  314.306691] Code: 940029eb 360ffc13 f9442ea0 52800001 (f9406017)
 [  314.312775] ---[ end trace 0000000000000000 ]---
 [  314.317384] Kernel panic - not syncing: Oops: Fatal exception in interrupt
 [  314.324249] SMP: stopping secondary CPUs
 [  314.328167] Kernel Offset: 0x2b9da00000 from 0xffffffc080000000
 [  314.334076] PHYS_OFFSET: 0x0
 [  314.336946] CPU features: 0x08,00002013,c0200000,0200421b
 [  314.342337] Memory Limit: none
 [  314.345382] ---[ end Kernel panic - not syncing: Oops: Fatal exception in interrupt ]---

Before resetting the GPU, it's necessary to disable all interrupts and
deal with any interrupt handler still in-flight. Otherwise, the GPU might
reset with jobs still running, or yet, an interrupt could be handled
during the reset.

Cc: stable@vger.kernel.org
Fixes: 57692c9 ("drm/v3d: Introduce a new DRM driver for Broadcom V3D V3.x+")
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Link: https://lore.kernel.org/r/20250628224243.47599-1-mcanal@igalia.com
Signed-off-by: Maíra Canal <mcanal@igalia.com>
mdroth pushed a commit that referenced this pull request Oct 10, 2025
…terface

collect_md property on xfrm interfaces can only be set on device creation,
thus xfrmi_changelink() should fail when called on such interfaces.

The check to enforce this was done only in the case where the xi was
returned from xfrmi_locate() which doesn't look for the collect_md
interface, and thus the validation was never reached.

Calling changelink would thus errornously place the special interface xi
in the xfrmi_net->xfrmi hash, but since it also exists in the
xfrmi_net->collect_md_xfrmi pointer it would lead to a double free when
the net namespace was taken down [1].

Change the check to use the xi from netdev_priv which is available earlier
in the function to prevent changes in xfrm collect_md interfaces.

[1] resulting oops:
[    8.516540] kernel BUG at net/core/dev.c:12029!
[    8.516552] Oops: invalid opcode: 0000 [#1] SMP NOPTI
[    8.516559] CPU: 0 UID: 0 PID: 12 Comm: kworker/u80:0 Not tainted 6.15.0-virtme #5 PREEMPT(voluntary)
[    8.516565] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[    8.516569] Workqueue: netns cleanup_net
[    8.516579] RIP: 0010:unregister_netdevice_many_notify+0x101/0xab0
[    8.516590] Code: 90 0f 0b 90 48 8b b0 78 01 00 00 48 8b 90 80 01 00 00 48 89 56 08 48 89 32 4c 89 80 78 01 00 00 48 89 b8 80 01 00 00 eb ac 90 <0f> 0b 48 8b 45 00 4c 8d a0 88 fe ff ff 48 39 c5 74 5c 41 80 bc 24
[    8.516593] RSP: 0018:ffffa93b8006bd30 EFLAGS: 00010206
[    8.516598] RAX: ffff98fe4226e000 RBX: ffffa93b8006bd58 RCX: ffffa93b8006bc60
[    8.516601] RDX: 0000000000000004 RSI: 0000000000000000 RDI: dead000000000122
[    8.516603] RBP: ffffa93b8006bdd8 R08: dead000000000100 R09: ffff98fe4133c100
[    8.516605] R10: 0000000000000000 R11: 00000000000003d2 R12: ffffa93b8006be00
[    8.516608] R13: ffffffff96c1a510 R14: ffffffff96c1a510 R15: ffffa93b8006be00
[    8.516615] FS:  0000000000000000(0000) GS:ffff98fee73b7000(0000) knlGS:0000000000000000
[    8.516619] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    8.516622] CR2: 00007fcd2abd0700 CR3: 000000003aa40000 CR4: 0000000000752ef0
[    8.516625] PKRU: 55555554
[    8.516627] Call Trace:
[    8.516632]  <TASK>
[    8.516635]  ? rtnl_is_locked+0x15/0x20
[    8.516641]  ? unregister_netdevice_queue+0x29/0xf0
[    8.516650]  ops_undo_list+0x1f2/0x220
[    8.516659]  cleanup_net+0x1ad/0x2e0
[    8.516664]  process_one_work+0x160/0x380
[    8.516673]  worker_thread+0x2aa/0x3c0
[    8.516679]  ? __pfx_worker_thread+0x10/0x10
[    8.516686]  kthread+0xfb/0x200
[    8.516690]  ? __pfx_kthread+0x10/0x10
[    8.516693]  ? __pfx_kthread+0x10/0x10
[    8.516697]  ret_from_fork+0x82/0xf0
[    8.516705]  ? __pfx_kthread+0x10/0x10
[    8.516709]  ret_from_fork_asm+0x1a/0x30
[    8.516718]  </TASK>

Fixes: abc340b ("xfrm: interface: support collect metadata mode")
Reported-by: Lonial Con <kongln9170@gmail.com>
Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
mdroth pushed a commit that referenced this pull request Oct 10, 2025
Since we're accessing btree_trans objects owned by another thread, we
need to guard against using pointers to freed key cache entries: we need
our own srcu read lock, and we should skip a btree_trans if it didn't
hold the srcu lock (and thus it might have pointers to freed key cache
entries).

00693 Mem abort info:
00693   ESR = 0x0000000096000005
00693   EC = 0x25: DABT (current EL), IL = 32 bits
00693   SET = 0, FnV = 0
00693   EA = 0, S1PTW = 0
00693   FSC = 0x05: level 1 translation fault
00693 Data abort info:
00693   ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
00693   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
00693   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
00693 user pgtable: 4k pages, 39-bit VAs, pgdp=000000012e650000
00693 [000000008fb96218] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000
00693 Internal error: Oops: 0000000096000005 [#1]  SMP
00693 Modules linked in:
00693 CPU: 0 UID: 0 PID: 4307 Comm: cat Not tainted 6.16.0-rc2-ktest-g9e15af94fd86 #27578 NONE
00693 Hardware name: linux,dummy-virt (DT)
00693 pstate: 60001005 (nZCv daif -PAN -UAO -TCO -DIT +SSBS BTYPE=--)
00693 pc : six_lock_counts+0x20/0xe8
00693 lr : bch2_btree_bkey_cached_common_to_text+0x38/0x130
00693 sp : ffffff80ca98bb60
00693 x29: ffffff80ca98bb60 x28: 000000008fb96200 x27: 0000000000000007
00693 x26: ffffff80eafd06b8 x25: 0000000000000000 x24: ffffffc080d75a60
00693 x23: ffffff80eafd0000 x22: ffffffc080bdfcc0 x21: ffffff80eafd0210
00693 x20: ffffff80c192ff08 x19: 000000008fb96200 x18: 00000000ffffffff
00693 x17: 0000000000000000 x16: 0000000000000000 x15: 00000000ffffffff
00693 x14: 0000000000000000 x13: ffffff80ceb5a29a x12: 20796220646c6568
00693 x11: 72205d3e303c5b20 x10: 0000000000000020 x9 : ffffffc0805fb6b0
00693 x8 : 0000000000000020 x7 : 0000000000000000 x6 : 0000000000000020
00693 x5 : ffffff80ceb5a29c x4 : 0000000000000001 x3 : 000000000000029c
00693 x2 : 0000000000000000 x1 : ffffff80ef66c000 x0 : 000000008fb96200
00693 Call trace:
00693  six_lock_counts+0x20/0xe8 (P)
00693  bch2_btree_bkey_cached_common_to_text+0x38/0x130
00693  bch2_btree_trans_to_text+0x260/0x2a8
00693  bch2_btree_transactions_read+0xac/0x1e8
00693  full_proxy_read+0x74/0xd8
00693  vfs_read+0x90/0x300
00693  ksys_read+0x6c/0x108
00693  __arm64_sys_read+0x20/0x30
00693  invoke_syscall.constprop.0+0x54/0xe8
00693  do_el0_svc+0x44/0xc8
00693  el0_svc+0x18/0x58
00693  el0t_64_sync_handler+0x104/0x130
00693  el0t_64_sync+0x154/0x158
00693 Code: 910003fd f9423c22 f90017e2 d2800002 (f9400c01)
00693 ---[ end trace 0000000000000000 ]---

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
mdroth pushed a commit that referenced this pull request Oct 10, 2025
Initialize u64 stats as it uses seq counter on 32bit machines
as suggested by lockdep below.

[    1.830953][    T1] INFO: trying to register non-static key.
[    1.830993][    T1] The code is fine but needs lockdep annotation, or maybe
[    1.831027][    T1] you didn't initialize this object before use?
[    1.831057][    T1] turning off the locking correctness validator.
[    1.831090][    T1] CPU: 1 UID: 0 PID: 1 Comm: swapper/0 Tainted: G        W           6.16.0-rc2-v7l+ #1 PREEMPT
[    1.831097][    T1] Tainted: [W]=WARN
[    1.831099][    T1] Hardware name: BCM2711
[    1.831101][    T1] Call trace:
[    1.831104][    T1]  unwind_backtrace from show_stack+0x18/0x1c
[    1.831120][    T1]  show_stack from dump_stack_lvl+0x8c/0xcc
[    1.831129][    T1]  dump_stack_lvl from register_lock_class+0x9e8/0x9fc
[    1.831141][    T1]  register_lock_class from __lock_acquire+0x420/0x22c0
[    1.831154][    T1]  __lock_acquire from lock_acquire+0x130/0x3f8
[    1.831166][    T1]  lock_acquire from bcmgenet_get_stats64+0x4a4/0x4c8
[    1.831176][    T1]  bcmgenet_get_stats64 from dev_get_stats+0x4c/0x408
[    1.831184][    T1]  dev_get_stats from rtnl_fill_stats+0x38/0x120
[    1.831193][    T1]  rtnl_fill_stats from rtnl_fill_ifinfo+0x7f8/0x1890
[    1.831203][    T1]  rtnl_fill_ifinfo from rtmsg_ifinfo_build_skb+0xd0/0x138
[    1.831214][    T1]  rtmsg_ifinfo_build_skb from rtmsg_ifinfo+0x48/0x8c
[    1.831225][    T1]  rtmsg_ifinfo from register_netdevice+0x8c0/0x95c
[    1.831237][    T1]  register_netdevice from register_netdev+0x28/0x40
[    1.831247][    T1]  register_netdev from bcmgenet_probe+0x690/0x6bc
[    1.831255][    T1]  bcmgenet_probe from platform_probe+0x64/0xbc
[    1.831263][    T1]  platform_probe from really_probe+0xd0/0x2d4
[    1.831269][    T1]  really_probe from __driver_probe_device+0x90/0x1a4
[    1.831273][    T1]  __driver_probe_device from driver_probe_device+0x38/0x11c
[    1.831278][    T1]  driver_probe_device from __driver_attach+0x9c/0x18c
[    1.831282][    T1]  __driver_attach from bus_for_each_dev+0x84/0xd4
[    1.831291][    T1]  bus_for_each_dev from bus_add_driver+0xd4/0x1f4
[    1.831303][    T1]  bus_add_driver from driver_register+0x88/0x120
[    1.831312][    T1]  driver_register from do_one_initcall+0x78/0x360
[    1.831320][    T1]  do_one_initcall from kernel_init_freeable+0x2bc/0x314
[    1.831331][    T1]  kernel_init_freeable from kernel_init+0x1c/0x144
[    1.831339][    T1]  kernel_init from ret_from_fork+0x14/0x20
[    1.831344][    T1] Exception stack(0xf082dfb0 to 0xf082dff8)
[    1.831349][    T1] dfa0:                                     00000000 00000000 00000000 00000000
[    1.831353][    T1] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[    1.831356][    T1] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000

Fixes: 59aa6e3 ("net: bcmgenet: switch to use 64bit statistics")
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Ryo Takakura <ryotkkr98@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250702092417.46486-1-ryotkkr98@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
mdroth pushed a commit that referenced this pull request Oct 10, 2025
Transport assignment may race with module unload. Protect new_transport
from becoming a stale pointer.

This also takes care of an insecure call in vsock_use_local_transport();
add a lockdep assert.

BUG: unable to handle page fault for address: fffffbfff8056000
Oops: Oops: 0000 [#1] SMP KASAN
RIP: 0010:vsock_assign_transport+0x366/0x600
Call Trace:
 vsock_connect+0x59c/0xc40
 __sys_connect+0xe8/0x100
 __x64_sys_connect+0x6e/0xc0
 do_syscall_64+0x92/0x1c0
 entry_SYSCALL_64_after_hwframe+0x4b/0x53

Fixes: c0cfa2d ("vsock: add multi-transports support")
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Link: https://patch.msgid.link/20250703-vsock-transports-toctou-v4-2-98f0eb530747@rbox.co
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
mdroth pushed a commit that referenced this pull request Oct 10, 2025
syzbot reported the splat below. [0]

This happens if we call ioctl(ATMARP_MKIP) more than once.

During the first call, clip_mkip() sets clip_push() to vcc->push(),
and the second call copies it to clip_vcc->old_push().

Later, when the socket is close()d, vcc_destroy_socket() passes
NULL skb to clip_push(), which calls clip_vcc->old_push(),
triggering the infinite recursion.

Let's prevent the second ioctl(ATMARP_MKIP) by checking
vcc->user_back, which is allocated by the first call as clip_vcc.

Note also that we use lock_sock() to prevent racy calls.

[0]:
BUG: TASK stack guard page was hit at ffffc9000d66fff8 (stack is ffffc9000d670000..ffffc9000d678000)
Oops: stack guard page: 0000 [#1] SMP KASAN NOPTI
CPU: 0 UID: 0 PID: 5322 Comm: syz.0.0 Not tainted 6.16.0-rc4-syzkaller #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
RIP: 0010:clip_push+0x5/0x720 net/atm/clip.c:191
Code: e0 8f aa 8c e8 1c ad 5b fa eb ae 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 55 <41> 57 41 56 41 55 41 54 53 48 83 ec 20 48 89 f3 49 89 fd 48 bd 00
RSP: 0018:ffffc9000d670000 EFLAGS: 00010246
RAX: 1ffff1100235a4a5 RBX: ffff888011ad2508 RCX: ffff8880003c0000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff888037f01000
RBP: dffffc0000000000 R08: ffffffff8fa104f7 R09: 1ffffffff1f4209e
R10: dffffc0000000000 R11: ffffffff8a99b300 R12: ffffffff8a99b300
R13: ffff888037f01000 R14: ffff888011ad2500 R15: ffff888037f01578
FS:  000055557ab6d500(0000) GS:ffff88808d250000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffc9000d66fff8 CR3: 0000000043172000 CR4: 0000000000352ef0
Call Trace:
 <TASK>
 clip_push+0x6dc/0x720 net/atm/clip.c:200
 clip_push+0x6dc/0x720 net/atm/clip.c:200
 clip_push+0x6dc/0x720 net/atm/clip.c:200
...
 clip_push+0x6dc/0x720 net/atm/clip.c:200
 clip_push+0x6dc/0x720 net/atm/clip.c:200
 clip_push+0x6dc/0x720 net/atm/clip.c:200
 vcc_destroy_socket net/atm/common.c:183 [inline]
 vcc_release+0x157/0x460 net/atm/common.c:205
 __sock_release net/socket.c:647 [inline]
 sock_close+0xc0/0x240 net/socket.c:1391
 __fput+0x449/0xa70 fs/file_table.c:465
 task_work_run+0x1d1/0x260 kernel/task_work.c:227
 resume_user_mode_work include/linux/resume_user_mode.h:50 [inline]
 exit_to_user_mode_loop+0xec/0x110 kernel/entry/common.c:114
 exit_to_user_mode_prepare include/linux/entry-common.h:330 [inline]
 syscall_exit_to_user_mode_work include/linux/entry-common.h:414 [inline]
 syscall_exit_to_user_mode include/linux/entry-common.h:449 [inline]
 do_syscall_64+0x2bd/0x3b0 arch/x86/entry/syscall_64.c:100
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7ff31c98e929
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fffb5aa1f78 EFLAGS: 00000246 ORIG_RAX: 00000000000001b4
RAX: 0000000000000000 RBX: 0000000000012747 RCX: 00007ff31c98e929
RDX: 0000000000000000 RSI: 000000000000001e RDI: 0000000000000003
RBP: 00007ff31cbb7ba0 R08: 0000000000000001 R09: 0000000db5aa226f
R10: 00007ff31c7ff030 R11: 0000000000000246 R12: 00007ff31cbb608c
R13: 00007ff31cbb6080 R14: ffffffffffffffff R15: 00007fffb5aa2090
 </TASK>
Modules linked in:

Fixes: 1da177e ("Linux-2.6.12-rc2")
Reported-by: syzbot+0c77cccd6b7cd917b35a@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=2371d94d248d126c1eb1
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250704062416.1613927-4-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
mdroth pushed a commit that referenced this pull request Oct 10, 2025
atmarpd_dev_ops does not implement the send method, which may cause crash
as bellow.

BUG: kernel NULL pointer dereference, address: 0000000000000000
PGD 0 P4D 0
Oops: Oops: 0010 [#1] SMP KASAN NOPTI
CPU: 0 UID: 0 PID: 5324 Comm: syz.0.0 Not tainted 6.15.0-rc6-syzkaller-00346-g5723cc3450bc #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
RIP: 0010:0x0
Code: Unable to access opcode bytes at 0xffffffffffffffd6.
RSP: 0018:ffffc9000d3cf778 EFLAGS: 00010246
RAX: 1ffffffff1910dd1 RBX: 00000000000000c0 RCX: dffffc0000000000
RDX: ffffc9000dc82000 RSI: ffff88803e4c4640 RDI: ffff888052cd0000
RBP: ffffc9000d3cf8d0 R08: ffff888052c9143f R09: 1ffff1100a592287
R10: dffffc0000000000 R11: 0000000000000000 R12: 1ffff92001a79f00
R13: ffff888052cd0000 R14: ffff88803e4c4640 R15: ffffffff8c886e88
FS:  00007fbc762566c0(0000) GS:ffff88808d6c2000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffffffffd6 CR3: 0000000041f1b000 CR4: 0000000000352ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 vcc_sendmsg+0xa10/0xc50 net/atm/common.c:644
 sock_sendmsg_nosec net/socket.c:712 [inline]
 __sock_sendmsg+0x219/0x270 net/socket.c:727
 ____sys_sendmsg+0x52d/0x830 net/socket.c:2566
 ___sys_sendmsg+0x21f/0x2a0 net/socket.c:2620
 __sys_sendmmsg+0x227/0x430 net/socket.c:2709
 __do_sys_sendmmsg net/socket.c:2736 [inline]
 __se_sys_sendmmsg net/socket.c:2733 [inline]
 __x64_sys_sendmmsg+0xa0/0xc0 net/socket.c:2733
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xf6/0x210 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Fixes: 1da177e ("Linux-2.6.12-rc2")
Reported-by: syzbot+e34e5e6b5eddb0014def@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/682f82d5.a70a0220.1765ec.0143.GAE@google.com/T
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250705085228.329202-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
mdroth pushed a commit that referenced this pull request Oct 10, 2025
alloc_tag_top_users() attempts to lock alloc_tag_cttype->mod_lock even
when the alloc_tag_cttype is not allocated because:

  1) alloc tagging is disabled because mem profiling is disabled
     (!alloc_tag_cttype)
  2) alloc tagging is enabled, but not yet initialized (!alloc_tag_cttype)
  3) alloc tagging is enabled, but failed initialization
     (!alloc_tag_cttype or IS_ERR(alloc_tag_cttype))

In all cases, alloc_tag_cttype is not allocated, and therefore
alloc_tag_top_users() should not attempt to acquire the semaphore.

This leads to a crash on memory allocation failure by attempting to
acquire a non-existent semaphore:

  Oops: general protection fault, probably for non-canonical address 0xdffffc000000001b: 0000 [#3] SMP KASAN NOPTI
  KASAN: null-ptr-deref in range [0x00000000000000d8-0x00000000000000df]
  CPU: 2 UID: 0 PID: 1 Comm: systemd Tainted: G      D             6.16.0-rc2 #1 VOLUNTARY
  Tainted: [D]=DIE
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
  RIP: 0010:down_read_trylock+0xaa/0x3b0
  Code: d0 7c 08 84 d2 0f 85 a0 02 00 00 8b 0d df 31 dd 04 85 c9 75 29 48 b8 00 00 00 00 00 fc ff df 48 8d 6b 68 48 89 ea 48 c1 ea 03 <80> 3c 02 00 0f 85 88 02 00 00 48 3b 5b 68 0f 85 53 01 00 00 65 ff
  RSP: 0000:ffff8881002ce9b8 EFLAGS: 00010016
  RAX: dffffc0000000000 RBX: 0000000000000070 RCX: 0000000000000000
  RDX: 000000000000001b RSI: 000000000000000a RDI: 0000000000000070
  RBP: 00000000000000d8 R08: 0000000000000001 R09: ffffed107dde49d1
  R10: ffff8883eef24e8b R11: ffff8881002cec20 R12: 1ffff11020059d37
  R13: 00000000003fff7b R14: ffff8881002cec20 R15: dffffc0000000000
  FS:  00007f963f21d940(0000) GS:ffff888458ca6000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007f963f5edf71 CR3: 000000010672c000 CR4: 0000000000350ef0
  Call Trace:
   <TASK>
   codetag_trylock_module_list+0xd/0x20
   alloc_tag_top_users+0x369/0x4b0
   __show_mem+0x1cd/0x6e0
   warn_alloc+0x2b1/0x390
   __alloc_frozen_pages_noprof+0x12b9/0x21a0
   alloc_pages_mpol+0x135/0x3e0
   alloc_slab_page+0x82/0xe0
   new_slab+0x212/0x240
   ___slab_alloc+0x82a/0xe00
   </TASK>

As David Wang points out, this issue became easier to trigger after commit
780138b ("alloc_tag: check mem_profiling_support in alloc_tag_init").

Before the commit, the issue occurred only when it failed to allocate and
initialize alloc_tag_cttype or if a memory allocation fails before
alloc_tag_init() is called.  After the commit, it can be easily triggered
when memory profiling is compiled but disabled at boot.

To properly determine whether alloc_tag_init() has been called and its
data structures initialized, verify that alloc_tag_cttype is a valid
pointer before acquiring the semaphore.  If the variable is NULL or an
error value, it has not been properly initialized.  In such a case, just
skip and do not attempt to acquire the semaphore.

[harry.yoo@oracle.com: v3]
  Link: https://lkml.kernel.org/r/20250624072513.84219-1-harry.yoo@oracle.com
Link: https://lkml.kernel.org/r/20250620195305.1115151-1-harry.yoo@oracle.com
Fixes: 780138b ("alloc_tag: check mem_profiling_support in alloc_tag_init")
Fixes: 1438d34 ("lib: add memory allocations report in show_mem()")
Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202506181351.bba867dd-lkp@intel.com
Acked-by: Suren Baghdasaryan <surenb@google.com>
Tested-by: Raghavendra K T <raghavendra.kt@amd.com>
Cc: Casey Chen <cachen@purestorage.com>
Cc: David Wang <00107082@163.com>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Yuanyuan Zhong <yzhong@purestorage.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
mdroth pushed a commit that referenced this pull request Oct 10, 2025
There are cases when we try to pin a folio but discover that it has not
been faulted-in.  So, we try to allocate it in memfd_alloc_folio() but
there is a chance that we might encounter a fatal crash/failure
(VM_BUG_ON(!h->resv_huge_pages) in alloc_hugetlb_folio_reserve()) if there
are no active reservations at that instant.  This issue was reported by
syzbot:

kernel BUG at mm/hugetlb.c:2403!
Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI
CPU: 0 UID: 0 PID: 5315 Comm: syz.0.0 Not tainted
6.13.0-rc5-syzkaller-00161-g63676eefb7a0 #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
RIP: 0010:alloc_hugetlb_folio_reserve+0xbc/0xc0 mm/hugetlb.c:2403
Code: 1f eb 05 e8 56 18 a0 ff 48 c7 c7 40 56 61 8e e8 ba 21 cc 09 4c 89
f0 5b 41 5c 41 5e 41 5f 5d c3 cc cc cc cc e8 35 18 a0 ff 90 <0f> 0b 66
90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f
RSP: 0018:ffffc9000d3d77f8 EFLAGS: 00010087
RAX: ffffffff81ff6beb RBX: 0000000000000000 RCX: 0000000000100000
RDX: ffffc9000e51a000 RSI: 00000000000003ec RDI: 00000000000003ed
RBP: 1ffffffff34810d9 R08: ffffffff81ff6ba3 R09: 1ffffd4000093005
R10: dffffc0000000000 R11: fffff94000093006 R12: dffffc0000000000
R13: dffffc0000000000 R14: ffffea0000498000 R15: ffffffff9a4086c8
FS:  00007f77ac12e6c0(0000) GS:ffff88801fc00000(0000)
knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f77ab54b170 CR3: 0000000040b70000 CR4: 0000000000352ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 memfd_alloc_folio+0x1bd/0x370 mm/memfd.c:88
 memfd_pin_folios+0xf10/0x1570 mm/gup.c:3750
 udmabuf_pin_folios drivers/dma-buf/udmabuf.c:346 [inline]
 udmabuf_create+0x70e/0x10c0 drivers/dma-buf/udmabuf.c:443
 udmabuf_ioctl_create drivers/dma-buf/udmabuf.c:495 [inline]
 udmabuf_ioctl+0x301/0x4e0 drivers/dma-buf/udmabuf.c:526
 vfs_ioctl fs/ioctl.c:51 [inline]
 __do_sys_ioctl fs/ioctl.c:906 [inline]
 __se_sys_ioctl+0xf5/0x170 fs/ioctl.c:892
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Therefore, prevent the above crash by removing the VM_BUG_ON() as there is
no need to crash the system in this situation and instead we could just
fail the allocation request.

Furthermore, as described above, the specific situation where this happens
is when we try to pin memfd folios before they are faulted-in.  Although,
this is a valid thing to do, it is not the regular or the common use-case.
Let us consider the following scenarios:

1) hugetlbfs_file_mmap()
    memfd_alloc_folio()
    hugetlb_fault()

2) memfd_alloc_folio()
    hugetlbfs_file_mmap()
    hugetlb_fault()

3) hugetlbfs_file_mmap()
    hugetlb_fault()
        alloc_hugetlb_folio()

3) is the most common use-case where first a memfd is allocated followed
by mmap(), user writes/updates and then the relevant folios are pinned
(memfd_pin_folios()).  The BUG this patch is fixing occurs in 2) because
we try to pin the folios before hugetlbfs_file_mmap() is called.  So, in
this situation we try to allocate the folios before pinning them but since
we did not make any reservations, resv_huge_pages would be 0, leading to
this issue.

Link: https://lkml.kernel.org/r/20250626191116.1377761-1-vivek.kasireddy@intel.com
Fixes: 26a8ea8 ("mm/hugetlb: fix memfd_pin_folios resv_huge_pages leak")
Reported-by: syzbot+a504cb5bae4fe117ba94@syzkaller.appspotmail.com
Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
Closes: https://syzkaller.appspot.com/bug?extid=a504cb5bae4fe117ba94
Closes: https://lore.kernel.org/all/677928b5.050a0220.3b53b0.004d.GAE@google.com/T/
Acked-by: Oscar Salvador <osalvador@suse.de>
Cc: Steve Sistare <steven.sistare@oracle.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: David Hildenbrand <david@redhat.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
mdroth pushed a commit that referenced this pull request Oct 10, 2025
Patch series "mm/damon: fix divide by zero and its samples", v3.

This series includes fixes against damon and its samples to make it safer
when damon sample starting fails.

It includes the following changes.
- fix unexpected divide by zero crash for zero size regions
- fix bugs for damon samples in case of start failures


This patch (of 4):

The damon_sample_prcl_start() can fail so we must reset the "enable"
parameter to "false" again for proper rollback.

In such cases, setting Y to "enable" then N triggers the following crash
because damon sample start failed but the "enable" stays as Y.

  [ 2441.419649] damon_sample_prcl: start
  [ 2454.146817] damon_sample_prcl: stop
  [ 2454.146862] ------------[ cut here ]------------
  [ 2454.146865] kernel BUG at mm/slub.c:546!
  [ 2454.148183] Oops: invalid opcode: 0000 [#1] SMP NOPTI
  	...
  [ 2454.167555] Call Trace:
  [ 2454.167822]  <TASK>
  [ 2454.168061]  damon_destroy_ctx+0x78/0x140
  [ 2454.168454]  damon_sample_prcl_enable_store+0x8d/0xd0
  [ 2454.168932]  param_attr_store+0xa1/0x120
  [ 2454.169315]  module_attr_store+0x20/0x50
  [ 2454.169695]  sysfs_kf_write+0x72/0x90
  [ 2454.170065]  kernfs_fop_write_iter+0x150/0x1e0
  [ 2454.170491]  vfs_write+0x315/0x440
  [ 2454.170833]  ksys_write+0x69/0xf0
  [ 2454.171162]  __x64_sys_write+0x19/0x30
  [ 2454.171525]  x64_sys_call+0x18b2/0x2700
  [ 2454.171900]  do_syscall_64+0x7f/0x680
  [ 2454.172258]  ? exit_to_user_mode_loop+0xf6/0x180
  [ 2454.172694]  ? clear_bhb_loop+0x30/0x80
  [ 2454.173067]  ? clear_bhb_loop+0x30/0x80
  [ 2454.173439]  entry_SYSCALL_64_after_hwframe+0x76/0x7e

Link: https://lkml.kernel.org/r/20250702000205.1921-1-honggyu.kim@sk.com
Link: https://lkml.kernel.org/r/20250702000205.1921-2-honggyu.kim@sk.com
Fixes: 2aca254 ("samples/damon: introduce a skeleton of a smaple DAMON module for proactive reclamation")
Signed-off-by: Honggyu Kim <honggyu.kim@sk.com>
Reviewed-by: SeongJae Park <sj@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
mdroth pushed a commit that referenced this pull request Oct 10, 2025
The current implementation allows having zero size regions with no special
reasons, but damon_get_intervals_score() gets crashed by divide by zero
when the region size is zero.

  [   29.403950] Oops: divide error: 0000 [#1] SMP NOPTI

This patch fixes the bug, but does not disallow zero size regions to keep
the backward compatibility since disallowing zero size regions might be a
breaking change for some users.

In addition, the same crash can happen when intervals_goal.access_bp is
zero so this should be fixed in stable trees as well.

Link: https://lkml.kernel.org/r/20250702000205.1921-5-honggyu.kim@sk.com
Fixes: f04b0fe ("mm/damon/core: implement intervals auto-tuning")
Signed-off-by: Honggyu Kim <honggyu.kim@sk.com>
Reviewed-by: SeongJae Park <sj@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
mdroth pushed a commit that referenced this pull request Oct 10, 2025
If "try_verify_in_tasklet" is set for dm-verity, DM_BUFIO_CLIENT_NO_SLEEP
is enabled for dm-bufio. However, when bufio tries to evict buffers, there
is a chance to trigger scheduling in spin_lock_bh, the following warning
is hit:

BUG: sleeping function called from invalid context at drivers/md/dm-bufio.c:2745
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 123, name: kworker/2:2
preempt_count: 201, expected: 0
RCU nest depth: 0, expected: 0
4 locks held by kworker/2:2/123:
 #0: ffff88800a2d1548 ((wq_completion)dm_bufio_cache){....}-{0:0}, at: process_one_work+0xe46/0x1970
 #1: ffffc90000d97d20 ((work_completion)(&dm_bufio_replacement_work)){....}-{0:0}, at: process_one_work+0x763/0x1970
 #2: ffffffff8555b528 (dm_bufio_clients_lock){....}-{3:3}, at: do_global_cleanup+0x1ce/0x710
 #3: ffff88801d5820b8 (&c->spinlock){....}-{2:2}, at: do_global_cleanup+0x2a5/0x710
Preemption disabled at:
[<0000000000000000>] 0x0
CPU: 2 UID: 0 PID: 123 Comm: kworker/2:2 Not tainted 6.16.0-rc3-g90548c634bd0 torvalds#305 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
Workqueue: dm_bufio_cache do_global_cleanup
Call Trace:
 <TASK>
 dump_stack_lvl+0x53/0x70
 __might_resched+0x360/0x4e0
 do_global_cleanup+0x2f5/0x710
 process_one_work+0x7db/0x1970
 worker_thread+0x518/0xea0
 kthread+0x359/0x690
 ret_from_fork+0xf3/0x1b0
 ret_from_fork_asm+0x1a/0x30
 </TASK>

That can be reproduced by:

  veritysetup format --data-block-size=4096 --hash-block-size=4096 /dev/vda /dev/vdb
  SIZE=$(blockdev --getsz /dev/vda)
  dmsetup create myverity -r --table "0 $SIZE verity 1 /dev/vda /dev/vdb 4096 4096 <data_blocks> 1 sha256 <root_hash> <salt> 1 try_verify_in_tasklet"
  mount /dev/dm-0 /mnt -o ro
  echo 102400 > /sys/module/dm_bufio/parameters/max_cache_size_bytes
  [read files in /mnt]

Cc: stable@vger.kernel.org	# v6.4+
Fixes: 450e8de ("dm bufio: improve concurrent IO performance")
Signed-off-by: Wang Shuai <wangshuai12@xiaomi.com>
Signed-off-by: Sheng Yong <shengyong1@xiaomi.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
mdroth pushed a commit that referenced this pull request Oct 10, 2025
This marks 0x29 as accelerometer address on Dell Precision 3551.

I followed previous works of Paul Menzel and Hans de Goede to verify it:

$ cd /sys/bus/pci/drivers/i801_smbus/0000\:00\:1f.4

$ ls -d i2c-?
i2c-0

$ sudo modprobe i2c-dev

$ sudo i2cdetect 0
WARNING! This program can confuse your I2C bus, cause data loss and worse!
I will probe file /dev/i2c-0.
I will probe address range 0x08-0x77.
Continue? [Y/n] Y
     0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f
00:                         08 -- -- -- -- -- -- --
10: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
20: -- -- -- -- -- -- -- -- -- 29 -- -- -- -- -- --
30: 30 -- -- -- -- 35 UU UU -- -- -- -- -- -- -- --
40: -- -- -- -- 44 -- -- -- -- -- -- -- -- -- -- --
50: UU -- 52 -- -- -- -- -- -- -- -- -- -- -- -- --
60: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
70: -- -- -- -- -- -- -- --

$ echo lis3lv02d 0x29 > sudo tee /sys/bus/i2c/devices/i2c-0/new_device
lis3lv02d 0x29

$ sudo dmesg
[    0.000000] Linux version 6.12.28 (nixbld@localhost) (gcc (GCC) 14.2.1 20250322, GNU ld (GNU Binutils) 2.44) #1-NixOS SMP PREEMPT_DYNAMIC Fri May  9 07:50:53 UTC 2025
[...]
[    0.000000] DMI: Dell Inc. Precision 3551/07YHW8, BIOS 1.18.0 10/03/2022
[...]
[ 3749.077624] lis3lv02d_i2c 0-0029: supply Vdd not found, using dummy regulator
[ 3749.077732] lis3lv02d_i2c 0-0029: supply Vdd_IO not found, using dummy regulator
[ 3749.098674] lis3lv02d: 8 bits 3DC sensor found
[ 3749.182480] input: ST LIS3LV02DL Accelerometer as /devices/platform/lis3lv02d/input/input28
[ 3749.182899] i2c i2c-0: new_device: Instantiated device lis3lv02d at 0x29

Signed-off-by: Jan-Niklas Burfeind <kernel@aiyionpri.me>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Link: https://lore.kernel.org/r/20250710190919.37842-1-kernel@aiyionpri.me
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
mdroth pushed a commit that referenced this pull request Oct 10, 2025
syzbot reported weird splats [0][1] in cipso_v4_sock_setattr() while
freeing inet_sk(sk)->inet_opt.

The address was freed multiple times even though it was read-only memory.

cipso_v4_sock_setattr() did nothing wrong, and the root cause was type
confusion.

The cited commit made it possible to create smc_sock as an INET socket.

The issue is that struct smc_sock does not have struct inet_sock as the
first member but hijacks AF_INET and AF_INET6 sk_family, which confuses
various places.

In this case, inet_sock.inet_opt was actually smc_sock.clcsk_data_ready(),
which is an address of a function in the text segment.

  $ pahole -C inet_sock vmlinux
  struct inet_sock {
  ...
          struct ip_options_rcu *    inet_opt;             /*   784     8 */

  $ pahole -C smc_sock vmlinux
  struct smc_sock {
  ...
          void                       (*clcsk_data_ready)(struct sock *); /*   784     8 */

The same issue for another field was reported before. [2][3]

At that time, an ugly hack was suggested [4], but it makes both INET
and SMC code error-prone and hard to change.

Also, yet another variant was fixed by a hacky commit 98d4435
("net/smc: prevent NULL pointer dereference in txopt_get").

Instead of papering over the root cause by such hacks, we should not
allow non-INET socket to reuse the INET infra.

Let's add inet_sock as the first member of smc_sock.

[0]:
kvfree_call_rcu(): Double-freed call. rcu_head 000000006921da73
WARNING: CPU: 0 PID: 6718 at mm/slab_common.c:1956 kvfree_call_rcu+0x94/0x3f0 mm/slab_common.c:1955
Modules linked in:
CPU: 0 UID: 0 PID: 6718 Comm: syz.0.17 Tainted: G        W           6.16.0-rc4-syzkaller-g7482bb149b9f #0 PREEMPT
Tainted: [W]=WARN
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/07/2025
pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : kvfree_call_rcu+0x94/0x3f0 mm/slab_common.c:1955
lr : kvfree_call_rcu+0x94/0x3f0 mm/slab_common.c:1955
sp : ffff8000a03a7730
x29: ffff8000a03a7730 x28: 00000000fffffff5 x27: 1fffe000184823d3
x26: dfff800000000000 x25: ffff0000c2411e9e x24: ffff0000dd88da00
x23: ffff8000891ac9a0 x22: 00000000ffffffea x21: ffff8000891ac9a0
x20: ffff8000891ac9a0 x19: ffff80008afc2480 x18: 00000000ffffffff
x17: 0000000000000000 x16: ffff80008ae642c8 x15: ffff700011ede14c
x14: 1ffff00011ede14c x13: 0000000000000004 x12: ffffffffffffffff
x11: ffff700011ede14c x10: 0000000000ff0100 x9 : 5fa3c1ffaf0ff000
x8 : 5fa3c1ffaf0ff000 x7 : 0000000000000001 x6 : 0000000000000001
x5 : ffff8000a03a7078 x4 : ffff80008f766c20 x3 : ffff80008054d360
x2 : 0000000000000000 x1 : 0000000000000201 x0 : 0000000000000000
Call trace:
 kvfree_call_rcu+0x94/0x3f0 mm/slab_common.c:1955 (P)
 cipso_v4_sock_setattr+0x2f0/0x3f4 net/ipv4/cipso_ipv4.c:1914
 netlbl_sock_setattr+0x240/0x334 net/netlabel/netlabel_kapi.c:1000
 smack_netlbl_add+0xa8/0x158 security/smack/smack_lsm.c:2581
 smack_inode_setsecurity+0x378/0x430 security/smack/smack_lsm.c:2912
 security_inode_setsecurity+0x118/0x3c0 security/security.c:2706
 __vfs_setxattr_noperm+0x174/0x5c4 fs/xattr.c:251
 __vfs_setxattr_locked+0x1ec/0x218 fs/xattr.c:295
 vfs_setxattr+0x158/0x2ac fs/xattr.c:321
 do_setxattr fs/xattr.c:636 [inline]
 file_setxattr+0x1b8/0x294 fs/xattr.c:646
 path_setxattrat+0x2ac/0x320 fs/xattr.c:711
 __do_sys_fsetxattr fs/xattr.c:761 [inline]
 __se_sys_fsetxattr fs/xattr.c:758 [inline]
 __arm64_sys_fsetxattr+0xc0/0xdc fs/xattr.c:758
 __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
 invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:49
 el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:132
 do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:151
 el0_svc+0x58/0x180 arch/arm64/kernel/entry-common.c:879
 el0t_64_sync_handler+0x84/0x12c arch/arm64/kernel/entry-common.c:898
 el0t_64_sync+0x198/0x19c arch/arm64/kernel/entry.S:600

[1]:
Unable to handle kernel write to read-only memory at virtual address ffff8000891ac9a8
KASAN: probably user-memory-access in range [0x0000000448d64d40-0x0000000448d64d47]
Mem abort info:
  ESR = 0x000000009600004e
  EC = 0x25: DABT (current EL), IL = 32 bits
  SET = 0, FnV = 0
  EA = 0, S1PTW = 0
  FSC = 0x0e: level 2 permission fault
Data abort info:
  ISV = 0, ISS = 0x0000004e, ISS2 = 0x00000000
  CM = 0, WnR = 1, TnD = 0, TagAccess = 0
  GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000207144000
[ffff8000891ac9a8] pgd=0000000000000000, p4d=100000020f950003, pud=100000020f951003, pmd=0040000201000781
Internal error: Oops: 000000009600004e [#1]  SMP
Modules linked in:
CPU: 0 UID: 0 PID: 6946 Comm: syz.0.69 Not tainted 6.16.0-rc4-syzkaller-g7482bb149b9f #0 PREEMPT
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/07/2025
pstate: 604000c5 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : kvfree_call_rcu+0x31c/0x3f0 mm/slab_common.c:1971
lr : add_ptr_to_bulk_krc_lock mm/slab_common.c:1838 [inline]
lr : kvfree_call_rcu+0xfc/0x3f0 mm/slab_common.c:1963
sp : ffff8000a28a7730
x29: ffff8000a28a7730 x28: 00000000fffffff5 x27: 1fffe00018b09bb3
x26: 0000000000000001 x25: ffff80008f66e000 x24: ffff00019beaf498
x23: ffff00019beaf4c0 x22: 0000000000000000 x21: ffff8000891ac9a0
x20: ffff8000891ac9a0 x19: 0000000000000000 x18: 00000000ffffffff
x17: ffff800093363000 x16: ffff80008052c6e4 x15: ffff700014514ecc
x14: 1ffff00014514ecc x13: 0000000000000004 x12: ffffffffffffffff
x11: ffff700014514ecc x10: 0000000000000001 x9 : 0000000000000001
x8 : ffff00019beaf7b4 x7 : ffff800080a94154 x6 : 0000000000000000
x5 : ffff8000935efa60 x4 : 0000000000000008 x3 : ffff80008052c7fc
x2 : 0000000000000001 x1 : ffff8000891ac9a0 x0 : 0000000000000001
Call trace:
 kvfree_call_rcu+0x31c/0x3f0 mm/slab_common.c:1967 (P)
 cipso_v4_sock_setattr+0x2f0/0x3f4 net/ipv4/cipso_ipv4.c:1914
 netlbl_sock_setattr+0x240/0x334 net/netlabel/netlabel_kapi.c:1000
 smack_netlbl_add+0xa8/0x158 security/smack/smack_lsm.c:2581
 smack_inode_setsecurity+0x378/0x430 security/smack/smack_lsm.c:2912
 security_inode_setsecurity+0x118/0x3c0 security/security.c:2706
 __vfs_setxattr_noperm+0x174/0x5c4 fs/xattr.c:251
 __vfs_setxattr_locked+0x1ec/0x218 fs/xattr.c:295
 vfs_setxattr+0x158/0x2ac fs/xattr.c:321
 do_setxattr fs/xattr.c:636 [inline]
 file_setxattr+0x1b8/0x294 fs/xattr.c:646
 path_setxattrat+0x2ac/0x320 fs/xattr.c:711
 __do_sys_fsetxattr fs/xattr.c:761 [inline]
 __se_sys_fsetxattr fs/xattr.c:758 [inline]
 __arm64_sys_fsetxattr+0xc0/0xdc fs/xattr.c:758
 __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
 invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:49
 el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:132
 do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:151
 el0_svc+0x58/0x180 arch/arm64/kernel/entry-common.c:879
 el0t_64_sync_handler+0x84/0x12c arch/arm64/kernel/entry-common.c:898
 el0t_64_sync+0x198/0x19c arch/arm64/kernel/entry.S:600
Code: aa1f03e2 52800023 97ee1e8d b4000195 (f90006b4)

Fixes: d25a92c ("net/smc: Introduce IPPROTO_SMC")
Reported-by: syzbot+40bf00346c3fe40f90f2@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/686d9b50.050a0220.1ffab7.0020.GAE@google.com/
Tested-by: syzbot+40bf00346c3fe40f90f2@syzkaller.appspotmail.com
Reported-by: syzbot+f22031fad6cbe52c70e7@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/686da0f3.050a0220.1ffab7.0022.GAE@google.com/
Reported-by: syzbot+271fed3ed6f24600c364@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=271fed3ed6f24600c364 # [2]
Link: https://lore.kernel.org/netdev/99f284be-bf1d-4bc4-a629-77b268522fff@huawei.com/ # [3]
Link: https://lore.kernel.org/netdev/20250331081003.1503211-1-wangliang74@huawei.com/ # [4]
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: D. Wythe <alibuda@linux.alibaba.com>
Reviewed-by: Wang Liang <wangliang74@huawei.com>
Link: https://patch.msgid.link/20250711060808.2977529-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
mdroth pushed a commit that referenced this pull request Oct 10, 2025
The conversion from compiler assisted indexing to manual
indexing wasn't done correctly. The array is still made
up of __le16 elements so multiplying the outer index by
the element size is not what we want. Fix it up.

This causes the kernel to oops when trying to transfer any
significant amount of data over wifi:

BUG: unable to handle page fault for address: ffffc900009f5282
PGD 100000067 P4D 100000067 PUD 1000fb067 PMD 102e82067 PTE 0
Oops: Oops: 0002 [#1] SMP
CPU: 1 UID: 0 PID: 99 Comm: kworker/u8:3 Not tainted 6.15.0-rc2-cl-bisect3-00604-g6204d5130a64-dirty torvalds#78 PREEMPT
Hardware name: Dell Inc. Latitude E5400                  /0D695C, BIOS A19 06/13/2013
Workqueue: events_unbound cfg80211_wiphy_work [cfg80211]
RIP: 0010:iwl_trans_pcie_tx+0x4dd/0xe60 [iwlwifi]
Code: 00 00 66 81 fa ff 0f 0f 87 42 09 00 00 3d ff 00 00 00 0f 8f 37 09 00 00 41 c1 e0 0c 41 09 d0 48 8d 14 b6 48 c1 e2 07 48 01 ca <66> 44 89 04 57 48 8d 0c 12 83 f8 3f 0f 8e 84 01 00 00 41 8b 85 80
RSP: 0018:ffffc900001c3b50 EFLAGS: 00010206
RAX: 00000000000000c1 RBX: ffff88810b180028 RCX: 00000000000000c1
RDX: 0000000000002141 RSI: 000000000000000d RDI: ffffc900009f1000
RBP: 0000000000000002 R08: 0000000000000025 R09: ffffffffa050fa60
R10: 00000000fbdbf4bc R11: 0000000000000082 R12: ffff88810e5ade40
R13: ffff88810af81588 R14: 000000000000001a R15: ffff888100dfe0c8
FS:  0000000000000000(0000) GS:ffff8881998c3000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffc900009f5282 CR3: 0000000001e39000 CR4: 00000000000426f0
Call Trace:
 <TASK>
 ? rcu_is_watching+0xd/0x40
 ? __iwl_dbg+0xb1/0xe0 [iwlwifi]
 iwlagn_tx_skb+0x8e2/0xcb0 [iwldvm]
 iwlagn_mac_tx+0x18/0x30 [iwldvm]
 ieee80211_handle_wake_tx_queue+0x6c/0xc0 [mac80211]
 ieee80211_agg_start_txq+0x140/0x2e0 [mac80211]
 ieee80211_agg_tx_operational+0x126/0x210 [mac80211]
 ieee80211_process_addba_resp+0x27b/0x2a0 [mac80211]
 ieee80211_iface_work+0x4bd/0x4d0 [mac80211]
 ? _raw_spin_unlock_irq+0x1f/0x40
 cfg80211_wiphy_work+0x117/0x1f0 [cfg80211]
 process_one_work+0x1ee/0x570
 worker_thread+0x1c5/0x3b0
 ? bh_worker+0x240/0x240
 kthread+0x110/0x220
 ? kthread_queue_delayed_work+0x90/0x90
 ret_from_fork+0x28/0x40
 ? kthread_queue_delayed_work+0x90/0x90
 ret_from_fork_asm+0x11/0x20
 </TASK>
Modules linked in: ctr aes_generic ccm sch_fq_codel bnep xt_tcpudp xt_multiport xt_state iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv4 ip_tables x_tables btusb btrtl btintel btbcm bluetooth ecdh_generic ecc libaes hid_generic usbhid hid binfmt_misc joydev mousedev snd_hda_codec_hdmi iwldvm snd_hda_codec_idt snd_hda_codec_generic mac80211 coretemp iTCO_wdt watchdog kvm_intel i2c_dev snd_hda_intel libarc4 kvm snd_intel_dspcfg sdhci_pci sdhci_uhs2 snd_hda_codec iwlwifi sdhci irqbypass cqhci snd_hwdep snd_hda_core cfg80211 firewire_ohci mmc_core psmouse snd_pcm i2c_i801 firewire_core pcspkr led_class uhci_hcd i2c_smbus tg3 crc_itu_t iosf_mbi snd_timer rfkill libphy ehci_pci snd ehci_hcd lpc_ich mfd_core usbcore video intel_agp usb_common soundcore intel_gtt evdev agpgart parport_pc wmi parport backlight
CR2: ffffc900009f5282
---[ end trace 0000000000000000 ]---
RIP: 0010:iwl_trans_pcie_tx+0x4dd/0xe60 [iwlwifi]
Code: 00 00 66 81 fa ff 0f 0f 87 42 09 00 00 3d ff 00 00 00 0f 8f 37 09 00 00 41 c1 e0 0c 41 09 d0 48 8d 14 b6 48 c1 e2 07 48 01 ca <66> 44 89 04 57 48 8d 0c 12 83 f8 3f 0f 8e 84 01 00 00 41 8b 85 80
RSP: 0018:ffffc900001c3b50 EFLAGS: 00010206
RAX: 00000000000000c1 RBX: ffff88810b180028 RCX: 00000000000000c1
RDX: 0000000000002141 RSI: 000000000000000d RDI: ffffc900009f1000
RBP: 0000000000000002 R08: 0000000000000025 R09: ffffffffa050fa60
R10: 00000000fbdbf4bc R11: 0000000000000082 R12: ffff88810e5ade40
R13: ffff88810af81588 R14: 000000000000001a R15: ffff888100dfe0c8
FS:  0000000000000000(0000) GS:ffff8881998c3000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffc900009f5282 CR3: 0000000001e39000 CR4: 00000000000426f0
Kernel panic - not syncing: Fatal exception in interrupt
Kernel Offset: disabled
---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---

Cc: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Fixes: 6204d51 ("wifi: iwlwifi: use bc entries instead of bc table also for pre-ax210")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patch.msgid.link/20250711205744.28723-1-ville.syrjala@linux.intel.com
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
mdroth pushed a commit that referenced this pull request Oct 10, 2025
This reverts commit 7796c97.

This patch broke Dragonboard 845c (sdm845). I see:

    Unexpected kernel BRK exception at EL1
    Internal error: BRK handler: 00000000f20003e8 [#1]  SMP
    pc : qcom_swrm_set_channel_map+0x7c/0x80 [soundwire_qcom]
    lr : snd_soc_dai_set_channel_map+0x34/0x78
    Call trace:
     qcom_swrm_set_channel_map+0x7c/0x80 [soundwire_qcom] (P)
     sdm845_dai_init+0x18c/0x2e0 [snd_soc_sdm845]
     snd_soc_link_init+0x28/0x6c
     snd_soc_bind_card+0x5f4/0xb0c
     snd_soc_register_card+0x148/0x1a4
     devm_snd_soc_register_card+0x50/0xb0
     sdm845_snd_platform_probe+0x124/0x148 [snd_soc_sdm845]
     platform_probe+0x6c/0xd0
     really_probe+0xc0/0x2a4
     __driver_probe_device+0x7c/0x130
     driver_probe_device+0x40/0x118
     __device_attach_driver+0xc4/0x108
     bus_for_each_drv+0x8c/0xf0
     __device_attach+0xa4/0x198
     device_initial_probe+0x18/0x28
     bus_probe_device+0xb8/0xbc
     deferred_probe_work_func+0xac/0xfc
     process_one_work+0x244/0x658
     worker_thread+0x1b4/0x360
     kthread+0x148/0x228
     ret_from_fork+0x10/0x20
    Kernel panic - not syncing: BRK handler: Fatal exception

Dan has also reported following issues with the original patch
https://lore.kernel.org/all/33fe8fe7-719a-405a-9ed2-d9f816ce1d57@sabinyo.mountain/

Bug #1:
The zeroeth element of ctrl->pconfig[] is supposed to be unused.  We
start counting at 1.  However this code sets ctrl->pconfig[0].ch_mask = 128.

Bug #2:
There are SLIM_MAX_TX_PORTS (16) elements in tx_ch[] array but only
QCOM_SDW_MAX_PORTS + 1 (15) in the ctrl->pconfig[] array so it corrupts
memory like Yongqin Liu pointed out.

Bug 3:
Like Jie Gan pointed out, it erases all the tx information with the rx
information.

Cc: stable@vger.kernel.org # v6.15+
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
Acked-by: Srinivas Kandagatla <srini@kernel.org>
Link: https://lore.kernel.org/r/20250709174949.8541-1-amit.pundir@linaro.org
Signed-off-by: Vinod Koul <vkoul@kernel.org>
mdroth pushed a commit that referenced this pull request Oct 10, 2025
When device reset is triggered by feature changes such as toggling Rx
VLAN offload, wx->do_reset() is called to reinitialize Rx rings. The
hardware descriptor ring may retain stale values from previous sessions.
And only set the length to 0 in rx_desc[0] would result in building
malformed SKBs. Fix it to ensure a clean slate after device reset.

[  549.186435] [     C16] ------------[ cut here ]------------
[  549.186457] [     C16] kernel BUG at net/core/skbuff.c:2814!
[  549.186468] [     C16] Oops: invalid opcode: 0000 [#1] SMP NOPTI
[  549.186472] [     C16] CPU: 16 UID: 0 PID: 0 Comm: swapper/16 Kdump: loaded Not tainted 6.16.0-rc4+ torvalds#23 PREEMPT(voluntary)
[  549.186476] [     C16] Hardware name: Micro-Star International Co., Ltd. MS-7E16/X670E GAMING PLUS WIFI (MS-7E16), BIOS 1.90 12/31/2024
[  549.186478] [     C16] RIP: 0010:__pskb_pull_tail+0x3ff/0x510
[  549.186484] [     C16] Code: 06 f0 ff 4f 34 74 7b 4d 8b 8c 24 c8 00 00 00 45 8b 84 24 c0 00 00 00 e9 c8 fd ff ff 48 c7 44 24 08 00 00 00 00 e9 5e fe ff ff <0f> 0b 31 c0 e9 23 90 5b ff 41 f7 c6 ff 0f 00 00 75 bf 49 8b 06 a8
[  549.186487] [     C16] RSP: 0018:ffffb391c0640d70 EFLAGS: 00010282
[  549.186490] [     C16] RAX: 00000000fffffff2 RBX: ffff8fe7e4d40200 RCX: 00000000fffffff2
[  549.186492] [     C16] RDX: ffff8fe7c3a4bf8e RSI: 0000000000000180 RDI: ffff8fe7c3a4bf40
[  549.186494] [     C16] RBP: ffffb391c0640da8 R08: ffff8fe7c3a4c0c0 R09: 000000000000000e
[  549.186496] [     C16] R10: ffffb391c0640d88 R11: 000000000000000e R12: ffff8fe7e4d40200
[  549.186497] [     C16] R13: 00000000fffffff2 R14: ffff8fe7fa01a000 R15: 00000000fffffff2
[  549.186499] [     C16] FS:  0000000000000000(0000) GS:ffff8fef5ae40000(0000) knlGS:0000000000000000
[  549.186502] [     C16] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  549.186503] [     C16] CR2: 00007f77d81d6000 CR3: 000000051a032000 CR4: 0000000000750ef0
[  549.186505] [     C16] PKRU: 55555554
[  549.186507] [     C16] Call Trace:
[  549.186510] [     C16]  <IRQ>
[  549.186513] [     C16]  ? srso_alias_return_thunk+0x5/0xfbef5
[  549.186517] [     C16]  __skb_pad+0xc7/0xf0
[  549.186523] [     C16]  wx_clean_rx_irq+0x355/0x3b0 [libwx]
[  549.186533] [     C16]  wx_poll+0x92/0x120 [libwx]
[  549.186540] [     C16]  __napi_poll+0x28/0x190
[  549.186544] [     C16]  net_rx_action+0x301/0x3f0
[  549.186548] [     C16]  ? srso_alias_return_thunk+0x5/0xfbef5
[  549.186551] [     C16]  ? __raw_spin_lock_irqsave+0x1e/0x50
[  549.186554] [     C16]  ? srso_alias_return_thunk+0x5/0xfbef5
[  549.186557] [     C16]  ? wake_up_nohz_cpu+0x35/0x160
[  549.186559] [     C16]  ? srso_alias_return_thunk+0x5/0xfbef5
[  549.186563] [     C16]  handle_softirqs+0xf9/0x2c0
[  549.186568] [     C16]  __irq_exit_rcu+0xc7/0x130
[  549.186572] [     C16]  common_interrupt+0xb8/0xd0
[  549.186576] [     C16]  </IRQ>
[  549.186577] [     C16]  <TASK>
[  549.186579] [     C16]  asm_common_interrupt+0x22/0x40
[  549.186582] [     C16] RIP: 0010:cpuidle_enter_state+0xc2/0x420
[  549.186585] [     C16] Code: 00 00 e8 11 0e 5e ff e8 ac f0 ff ff 49 89 c5 0f 1f 44 00 00 31 ff e8 0d ed 5c ff 45 84 ff 0f 85 40 02 00 00 fb 0f 1f 44 00 00 <45> 85 f6 0f 88 84 01 00 00 49 63 d6 48 8d 04 52 48 8d 04 82 49 8d
[  549.186587] [     C16] RSP: 0018:ffffb391c0277e78 EFLAGS: 00000246
[  549.186590] [     C16] RAX: ffff8fef5ae40000 RBX: 0000000000000003 RCX: 0000000000000000
[  549.186591] [     C16] RDX: 0000007fde0faac5 RSI: ffffffff826e53f6 RDI: ffffffff826fa9b3
[  549.186593] [     C16] RBP: ffff8fe7c3a20800 R08: 0000000000000002 R09: 0000000000000000
[  549.186595] [     C16] R10: 0000000000000000 R11: 000000000000ffff R12: ffffffff82ed7a40
[  549.186596] [     C16] R13: 0000007fde0faac5 R14: 0000000000000003 R15: 0000000000000000
[  549.186601] [     C16]  ? cpuidle_enter_state+0xb3/0x420
[  549.186605] [     C16]  cpuidle_enter+0x29/0x40
[  549.186609] [     C16]  cpuidle_idle_call+0xfd/0x170
[  549.186613] [     C16]  do_idle+0x7a/0xc0
[  549.186616] [     C16]  cpu_startup_entry+0x25/0x30
[  549.186618] [     C16]  start_secondary+0x117/0x140
[  549.186623] [     C16]  common_startup_64+0x13e/0x148
[  549.186628] [     C16]  </TASK>

Fixes: 3c47e8a ("net: libwx: Support to receive packets in NAPI")
Cc: stable@vger.kernel.org
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250714024755.17512-4-jiawenwu@trustnetic.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
mdroth pushed a commit that referenced this pull request Oct 10, 2025
… runtime

Assuming the "rx-vlan-filter" feature is enabled on a net device, the
8021q module will automatically add or remove VLAN 0 when the net device
is put administratively up or down, respectively. There are a couple of
problems with the above scheme.

The first problem is a memory leak that can happen if the "rx-vlan-filter"
feature is disabled while the device is running:

 # ip link add bond1 up type bond mode 0
 # ethtool -K bond1 rx-vlan-filter off
 # ip link del dev bond1

When the device is put administratively down the "rx-vlan-filter"
feature is disabled, so the 8021q module will not remove VLAN 0 and the
memory will be leaked [1].

Another problem that can happen is that the kernel can automatically
delete VLAN 0 when the device is put administratively down despite not
adding it when the device was put administratively up since during that
time the "rx-vlan-filter" feature was disabled. null-ptr-unref or
bug_on[2] will be triggered by unregister_vlan_dev() for refcount
imbalance if toggling filtering during runtime:

$ ip link add bond0 type bond mode 0
$ ip link add link bond0 name vlan0 type vlan id 0 protocol 802.1q
$ ethtool -K bond0 rx-vlan-filter off
$ ifconfig bond0 up
$ ethtool -K bond0 rx-vlan-filter on
$ ifconfig bond0 down
$ ip link del vlan0

Root cause is as below:
step1: add vlan0 for real_dev, such as bond, team.
register_vlan_dev
    vlan_vid_add(real_dev,htons(ETH_P_8021Q),0) //refcnt=1
step2: disable vlan filter feature and enable real_dev
step3: change filter from 0 to 1
vlan_device_event
    vlan_filter_push_vids
        ndo_vlan_rx_add_vid //No refcnt added to real_dev vlan0
step4: real_dev down
vlan_device_event
    vlan_vid_del(dev, htons(ETH_P_8021Q), 0); //refcnt=0
        vlan_info_rcu_free //free vlan0
step5: delete vlan0
unregister_vlan_dev
    BUG_ON(!vlan_info); //vlan_info is null

Fix both problems by noting in the VLAN info whether VLAN 0 was
automatically added upon NETDEV_UP and based on that decide whether it
should be deleted upon NETDEV_DOWN, regardless of the state of the
"rx-vlan-filter" feature.

[1]
unreferenced object 0xffff8880068e3100 (size 256):
  comm "ip", pid 384, jiffies 4296130254
  hex dump (first 32 bytes):
    00 20 30 0d 80 88 ff ff 00 00 00 00 00 00 00 00  . 0.............
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace (crc 81ce31fa):
    __kmalloc_cache_noprof+0x2b5/0x340
    vlan_vid_add+0x434/0x940
    vlan_device_event.cold+0x75/0xa8
    notifier_call_chain+0xca/0x150
    __dev_notify_flags+0xe3/0x250
    rtnl_configure_link+0x193/0x260
    rtnl_newlink_create+0x383/0x8e0
    __rtnl_newlink+0x22c/0xa40
    rtnl_newlink+0x627/0xb00
    rtnetlink_rcv_msg+0x6fb/0xb70
    netlink_rcv_skb+0x11f/0x350
    netlink_unicast+0x426/0x710
    netlink_sendmsg+0x75a/0xc20
    __sock_sendmsg+0xc1/0x150
    ____sys_sendmsg+0x5aa/0x7b0
    ___sys_sendmsg+0xfc/0x180

[2]
kernel BUG at net/8021q/vlan.c:99!
Oops: invalid opcode: 0000 [#1] SMP KASAN PTI
CPU: 0 UID: 0 PID: 382 Comm: ip Not tainted 6.16.0-rc3 torvalds#61 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
RIP: 0010:unregister_vlan_dev (net/8021q/vlan.c:99 (discriminator 1))
RSP: 0018:ffff88810badf310 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff88810da84000 RCX: ffffffffb47ceb9a
RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff88810e8b43c8
RBP: 0000000000000000 R08: 0000000000000000 R09: fffffbfff6cefe80
R10: ffffffffb677f407 R11: ffff88810badf3c0 R12: ffff88810e8b4000
R13: 0000000000000000 R14: ffff88810642a5c0 R15: 000000000000017e
FS:  00007f1ff68c20c0(0000) GS:ffff888163a24000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f1ff5dad240 CR3: 0000000107e56000 CR4: 00000000000006f0
Call Trace:
 <TASK>
rtnl_dellink (net/core/rtnetlink.c:3511 net/core/rtnetlink.c:3553)
rtnetlink_rcv_msg (net/core/rtnetlink.c:6945)
netlink_rcv_skb (net/netlink/af_netlink.c:2535)
netlink_unicast (net/netlink/af_netlink.c:1314 net/netlink/af_netlink.c:1339)
netlink_sendmsg (net/netlink/af_netlink.c:1883)
____sys_sendmsg (net/socket.c:712 net/socket.c:727 net/socket.c:2566)
___sys_sendmsg (net/socket.c:2622)
__sys_sendmsg (net/socket.c:2652)
do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94)

Fixes: ad1afb0 ("vlan_dev: VLAN 0 should be treated as "no vlan tag" (802.1p packet)")
Reported-by: syzbot+a8b046e462915c65b10b@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=a8b046e462915c65b10b
Suggested-by: Ido Schimmel <idosch@idosch.org>
Signed-off-by: Dong Chenchen <dongchenchen2@huawei.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250716034504.2285203-2-dongchenchen2@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
mdroth pushed a commit that referenced this pull request Oct 10, 2025
We have observed kernel panics when using timerlat with stack saving,
with the following dmesg output:

memcpy: detected buffer overflow: 88 byte write of buffer size 0
WARNING: CPU: 2 PID: 8153 at lib/string_helpers.c:1032 __fortify_report+0x55/0xa0
CPU: 2 UID: 0 PID: 8153 Comm: timerlatu/2 Kdump: loaded Not tainted 6.15.3-200.fc42.x86_64 #1 PREEMPT(lazy)
Call Trace:
 <TASK>
 ? trace_buffer_lock_reserve+0x2a/0x60
 __fortify_panic+0xd/0xf
 __timerlat_dump_stack.cold+0xd/0xd
 timerlat_dump_stack.part.0+0x47/0x80
 timerlat_fd_read+0x36d/0x390
 vfs_read+0xe2/0x390
 ? syscall_exit_to_user_mode+0x1d5/0x210
 ksys_read+0x73/0xe0
 do_syscall_64+0x7b/0x160
 ? exc_page_fault+0x7e/0x1a0
 entry_SYSCALL_64_after_hwframe+0x76/0x7e

__timerlat_dump_stack() constructs the ftrace stack entry like this:

struct stack_entry *entry;
...
memcpy(&entry->caller, fstack->calls, size);
entry->size = fstack->nr_entries;

Since commit e7186af ("tracing: Add back FORTIFY_SOURCE logic to
kernel_stack event structure"), struct stack_entry marks its caller
field with __counted_by(size). At the time of the memcpy, entry->size
contains garbage from the ringbuffer, which under some circumstances is
zero, triggering a kernel panic by buffer overflow.

Populate the size field before the memcpy so that the out-of-bounds
check knows the correct size. This is analogous to
__ftrace_trace_stack().

Cc: stable@vger.kernel.org
Cc: John Kacur <jkacur@redhat.com>
Cc: Luis Goncalves <lgoncalv@redhat.com>
Cc: Attila Fazekas <afazekas@redhat.com>
Link: https://lore.kernel.org/20250716143601.7313-1-tglozar@redhat.com
Fixes: e7186af ("tracing: Add back FORTIFY_SOURCE logic to kernel_stack event structure")
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
mdroth pushed a commit that referenced this pull request Oct 10, 2025
Sometimes the netdev triggers causes tasks to get blocked for more then
120 seconds, which in turn makes the (WAN) network port on the NanoPi
R5S fail to come up.
This results in the following (partial) trace:

  INFO: task kworker/0:1:11 blocked for more than 120 seconds.
        Not tainted 6.16-rc6+unreleased-arm64-cknow #1 Debian 6.16~rc6-1~exp1
  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  task:kworker/0:1     state:D stack:0     pid:11    tgid:11    ppid:2      task_flags:0x4208060 flags:0x00000010
  Workqueue: events_power_efficient reg_check_chans_work [cfg80211]
  Call trace:
   __switch_to+0xf8/0x168 (T)
   __schedule+0x3f8/0xda8
   schedule+0x3c/0x120
   schedule_preempt_disabled+0x2c/0x58
   __mutex_lock.constprop.0+0x4d0/0xab8
   __mutex_lock_slowpath+0x1c/0x30
   mutex_lock+0x50/0x68
   rtnl_lock+0x20/0x38
   reg_check_chans_work+0x40/0x478 [cfg80211]
   process_one_work+0x178/0x3e0
   worker_thread+0x260/0x390
   kthread+0x150/0x250
   ret_from_fork+0x10/0x20
  INFO: task kworker/0:1:11 is blocked on a mutex likely owned by task dhcpcd:615.
  task:dhcpcd          state:D stack:0     pid:615   tgid:615   ppid:614    task_flags:0x400140 flags:0x00000018
  Call trace:
   __switch_to+0xf8/0x168 (T)
   __schedule+0x3f8/0xda8
   schedule+0x3c/0x120
   schedule_preempt_disabled+0x2c/0x58
   rwsem_down_write_slowpath+0x1e4/0x750
   down_write+0x98/0xb0
   led_trigger_register+0x134/0x1c0
   phy_led_triggers_register+0xf4/0x258 [libphy]
   phy_attach_direct+0x30c/0x390 [libphy]
   phylink_fwnode_phy_connect+0xb0/0x138 [phylink]
   __stmmac_open+0xec/0x520 [stmmac]
   stmmac_open+0x4c/0xe8 [stmmac]
   __dev_open+0x130/0x2e0
   __dev_change_flags+0x1c4/0x248
   netif_change_flags+0x2c/0x80
   dev_change_flags+0x88/0xc8
   devinet_ioctl+0x35c/0x610
   inet_ioctl+0x204/0x260
   sock_do_ioctl+0x6c/0x140
   sock_ioctl+0x2e4/0x388
   __arm64_sys_ioctl+0xb4/0x120
   invoke_syscall+0x6c/0x100
   el0_svc_common.constprop.0+0x48/0xf0
   do_el0_svc+0x24/0x38
   el0_svc+0x3c/0x188
   el0t_64_sync_handler+0x10c/0x140
   el0t_64_sync+0x198/0x1a0

In order to not introduce a regression with kernel 6.16, drop the netdev
triggers for now while the problem is being investigated further.

Fixes: 1631cbd ("arm64: dts: rockchip: Improve LED config for NanoPi R5S")
Helped-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Diederik de Haas <didi.debian@cknow.org>
Link: https://lore.kernel.org/r/20250722123628.25660-1-didi.debian@cknow.org
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
mdroth pushed a commit that referenced this pull request Oct 10, 2025
syzbot reported a bug in in afs_put_vlserverlist.

  kAFS: bad VL server IP address
  BUG: unable to handle page fault for address: fffffffffffffffa
  ...
  Oops: Oops: 0002 [#1] SMP KASAN PTI
  ...
  RIP: 0010:refcount_dec_and_test include/linux/refcount.h:450 [inline]
  RIP: 0010:afs_put_vlserverlist+0x3a/0x220 fs/afs/vl_list.c:67
  ...
  Call Trace:
   <TASK>
   afs_alloc_cell fs/afs/cell.c:218 [inline]
   afs_lookup_cell+0x12a5/0x1680 fs/afs/cell.c:264
   afs_cell_init+0x17a/0x380 fs/afs/cell.c:386
   afs_proc_rootcell_write+0x21f/0x290 fs/afs/proc.c:247
   proc_simple_write+0x114/0x1b0 fs/proc/generic.c:825
   pde_write fs/proc/inode.c:330 [inline]
   proc_reg_write+0x23d/0x330 fs/proc/inode.c:342
   vfs_write+0x25c/0x1180 fs/read_write.c:682
   ksys_write+0x12a/0x240 fs/read_write.c:736
   do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
   do_syscall_64+0xcd/0x260 arch/x86/entry/syscall_64.c:94
   entry_SYSCALL_64_after_hwframe+0x77/0x7f

Because afs_parse_text_addrs() parses incorrectly, its return value -EINVAL
is assigned to vllist, which results in -EINVAL being used as the vllist
address when afs_put_vlserverlist() is executed.

Set the vllist value to NULL when a parsing error occurs to avoid this
issue.

Fixes: e2c2cb8 ("afs: Simplify cell record handling")
Reported-by: syzbot+5c042fbab0b292c98fc6@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=5c042fbab0b292c98fc6
Tested-by: syzbot+5c042fbab0b292c98fc6@syzkaller.appspotmail.com
Signed-off-by: Edward Adam Davis <eadavis@qq.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/4119365.1753108011@warthog.procyon.org.uk
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant