Skip to content

Conversation

@roxanan1996
Copy link

DESCRIPTION

Commit

RDMA/iwcm: Fix use-after-free of work objects after cm_id destruction

is the CVE fix, but it had dependencies to be applied cleanly.
Those 2 resulted to be CVes on their own.

RDMA/iwcm: Fix WARNING:at_kernel/workqueue.c:#check_flush_dependency

for CVE-2024-47696

and

RDMA/iwcm: Fix a use-after-free related to destroying CM IDs

for CVE-2024-42285

COMMITS

RDMA/iwcm: Fix a use-after-free related to destroying CM IDs

jira VULN-38770
cve CVE-2024-42285
commit-author Bart Van Assche <bvanassche@acm.org>
commit aee2424246f9f1dadc33faa78990c1e2eb7826e4
RDMA/iwcm: Fix WARNING:at_kernel/workqueue.c:#check_flush_dependency

jira VULN-45118
cve CVE-2024-47696
commit-author Bart Van Assche <bvanassche@acm.org>
commit 86dfdd8288907f03c18b7fb462e0e232c4f98d89
RDMA/iwcm: Fix use-after-free of work objects after cm_id destruction

jira VULN-72086
cve CVE-2025-38211
commit-author Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
commit 6883b680e703c6b2efddb4e7a8d891ce1803d06b

TESTING

BUILD

> grep -E -B 5 -A 5 '\[TIMER\]|^Starting Build' /home/rnicolescu/ciq/kernels/lts-9.4/kernel-build-after.log
  CLEAN   scripts/mod
  CLEAN   scripts/selinux/genheaders
  CLEAN   scripts/selinux/mdp
  CLEAN   scripts
  CLEAN   include/config include/generated arch/x86/include/generated .config .config.old certs/signing_key.pem certs/signing_key.x509 certs/x509.genkey
[TIMER]{MRPROPER}: 4s
x86_64 architecture detected, copying config
'configs/kernel-x86_64-rhel.config' -> '.config'
Setting Local Version for build
CONFIG_LOCALVERSION="-rnicolescu_ciqlts9_4-4448675e90bfa"
Making olddefconfig
--
  HOSTCC  scripts/kconfig/util.o
  HOSTLD  scripts/kconfig/conf
#
# configuration written to .config
#
Starting Build
  SYSHDR  arch/x86/include/generated/uapi/asm/unistd_32.h
  SYSHDR  arch/x86/include/generated/uapi/asm/unistd_x32.h
  SYSHDR  arch/x86/include/generated/uapi/asm/unistd_64.h
  SYSTBL  arch/x86/include/generated/asm/syscalls_32.h
  SYSHDR  arch/x86/include/generated/asm/unistd_32_ia32.h
--
  BTF [M] sound/x86/snd-hdmi-lpe-audio.ko
  LD [M]  virt/lib/irqbypass.ko
  BTF [M] sound/virtio/virtio_snd.ko
  BTF [M] sound/xen/snd_xen_front.ko
  BTF [M] virt/lib/irqbypass.ko
[TIMER]{BUILD}: 2759s
Making Modules
  INSTALL /lib/modules/5.14.0-rnicolescu_ciqlts9_4-4448675e90bfa+/kernel/arch/x86/crypto/blake2s-x86_64.ko
  INSTALL /lib/modules/5.14.0-rnicolescu_ciqlts9_4-4448675e90bfa+/kernel/arch/x86/crypto/blowfish-x86_64.ko
  INSTALL /lib/modules/5.14.0-rnicolescu_ciqlts9_4-4448675e90bfa+/kernel/arch/x86/crypto/camellia-aesni-avx2.ko
  INSTALL /lib/modules/5.14.0-rnicolescu_ciqlts9_4-4448675e90bfa+/kernel/arch/x86/crypto/camellia-x86_64.ko
--
  SIGN    /lib/modules/5.14.0-rnicolescu_ciqlts9_4-4448675e90bfa+/kernel/virt/lib/irqbypass.ko
  SIGN    /lib/modules/5.14.0-rnicolescu_ciqlts9_4-4448675e90bfa+/kernel/sound/usb/usx2y/snd-usb-usx2y.ko
  STRIP   /lib/modules/5.14.0-rnicolescu_ciqlts9_4-4448675e90bfa+/kernel/drivers/gpu/drm/virtio/virtio-gpu.ko
  SIGN    /lib/modules/5.14.0-rnicolescu_ciqlts9_4-4448675e90bfa+/kernel/drivers/gpu/drm/virtio/virtio-gpu.ko
  DEPMOD  /lib/modules/5.14.0-rnicolescu_ciqlts9_4-4448675e90bfa+
[TIMER]{MODULES}: 11s
Making Install
sh ./arch/x86/boot/install.sh 5.14.0-rnicolescu_ciqlts9_4-4448675e90bfa+ \
	arch/x86/boot/bzImage System.map "/boot"
sed: can't read /boot/.vmlinuz-5.14.0-rnicolescu_ciqlts9_4-4448675e90bfa+.hmac: No such file or directory
Can't create '/boot/.vmlinuz-0-rescue-2dc3542e4cb84694be2eb7e84d00575c.hmac' from '/boot/.vmlinuz-5.14.0-rnicolescu_ciqlts9_4-4448675e90bfa+.hmac'!
[TIMER]{INSTALL}: 62s
Checking kABI
kABI check passed
Setting Default Kernel to /boot/vmlinuz-5.14.0-rnicolescu_ciqlts9_4-4448675e90bfa+ and Index to 0
The default is /boot/loader/entries/2dc3542e4cb84694be2eb7e84d00575c-5.14.0-rnicolescu_ciqlts9_4-4448675e90bfa+.conf with index 0 and kernel /boot/vmlinuz-5.14.0-rnicolescu_ciqlts9_4-4448675e90bfa+
The default is /boot/loader/entries/2dc3542e4cb84694be2eb7e84d00575c-5.14.0-rnicolescu_ciqlts9_4-4448675e90bfa+.conf with index 0 and kernel /boot/vmlinuz-5.14.0-rnicolescu_ciqlts9_4-4448675e90bfa+
Generating grub configuration file ...
Adding boot menu entry for UEFI Firmware Settings ...
done
Hopefully Grub2.0 took everything ... rebooting after time metrices
[TIMER]{MRPROPER}: 4s
[TIMER]{BUILD}: 2759s
[TIMER]{MODULES}: 11s
[TIMER]{INSTALL}: 62s
[TIMER]{TOTAL} 2844s
Rebooting in 10 seconds

kernel-build-before.log
kernel-build-after.log

Kselftests

> /home/rnicolescu/ciq/kernel-tools/kselftest-diff.sh /home/rnicolescu/ciq/kernels/lts-9.4
/home/rnicolescu/ciq/kernels/lts-9.4/kselftest-before.log
368
/home/rnicolescu/ciq/kernels/lts-9.4/kselftest-after.log
367
Before: /home/rnicolescu/ciq/kernels/lts-9.4/kselftest-before.log
After: /home/rnicolescu/ciq/kernels/lts-9.4/kselftest-after.log
Diff:
-ok 6 selftests: net: tls

kselftest-before.log
kselftest-after.log

Check_kernel_commits

> python3 /home/rnicolescu/ciq/kernel-src-tree-tools/check_kernel_commits.py --repo /home/rnicolescu/ciq/kernels/lts-9.4/kernel-src-tree --pr_branch {rnicolescu}_ciqlts9_4 --base_branch origin/ciqlts9_4 --check-cves
All referenced commits exist upstream and have no Fixes: tags.

Run interdiff

> python3 /home/rnicolescu/ciq/kernel-src-tree-tools/run_interdiff.py --repo /home/rnicolescu/ciq/kernels/lts-9.4/kernel-src-tree --pr_branch {rnicolescu}_ciqlts9_4 --base_branch origin/ciqlts9_4
All backported commits match their upstream counterparts.

jira VULN-38770
cve CVE-2024-42285
commit-author Bart Van Assche <bvanassche@acm.org>
commit aee2424

iw_conn_req_handler() associates a new struct rdma_id_private (conn_id) with
an existing struct iw_cm_id (cm_id) as follows:

        conn_id->cm_id.iw = cm_id;
        cm_id->context = conn_id;
        cm_id->cm_handler = cma_iw_handler;

rdma_destroy_id() frees both the cm_id and the struct rdma_id_private. Make
sure that cm_work_handler() does not trigger a use-after-free by only
freeing of the struct rdma_id_private after all pending work has finished.

	Cc: stable@vger.kernel.org
Fixes: 59c68ac ("iw_cm: free cm_id resources on the last deref")
	Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
	Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
	Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20240605145117.397751-6-bvanassche@acm.org
	Signed-off-by: Leon Romanovsky <leon@kernel.org>
(cherry picked from commit aee2424)
	Signed-off-by: Roxana Nicolescu <rnicolescu@ciq.com>
jira VULN-45118
cve CVE-2024-47696
commit-author Bart Van Assche <bvanassche@acm.org>
commit 86dfdd8

In the commit aee2424 ("RDMA/iwcm: Fix a use-after-free related to
destroying CM IDs"), the function flush_workqueue is invoked to flush the
work queue iwcm_wq.

But at that time, the work queue iwcm_wq was created via the function
alloc_ordered_workqueue without the flag WQ_MEM_RECLAIM.

Because the current process is trying to flush the whole iwcm_wq, if
iwcm_wq doesn't have the flag WQ_MEM_RECLAIM, verify that the current
process is not reclaiming memory or running on a workqueue which doesn't
have the flag WQ_MEM_RECLAIM as that can break forward-progress guarantee
leading to a deadlock.

The call trace is as below:

[  125.350876][ T1430] Call Trace:
[  125.356281][ T1430]  <TASK>
[ 125.361285][ T1430] ? __warn (kernel/panic.c:693)
[ 125.367640][ T1430] ? check_flush_dependency (kernel/workqueue.c:3706 (discriminator 9))
[ 125.375689][ T1430] ? report_bug (lib/bug.c:180 lib/bug.c:219)
[ 125.382505][ T1430] ? handle_bug (arch/x86/kernel/traps.c:239)
[ 125.388987][ T1430] ? exc_invalid_op (arch/x86/kernel/traps.c:260 (discriminator 1))
[ 125.395831][ T1430] ? asm_exc_invalid_op (arch/x86/include/asm/idtentry.h:621)
[ 125.403125][ T1430] ? check_flush_dependency (kernel/workqueue.c:3706 (discriminator 9))
[ 125.410984][ T1430] ? check_flush_dependency (kernel/workqueue.c:3706 (discriminator 9))
[ 125.418764][ T1430] __flush_workqueue (kernel/workqueue.c:3970)
[ 125.426021][ T1430] ? __pfx___might_resched (kernel/sched/core.c:10151)
[ 125.433431][ T1430] ? destroy_cm_id (drivers/infiniband/core/iwcm.c:375) iw_cm
[ 125.441209][ T1430] ? __pfx___flush_workqueue (kernel/workqueue.c:3910)
[ 125.473900][ T1430] ? _raw_spin_lock_irqsave (arch/x86/include/asm/atomic.h:107 include/linux/atomic/atomic-arch-fallback.h:2170 include/linux/atomic/atomic-instrumented.h:1302 include/asm-generic/qspinlock.h:111 include/linux/spinlock.h:187 include/linux/spinlock_api_smp.h:111 kernel/locking/spinlock.c:162)
[ 125.473909][ T1430] ? __pfx__raw_spin_lock_irqsave (kernel/locking/spinlock.c:161)
[ 125.482537][ T1430] _destroy_id (drivers/infiniband/core/cma.c:2044) rdma_cm
[ 125.495072][ T1430] nvme_rdma_free_queue (drivers/nvme/host/rdma.c:656 drivers/nvme/host/rdma.c:650) nvme_rdma
[ 125.505827][ T1430] nvme_rdma_reset_ctrl_work (drivers/nvme/host/rdma.c:2180) nvme_rdma
[ 125.505831][ T1430] process_one_work (kernel/workqueue.c:3231)
[ 125.515122][ T1430] worker_thread (kernel/workqueue.c:3306 kernel/workqueue.c:3393)
[ 125.515127][ T1430] ? __pfx_worker_thread (kernel/workqueue.c:3339)
[ 125.531837][ T1430] kthread (kernel/kthread.c:389)
[ 125.539864][ T1430] ? __pfx_kthread (kernel/kthread.c:342)
[ 125.550628][ T1430] ret_from_fork (arch/x86/kernel/process.c:147)
[ 125.558840][ T1430] ? __pfx_kthread (kernel/kthread.c:342)
[ 125.558844][ T1430] ret_from_fork_asm (arch/x86/entry/entry_64.S:257)
[  125.566487][ T1430]  </TASK>
[  125.566488][ T1430] ---[ end trace 0000000000000000 ]---

Fixes: aee2424 ("RDMA/iwcm: Fix a use-after-free related to destroying CM IDs")
Link: https://patch.msgid.link/r/20240820113336.19860-1-yanjun.zhu@linux.dev
	Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202408151633.fc01893c-oliver.sang@intel.com
	Tested-by: kernel test robot <oliver.sang@intel.com>
	Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
	Reviewed-by: Bart Van Assche <bvanassche@acm.org>
	Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
(cherry picked from commit 86dfdd8)
	Signed-off-by: Roxana Nicolescu <rnicolescu@ciq.com>
jira VULN-72086
cve CVE-2025-38211
commit-author Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
commit 6883b68

The commit 59c68ac ("iw_cm: free cm_id resources on the last
deref") simplified cm_id resource management by freeing cm_id once all
references to the cm_id were removed. The references are removed either
upon completion of iw_cm event handlers or when the application destroys
the cm_id. This commit introduced the use-after-free condition where
cm_id_private object could still be in use by event handler works during
the destruction of cm_id. The commit aee2424 ("RDMA/iwcm: Fix a
use-after-free related to destroying CM IDs") addressed this use-after-
free by flushing all pending works at the cm_id destruction.

However, still another use-after-free possibility remained. It happens
with the work objects allocated for each cm_id_priv within
alloc_work_entries() during cm_id creation, and subsequently freed in
dealloc_work_entries() once all references to the cm_id are removed.
If the cm_id's last reference is decremented in the event handler work,
the work object for the work itself gets removed, and causes the use-
after-free BUG below:

  BUG: KASAN: slab-use-after-free in __pwq_activate_work+0x1ff/0x250
  Read of size 8 at addr ffff88811f9cf800 by task kworker/u16:1/147091

  CPU: 2 UID: 0 PID: 147091 Comm: kworker/u16:1 Not tainted 6.15.0-rc2+ #27 PREEMPT(voluntary)
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-3.fc41 04/01/2014
  Workqueue:  0x0 (iw_cm_wq)
  Call Trace:
   <TASK>
   dump_stack_lvl+0x6a/0x90
   print_report+0x174/0x554
   ? __virt_addr_valid+0x208/0x430
   ? __pwq_activate_work+0x1ff/0x250
   kasan_report+0xae/0x170
   ? __pwq_activate_work+0x1ff/0x250
   __pwq_activate_work+0x1ff/0x250
   pwq_dec_nr_in_flight+0x8c5/0xfb0
   process_one_work+0xc11/0x1460
   ? __pfx_process_one_work+0x10/0x10
   ? assign_work+0x16c/0x240
   worker_thread+0x5ef/0xfd0
   ? __pfx_worker_thread+0x10/0x10
   kthread+0x3b0/0x770
   ? __pfx_kthread+0x10/0x10
   ? rcu_is_watching+0x11/0xb0
   ? _raw_spin_unlock_irq+0x24/0x50
   ? rcu_is_watching+0x11/0xb0
   ? __pfx_kthread+0x10/0x10
   ret_from_fork+0x30/0x70
   ? __pfx_kthread+0x10/0x10
   ret_from_fork_asm+0x1a/0x30
   </TASK>

  Allocated by task 147416:
   kasan_save_stack+0x2c/0x50
   kasan_save_track+0x10/0x30
   __kasan_kmalloc+0xa6/0xb0
   alloc_work_entries+0xa9/0x260 [iw_cm]
   iw_cm_connect+0x23/0x4a0 [iw_cm]
   rdma_connect_locked+0xbfd/0x1920 [rdma_cm]
   nvme_rdma_cm_handler+0x8e5/0x1b60 [nvme_rdma]
   cma_cm_event_handler+0xae/0x320 [rdma_cm]
   cma_work_handler+0x106/0x1b0 [rdma_cm]
   process_one_work+0x84f/0x1460
   worker_thread+0x5ef/0xfd0
   kthread+0x3b0/0x770
   ret_from_fork+0x30/0x70
   ret_from_fork_asm+0x1a/0x30

  Freed by task 147091:
   kasan_save_stack+0x2c/0x50
   kasan_save_track+0x10/0x30
   kasan_save_free_info+0x37/0x60
   __kasan_slab_free+0x4b/0x70
   kfree+0x13a/0x4b0
   dealloc_work_entries+0x125/0x1f0 [iw_cm]
   iwcm_deref_id+0x6f/0xa0 [iw_cm]
   cm_work_handler+0x136/0x1ba0 [iw_cm]
   process_one_work+0x84f/0x1460
   worker_thread+0x5ef/0xfd0
   kthread+0x3b0/0x770
   ret_from_fork+0x30/0x70
   ret_from_fork_asm+0x1a/0x30

  Last potentially related work creation:
   kasan_save_stack+0x2c/0x50
   kasan_record_aux_stack+0xa3/0xb0
   __queue_work+0x2ff/0x1390
   queue_work_on+0x67/0xc0
   cm_event_handler+0x46a/0x820 [iw_cm]
   siw_cm_upcall+0x330/0x650 [siw]
   siw_cm_work_handler+0x6b9/0x2b20 [siw]
   process_one_work+0x84f/0x1460
   worker_thread+0x5ef/0xfd0
   kthread+0x3b0/0x770
   ret_from_fork+0x30/0x70
   ret_from_fork_asm+0x1a/0x30

This BUG is reproducible by repeating the blktests test case nvme/061
for the rdma transport and the siw driver.

To avoid the use-after-free of cm_id_private work objects, ensure that
the last reference to the cm_id is decremented not in the event handler
works, but in the cm_id destruction context. For that purpose, move
iwcm_deref_id() call from destroy_cm_id() to the callers of
destroy_cm_id(). In iw_destroy_cm_id(), call iwcm_deref_id() after
flushing the pending works.

During the fix work, I noticed that iw_destroy_cm_id() is called from
cm_work_handler() and process_event() context. However, the comment of
iw_destroy_cm_id() notes that the function "cannot be called by the
event thread". Drop the false comment.

Closes: https://lore.kernel.org/linux-rdma/r5676e754sv35aq7cdsqrlnvyhiq5zktteaurl7vmfih35efko@z6lay7uypy3c/
Fixes: 59c68ac ("iw_cm: free cm_id resources on the last deref")
	Cc: stable@vger.kernel.org
	Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Link: https://patch.msgid.link/20250510101036.1756439-1-shinichiro.kawasaki@wdc.com
	Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
	Signed-off-by: Leon Romanovsky <leon@kernel.org>
(cherry picked from commit 6883b68)
	Signed-off-by: Roxana Nicolescu <rnicolescu@ciq.com>
@roxanan1996 roxanan1996 self-assigned this Nov 17, 2025
@roxanan1996 roxanan1996 requested a review from a team November 17, 2025 13:47
Copy link
Collaborator

@bmastbergen bmastbergen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥌

Don't forget to log some time:

brett@iconium ~/ciq/kernel-src-tree-tools
 % python3 ./jira_pr_check.py --kernel-src-tree ~/ciq/kernel-src-tree --merge-target ciqlts9_4 --pr-branch {rnicolescu}_ciqlts9_4 --jira-url https://ciqinc.atlassian.net --jira-user bmastbergen@ciq.com --jira-key 

## JIRA PR Check Results

**3 commit(s) with issues found:**

### Commit `4448675e90bf`
**Summary:** RDMA/iwcm: Fix use-after-free of work objects after cm_id destruction

**❌ Errors:**
- **VULN-72086**: Status is 'To Do', expected 'In Progress'

**⚠ Warnings:**
- **VULN-72086**: No time logged - please log time manually

### Commit `261d05cb062c`
**Summary:** RDMA/iwcm: Fix WARNING:at_kernel/workqueue.c:#check_flush_dependency

**❌ Errors:**
- **VULN-45118**: Status is 'To Do', expected 'In Progress'

**⚠ Warnings:**
- **VULN-45118**: No time logged - please log time manually

### Commit `30dd38891e5b`
**Summary:** RDMA/iwcm: Fix a use-after-free related to destroying CM IDs

**❌ Errors:**
- **VULN-38770**: Status is 'To Do', expected 'In Progress'

**⚠ Warnings:**
- **VULN-38770**: No time logged - please log time manually


---
**Summary:** Checked 3 commit(s) total.

@roxanan1996
Copy link
Author

Yeah, for the extra CVES. I'll add this to my pull request script

@roxanan1996
Copy link
Author

❯ python3 ~/ciq/kernel-src-tree-tools/jira_pr_check.py --kernel-src-tree .  --merge-target ciqlts9_4 --pr-branch {rnicolescu}_ciqlts9_4

## JIRA PR Check Results

✅ **No issues found!**


---
**Summary:** Checked 3 commit(s) total.

Copy link
Collaborator

@PlaidCat PlaidCat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@roxanan1996 roxanan1996 merged commit f4170e4 into ciqlts9_4 Nov 18, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

4 participants