Skip to content

crypto: hisilicon/zip: fix memory leak in hisi_zip_create_req_q()#2

Closed
IAMHCHCH wants to merge 9 commits into
masterfrom
fix-req-q-memory-leak-2026
Closed

crypto: hisilicon/zip: fix memory leak in hisi_zip_create_req_q()#2
IAMHCHCH wants to merge 9 commits into
masterfrom
fix-req-q-memory-leak-2026

Conversation

@IAMHCHCH
Copy link
Copy Markdown
Owner

@IAMHCHCH IAMHCHCH commented Apr 4, 2026

Problem

In hisi_zip_create_req_q(), when the DECOMP (i=1) request array allocation via kzalloc_objs() fails after the COMP (i=0) allocations already succeeded, the error path jumps to err_free_comp_q.

This label frees the COMP q array (kfree) but does NOT free the COMP req_bitmap, causing a memory leak of one bitmap per failed ctx init.

Root Cause Analysis

i=0 (COMP): bitmap_zalloc OK, kzalloc_objs OK
i=1 (DECOMP): bitmap_zalloc OK, kzalloc_objs FAIL

goto err_free_decomp_bitmap  → frees DECOMP bitmap
goto err_free_comp_q         → frees COMP q array
                                  COMP bitmap is LEAKED ⚠️
goto err_free_comp_bitmap    → never reached

Fix

Add bitmap_free() for COMP req_bitmap in err_free_comp_q label, paired with the existing kfree():

err_free_comp_q:
    kfree(ctx->qp_ctx[HZIP_QPC_COMP].req_q.q);
    bitmap_free(ctx->qp_ctx[HZIP_QPC_COMP].req_q.req_bitmap);  // ← added
err_free_comp_bitmap:
    bitmap_free(ctx->qp_ctx[HZIP_QPC_COMP].req_q.req_bitmap);

Scope

  • Single file modified: drivers/crypto/hisilicon/zip/zip_crypto.c
  • Single line added: +1
  • No functional change to the normal (non-error) code path
  • Triggers only on allocation failure during ctx init

Testing

Please verify:

  1. Confirm the error path analysis is correct
  2. If possible, test with injected allocation failure for DECOMP

Reviewed-by: (pending)

Eric Biggers and others added 9 commits March 26, 2026 18:10
The tegra crypto driver failed to set the CRYPTO_ALG_ASYNC on its
asynchronous algorithms, causing the crypto API to select them for users
that request only synchronous algorithms.  This causes crashes (at
least).  Fix this by adding the flag like what the other drivers do.
Also remove the unnecessary CRYPTO_ALG_TYPE_* flags, since those just
get ignored and overridden by the registration function anyway.

Reported-by: Zorro Lang <zlang@redhat.com>
Closes: https://lore.kernel.org/r/20260314080937.pghb4aa7d4je3mhh@dell-per750-06-vm-08.rhts.eng.pek2.redhat.com
Fixes: 0880bb3 ("crypto: tegra - Add Tegra Security Engine driver")
Cc: stable@vger.kernel.org
Cc: Akhil R <akhilrajeev@nvidia.com>
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
When a key longer than block size is supplied, it is copied and then
hashed into the real key.  The memory allocated for the copy needs to
be rounded to DMA cache alignment, as otherwise the hashed key may
corrupt neighbouring memory.

The rounding was performed, but never actually used for the allocation.
Fix this by replacing kmemdup with kmalloc for a larger buffer,
followed by memcpy.

Fixes: 199354d ("crypto: caam - Remove GFP_DMA and add DMA alignment padding")
Reported-by: Paul Bunyan <pbunyan@redhat.com>
Signed-off-by: Horia Geantă <horia.geanta@nxp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
When a key longer than block size is supplied, it is copied and then
hashed into the real key.  The memory allocated for the copy needs to
be rounded to DMA cache alignment, as otherwise the hashed key may
corrupt neighbouring memory.

The copying is performed using kmemdup, however this leads to an overflow:
reading more bytes (aligned_len - keylen) from the keylen source buffer.
Fix this by replacing kmemdup with kmalloc, followed by memcpy.

Fixes: 199354d ("crypto: caam - Remove GFP_DMA and add DMA alignment padding")
Signed-off-by: Horia Geantă <horia.geanta@nxp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The code in deflate_decompress_one may erroneously return -ENOSPC even if
it didn't run out of output space. The error happens under this
condition:

- Suppose that there are two input pages, the compressed data fits into
  the first page and the zlib checksum is placed in the second page.

- The code iterates over the first page, decompresses the data and fully
  fills the destination buffer, zlib_inflate returns Z_OK becuse zlib
  hasn't seen the checksum yet.

- The outer do-while loop is iterated again, acomp_walk_next_src sets the
  input parameters to the second page containing the checksum.

- We go into the inner do-while loop, execute "dcur =
  acomp_walk_next_dst(&walk);". "dcur" is zero, so we break out of the
  loop and return -ENOSPC, despite the fact that the decompressed data
  fit into the destination buffer.

In order to fix this bug, this commit changes the logic when to report
the -ENOSPC error. We report the error if the destination buffer is empty
*and* if zlib_inflate didn't make any progress consuming the input
buffer. If zlib_inflate consumes the trailing checksum, we see that it
made progress and we will not return -ENOSPC.

Fixes: 08cabc7 ("crypto: deflate - Convert to acomp")
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The AF_ALG interface fails to unmark the end of a Scatter/Gather List (SGL)
when chaining a new af_alg_tsgl structure. If a sendmsg() fills an SGL
exactly to MAX_SGL_ENTS, the last entry is marked as the end. A subsequent
sendmsg() allocates a new SGL and chains it, but fails to clear the end
marker on the previous SGL's last data entry.

This causes the crypto scatterwalk to hit a premature end, returning NULL
on sg_next() and leading to a kernel panic during dereference.

Fix this by explicitly unmarking the end of the previous SGL when
performing sg_chain() in af_alg_alloc_tsgl().

Fixes: 8ff5909 ("crypto: algif_skcipher - User-space interface for skcipher operations")
Signed-off-by: Norbert Szetei <norbert@doyensec.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This mostly reverts commit 72548b0 except for the copying of
the associated data.

There is no benefit in operating in-place in algif_aead since the
source and destination come from different mappings.  Get rid of
all the complexity added for in-place operation and just copy the
AD directly.

Fixes: 72548b0 ("crypto: algif_aead - copy AAD from src to dst")
Reported-by: Taeyang Lee <0wn@theori.io>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
…e decryption

When decrypting data that is not in-place (src != dst), there is
no need to save the high-order sequence bits in dst as it could
simply be re-copied from the source.

However, the data to be hashed need to be rearranged accordingly.

Reported-by: Taeyang Lee <0wn@theori.io>
Fixes: 104880a ("crypto: authencesn - Convert to new AEAD interface")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Thanks,

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Make af_alg_get_rsgl() limit each RX scatterlist extraction to the
remaining receive buffer budget.

af_alg_get_rsgl() currently uses af_alg_readable() only as a gate
before extracting data into the RX scatterlist. Limit each extraction
to the remaining af_alg_rcvbuf(sk) budget so that receive-side
accounting matches the amount of data attached to the request.

If skcipher cannot obtain enough RX space for at least one chunk while
more data remains to be processed, reject the recvmsg call instead of
rounding the request length down to zero.

Fixes: e870456 ("crypto: algif_skcipher - overhaul memory management")
Reported-by: Yifan Wu <yifanwucs@gmail.com>
Reported-by: Juefei Pu <tomapufckgml@gmail.com>
Co-developed-by: Yuan Tan <yuantan098@gmail.com>
Signed-off-by: Yuan Tan <yuantan098@gmail.com>
Suggested-by: Xin Liu <bird@lzu.edu.cn>
Signed-off-by: Douya Le <ldy3087146292@gmail.com>
Signed-off-by: Ren Wei <n05ec@lzu.edu.cn>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
When DECOMP request queue allocation fails (i=1), the error path
err_free_comp_q only frees the COMP q array but not the COMP req_bitmap,
causing a memory leak of one bitmap per failed ctx init.

Fix by adding bitmap_free() for COMP req_q in err_free_comp_q label,
matching the paired kfree() already present there.

Fixes: a memory leak when kzalloc_objs() fails for DECOMP (i=1)
after COMP (i=0) allocations succeeded.

Signed-off-by: Huang Chenghai <510725557@qq.com>
@IAMHCHCH IAMHCHCH closed this Apr 6, 2026
IAMHCHCH pushed a commit that referenced this pull request Apr 7, 2026
The devm_free_irq() and devm_request_irq() functions should not be
executed in an atomic context.

During device suspend, all userspace processes and most kernel threads
are frozen. Additionally, we flush all tx/rx status, disable all macb
interrupts, and halt rx operations. Therefore, it is safe to split the
region protected by bp->lock into two independent sections, allowing
devm_free_irq() and devm_request_irq() to run in a non-atomic context.
This modification resolves the following lockdep warning:
  BUG: sleeping function called from invalid context at kernel/locking/mutex.c:591
  in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 501, name: rtcwake
  preempt_count: 1, expected: 0
  RCU nest depth: 1, expected: 0
  7 locks held by rtcwake/501:
   #0: ffff0008038c3408 (sb_writers#5){.+.+}-{0:0}, at: vfs_write+0xf8/0x368
   #1: ffff0008049a5e88 (&of->mutex#2){+.+.}-{4:4}, at: kernfs_fop_write_iter+0xbc/0x1c8
   #2: ffff00080098d588 (kn->active#70){.+.+}-{0:0}, at: kernfs_fop_write_iter+0xcc/0x1c8
   #3: ffff800081c84888 (system_transition_mutex){+.+.}-{4:4}, at: pm_suspend+0x1ec/0x290
   #4: ffff0008009ba0f8 (&dev->mutex){....}-{4:4}, at: device_suspend+0x118/0x4f0
   #5: ffff800081d00458 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire+0x4/0x48
   #6: ffff0008031fb9e0 (&bp->lock){-.-.}-{3:3}, at: macb_suspend+0x144/0x558
  irq event stamp: 8682
  hardirqs last  enabled at (8681): [<ffff8000813c7d7c>] _raw_spin_unlock_irqrestore+0x44/0x88
  hardirqs last disabled at (8682): [<ffff8000813c7b58>] _raw_spin_lock_irqsave+0x38/0x98
  softirqs last  enabled at (7322): [<ffff8000800f1b4c>] handle_softirqs+0x52c/0x588
  softirqs last disabled at (7317): [<ffff800080010310>] __do_softirq+0x20/0x2c
  CPU: 1 UID: 0 PID: 501 Comm: rtcwake Not tainted 7.0.0-rc3-next-20260310-yocto-standard+ #125 PREEMPT
  Hardware name: ZynqMP ZCU102 Rev1.1 (DT)
  Call trace:
   show_stack+0x24/0x38 (C)
   __dump_stack+0x28/0x38
   dump_stack_lvl+0x64/0x88
   dump_stack+0x18/0x24
   __might_resched+0x200/0x218
   __might_sleep+0x38/0x98
   __mutex_lock_common+0x7c/0x1378
   mutex_lock_nested+0x38/0x50
   free_irq+0x68/0x2b0
   devm_irq_release+0x24/0x38
   devres_release+0x40/0x80
   devm_free_irq+0x48/0x88
   macb_suspend+0x298/0x558
   device_suspend+0x218/0x4f0
   dpm_suspend+0x244/0x3a0
   dpm_suspend_start+0x50/0x78
   suspend_devices_and_enter+0xec/0x560
   pm_suspend+0x194/0x290
   state_store+0x110/0x158
   kobj_attr_store+0x1c/0x30
   sysfs_kf_write+0xa8/0xd0
   kernfs_fop_write_iter+0x11c/0x1c8
   vfs_write+0x248/0x368
   ksys_write+0x7c/0xf8
   __arm64_sys_write+0x28/0x40
   invoke_syscall+0x4c/0xe8
   el0_svc_common+0x98/0xf0
   do_el0_svc+0x28/0x40
   el0_svc+0x54/0x1e0
   el0t_64_sync_handler+0x84/0x130
   el0t_64_sync+0x198/0x1a0

Fixes: 558e35c ("net: macb: WoL support for GEM type of Ethernet controller")
Cc: stable@vger.kernel.org
Reviewed-by: Théo Lebrun <theo.lebrun@bootlin.com>
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Link: https://patch.msgid.link/20260318-macb-irq-v2-1-f1179768ab24@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
IAMHCHCH pushed a commit that referenced this pull request Apr 7, 2026
Access to net_device::ip_ptr and its associated members must be
protected by an RCU lock. Since we are modifying this piece of code,
let's also move it to execute only when WAKE_ARP is enabled.

To minimize the duration of the RCU lock, a local variable is used to
temporarily store the IP address. This change resolves the following
RCU check warning:
  WARNING: suspicious RCU usage
  7.0.0-rc3-next-20260310-yocto-standard+ #122 Not tainted
  -----------------------------
  drivers/net/ethernet/cadence/macb_main.c:5944 suspicious rcu_dereference_check() usage!

  other info that might help us debug this:

  rcu_scheduler_active = 2, debug_locks = 1
  5 locks held by rtcwake/518:
   #0: ffff000803ab1408 (sb_writers#5){.+.+}-{0:0}, at: vfs_write+0xf8/0x368
   #1: ffff0008090bf088 (&of->mutex#2){+.+.}-{4:4}, at: kernfs_fop_write_iter+0xbc/0x1c8
   #2: ffff00080098d588 (kn->active#70){.+.+}-{0:0}, at: kernfs_fop_write_iter+0xcc/0x1c8
   #3: ffff800081c84888 (system_transition_mutex){+.+.}-{4:4}, at: pm_suspend+0x1ec/0x290
   #4: ffff0008009ba0f8 (&dev->mutex){....}-{4:4}, at: device_suspend+0x118/0x4f0

  stack backtrace:
  CPU: 3 UID: 0 PID: 518 Comm: rtcwake Not tainted 7.0.0-rc3-next-20260310-yocto-standard+ #122 PREEMPT
  Hardware name: ZynqMP ZCU102 Rev1.1 (DT)
  Call trace:
   show_stack+0x24/0x38 (C)
   __dump_stack+0x28/0x38
   dump_stack_lvl+0x64/0x88
   dump_stack+0x18/0x24
   lockdep_rcu_suspicious+0x134/0x1d8
   macb_suspend+0xd8/0x4c0
   device_suspend+0x218/0x4f0
   dpm_suspend+0x244/0x3a0
   dpm_suspend_start+0x50/0x78
   suspend_devices_and_enter+0xec/0x560
   pm_suspend+0x194/0x290
   state_store+0x110/0x158
   kobj_attr_store+0x1c/0x30
   sysfs_kf_write+0xa8/0xd0
   kernfs_fop_write_iter+0x11c/0x1c8
   vfs_write+0x248/0x368
   ksys_write+0x7c/0xf8
   __arm64_sys_write+0x28/0x40
   invoke_syscall+0x4c/0xe8
   el0_svc_common+0x98/0xf0
   do_el0_svc+0x28/0x40
   el0_svc+0x54/0x1e0
   el0t_64_sync_handler+0x84/0x130
   el0t_64_sync+0x198/0x1a0

Fixes: 0cb8de3 ("net: macb: Add ARP support to WOL")
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Cc: stable@vger.kernel.org
Reviewed-by: Théo Lebrun <theo.lebrun@bootlin.com>
Link: https://patch.msgid.link/20260318-macb-irq-v2-2-f1179768ab24@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
@IAMHCHCH IAMHCHCH deleted the fix-req-q-memory-leak-2026 branch April 7, 2026 07:29
IAMHCHCH pushed a commit that referenced this pull request Apr 14, 2026
There is a use-after-free error in cfg80211_shutdown_all_interfaces found
by syzkaller:

BUG: KASAN: use-after-free in cfg80211_shutdown_all_interfaces+0x213/0x220
Read of size 8 at addr ffff888112a78d98 by task kworker/0:5/5326
CPU: 0 UID: 0 PID: 5326 Comm: kworker/0:5 Not tainted 6.19.0-rc2 #2 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
Workqueue: events cfg80211_rfkill_block_work
Call Trace:
 <TASK>
 dump_stack_lvl+0x116/0x1f0
 print_report+0xcd/0x630
 kasan_report+0xe0/0x110
 cfg80211_shutdown_all_interfaces+0x213/0x220
 cfg80211_rfkill_block_work+0x1e/0x30
 process_one_work+0x9cf/0x1b70
 worker_thread+0x6c8/0xf10
 kthread+0x3c5/0x780
 ret_from_fork+0x56d/0x700
 ret_from_fork_asm+0x1a/0x30
 </TASK>

The problem arises due to the rfkill_block work is not cancelled when wiphy
is being unregistered. In order to fix the issue cancel the corresponding
work in wiphy_unregister().

Found by Linux Verification Center (linuxtesting.org) with Syzkaller.

Fixes: 1f87f7d ("cfg80211: add rfkill support")
Cc: stable@vger.kernel.org
Signed-off-by: Daniil Dulov <d.dulov@aladdin.ru>
Link: https://patch.msgid.link/20260211082024.1967588-1-d.dulov@aladdin.ru
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
IAMHCHCH pushed a commit that referenced this pull request Apr 14, 2026
Fix a circular locking dependency between dbg_mutex and the domain
rx/tx mutexes that could lead to a deadlock.

The dump path in dr_dump_domain_all() was acquiring locks in the order:
  dbg_mutex -> rx.mutex -> tx.mutex

While the table/matcher creation paths acquire locks in the order:
  rx.mutex -> tx.mutex -> dbg_mutex

This inverted lock ordering creates a circular dependency. Fix this by
changing dr_dump_domain_all() to acquire the domain lock before
dbg_mutex, matching the order used in mlx5dr_table_create() and
mlx5dr_matcher_create().

Lockdep splat:
 ======================================================
 WARNING: possible circular locking dependency detected
 6.19.0-rc6net_next_e817c4e #1 Not tainted
 ------------------------------------------------------
 sos/30721 is trying to acquire lock:
 ffff888102df5900 (&dmn->info.rx.mutex){+.+.}-{4:4}, at:
dr_dump_start+0x131/0x450 [mlx5_core]

 but task is already holding lock:
 ffff888102df5bc0 (&dmn->dump_info.dbg_mutex){+.+.}-{4:4}, at:
dr_dump_start+0x10b/0x450 [mlx5_core]

 which lock already depends on the new lock.

 the existing dependency chain (in reverse order) is:

 -> #2 (&dmn->dump_info.dbg_mutex){+.+.}-{4:4}:
        __mutex_lock+0x91/0x1060
        mlx5dr_matcher_create+0x377/0x5e0 [mlx5_core]
        mlx5_cmd_dr_create_flow_group+0x62/0xd0 [mlx5_core]
        mlx5_create_flow_group+0x113/0x1c0 [mlx5_core]
        mlx5_chains_create_prio+0x453/0x2290 [mlx5_core]
        mlx5_chains_get_table+0x2e2/0x980 [mlx5_core]
        esw_chains_create+0x1e6/0x3b0 [mlx5_core]
        esw_create_offloads_fdb_tables.cold+0x62/0x63f [mlx5_core]
        esw_offloads_enable+0x76f/0xd20 [mlx5_core]
        mlx5_eswitch_enable_locked+0x35a/0x500 [mlx5_core]
        mlx5_devlink_eswitch_mode_set+0x561/0x950 [mlx5_core]
        devlink_nl_eswitch_set_doit+0x67/0xe0
        genl_family_rcv_msg_doit+0xe0/0x130
        genl_rcv_msg+0x188/0x290
        netlink_rcv_skb+0x4b/0xf0
        genl_rcv+0x24/0x40
        netlink_unicast+0x1ed/0x2c0
        netlink_sendmsg+0x210/0x450
        __sock_sendmsg+0x38/0x60
        __sys_sendto+0x119/0x180
        __x64_sys_sendto+0x20/0x30
        do_syscall_64+0x70/0xd00
        entry_SYSCALL_64_after_hwframe+0x4b/0x53

 -> #1 (&dmn->info.tx.mutex){+.+.}-{4:4}:
        __mutex_lock+0x91/0x1060
        mlx5dr_table_create+0x11d/0x530 [mlx5_core]
        mlx5_cmd_dr_create_flow_table+0x62/0x140 [mlx5_core]
        __mlx5_create_flow_table+0x46f/0x960 [mlx5_core]
        mlx5_create_flow_table+0x16/0x20 [mlx5_core]
        esw_create_offloads_fdb_tables+0x136/0x240 [mlx5_core]
        esw_offloads_enable+0x76f/0xd20 [mlx5_core]
        mlx5_eswitch_enable_locked+0x35a/0x500 [mlx5_core]
        mlx5_devlink_eswitch_mode_set+0x561/0x950 [mlx5_core]
        devlink_nl_eswitch_set_doit+0x67/0xe0
        genl_family_rcv_msg_doit+0xe0/0x130
        genl_rcv_msg+0x188/0x290
        netlink_rcv_skb+0x4b/0xf0
        genl_rcv+0x24/0x40
        netlink_unicast+0x1ed/0x2c0
        netlink_sendmsg+0x210/0x450
        __sock_sendmsg+0x38/0x60
        __sys_sendto+0x119/0x180
        __x64_sys_sendto+0x20/0x30
        do_syscall_64+0x70/0xd00
        entry_SYSCALL_64_after_hwframe+0x4b/0x53

 -> #0 (&dmn->info.rx.mutex){+.+.}-{4:4}:
        __lock_acquire+0x18b6/0x2eb0
        lock_acquire+0xd3/0x2c0
        __mutex_lock+0x91/0x1060
        dr_dump_start+0x131/0x450 [mlx5_core]
        seq_read_iter+0xe3/0x410
        seq_read+0xfb/0x130
        full_proxy_read+0x53/0x80
        vfs_read+0xba/0x330
        ksys_read+0x65/0xe0
        do_syscall_64+0x70/0xd00
        entry_SYSCALL_64_after_hwframe+0x4b/0x53

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(&dmn->dump_info.dbg_mutex);
                                lock(&dmn->info.tx.mutex);
                                lock(&dmn->dump_info.dbg_mutex);
   lock(&dmn->info.rx.mutex);

                   *** DEADLOCK ***

Fixes: 9222f0b ("net/mlx5: DR, Add support for dumping steering info")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260224114652.1787431-2-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
IAMHCHCH pushed a commit that referenced this pull request Apr 14, 2026
test_progs run with ASAN reported [1]:

  ==126==ERROR: LeakSanitizer: detected memory leaks

  Direct leak of 32 byte(s) in 1 object(s) allocated from:
      #0 0x7f1ff3cfa340 in calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:77
      #1 0x5610c15bb520 in bpf_program_attach_fd /codebuild/output/src685977285/src/actions-runner/_work/vmtest/vmtest/src/tools/lib/bpf/libbpf.c:13164
      #2 0x5610c15bb740 in bpf_program__attach_xdp /codebuild/output/src685977285/src/actions-runner/_work/vmtest/vmtest/src/tools/lib/bpf/libbpf.c:13204
      #3 0x5610c14f91d3 in test_xdp_flowtable /codebuild/output/src685977285/src/actions-runner/_work/vmtest/vmtest/src/tools/testing/selftests/bpf/prog_tests/xdp_flowtable.c:138
      #4 0x5610c1533566 in run_one_test /codebuild/output/src685977285/src/actions-runner/_work/vmtest/vmtest/src/tools/testing/selftests/bpf/test_progs.c:1406
      #5 0x5610c1537fb0 in main /codebuild/output/src685977285/src/actions-runner/_work/vmtest/vmtest/src/tools/testing/selftests/bpf/test_progs.c:2097
      #6 0x7f1ff25df1c9  (/lib/x86_64-linux-gnu/libc.so.6+0x2a1c9) (BuildId: 8e9fd827446c24067541ac5390e6f527fb5947bb)
      #7 0x7f1ff25df28a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2a28a) (BuildId: 8e9fd827446c24067541ac5390e6f527fb5947bb)
      #8 0x5610c0bd3180 in _start (/tmp/work/vmtest/vmtest/selftests/bpf/test_progs+0x593180) (BuildId: cdf9f103f42307dc0a2cd6cfc8afcbc1366cf8bd)

Fix by properly destroying bpf_link on exit in xdp_flowtable test.

[1] https://github.com/kernel-patches/vmtest/actions/runs/22361085418/job/64716490680

Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Reviewed-by: Subbaraya Sundeep <sbhatta@marvell.com>
Link: https://lore.kernel.org/r/20260225003351.465104-1-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
IAMHCHCH pushed a commit that referenced this pull request Apr 14, 2026
Lockdep found a bug in the event scheduling when a pinned event was
failed and wakes up the threads in the ring buffer like below.

It seems it should not grab a wait-queue lock under perf-context lock.
Let's do it with irq_work.

  [   39.913691] =============================
  [   39.914157] [ BUG: Invalid wait context ]
  [   39.914623] 6.15.0-next-20250530-next-2025053 #1 Not tainted
  [   39.915271] -----------------------------
  [   39.915731] repro/837 is trying to lock:
  [   39.916191] ffff88801acfabd8 (&event->waitq){....}-{3:3}, at: __wake_up+0x26/0x60
  [   39.917182] other info that might help us debug this:
  [   39.917761] context-{5:5}
  [   39.918079] 4 locks held by repro/837:
  [   39.918530]  #0: ffffffff8725cd00 (rcu_read_lock){....}-{1:3}, at: __perf_event_task_sched_in+0xd1/0xbc0
  [   39.919612]  #1: ffff88806ca3c6f8 (&cpuctx_lock){....}-{2:2}, at: __perf_event_task_sched_in+0x1a7/0xbc0
  [   39.920748]  #2: ffff88800d91fc18 (&ctx->lock){....}-{2:2}, at: __perf_event_task_sched_in+0x1f9/0xbc0
  [   39.921819]  #3: ffffffff8725cd00 (rcu_read_lock){....}-{1:3}, at: perf_event_wakeup+0x6c/0x470

Fixes: f4b07fd ("perf/core: Use POLLHUP for a pinned event in error")
Closes: https://lore.kernel.org/lkml/aD2w50VDvGIH95Pf@ly-workstation
Reported-by: "Lai, Yi" <yi1.lai@linux.intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: "Lai, Yi" <yi1.lai@linux.intel.com>
Link: https://patch.msgid.link/20250603045105.1731451-1-namhyung@kernel.org
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants