Describe the bug
Our application runs on EKS 1.33 with latest AL2023 x86_64 nodes. It runs a workload that attaches several EBS volumes on the fly, mounts them, runs a certain logic, and then umounts and detaches the volume. Recently, EKS team has upgraded their node images to use the kernel 6.12 branch of AL2023. We started noticing quite a lot more containers simply hung for a few hours. Running dmesg on the node itself would usually show something similar as in the attached dmesg log. In the attached log we can see that there are 3 volumes, all ext4, and 4 hung tasks, all waiting on the fadvise mutex - fluentbit (flb-pipeline), a rust-based security agent (tokio-runtime-w), and our application (entrypoint). This happens rather fast, as can be seen, the first dmesg entry starts at the 240 seconds marker (~4 minutes after the node is started).
Our application and the security agent both call fadvise with POSIX_FADV_DONTNEED for the same reason, they both open files and read them exactly once. The goal is not to fill the page cache with unnecessary data.
I'd like to mention that this behavior did NOT occur with the kernel 6.1 version of EKS nodes, and very quickly started to show afterwards, that's why that's my initial suspicion.
Thanks in advance.
Expected behavior
No hung tasks due to fadvise calls.
dmesg logs
[ 90.984422] nvme nvme2: pci function 0000:00:1f.0
[ 90.985027] nvme 0000:00:1f.0: enabling device (0000 -> 0002)
[ 90.987825] ACPI: \_SB_.LNKC: Enabled at IRQ 11
[ 90.996176] nvme nvme2: 2/0/0 default/read/poll queues
[ 91.131043] nvme2n1: p1 p14 p15
[ 94.727028] EXT4-fs (nvme1n1p1): 61 orphan inodes deleted
[ 94.727722] EXT4-fs (nvme1n1p1): recovery complete
[ 94.738069] EXT4-fs (nvme1n1p1): mounted filesystem a7e321c6-fe24-4b08-b922-b296032b6eda r/w with ordered data mode. Quota mode: none.
[ 94.928915] EXT4-fs (nvme2n1p1): 56 orphan inodes deleted
[ 94.928984] EXT4-fs (nvme1n1p1): unmounting filesystem a7e321c6-fe24-4b08-b922-b296032b6eda.
[ 94.929609] EXT4-fs (nvme2n1p1): recovery complete
[ 94.942794] FAT-fs (nvme1n1p15): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
[ 94.944075] EXT4-fs (nvme2n1p1): mounted filesystem bb952a53-3b3a-48b7-8915-4c8254aa6ace r/w with ordered data mode. Quota mode: none.
[ 95.098641] EXT4-fs (nvme2n1p1): unmounting filesystem bb952a53-3b3a-48b7-8915-4c8254aa6ace.
[ 95.112313] FAT-fs (nvme2n1p15): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
[ 95.140284] EXT4-fs (nvme1n1p1): mounted filesystem a7e321c6-fe24-4b08-b922-b296032b6eda r/w with ordered data mode. Quota mode: none.
[ 95.152469] FAT-fs (nvme1n1p15): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
[ 95.191207] EXT4-fs (nvme1n1p1): unmounting filesystem a7e321c6-fe24-4b08-b922-b296032b6eda.
[ 95.201936] FAT-fs (nvme1n1p15): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
[ 95.230268] EXT4-fs (nvme1n1p1): mounted filesystem a7e321c6-fe24-4b08-b922-b296032b6eda r/w with ordered data mode. Quota mode: none.
[ 95.240912] FAT-fs (nvme1n1p15): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
[ 95.306483] EXT4-fs (nvme2n1p1): mounted filesystem bb952a53-3b3a-48b7-8915-4c8254aa6ace r/w with ordered data mode. Quota mode: none.
[ 95.317214] FAT-fs (nvme2n1p15): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
[ 95.352621] EXT4-fs (nvme2n1p1): unmounting filesystem bb952a53-3b3a-48b7-8915-4c8254aa6ace.
[ 95.363757] FAT-fs (nvme2n1p15): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
[ 95.390572] EXT4-fs (nvme2n1p1): mounted filesystem bb952a53-3b3a-48b7-8915-4c8254aa6ace r/w with ordered data mode. Quota mode: none.
[ 95.454134] FAT-fs (nvme2n1p15): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
[ 148.603898] sh (3589): drop_caches: 2
[ 208.768362] sh (3589): drop_caches: 2
[ 243.582122] INFO: task flb-pipeline:3818 blocked for more than 122 seconds.
[ 243.583001] Tainted: P OE 6.12.37-61.105.amzn2023.x86_64 #1
[ 243.583905] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 243.584854] task:flb-pipeline state:D stack:0 pid:3818 tgid:3787 ppid:3157 flags:0x00000002
[ 243.586823] Call Trace:
[ 243.587170] <TASK>
[ 243.587477] __schedule+0x267/0x530
[ 243.587947] schedule+0x23/0xa0
[ 243.588358] schedule_preempt_disabled+0x11/0x20
[ 243.588940] __mutex_lock.constprop.0+0x31d/0x650
[ 243.589529] __lru_add_drain_all+0x32/0x1f0
[ 243.590061] generic_fadvise+0x267/0x2b0
[ 243.590562] ksys_fadvise64_64+0x4c/0xa0
[ 243.591057] __x64_sys_fadvise64+0x18/0x20
[ 243.591571] do_syscall_64+0x5b/0x170
[ 243.592079] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 243.592709] RIP: 0033:0x7fe995cf869e
[ 243.593177] RSP: 002b:00007fe9951fa728 EFLAGS: 00000246 ORIG_RAX: 00000000000000dd
[ 243.594104] RAX: ffffffffffffffda RBX: 00007fe9942f2540 RCX: 00007fe995cf869e
[ 243.594972] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000024
[ 243.595849] RBP: 0000000000007fff R08: 0000000000000000 R09: 0000000000000000
[ 243.596717] R10: 0000000000000004 R11: 0000000000000246 R12: 00007fe994242900
[ 243.597585] R13: 00007fe995698000 R14: 0000000000000001 R15: 00007fe99562a000
[ 243.598458] </TASK>
[ 243.598770] INFO: task tokio-runtime-w:4534 blocked for more than 122 seconds.
[ 243.599647] Tainted: P OE 6.12.37-61.105.amzn2023.x86_64 #1
[ 243.600541] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 243.601475] task:tokio-runtime-w state:D stack:0 pid:4534 tgid:4495 ppid:3677 flags:0x00000002
[ 243.602613] Call Trace:
[ 243.602940] <TASK>
[ 243.603230] __schedule+0x267/0x530
[ 243.603684] schedule+0x23/0xa0
[ 243.604097] schedule_preempt_disabled+0x11/0x20
[ 243.604672] __mutex_lock.constprop.0+0x31d/0x650
[ 243.605263] __lru_add_drain_all+0x32/0x1f0
[ 243.605791] generic_fadvise+0x267/0x2b0
[ 243.606323] ksys_fadvise64_64+0x4c/0xa0
[ 243.606817] __x64_sys_fadvise64+0x18/0x20
[ 243.607331] do_syscall_64+0x5b/0x170
[ 243.607798] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 243.608425] RIP: 0033:0x7fc0e1f4c9da
[ 243.608889] RSP: 002b:00007fc0dbdfd998 EFLAGS: 00000246 ORIG_RAX: 00000000000000dd
[ 243.609803] RAX: ffffffffffffffda RBX: 00007fc0dbdfdb10 RCX: 00007fc0e1f4c9da
[ 243.610676] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000202
[ 243.611544] RBP: 00007fc0dbdfd9a0 R08: 00000000000000c8 R09: 0000000000000004
[ 243.612494] R10: 0000000000000004 R11: 0000000000000246 R12: 8000000000000000
[ 243.613640] R13: 0000000000000028 R14: 0000000000000001 R15: 0000000000000040
[ 243.617328] </TASK>
[ 243.617732] INFO: task tokio-runtime-w:4558 blocked for more than 122 seconds.
[ 243.618838] Tainted: P OE 6.12.37-61.105.amzn2023.x86_64 #1
[ 243.619889] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 243.622887] task:tokio-runtime-w state:D stack:0 pid:4558 tgid:4495 ppid:3677 flags:0x00000002
[ 243.624316] Call Trace:
[ 243.624735] <TASK>
[ 243.625108] __schedule+0x267/0x530
[ 243.625661] schedule+0x23/0xa0
[ 243.627058] schedule_preempt_disabled+0x11/0x20
[ 243.627791] __mutex_lock.constprop.0+0x31d/0x650
[ 243.628530] __lru_add_drain_all+0x32/0x1f0
[ 243.630824] generic_fadvise+0x267/0x2b0
[ 243.631462] ? ovl_path_realdata+0x76/0x90 [overlay]
[ 243.632447] ? ovl_real_fdget_meta+0xc0/0x1b0 [overlay]
[ 243.633277] ovl_fadvise+0x97/0xc0 [overlay]
[ 243.633988] ksys_fadvise64_64+0x4c/0xa0
[ 243.634658] __x64_sys_fadvise64+0x18/0x20
[ 243.635312] do_syscall_64+0x5b/0x170
[ 243.635906] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 243.636690] RIP: 0033:0x7fc0e1f4c9da
[ 243.637276] RSP: 002b:00007fc0d24ee998 EFLAGS: 00000246 ORIG_RAX: 00000000000000dd
[ 243.638429] RAX: ffffffffffffffda RBX: 00007fc0d24eeb10 RCX: 00007fc0e1f4c9da
[ 243.639491] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000214
[ 243.640557] RBP: 00007fc0d24ee9a0 R08: 00000000000000c8 R09: 0000000000000004
[ 243.641644] R10: 0000000000000004 R11: 0000000000000246 R12: 8000000000000000
[ 243.642787] R13: 0000000000000028 R14: 0000000000000004 R15: 0000000000000040
[ 243.643846] </TASK>
[ 243.644223] INFO: task tokio-runtime-w:4560 blocked for more than 122 seconds.
[ 243.645324] Tainted: P OE 6.12.37-61.105.amzn2023.x86_64 #1
[ 243.646432] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 243.647594] task:tokio-runtime-w state:D stack:0 pid:4560 tgid:4495 ppid:3677 flags:0x00000002
[ 243.648987] Call Trace:
[ 243.649395] <TASK>
[ 243.649759] __schedule+0x267/0x530
[ 243.650341] schedule+0x23/0xa0
[ 243.650848] schedule_preempt_disabled+0x11/0x20
[ 243.651567] __mutex_lock.constprop.0+0x31d/0x650
[ 243.652356] __lru_add_drain_all+0x32/0x1f0
[ 243.653029] generic_fadvise+0x267/0x2b0
[ 243.653623] ? ovl_path_realdata+0x76/0x90 [overlay]
[ 243.654411] ? ovl_real_fdget_meta+0xc0/0x1b0 [overlay]
[ 243.655222] ovl_fadvise+0x97/0xc0 [overlay]
[ 243.655904] ksys_fadvise64_64+0x4c/0xa0
[ 243.656529] __x64_sys_fadvise64+0x18/0x20
[ 243.657165] do_syscall_64+0x5b/0x170
[ 243.657742] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 243.658514] RIP: 0033:0x7fc0e1f4c9da
[ 243.659097] RSP: 002b:00007fc0d20ec998 EFLAGS: 00000246 ORIG_RAX: 00000000000000dd
[ 243.660211] RAX: ffffffffffffffda RBX: 00007fc0d20ecb10 RCX: 00007fc0e1f4c9da
[ 243.661284] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000207
[ 243.662407] RBP: 00007fc0d20ec9a0 R08: 00000000000000c8 R09: 0000000000000006
[ 243.663463] R10: 0000000000000004 R11: 0000000000000246 R12: 8000000000000000
[ 243.664533] R13: 0000000000000028 R14: 0000000000000003 R15: 0000000000000040
[ 243.665623] </TASK>
[ 268.840461] sh (3589): drop_caches: 2
[ 328.982835] sh (3589): drop_caches: 2
[ 366.462099] INFO: task flb-pipeline:3818 blocked for more than 245 seconds.
[ 366.462968] Tainted: P OE 6.12.37-61.105.amzn2023.x86_64 #1
[ 366.463856] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 366.464786] task:flb-pipeline state:D stack:0 pid:3818 tgid:3787 ppid:3157 flags:0x00000002
[ 366.465886] Call Trace:
[ 366.466217] <TASK>
[ 366.466505] __schedule+0x267/0x530
[ 366.466949] schedule+0x23/0xa0
[ 366.467490] schedule_preempt_disabled+0x11/0x20
[ 366.468054] __mutex_lock.constprop.0+0x31d/0x650
[ 366.468633] __lru_add_drain_all+0x32/0x1f0
[ 366.469145] generic_fadvise+0x267/0x2b0
[ 366.469631] ksys_fadvise64_64+0x4c/0xa0
[ 366.470131] __x64_sys_fadvise64+0x18/0x20
[ 366.470633] do_syscall_64+0x5b/0x170
[ 366.471111] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 366.471733] RIP: 0033:0x7fe995cf869e
[ 366.472246] RSP: 002b:00007fe9951fa728 EFLAGS: 00000246 ORIG_RAX: 00000000000000dd
[ 366.473139] RAX: ffffffffffffffda RBX: 00007fe9942f2540 RCX: 00007fe995cf869e
[ 366.473989] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000024
[ 366.474846] RBP: 0000000000007fff R08: 0000000000000000 R09: 0000000000000000
[ 366.475701] R10: 0000000000000004 R11: 0000000000000246 R12: 00007fe994242900
[ 366.476552] R13: 00007fe995698000 R14: 0000000000000001 R15: 00007fe99562a000
[ 366.477405] </TASK>
[ 366.477711] INFO: task tokio-runtime-w:4534 blocked for more than 245 seconds.
[ 366.478576] Tainted: P OE 6.12.37-61.105.amzn2023.x86_64 #1
[ 366.479451] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 366.480368] task:tokio-runtime-w state:D stack:0 pid:4534 tgid:4495 ppid:3677 flags:0x00000002
[ 366.481462] Call Trace:
[ 366.481782] <TASK>
[ 366.482113] __schedule+0x267/0x530
[ 366.482553] schedule+0x23/0xa0
[ 366.482955] schedule_preempt_disabled+0x11/0x20
[ 366.483518] __mutex_lock.constprop.0+0x31d/0x650
[ 366.484094] __lru_add_drain_all+0x32/0x1f0
[ 366.484605] generic_fadvise+0x267/0x2b0
[ 366.485096] ksys_fadvise64_64+0x4c/0xa0
[ 366.485584] __x64_sys_fadvise64+0x18/0x20
[ 366.486095] do_syscall_64+0x5b/0x170
[ 366.486554] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 366.487171] RIP: 0033:0x7fc0e1f4c9da
[ 366.487624] RSP: 002b:00007fc0dbdfd998 EFLAGS: 00000246 ORIG_RAX: 00000000000000dd
[ 366.488530] RAX: ffffffffffffffda RBX: 00007fc0dbdfdb10 RCX: 00007fc0e1f4c9da
[ 366.489382] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000202
[ 366.490236] RBP: 00007fc0dbdfd9a0 R08: 00000000000000c8 R09: 0000000000000004
[ 366.491088] R10: 0000000000000004 R11: 0000000000000246 R12: 8000000000000000
[ 366.491968] R13: 0000000000000028 R14: 0000000000000001 R15: 0000000000000040
[ 366.492821] </TASK>
[ 366.493117] INFO: task tokio-runtime-w:4558 blocked for more than 245 seconds.
[ 366.493979] Tainted: P OE 6.12.37-61.105.amzn2023.x86_64 #1
[ 366.494862] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 366.495789] task:tokio-runtime-w state:D stack:0 pid:4558 tgid:4495 ppid:3677 flags:0x00000002
[ 366.496878] Call Trace:
[ 366.497198] <TASK>
[ 366.497480] __schedule+0x267/0x530
[ 366.497921] schedule+0x23/0xa0
[ 366.498328] schedule_preempt_disabled+0x11/0x20
[ 366.498901] __mutex_lock.constprop.0+0x31d/0x650
[ 366.499481] __lru_add_drain_all+0x32/0x1f0
[ 366.499995] generic_fadvise+0x267/0x2b0
[ 366.500482] ? ovl_path_realdata+0x76/0x90 [overlay]
[ 366.501097] ? ovl_real_fdget_meta+0xc0/0x1b0 [overlay]
[ 366.501737] ovl_fadvise+0x97/0xc0 [overlay]
[ 366.502313] ksys_fadvise64_64+0x4c/0xa0
[ 366.502799] __x64_sys_fadvise64+0x18/0x20
[ 366.503304] do_syscall_64+0x5b/0x170
[ 366.503775] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 366.504390] RIP: 0033:0x7fc0e1f4c9da
[ 366.504842] RSP: 002b:00007fc0d24ee998 EFLAGS: 00000246 ORIG_RAX: 00000000000000dd
[ 366.505734] RAX: ffffffffffffffda RBX: 00007fc0d24eeb10 RCX: 00007fc0e1f4c9da
[ 366.506602] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000214
[ 366.507455] RBP: 00007fc0d24ee9a0 R08: 00000000000000c8 R09: 0000000000000004
[ 366.508308] R10: 0000000000000004 R11: 0000000000000246 R12: 8000000000000000
[ 366.509163] R13: 0000000000000028 R14: 0000000000000004 R15: 0000000000000040
[ 366.510020] </TASK>
[ 366.510314] INFO: task tokio-runtime-w:4560 blocked for more than 245 seconds.
[ 366.511175] Tainted: P OE 6.12.37-61.105.amzn2023.x86_64 #1
[ 366.512087] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 366.513015] task:tokio-runtime-w state:D stack:0 pid:4560 tgid:4495 ppid:3677 flags:0x00000002
[ 366.514107] Call Trace:
[ 366.514427] <TASK>
[ 366.514708] __schedule+0x267/0x530
[ 366.515151] schedule+0x23/0xa0
[ 366.515555] schedule_preempt_disabled+0x11/0x20
[ 366.516124] __mutex_lock.constprop.0+0x31d/0x650
[ 366.516705] __lru_add_drain_all+0x32/0x1f0
[ 366.517221] generic_fadvise+0x267/0x2b0
[ 366.517710] ? ovl_path_realdata+0x76/0x90 [overlay]
[ 366.518324] ? ovl_real_fdget_meta+0xc0/0x1b0 [overlay]
[ 366.518958] ovl_fadvise+0x97/0xc0 [overlay]
[ 366.519497] ksys_fadvise64_64+0x4c/0xa0
[ 366.519984] __x64_sys_fadvise64+0x18/0x20
[ 366.520491] do_syscall_64+0x5b/0x170
[ 366.520954] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 366.521569] RIP: 0033:0x7fc0e1f4c9da
[ 366.522098] RSP: 002b:00007fc0d20ec998 EFLAGS: 00000246 ORIG_RAX: 00000000000000dd
[ 366.522997] RAX: ffffffffffffffda RBX: 00007fc0d20ecb10 RCX: 00007fc0e1f4c9da
[ 366.523844] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000207
[ 366.524693] RBP: 00007fc0d20ec9a0 R08: 00000000000000c8 R09: 0000000000000006
[ 366.525547] R10: 0000000000000004 R11: 0000000000000246 R12: 8000000000000000
[ 366.526403] R13: 0000000000000028 R14: 0000000000000003 R15: 0000000000000040
[ 366.527257] </TASK>
[ 366.527555] INFO: task entrypoint:5601 blocked for more than 122 seconds.
[ 366.528361] Tainted: P OE 6.12.37-61.105.amzn2023.x86_64 #1
[ 366.529240] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 366.530168] task:entrypoint state:D stack:0 pid:5601 tgid:5565 ppid:4954 flags:0x00000002
[ 366.531258] Call Trace:
[ 366.531578] <TASK>
[ 366.531862] __schedule+0x267/0x530
[ 366.532335] schedule+0x23/0xa0
[ 366.532737] schedule_preempt_disabled+0x11/0x20
[ 366.533306] __mutex_lock.constprop.0+0x31d/0x650
[ 366.533886] ? __pfx_lru_add+0x10/0x10
[ 366.534358] __lru_add_drain_all+0x32/0x1f0
[ 366.534872] generic_fadvise+0x267/0x2b0
[ 366.535359] ksys_fadvise64_64+0x88/0xa0
[ 366.535852] __x64_sys_fadvise64+0x18/0x20
[ 366.536358] do_syscall_64+0x5b/0x170
[ 366.536817] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 366.537429] RIP: 0033:0x4ca53ae
[ 366.537832] RSP: 002b:000000c0138255a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000dd
[ 366.538731] RAX: ffffffffffffffda RBX: 0000000000000059 RCX: 0000000004ca53ae
[ 366.539583] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000059
[ 366.540435] RBP: 000000c0138255e8 R08: 0000000000000000 R09: 0000000000000000
[ 366.541286] R10: 0000000000000004 R11: 0000000000000202 R12: 000000c0138257f0
[ 366.542168] R13: 000000c00f9b0840 R14: 000000c001ab9c00 R15: 0000000000000000
[ 366.543019] </TASK>
[ 366.543315] INFO: task entrypoint:5621 blocked for more than 122 seconds.
[ 366.544126] Tainted: P OE 6.12.37-61.105.amzn2023.x86_64 #1
[ 366.545007] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 366.545934] task:entrypoint state:D stack:0 pid:5621 tgid:5565 ppid:4954 flags:0x00000002
[ 366.547049] Call Trace:
[ 366.547371] <TASK>
[ 366.547654] __schedule+0x267/0x530
[ 366.548096] schedule+0x23/0xa0
[ 366.548501] schedule_preempt_disabled+0x11/0x20
[ 366.549071] __mutex_lock.constprop.0+0x31d/0x650
[ 366.549649] __lru_add_drain_all+0x32/0x1f0
[ 366.550171] generic_fadvise+0x267/0x2b0
[ 366.550659] ? do_futex+0xbe/0x1d0
[ 366.551092] ksys_fadvise64_64+0x88/0xa0
[ 366.551575] __x64_sys_fadvise64+0x18/0x20
[ 366.552100] do_syscall_64+0x5b/0x170
[ 366.552560] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 366.553175] RIP: 0033:0x4ca53ae
[ 366.553577] RSP: 002b:000000c013c5d5a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000dd
[ 366.554489] RAX: ffffffffffffffda RBX: 0000000000000234 RCX: 0000000004ca53ae
[ 366.555341] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000234
[ 366.556195] RBP: 000000c013c5d5e8 R08: 0000000000000000 R09: 0000000000000000
[ 366.557049] R10: 0000000000000004 R11: 0000000000000202 R12: 000000c013c5d7f0
[ 366.557901] R13: ffffffffffffffff R14: 000000c001ab8e00 R15: 0000000000000000
[ 366.558760] </TASK>
[ 366.559053] Future hung task reports are suppressed, see sysctl kernel.hung_task_warnings
Describe the bug
Our application runs on EKS 1.33 with latest AL2023 x86_64 nodes. It runs a workload that attaches several EBS volumes on the fly, mounts them, runs a certain logic, and then umounts and detaches the volume. Recently, EKS team has upgraded their node images to use the kernel 6.12 branch of AL2023. We started noticing quite a lot more containers simply hung for a few hours. Running
dmesgon the node itself would usually show something similar as in the attached dmesg log. In the attached log we can see that there are 3 volumes, all ext4, and 4 hung tasks, all waiting on the fadvise mutex - fluentbit (flb-pipeline), a rust-based security agent (tokio-runtime-w), and our application (entrypoint). This happens rather fast, as can be seen, the first dmesg entry starts at the 240 seconds marker (~4 minutes after the node is started).Our application and the security agent both call fadvise with POSIX_FADV_DONTNEED for the same reason, they both open files and read them exactly once. The goal is not to fill the page cache with unnecessary data.
I'd like to mention that this behavior did NOT occur with the kernel 6.1 version of EKS nodes, and very quickly started to show afterwards, that's why that's my initial suspicion.
Thanks in advance.
Expected behavior
No hung tasks due to fadvise calls.
dmesg logs