v6.4.9 and v6.4.10 causes coredump with lightdm-webkit2-greeter #85

CodingCellist · 2023-08-22T16:39:09Z

Description

When using linux-hardened version 6.4.9 or 6.4.10 with lightdm and its webkit2-based greeter, the greeter will continually coredump and attempt to restart, rendering login impossible (tty-switching is not possible as each restart returns the user to the greeter's tty). The culprit seems to be a removed/missing part of libatomic: host-config.h (see attached logs and coredump).

As far as I can tell, downgrading linux-hardened to version 6.4.7 is the only fix; downgrading the other packages involved did not resolve the issue.

Logs + coredump

(note: I believe I cut the logs to the relevant parts, please let me know if not.)

coredump-lightdm-webkit2-greeter.txt
lhardened-6.4.7-works-lightdm-webkit2-greeter.log
lhardened-6.4.9-breaks-lightdm-webkit2-greeter.log
lhardened-6.4.10-breaks-lightdm-webkit2-greeter.log

I'm happy to provide more info if need be : )

System

Distro: Arch Linux
lightdm: 1:1.32.0-4
lightdm-webkit2-greeter: 2.2.5-7
webkit2gtk: 2.40.5-1
systemd: 254.1-1

The text was updated successfully, but these errors were encountered:

CodingCellist · 2023-08-22T17:07:53Z

(cross-posted to Arch's bugtracker: https://bugs.archlinux.org/task/79444)

ghost · 2023-08-26T16:00:44Z

I think a similar issue goes for me too. When I tried to launch a Firefox-based browser, it fails with invalid instruction errors. I have tried debugging the browser and find that the SIMD instructions are failing somehow. I'll provide more info if it's requested.

ShellCode33 · 2023-10-23T23:12:20Z

It seems that I'm also impacted by this, but using npm :

Core was generated by `npm search test              '.
Program terminated with signal SIGILL, Illegal instruction.
#0  0x0000605eb83d7152 in base64_stream_encode_avx2 () from /usr/bin/../lib/libnode.so.115
[Current thread is 1 (Thread 0x605eb6d01f40 (LWP 5765))]
(gdb) bt
#0  0x0000605eb83d7152 in base64_stream_encode_avx2 () from /usr/bin/../lib/libnode.so.115
#1  0x0000605eb81d3e26 in base64_encode () from /usr/bin/../lib/libnode.so.115
#2  0x0000605eb80eb234 in node::StringBytes::Encode(v8::Isolate*, char const*, unsigned long, node::encoding, v8::Local<v8::Value>*) () from /usr/bin/../lib/libnode.so.115
#3  0x0000605eb8186aac in node::crypto::Hash::HashDigest(v8::FunctionCallbackInfo<v8::Value> const&) () from /usr/bin/../lib/libnode.so.115

As the name of the function implies, it seems to be AVX related as well.

dmesg outputs the following:

[ 1157.618115] traps: npm search test[5765] trap invalid opcode ip:605eb83d7152 sp:73a0aeeec658 error:0 in libnode.so.115[605eb7c1c000+2111000]

$ uname -r
6.5.8-hardened1-1-hardened

hardfalcon · 2023-11-03T07:08:43Z

Potentially related: https://github.com/google/security-research/tree/master/pocs/cpus/xgetbv

hardfalcon · 2023-11-05T09:51:35Z

Found the culprit: aklomp/base64#121

CodingCellist · 2023-11-06T08:23:44Z

@hardfalcon, in case it's still relevant, here's my CPU info:

Vendor ID:               GenuineIntel
  Model name:            Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
    CPU family:          6
    Model:               158
[...]
Vulnerabilities:
  Gather data sampling:  Vulnerable: No microcode

hardfalcon · 2023-11-06T11:33:56Z

@CodingCellist:

  Model name:            Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz

I have a laptop with the exact same CPU model, and the gather data sampling vulnerability can be mitigated on that CPU by a microcode update, whilst still leaving AVX/AVX2 support enabled. My advice would be the following two things:

check for a BIOS update from your laptop vendor and install it if available,
install the intel-ucode package, reconfigure your boot loader to additionally load the intel-ucode.img initrd before any other initrd images, and reboot (do this even if your laptop vendor still provides BIOS updates, because hardware vendors have a tendency to be sluggish with integrating microcode updates and other security fixes into their BIOS and releasing BIOS updates)

With a current microcode update version 0xf4, your CPU can do AVX and AVX2 whilst still mitigating the GDS vulnerability.

AFAIK, basically the only CPUs that are affected by the GDS vulnerability but did not receive microcode updates from Intel were some (but not all!) Skylake CPUs. For example, I have an affected machine with a Xeon E3-1270 v5 for which Intel can't be bothered to release microcode updates anymore. A list of CPUs affected by the GDS vulnerability can be found here, and a list of CPUs for which Intel doesn't provide any support anymore can be found here.

CodingCellist · 2023-11-10T16:39:33Z

@hardfalcon I considered a microcode update, but didn't know if it was worth keeping the laptop in its current state to be able to test patches? IIRC the BIOS+µcode update for my laptop (somewhat understandably) locks the version to that one and newer once applied. Then again, who knows how long it'll take before those land...

hardfalcon · 2023-11-10T17:14:15Z

The microcode updates from the intel-ucode and amd-ucode packages are not permanent, but are only loaded into the CPU at each boot of the system, so you could easily revert them at any time by simply disabling/uninstalling them and then rebooting the system.

Although I haven't tested this, I'd assume that you could also downgrade to an older microcode version than the one from your BIOS if you manually built an initrd image containing the older microcode version that you want (but I'd of course advise against doing this because it needlessly leaves security vulnerabilities unpatched).

Also, if you are worried against any performance losses incurred by microcode updates, you'd probably still be able to circumvent those by booting with mitigations=off set as kernel parameter in the bootloader (albeit at the expense of security). I'd guess that booting a current kernel with an outdated microcode version would probably cost you more performance because that would cause the kernel to use less efficient mitigations than would be available if you used an up-to-date microcode version.

ShellCode33 · 2023-11-17T00:14:01Z

It seems the fix is now available ! Just upgrade your system and everything should work fine. Thanks a lot @hardfalcon for taking care of it.
I guess this GitHub issue can be closed since it is both fixed and not directly related to linux-hardened

CodingCellist · 2023-11-28T13:04:34Z

It seems the fix is now available ! Just upgrade your system and everything should work fine. Thanks a lot hardfalcon for taking care of it. I guess this GitHub issue can be closed since it is both fixed and not directly related to linux-hardened

Seems to still repro on my system (kernel 6.5.12-hardened1-1-hardened with lightdm-webkit2-greeter 2.2.5-7). But that will (afaiu) be the case until the relevant WebKit bug (WebKit#262100) is fixed.

In any case, closing this since it's not related to linux-hardened is fine by me : )

commit 745f17a upstream. We got a WARNING in ext4_add_complete_io: ================================================================== WARNING: at fs/ext4/page-io.c:231 ext4_put_io_end_defer+0x182/0x250 CPU: 10 PID: 77 Comm: ksoftirqd/10 Tainted: 6.3.0-rc2 #85 RIP: 0010:ext4_put_io_end_defer+0x182/0x250 [ext4] [...] Call Trace: <TASK> ext4_end_bio+0xa8/0x240 [ext4] bio_endio+0x195/0x310 blk_update_request+0x184/0x770 scsi_end_request+0x2f/0x240 scsi_io_completion+0x75/0x450 scsi_finish_command+0xef/0x160 scsi_complete+0xa3/0x180 blk_complete_reqs+0x60/0x80 blk_done_softirq+0x25/0x40 __do_softirq+0x119/0x4c8 run_ksoftirqd+0x42/0x70 smpboot_thread_fn+0x136/0x3c0 kthread+0x140/0x1a0 ret_from_fork+0x2c/0x50 ================================================================== Above issue may happen as follows: cpu1 cpu2 ----------------------------|---------------------------- mount -o dioread_lock ext4_writepages ext4_do_writepages *if (ext4_should_dioread_nolock(inode))* // rsv_blocks is not assigned here mount -o remount,dioread_nolock ext4_journal_start_with_reserve __ext4_journal_start __ext4_journal_start_sb jbd2__journal_start *if (rsv_blocks)* // h_rsv_handle is not initialized here mpage_map_and_submit_extent mpage_map_one_extent dioread_nolock = ext4_should_dioread_nolock(inode) if (dioread_nolock && (map->m_flags & EXT4_MAP_UNWRITTEN)) mpd->io_submit.io_end->handle = handle->h_rsv_handle ext4_set_io_unwritten_flag io_end->flag |= EXT4_IO_END_UNWRITTEN // now io_end->handle is NULL but has EXT4_IO_END_UNWRITTEN flag scsi_finish_command scsi_io_completion scsi_io_completion_action scsi_end_request blk_update_request req_bio_endio bio_endio bio->bi_end_io > ext4_end_bio ext4_put_io_end_defer ext4_add_complete_io // trigger WARN_ON(!io_end->handle && sbi->s_journal); The immediate cause of this problem is that ext4_should_dioread_nolock() function returns inconsistent values in the ext4_do_writepages() and mpage_map_one_extent(). There are four conditions in this function that can be changed at mount time to cause this problem. These four conditions can be divided into two categories: (1) journal_data and EXT4_EXTENTS_FL, which can be changed by ioctl (2) DELALLOC and DIOREAD_NOLOCK, which can be changed by remount The two in the first category have been fixed by commit c8585c6 ("ext4: fix races between changing inode journal mode and ext4_writepages") and commit cb85f4d ("ext4: fix race between writepages and enabling EXT4_EXTENTS_FL") respectively. Two cases in the other category have not yet been fixed, and the above issue is caused by this situation. We refer to the fix for the first category, when applying options during remount, we grab s_writepages_rwsem to avoid racing with writepages ops to trigger this problem. Fixes: 6b523df ("ext4: use transaction reservation for extent conversion in ext4_end_io") Cc: stable@vger.kernel.org Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230524072538.2883391-1-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

anthraxx · 2024-01-30T17:02:45Z

Thank you all for the contribution and investigation into downstream issues related to these hardening. I'm closing it as being deferred, and users searching for such problems should still find this issue for further information. cheers.

commit 2acc59d upstream. When I was testing mongodb over bcachefs with compression, there is a lockdep warning when snapshotting mongodb data volume. $ cat test.sh prog=bcachefs $prog subvolume create /mnt/data $prog subvolume create /mnt/data/snapshots while true;do $prog subvolume snapshot /mnt/data /mnt/data/snapshots/$(date +%s) sleep 1s done $ cat /etc/mongodb.conf systemLog: destination: file logAppend: true path: /mnt/data/mongod.log storage: dbPath: /mnt/data/ lockdep reports: [ 3437.452330] ====================================================== [ 3437.452750] WARNING: possible circular locking dependency detected [ 3437.453168] 6.7.0-rc7-custom+ #85 Tainted: G E [ 3437.453562] ------------------------------------------------------ [ 3437.453981] bcachefs/35533 is trying to acquire lock: [ 3437.454325] ffffa0a02b2b1418 (sb_writers#10){.+.+}-{0:0}, at: filename_create+0x62/0x190 [ 3437.454875] but task is already holding lock: [ 3437.455268] ffffa0a02b2b10e0 (&type->s_umount_key#48){.+.+}-{3:3}, at: bch2_fs_file_ioctl+0x232/0xc90 [bcachefs] [ 3437.456009] which lock already depends on the new lock. [ 3437.456553] the existing dependency chain (in reverse order) is: [ 3437.457054] -> #3 (&type->s_umount_key#48){.+.+}-{3:3}: [ 3437.457507] down_read+0x3e/0x170 [ 3437.457772] bch2_fs_file_ioctl+0x232/0xc90 [bcachefs] [ 3437.458206] __x64_sys_ioctl+0x93/0xd0 [ 3437.458498] do_syscall_64+0x42/0xf0 [ 3437.458779] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 3437.459155] -> #2 (&c->snapshot_create_lock){++++}-{3:3}: [ 3437.459615] down_read+0x3e/0x170 [ 3437.459878] bch2_truncate+0x82/0x110 [bcachefs] [ 3437.460276] bchfs_truncate+0x254/0x3c0 [bcachefs] [ 3437.460686] notify_change+0x1f1/0x4a0 [ 3437.461283] do_truncate+0x7f/0xd0 [ 3437.461555] path_openat+0xa57/0xce0 [ 3437.461836] do_filp_open+0xb4/0x160 [ 3437.462116] do_sys_openat2+0x91/0xc0 [ 3437.462402] __x64_sys_openat+0x53/0xa0 [ 3437.462701] do_syscall_64+0x42/0xf0 [ 3437.462982] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 3437.463359] -> #1 (&sb->s_type->i_mutex_key#15){+.+.}-{3:3}: [ 3437.463843] down_write+0x3b/0xc0 [ 3437.464223] bch2_write_iter+0x5b/0xcc0 [bcachefs] [ 3437.464493] vfs_write+0x21b/0x4c0 [ 3437.464653] ksys_write+0x69/0xf0 [ 3437.464839] do_syscall_64+0x42/0xf0 [ 3437.465009] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 3437.465231] -> #0 (sb_writers#10){.+.+}-{0:0}: [ 3437.465471] __lock_acquire+0x1455/0x21b0 [ 3437.465656] lock_acquire+0xc6/0x2b0 [ 3437.465822] mnt_want_write+0x46/0x1a0 [ 3437.465996] filename_create+0x62/0x190 [ 3437.466175] user_path_create+0x2d/0x50 [ 3437.466352] bch2_fs_file_ioctl+0x2ec/0xc90 [bcachefs] [ 3437.466617] __x64_sys_ioctl+0x93/0xd0 [ 3437.466791] do_syscall_64+0x42/0xf0 [ 3437.466957] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 3437.467180] other info that might help us debug this: [ 3437.469670] 2 locks held by bcachefs/35533: other info that might help us debug this: [ 3437.467507] Chain exists of: sb_writers#10 --> &c->snapshot_create_lock --> &type->s_umount_key#48 [ 3437.467979] Possible unsafe locking scenario: [ 3437.468223] CPU0 CPU1 [ 3437.468405] ---- ---- [ 3437.468585] rlock(&type->s_umount_key#48); [ 3437.468758] lock(&c->snapshot_create_lock); [ 3437.469030] lock(&type->s_umount_key#48); [ 3437.469291] rlock(sb_writers#10); [ 3437.469434] *** DEADLOCK *** [ 3437.469670] 2 locks held by bcachefs/35533: [ 3437.469838] #0: ffffa0a02ce00a88 (&c->snapshot_create_lock){++++}-{3:3}, at: bch2_fs_file_ioctl+0x1e3/0xc90 [bcachefs] [ 3437.470294] #1: ffffa0a02b2b10e0 (&type->s_umount_key#48){.+.+}-{3:3}, at: bch2_fs_file_ioctl+0x232/0xc90 [bcachefs] [ 3437.470744] stack backtrace: [ 3437.470922] CPU: 7 PID: 35533 Comm: bcachefs Kdump: loaded Tainted: G E 6.7.0-rc7-custom+ #85 [ 3437.471313] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014 [ 3437.471694] Call Trace: [ 3437.471795] <TASK> [ 3437.471884] dump_stack_lvl+0x57/0x90 [ 3437.472035] check_noncircular+0x132/0x150 [ 3437.472202] __lock_acquire+0x1455/0x21b0 [ 3437.472369] lock_acquire+0xc6/0x2b0 [ 3437.472518] ? filename_create+0x62/0x190 [ 3437.472683] ? lock_is_held_type+0x97/0x110 [ 3437.472856] mnt_want_write+0x46/0x1a0 [ 3437.473025] ? filename_create+0x62/0x190 [ 3437.473204] filename_create+0x62/0x190 [ 3437.473380] user_path_create+0x2d/0x50 [ 3437.473555] bch2_fs_file_ioctl+0x2ec/0xc90 [bcachefs] [ 3437.473819] ? lock_acquire+0xc6/0x2b0 [ 3437.474002] ? __fget_files+0x2a/0x190 [ 3437.474195] ? __fget_files+0xbc/0x190 [ 3437.474380] ? lock_release+0xc5/0x270 [ 3437.474567] ? __x64_sys_ioctl+0x93/0xd0 [ 3437.474764] ? __pfx_bch2_fs_file_ioctl+0x10/0x10 [bcachefs] [ 3437.475090] __x64_sys_ioctl+0x93/0xd0 [ 3437.475277] do_syscall_64+0x42/0xf0 [ 3437.475454] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 3437.475691] RIP: 0033:0x7f2743c313af ====================================================== In __bch2_ioctl_subvolume_create(), we grab s_umount unconditionally and unlock it at the end of the function. There is a comment "why do we need this lock?" about the lock coming from commit 42d2373 ("bcachefs: Snapshot creation, deletion") The reason is that __bch2_ioctl_subvolume_create() calls sync_inodes_sb() which enforce locked s_umount to writeback all dirty nodes before doing snapshot works. Fix it by read locking s_umount for snapshotting only and unlocking s_umount after sync_inodes_sb(). Signed-off-by: Su Yue <glass.su@suse.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

hardfalcon mentioned this issue Nov 5, 2023

Fix AVX detection aklomp/base64#121

Merged

hardfalcon mentioned this issue Nov 5, 2023

"Illegal instruction" crash when doing base64 on x86_64 machines with AVX(2) support but "gather data sampling" mitigations enabled nodejs/node#50561

Closed

anthraxx closed this as completed Jan 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v6.4.9 and v6.4.10 causes coredump with lightdm-webkit2-greeter #85

v6.4.9 and v6.4.10 causes coredump with lightdm-webkit2-greeter #85

CodingCellist commented Aug 22, 2023

CodingCellist commented Aug 22, 2023 •

edited

ghost commented Aug 26, 2023

ShellCode33 commented Oct 23, 2023 •

edited

hardfalcon commented Nov 3, 2023

hardfalcon commented Nov 5, 2023

CodingCellist commented Nov 6, 2023

hardfalcon commented Nov 6, 2023 •

edited

CodingCellist commented Nov 10, 2023

hardfalcon commented Nov 10, 2023

ShellCode33 commented Nov 17, 2023

CodingCellist commented Nov 28, 2023

anthraxx commented Jan 30, 2024

v6.4.9 and v6.4.10 causes coredump with lightdm-webkit2-greeter #85

v6.4.9 and v6.4.10 causes coredump with lightdm-webkit2-greeter #85

Comments

CodingCellist commented Aug 22, 2023

Description

Logs + coredump

System

CodingCellist commented Aug 22, 2023 • edited

ghost commented Aug 26, 2023

ShellCode33 commented Oct 23, 2023 • edited

hardfalcon commented Nov 3, 2023

hardfalcon commented Nov 5, 2023

CodingCellist commented Nov 6, 2023

hardfalcon commented Nov 6, 2023 • edited

CodingCellist commented Nov 10, 2023

hardfalcon commented Nov 10, 2023

ShellCode33 commented Nov 17, 2023

CodingCellist commented Nov 28, 2023

anthraxx commented Jan 30, 2024

CodingCellist commented Aug 22, 2023 •

edited

ShellCode33 commented Oct 23, 2023 •

edited

hardfalcon commented Nov 6, 2023 •

edited