New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
System "reboot" issue by using "sysrq" #2
Comments
Hi Tido, I do appreciate your help. |
Hi sghazagh, I also thought I have seen that name before - same to you, thank you. |
Fix on 15a5310. |
commit 50a0cb565246f20d59cdb161778531e4b19d35ac upstream. Starting with this commit 35eb8b81edd4 ("x86/efi: Build our own page table structures") efi regions have a separate page directory called "efi_pgd". In order to access any efi region we have to first shift %cr3 to this page table. In the bgrt code we are trying to copy bgrt_header and image, but these regions fall under "EFI_BOOT_SERVICES_DATA" and to access these regions we have to shift %cr3 to efi_pgd and not doing so will cause page fault as shown below. [ 0.251599] Last level dTLB entries: 4KB 64, 2MB 0, 4MB 0, 1GB 4 [ 0.259126] Freeing SMP alternatives memory: 32K (ffffffff8230e000 - ffffffff82316000) [ 0.271803] BUG: unable to handle kernel paging request at fffffffefce35002 [ 0.279740] IP: [<ffffffff821bca49>] efi_bgrt_init+0x144/0x1fd [ 0.286383] PGD 300f067 PUD 0 [ 0.289879] Oops: 0000 [#1] SMP [ 0.293566] Modules linked in: [ 0.297039] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.4.0-rc1-eywa-eywa-built-in-47041+ #2 [ 0.306619] Hardware name: Intel Corporation Skylake Client platform/Skylake Y LPDDR3 RVP3, BIOS SKLSE2R1.R00.B104.B01.1511110114 11/11/2015 [ 0.320925] task: ffffffff820134c0 ti: ffffffff82000000 task.ti: ffffffff82000000 [ 0.329420] RIP: 0010:[<ffffffff821bca49>] [<ffffffff821bca49>] efi_bgrt_init+0x144/0x1fd [ 0.338821] RSP: 0000:ffffffff82003f18 EFLAGS: 00010246 [ 0.344852] RAX: fffffffefce35000 RBX: fffffffefce35000 RCX: fffffffefce2b000 [ 0.352952] RDX: 000000008a82b000 RSI: ffffffff8235bb80 RDI: 000000008a835000 [ 0.361050] RBP: ffffffff82003f30 R08: 000000008a865000 R09: ffffffffff202850 [ 0.369149] R10: ffffffff811ad62f R11: 0000000000000000 R12: 0000000000000000 [ 0.377248] R13: ffff88016dbaea40 R14: ffffffff822622c0 R15: ffffffff82003fb0 [ 0.385348] FS: 0000000000000000(0000) GS:ffff88016d800000(0000) knlGS:0000000000000000 [ 0.394533] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 0.401054] CR2: fffffffefce35002 CR3: 000000000300c000 CR4: 00000000003406f0 [ 0.409153] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 0.417252] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 0.425350] Stack: [ 0.427638] ffffffffffffffff ffffffff82256900 ffff88016dbaea40 ffffffff82003f40 [ 0.436086] ffffffff821bbce0 ffffffff82003f88 ffffffff8219c0c2 0000000000000000 [ 0.444533] ffffffff8219ba4a ffffffff822622c0 0000000000083000 00000000ffffffff [ 0.452978] Call Trace: [ 0.455763] [<ffffffff821bbce0>] efi_late_init+0x9/0xb [ 0.461697] [<ffffffff8219c0c2>] start_kernel+0x463/0x47f [ 0.467928] [<ffffffff8219ba4a>] ? set_init_arg+0x55/0x55 [ 0.474159] [<ffffffff8219b120>] ? early_idt_handler_array+0x120/0x120 [ 0.481669] [<ffffffff8219b5ee>] x86_64_start_reservations+0x2a/0x2c [ 0.488982] [<ffffffff8219b72d>] x86_64_start_kernel+0x13d/0x14c [ 0.495897] Code: 00 41 b4 01 48 8b 78 28 e8 09 36 01 00 48 85 c0 48 89 c3 75 13 48 c7 c7 f8 ac d3 81 31 c0 e8 d7 3b fb fe e9 b5 00 00 00 45 84 e4 <44> 8b 6b 02 74 0d be 06 00 00 00 48 89 df e8 ae 34 0$ [ 0.518151] RIP [<ffffffff821bca49>] efi_bgrt_init+0x144/0x1fd [ 0.524888] RSP <ffffffff82003f18> [ 0.528851] CR2: fffffffefce35002 [ 0.532615] ---[ end trace 7b06521e6ebf2aea ]--- [ 0.537852] Kernel panic - not syncing: Attempted to kill the idle task! As said above one way to fix this bug is to shift %cr3 to efi_pgd but we are not doing that way because it leaks inner details of how we switch to EFI page tables into a new call site and it also adds duplicate code. Instead, we remove the call to efi_lookup_mapped_addr() and always perform early_mem*() instead of early_io*() because we want to remap RAM regions and not I/O regions. We also delete efi_lookup_mapped_addr() because we are no longer using it. Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Reported-by: Wendy Wang <wendy.wang@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Ricardo Neri <ricardo.neri@intel.com> Cc: Ravi Shankar <ravi.v.shankar@intel.com> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> Cc: "Ghannam, Yazen" <Yazen.Ghannam@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e2c90dd7e11e3025b46719a79fb4bb1e7a5cef9f upstream. Môshe reported the following warning triggered on his machine since commit 50a0cb565246 ("x86/efi-bgrt: Fix kernel panic when mapping BGRT data"), [ 0.026936] ------------[ cut here ]------------ [ 0.026941] WARNING: CPU: 0 PID: 0 at mm/early_ioremap.c:137 __early_ioremap+0x102/0x1bb() [ 0.026941] Modules linked in: [ 0.026944] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.4.0-rc1 #2 [ 0.026945] Hardware name: Dell Inc. XPS 13 9343/09K8G1, BIOS A05 07/14/2015 [ 0.026946] 0000000000000000 900f03d5a116524d ffffffff81c03e60 ffffffff813a3fff [ 0.026948] 0000000000000000 ffffffff81c03e98 ffffffff810a0852 00000000d7b76000 [ 0.026949] 0000000000000000 0000000000000001 0000000000000001 000000000000017c [ 0.026951] Call Trace: [ 0.026955] [<ffffffff813a3fff>] dump_stack+0x44/0x55 [ 0.026958] [<ffffffff810a0852>] warn_slowpath_common+0x82/0xc0 [ 0.026959] [<ffffffff810a099a>] warn_slowpath_null+0x1a/0x20 [ 0.026961] [<ffffffff81d8c395>] __early_ioremap+0x102/0x1bb [ 0.026962] [<ffffffff81d8c602>] early_memremap+0x13/0x15 [ 0.026964] [<ffffffff81d78361>] efi_bgrt_init+0x162/0x1ad [ 0.026966] [<ffffffff81d778ec>] efi_late_init+0x9/0xb [ 0.026968] [<ffffffff81d58ff5>] start_kernel+0x46f/0x49f [ 0.026970] [<ffffffff81d58120>] ? early_idt_handler_array+0x120/0x120 [ 0.026972] [<ffffffff81d58339>] x86_64_start_reservations+0x2a/0x2c [ 0.026974] [<ffffffff81d58485>] x86_64_start_kernel+0x14a/0x16d [ 0.026977] ---[ end trace f9b3812eb8e24c58 ]--- [ 0.026978] efi_bgrt: Ignoring BGRT: failed to map image memory early_memremap() has an upper limit on the size of mapping it can handle which is ~200KB. Clearly the BGRT image on Môshe's machine is much larger than that. There's actually no reason to restrict ourselves to using the early_* version of memremap() - the ACPI BGRT driver is invoked late enough in boot that we can use the standard version, with the benefit that the late version allows mappings of arbitrary size. Reported-by: Môshe van der Sterre <me@moshe.nl> Tested-by: Môshe van der Sterre <me@moshe.nl> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Cc: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1450707172-12561-1-git-send-email-matt@codeblueprint.co.uk Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: "Ghannam, Yazen" <Yazen.Ghannam@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit f3069c6d33f6ae63a1668737bc78aaaa51bff7ca ] This is a fix for syzkaller719569, where memory registration was attempted without any underlying transport being loaded. Analysis of the case reveals that it is the setsockopt() RDS_GET_MR (2) and RDS_GET_MR_FOR_DEST (7) that are vulnerable. Here is an example stack trace when the bug is hit: BUG: unable to handle kernel NULL pointer dereference at 00000000000000c0 IP: __rds_rdma_map+0x36/0x440 [rds] PGD 2f93d03067 P4D 2f93d03067 PUD 2f93d02067 PMD 0 Oops: 0000 [#1] SMP Modules linked in: bridge stp llc tun rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache rds binfmt_misc sb_edac intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul c rc32_pclmul ghash_clmulni_intel pcbc aesni_intel crypto_simd glue_helper cryptd iTCO_wdt mei_me sg iTCO_vendor_support ipmi_si mei ipmi_devintf nfsd shpchp pcspkr i2c_i801 ioatd ma ipmi_msghandler wmi lpc_ich mfd_core auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache jbd2 mgag200 i2c_algo_bit drm_kms_helper ixgbe syscopyarea ahci sysfillrect sysimgblt libahci mdio fb_sys_fops ttm ptp libata sd_mod mlx4_core drm crc32c_intel pps_core megaraid_sas i2c_core dca dm_mirror dm_region_hash dm_log dm_mod CPU: 48 PID: 45787 Comm: repro_set2 Not tainted 4.14.2-3.el7uek.x86_64 #2 Hardware name: Oracle Corporation ORACLE SERVER X5-2L/ASM,MOBO TRAY,2U, BIOS 31110000 03/03/2017 task: ffff882f9190db00 task.stack: ffffc9002b994000 RIP: 0010:__rds_rdma_map+0x36/0x440 [rds] RSP: 0018:ffffc9002b997df0 EFLAGS: 00010202 RAX: 0000000000000000 RBX: ffff882fa2182580 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffffc9002b997e40 RDI: ffff882fa2182580 RBP: ffffc9002b997e30 R08: 0000000000000000 R09: 0000000000000002 R10: ffff885fb29e3838 R11: 0000000000000000 R12: ffff882fa2182580 R13: ffff882fa2182580 R14: 0000000000000002 R15: 0000000020000ffc FS: 00007fbffa20b700(0000) GS:ffff882fbfb80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000000c0 CR3: 0000002f98a66006 CR4: 00000000001606e0 Call Trace: rds_get_mr+0x56/0x80 [rds] rds_setsockopt+0x172/0x340 [rds] ? __fget_light+0x25/0x60 ? __fdget+0x13/0x20 SyS_setsockopt+0x80/0xe0 do_syscall_64+0x67/0x1b0 entry_SYSCALL64_slow_path+0x25/0x25 RIP: 0033:0x7fbff9b117f9 RSP: 002b:00007fbffa20aed8 EFLAGS: 00000293 ORIG_RAX: 0000000000000036 RAX: ffffffffffffffda RBX: 00000000000c84a4 RCX: 00007fbff9b117f9 RDX: 0000000000000002 RSI: 0000400000000114 RDI: 000000000000109b RBP: 00007fbffa20af10 R08: 0000000000000020 R09: 00007fbff9dd7860 R10: 0000000020000ffc R11: 0000000000000293 R12: 0000000000000000 R13: 00007fbffa20b9c0 R14: 00007fbffa20b700 R15: 0000000000000021 Code: 41 56 41 55 49 89 fd 41 54 53 48 83 ec 18 8b 87 f0 02 00 00 48 89 55 d0 48 89 4d c8 85 c0 0f 84 2d 03 00 00 48 8b 87 00 03 00 00 <48> 83 b8 c0 00 00 00 00 0f 84 25 03 00 0 0 48 8b 06 48 8b 56 08 The fix is to check the existence of an underlying transport in __rds_rdma_map(). Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> Reported-by: syzbot <syzkaller@googlegroups.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e1699d2d7bf6e6cce3e1baff19f9dd4595a58664 ] This is a story about 4 distinct (and very old) btrfs bugs. Commit c8b9781 ("Btrfs: Add zlib compression support") added three data corruption bugs for inline extents (bugs #1-3). Commit 93c82d5 ("Btrfs: zero page past end of inline file items") fixed bug #1: uncompressed inline extents followed by a hole and more extents could get non-zero data in the hole as they were read. The fix was to add a memset in btrfs_get_extent to zero out the hole. Commit 166ae5a ("btrfs: fix inline compressed read err corruption") fixed bug #2: compressed inline extents which contained non-zero bytes might be replaced with zero bytes in some cases. This patch removed an unhelpful memset from uncompress_inline, but the case where memset is required was missed. There is also a memset in the decompression code, but this only covers decompressed data that is shorter than the ram_bytes from the extent ref record. This memset doesn't cover the region between the end of the decompressed data and the end of the page. It has also moved around a few times over the years, so there's no single patch to refer to. This patch fixes bug #3: compressed inline extents followed by a hole and more extents could get non-zero data in the hole as they were read (i.e. bug #3 is the same as bug #1, but s/uncompressed/compressed/). The fix is the same: zero out the hole in the compressed case too, by putting a memset back in uncompress_inline, but this time with correct parameters. The last and oldest bug, bug #0, is the cause of the offending inline extent/hole/extent pattern. Bug #0 is a subtle and mostly-harmless quirk of behavior somewhere in the btrfs write code. In a few special cases, an inline extent and hole are allowed to persist where they normally would be combined with later extents in the file. A fast reproducer for bug #0 is presented below. A few offending extents are also created in the wild during large rsync transfers with the -S flag. A Linux kernel build (git checkout; make allyesconfig; make -j8) will produce a handful of offending files as well. Once an offending file is created, it can present different content to userspace each time it is read. Bug #0 is at least 4 and possibly 8 years old. I verified every vX.Y kernel back to v3.5 has this behavior. There are fossil records of this bug's effects in commits all the way back to v2.6.32. I have no reason to believe bug #0 wasn't present at the beginning of btrfs compression support in v2.6.29, but I can't easily test kernels that old to be sure. It is not clear whether bug #0 is worth fixing. A fix would likely require injecting extra reads into currently write-only paths, and most of the exceptional cases caused by bug #0 are already handled now. Whether we like them or not, bug #0's inline extents followed by holes are part of the btrfs de-facto disk format now, and we need to be able to read them without data corruption or an infoleak. So enough about bug #0, let's get back to bug #3 (this patch). An example of on-disk structure leading to data corruption found in the wild: item 61 key (606890 INODE_ITEM 0) itemoff 9662 itemsize 160 inode generation 50 transid 50 size 47424 nbytes 49141 block group 0 mode 100644 links 1 uid 0 gid 0 rdev 0 flags 0x0(none) item 62 key (606890 INODE_REF 603050) itemoff 9642 itemsize 20 inode ref index 3 namelen 10 name: DB_File.so item 63 key (606890 EXTENT_DATA 0) itemoff 8280 itemsize 1362 inline extent data size 1341 ram 4085 compress(zlib) item 64 key (606890 EXTENT_DATA 4096) itemoff 8227 itemsize 53 extent data disk byte 5367308288 nr 20480 extent data offset 0 nr 45056 ram 45056 extent compression(zlib) Different data appears in userspace during each read of the 11 bytes between 4085 and 4096. The extent in item 63 is not long enough to fill the first page of the file, so a memset is required to fill the space between item 63 (ending at 4085) and item 64 (beginning at 4096) with zero. Here is a reproducer from Liu Bo, which demonstrates another method of creating the same inline extent and hole pattern: Using 'page_poison=on' kernel command line (or enable CONFIG_PAGE_POISONING) run the following: # touch foo # chattr +c foo # xfs_io -f -c "pwrite -W 0 1000" foo # xfs_io -f -c "falloc 4 8188" foo # od -x foo # echo 3 >/proc/sys/vm/drop_caches # od -x foo This produce the following on my box: Correct output: file contains 1000 data bytes followed by zeros: 0000000 cdcd cdcd cdcd cdcd cdcd cdcd cdcd cdcd * 0001740 cdcd cdcd cdcd cdcd 0000 0000 0000 0000 0001760 0000 0000 0000 0000 0000 0000 0000 0000 * 0020000 Actual output: the data after the first 1000 bytes will be different each run: 0000000 cdcd cdcd cdcd cdcd cdcd cdcd cdcd cdcd * 0001740 cdcd cdcd cdcd cdcd 6c63 7400 635f 006d 0001760 5f74 6f43 7400 435f 0053 5f74 7363 7400 0002000 435f 0056 5f74 6164 7400 645f 0062 5f74 (...) Signed-off-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org> Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: Chris Mason <clm@fb.com> Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit ec4fbd64751de18729eaa816ec69e4b504b5a7a2 ] Dmitry reported a lockdep splat [1] (false positive) that we can fix by releasing the spinlock before calling icmp_send() from ip_expire() This is a false positive because sending an ICMP message can not possibly re-enter the IP frag engine. [1] [ INFO: possible circular locking dependency detected ] 4.10.0+ #29 Not tainted ------------------------------------------------------- modprobe/12392 is trying to acquire lock: (_xmit_ETHER#2){+.-...}, at: [<ffffffff837a8182>] spin_lock include/linux/spinlock.h:299 [inline] (_xmit_ETHER#2){+.-...}, at: [<ffffffff837a8182>] __netif_tx_lock include/linux/netdevice.h:3486 [inline] (_xmit_ETHER#2){+.-...}, at: [<ffffffff837a8182>] sch_direct_xmit+0x282/0x6d0 net/sched/sch_generic.c:180 but task is already holding lock: (&(&q->lock)->rlock){+.-...}, at: [<ffffffff8389a4d1>] spin_lock include/linux/spinlock.h:299 [inline] (&(&q->lock)->rlock){+.-...}, at: [<ffffffff8389a4d1>] ip_expire+0x51/0x6c0 net/ipv4/ip_fragment.c:201 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&(&q->lock)->rlock){+.-...}: validate_chain kernel/locking/lockdep.c:2267 [inline] __lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3340 lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3755 __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline] _raw_spin_lock+0x33/0x50 kernel/locking/spinlock.c:151 spin_lock include/linux/spinlock.h:299 [inline] ip_defrag+0x3a2/0x4130 net/ipv4/ip_fragment.c:669 ip_check_defrag+0x4e3/0x8b0 net/ipv4/ip_fragment.c:713 packet_rcv_fanout+0x282/0x800 net/packet/af_packet.c:1459 deliver_skb net/core/dev.c:1834 [inline] dev_queue_xmit_nit+0x294/0xa90 net/core/dev.c:1890 xmit_one net/core/dev.c:2903 [inline] dev_hard_start_xmit+0x16b/0xab0 net/core/dev.c:2923 sch_direct_xmit+0x31f/0x6d0 net/sched/sch_generic.c:182 __dev_xmit_skb net/core/dev.c:3092 [inline] __dev_queue_xmit+0x13e5/0x1e60 net/core/dev.c:3358 dev_queue_xmit+0x17/0x20 net/core/dev.c:3423 neigh_resolve_output+0x6b9/0xb10 net/core/neighbour.c:1308 neigh_output include/net/neighbour.h:478 [inline] ip_finish_output2+0x8b8/0x15a0 net/ipv4/ip_output.c:228 ip_do_fragment+0x1d93/0x2720 net/ipv4/ip_output.c:672 ip_fragment.constprop.54+0x145/0x200 net/ipv4/ip_output.c:545 ip_finish_output+0x82d/0xe10 net/ipv4/ip_output.c:314 NF_HOOK_COND include/linux/netfilter.h:246 [inline] ip_output+0x1f0/0x7a0 net/ipv4/ip_output.c:404 dst_output include/net/dst.h:486 [inline] ip_local_out+0x95/0x170 net/ipv4/ip_output.c:124 ip_send_skb+0x3c/0xc0 net/ipv4/ip_output.c:1492 ip_push_pending_frames+0x64/0x80 net/ipv4/ip_output.c:1512 raw_sendmsg+0x26de/0x3a00 net/ipv4/raw.c:655 inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:761 sock_sendmsg_nosec net/socket.c:633 [inline] sock_sendmsg+0xca/0x110 net/socket.c:643 ___sys_sendmsg+0x4a3/0x9f0 net/socket.c:1985 __sys_sendmmsg+0x25c/0x750 net/socket.c:2075 SYSC_sendmmsg net/socket.c:2106 [inline] SyS_sendmmsg+0x35/0x60 net/socket.c:2101 do_syscall_64+0x2e8/0x930 arch/x86/entry/common.c:281 return_from_SYSCALL_64+0x0/0x7a -> #0 (_xmit_ETHER#2){+.-...}: check_prev_add kernel/locking/lockdep.c:1830 [inline] check_prevs_add+0xa8f/0x19f0 kernel/locking/lockdep.c:1940 validate_chain kernel/locking/lockdep.c:2267 [inline] __lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3340 lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3755 __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline] _raw_spin_lock+0x33/0x50 kernel/locking/spinlock.c:151 spin_lock include/linux/spinlock.h:299 [inline] __netif_tx_lock include/linux/netdevice.h:3486 [inline] sch_direct_xmit+0x282/0x6d0 net/sched/sch_generic.c:180 __dev_xmit_skb net/core/dev.c:3092 [inline] __dev_queue_xmit+0x13e5/0x1e60 net/core/dev.c:3358 dev_queue_xmit+0x17/0x20 net/core/dev.c:3423 neigh_hh_output include/net/neighbour.h:468 [inline] neigh_output include/net/neighbour.h:476 [inline] ip_finish_output2+0xf6c/0x15a0 net/ipv4/ip_output.c:228 ip_finish_output+0xa29/0xe10 net/ipv4/ip_output.c:316 NF_HOOK_COND include/linux/netfilter.h:246 [inline] ip_output+0x1f0/0x7a0 net/ipv4/ip_output.c:404 dst_output include/net/dst.h:486 [inline] ip_local_out+0x95/0x170 net/ipv4/ip_output.c:124 ip_send_skb+0x3c/0xc0 net/ipv4/ip_output.c:1492 ip_push_pending_frames+0x64/0x80 net/ipv4/ip_output.c:1512 icmp_push_reply+0x372/0x4d0 net/ipv4/icmp.c:394 icmp_send+0x156c/0x1c80 net/ipv4/icmp.c:754 ip_expire+0x40e/0x6c0 net/ipv4/ip_fragment.c:239 call_timer_fn+0x241/0x820 kernel/time/timer.c:1268 expire_timers kernel/time/timer.c:1307 [inline] __run_timers+0x960/0xcf0 kernel/time/timer.c:1601 run_timer_softirq+0x21/0x80 kernel/time/timer.c:1614 __do_softirq+0x31f/0xbe7 kernel/softirq.c:284 invoke_softirq kernel/softirq.c:364 [inline] irq_exit+0x1cc/0x200 kernel/softirq.c:405 exiting_irq arch/x86/include/asm/apic.h:657 [inline] smp_apic_timer_interrupt+0x76/0xa0 arch/x86/kernel/apic/apic.c:962 apic_timer_interrupt+0x93/0xa0 arch/x86/entry/entry_64.S:707 __read_once_size include/linux/compiler.h:254 [inline] atomic_read arch/x86/include/asm/atomic.h:26 [inline] rcu_dynticks_curr_cpu_in_eqs kernel/rcu/tree.c:350 [inline] __rcu_is_watching kernel/rcu/tree.c:1133 [inline] rcu_is_watching+0x83/0x110 kernel/rcu/tree.c:1147 rcu_read_lock_held+0x87/0xc0 kernel/rcu/update.c:293 radix_tree_deref_slot include/linux/radix-tree.h:238 [inline] filemap_map_pages+0x6d4/0x1570 mm/filemap.c:2335 do_fault_around mm/memory.c:3231 [inline] do_read_fault mm/memory.c:3265 [inline] do_fault+0xbd5/0x2080 mm/memory.c:3370 handle_pte_fault mm/memory.c:3600 [inline] __handle_mm_fault+0x1062/0x2cb0 mm/memory.c:3714 handle_mm_fault+0x1e2/0x480 mm/memory.c:3751 __do_page_fault+0x4f6/0xb60 arch/x86/mm/fault.c:1397 do_page_fault+0x54/0x70 arch/x86/mm/fault.c:1460 page_fault+0x28/0x30 arch/x86/entry/entry_64.S:1011 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&(&q->lock)->rlock); lock(_xmit_ETHER#2); lock(&(&q->lock)->rlock); lock(_xmit_ETHER#2); *** DEADLOCK *** 10 locks held by modprobe/12392: #0: (&mm->mmap_sem){++++++}, at: [<ffffffff81329758>] __do_page_fault+0x2b8/0xb60 arch/x86/mm/fault.c:1336 #1: (rcu_read_lock){......}, at: [<ffffffff8188cab6>] filemap_map_pages+0x1e6/0x1570 mm/filemap.c:2324 #2: (&(ptlock_ptr(page))->rlock#2){+.+...}, at: [<ffffffff81984a78>] spin_lock include/linux/spinlock.h:299 [inline] #2: (&(ptlock_ptr(page))->rlock#2){+.+...}, at: [<ffffffff81984a78>] pte_alloc_one_map mm/memory.c:2944 [inline] #2: (&(ptlock_ptr(page))->rlock#2){+.+...}, at: [<ffffffff81984a78>] alloc_set_pte+0x13b8/0x1b90 mm/memory.c:3072 #3: (((&q->timer))){+.-...}, at: [<ffffffff81627e72>] lockdep_copy_map include/linux/lockdep.h:175 [inline] #3: (((&q->timer))){+.-...}, at: [<ffffffff81627e72>] call_timer_fn+0x1c2/0x820 kernel/time/timer.c:1258 #4: (&(&q->lock)->rlock){+.-...}, at: [<ffffffff8389a4d1>] spin_lock include/linux/spinlock.h:299 [inline] #4: (&(&q->lock)->rlock){+.-...}, at: [<ffffffff8389a4d1>] ip_expire+0x51/0x6c0 net/ipv4/ip_fragment.c:201 #5: (rcu_read_lock){......}, at: [<ffffffff8389a633>] ip_expire+0x1b3/0x6c0 net/ipv4/ip_fragment.c:216 #6: (slock-AF_INET){+.-...}, at: [<ffffffff839b3313>] spin_trylock include/linux/spinlock.h:309 [inline] #6: (slock-AF_INET){+.-...}, at: [<ffffffff839b3313>] icmp_xmit_lock net/ipv4/icmp.c:219 [inline] #6: (slock-AF_INET){+.-...}, at: [<ffffffff839b3313>] icmp_send+0x803/0x1c80 net/ipv4/icmp.c:681 #7: (rcu_read_lock_bh){......}, at: [<ffffffff838ab9a1>] ip_finish_output2+0x2c1/0x15a0 net/ipv4/ip_output.c:198 #8: (rcu_read_lock_bh){......}, at: [<ffffffff836d1dee>] __dev_queue_xmit+0x23e/0x1e60 net/core/dev.c:3324 #9: (dev->qdisc_running_key ?: &qdisc_running_key){+.....}, at: [<ffffffff836d3a27>] dev_queue_xmit+0x17/0x20 net/core/dev.c:3423 stack backtrace: CPU: 0 PID: 12392 Comm: modprobe Not tainted 4.10.0+ #29 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:16 [inline] dump_stack+0x2ee/0x3ef lib/dump_stack.c:52 print_circular_bug+0x307/0x3b0 kernel/locking/lockdep.c:1204 check_prev_add kernel/locking/lockdep.c:1830 [inline] check_prevs_add+0xa8f/0x19f0 kernel/locking/lockdep.c:1940 validate_chain kernel/locking/lockdep.c:2267 [inline] __lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3340 lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3755 __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline] _raw_spin_lock+0x33/0x50 kernel/locking/spinlock.c:151 spin_lock include/linux/spinlock.h:299 [inline] __netif_tx_lock include/linux/netdevice.h:3486 [inline] sch_direct_xmit+0x282/0x6d0 net/sched/sch_generic.c:180 __dev_xmit_skb net/core/dev.c:3092 [inline] __dev_queue_xmit+0x13e5/0x1e60 net/core/dev.c:3358 dev_queue_xmit+0x17/0x20 net/core/dev.c:3423 neigh_hh_output include/net/neighbour.h:468 [inline] neigh_output include/net/neighbour.h:476 [inline] ip_finish_output2+0xf6c/0x15a0 net/ipv4/ip_output.c:228 ip_finish_output+0xa29/0xe10 net/ipv4/ip_output.c:316 NF_HOOK_COND include/linux/netfilter.h:246 [inline] ip_output+0x1f0/0x7a0 net/ipv4/ip_output.c:404 dst_output include/net/dst.h:486 [inline] ip_local_out+0x95/0x170 net/ipv4/ip_output.c:124 ip_send_skb+0x3c/0xc0 net/ipv4/ip_output.c:1492 ip_push_pending_frames+0x64/0x80 net/ipv4/ip_output.c:1512 icmp_push_reply+0x372/0x4d0 net/ipv4/icmp.c:394 icmp_send+0x156c/0x1c80 net/ipv4/icmp.c:754 ip_expire+0x40e/0x6c0 net/ipv4/ip_fragment.c:239 call_timer_fn+0x241/0x820 kernel/time/timer.c:1268 expire_timers kernel/time/timer.c:1307 [inline] __run_timers+0x960/0xcf0 kernel/time/timer.c:1601 run_timer_softirq+0x21/0x80 kernel/time/timer.c:1614 __do_softirq+0x31f/0xbe7 kernel/softirq.c:284 invoke_softirq kernel/softirq.c:364 [inline] irq_exit+0x1cc/0x200 kernel/softirq.c:405 exiting_irq arch/x86/include/asm/apic.h:657 [inline] smp_apic_timer_interrupt+0x76/0xa0 arch/x86/kernel/apic/apic.c:962 apic_timer_interrupt+0x93/0xa0 arch/x86/entry/entry_64.S:707 RIP: 0010:__read_once_size include/linux/compiler.h:254 [inline] RIP: 0010:atomic_read arch/x86/include/asm/atomic.h:26 [inline] RIP: 0010:rcu_dynticks_curr_cpu_in_eqs kernel/rcu/tree.c:350 [inline] RIP: 0010:__rcu_is_watching kernel/rcu/tree.c:1133 [inline] RIP: 0010:rcu_is_watching+0x83/0x110 kernel/rcu/tree.c:1147 RSP: 0000:ffff8801c391f120 EFLAGS: 00000a03 ORIG_RAX: ffffffffffffff10 RAX: dffffc0000000000 RBX: ffff8801c391f148 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 000055edd4374000 RDI: ffff8801dbe1ae0c RBP: ffff8801c391f1a0 R08: 0000000000000002 R09: 0000000000000000 R10: dffffc0000000000 R11: 0000000000000002 R12: 1ffff10038723e25 R13: ffff8801dbe1ae00 R14: ffff8801c391f680 R15: dffffc0000000000 </IRQ> rcu_read_lock_held+0x87/0xc0 kernel/rcu/update.c:293 radix_tree_deref_slot include/linux/radix-tree.h:238 [inline] filemap_map_pages+0x6d4/0x1570 mm/filemap.c:2335 do_fault_around mm/memory.c:3231 [inline] do_read_fault mm/memory.c:3265 [inline] do_fault+0xbd5/0x2080 mm/memory.c:3370 handle_pte_fault mm/memory.c:3600 [inline] __handle_mm_fault+0x1062/0x2cb0 mm/memory.c:3714 handle_mm_fault+0x1e2/0x480 mm/memory.c:3751 __do_page_fault+0x4f6/0xb60 arch/x86/mm/fault.c:1397 do_page_fault+0x54/0x70 arch/x86/mm/fault.c:1460 page_fault+0x28/0x30 arch/x86/entry/entry_64.S:1011 RIP: 0033:0x7f83172f2786 RSP: 002b:00007fffe859ae80 EFLAGS: 00010293 RAX: 000055edd4373040 RBX: 00007f83175111c8 RCX: 000055edd4373238 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00007f8317510970 RBP: 00007fffe859afd0 R08: 0000000000000009 R09: 0000000000000000 R10: 0000000000000064 R11: 0000000000000000 R12: 000055edd4373040 R13: 0000000000000000 R14: 00007fffe859afe8 R15: 0000000000000000 Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3b2d69114fefa474fca542e51119036dceb4aa6f upstream. ACPICA commit a23325b2e583556eae88ed3f764e457786bf4df6 I found some ACPI operand cache leaks in ACPI early abort cases. Boot log of ACPI operand cache leak is as follows: >[ 0.174332] ACPI: Added _OSI(Module Device) >[ 0.175504] ACPI: Added _OSI(Processor Device) >[ 0.176010] ACPI: Added _OSI(3.0 _SCP Extensions) >[ 0.177032] ACPI: Added _OSI(Processor Aggregator Device) >[ 0.178284] ACPI: SCI (IRQ16705) allocation failed >[ 0.179352] ACPI Exception: AE_NOT_ACQUIRED, Unable to install System Control Interrupt handler (20160930/evevent-131) >[ 0.180008] ACPI: Unable to start the ACPI Interpreter >[ 0.181125] ACPI Error: Could not remove SCI handler (20160930/evmisc-281) >[ 0.184068] kmem_cache_destroy Acpi-Operand: Slab cache still has objects >[ 0.185358] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.10.0-rc3 #2 >[ 0.186820] Hardware name: innotek gmb_h virtual_box/virtual_box, BIOS virtual_box 12/01/2006 >[ 0.188000] Call Trace: >[ 0.188000] ? dump_stack+0x5c/0x7d >[ 0.188000] ? kmem_cache_destroy+0x224/0x230 >[ 0.188000] ? acpi_sleep_proc_init+0x22/0x22 >[ 0.188000] ? acpi_os_delete_cache+0xa/0xd >[ 0.188000] ? acpi_ut_delete_caches+0x3f/0x7b >[ 0.188000] ? acpi_terminate+0x5/0xf >[ 0.188000] ? acpi_init+0x288/0x32e >[ 0.188000] ? __class_create+0x4c/0x80 >[ 0.188000] ? video_setup+0x7a/0x7a >[ 0.188000] ? do_one_initcall+0x4e/0x1b0 >[ 0.188000] ? kernel_init_freeable+0x194/0x21a >[ 0.188000] ? rest_init+0x80/0x80 >[ 0.188000] ? kernel_init+0xa/0x100 >[ 0.188000] ? ret_from_fork+0x25/0x30 When early abort is occurred due to invalid ACPI information, Linux kernel terminates ACPI by calling acpi_terminate() function. The function calls acpi_ns_terminate() function to delete namespace data and ACPI operand cache (acpi_gbl_module_code_list). But the deletion code in acpi_ns_terminate() function is wrapped in ACPI_EXEC_APP definition, therefore the code is only executed when the definition exists. If the define doesn't exist, ACPI operand cache (acpi_gbl_module_code_list) is leaked, and stack dump is shown in kernel log. This causes a security threat because the old kernel (<= 4.9) shows memory locations of kernel functions in stack dump, therefore kernel ASLR can be neutralized. To fix ACPI operand leak for enhancing security, I made a patch which removes the ACPI_EXEC_APP define in acpi_ns_terminate() function for executing the deletion code unconditionally. Link: acpica/acpica@a23325b2 Signed-off-by: Seunghun Han <kkamagui@gmail.com> Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Lee, Chun-Yi <jlee@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 749439bfac6e1a2932c582e2699f91d329658196 ] The logic in __ip6_append_data() assumes that the MTU is at least large enough for the headers. A device's MTU may be adjusted after being added while sendmsg() is processing data, resulting in __ip6_append_data() seeing any MTU. For an mtu smaller than the size of the fragmentation header, the math results in a negative 'maxfraglen', which causes problems when refragmenting any previous skb in the skb_write_queue, leaving it possibly malformed. Instead sendmsg returns EINVAL when the mtu is calculated to be less than IPV6_MIN_MTU. Found by syzkaller: kernel BUG at ./include/linux/skbuff.h:2064! invalid opcode: 0000 [#1] SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 1 PID: 14216 Comm: syz-executor5 Not tainted 4.13.0-rc4+ #2 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 task: ffff8801d0b68580 task.stack: ffff8801ac6b8000 RIP: 0010:__skb_pull include/linux/skbuff.h:2064 [inline] RIP: 0010:__ip6_make_skb+0x18cf/0x1f70 net/ipv6/ip6_output.c:1617 RSP: 0018:ffff8801ac6bf570 EFLAGS: 00010216 RAX: 0000000000010000 RBX: 0000000000000028 RCX: ffffc90003cce000 RDX: 00000000000001b8 RSI: ffffffff839df06f RDI: ffff8801d9478ca0 RBP: ffff8801ac6bf780 R08: ffff8801cc3f1dbc R09: 0000000000000000 R10: ffff8801ac6bf7a0 R11: 43cb4b7b1948a9e7 R12: ffff8801cc3f1dc8 R13: ffff8801cc3f1d40 R14: 0000000000001036 R15: dffffc0000000000 FS: 00007f43d740c700(0000) GS:ffff8801dc100000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f7834984000 CR3: 00000001d79b9000 CR4: 00000000001406e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: ip6_finish_skb include/net/ipv6.h:911 [inline] udp_v6_push_pending_frames+0x255/0x390 net/ipv6/udp.c:1093 udpv6_sendmsg+0x280d/0x31a0 net/ipv6/udp.c:1363 inet_sendmsg+0x11f/0x5e0 net/ipv4/af_inet.c:762 sock_sendmsg_nosec net/socket.c:633 [inline] sock_sendmsg+0xca/0x110 net/socket.c:643 SYSC_sendto+0x352/0x5a0 net/socket.c:1750 SyS_sendto+0x40/0x50 net/socket.c:1718 entry_SYSCALL_64_fastpath+0x1f/0xbe RIP: 0033:0x4512e9 RSP: 002b:00007f43d740bc08 EFLAGS: 00000216 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 00000000007180a8 RCX: 00000000004512e9 RDX: 000000000000002e RSI: 0000000020d08000 RDI: 0000000000000005 RBP: 0000000000000086 R08: 00000000209c1000 R09: 000000000000001c R10: 0000000000040800 R11: 0000000000000216 R12: 00000000004b9c69 R13: 00000000ffffffff R14: 0000000000000005 R15: 00000000202c2000 Code: 9e 01 fe e9 c5 e8 ff ff e8 7f 9e 01 fe e9 4a ea ff ff 48 89 f7 e8 52 9e 01 fe e9 aa eb ff ff e8 a8 b6 cf fd 0f 0b e8 a1 b6 cf fd <0f> 0b 49 8d 45 78 4d 8d 45 7c 48 89 85 78 fe ff ff 49 8d 85 ba RIP: __skb_pull include/linux/skbuff.h:2064 [inline] RSP: ffff8801ac6bf570 RIP: __ip6_make_skb+0x18cf/0x1f70 net/ipv6/ip6_output.c:1617 RSP: ffff8801ac6bf570 Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Mike Maloney <maloney@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ 5.368211] rockchip-sy-mipi-dphy ff770000.syscon:mipi-phy-rx0: invalid resource [ 5.376523] ERROR: Bad of_node_put() on /syscon@ff770000/mipi-phy-rx0/ports/port@0 [ 5.384943] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 4.4.114 #2 [ 5.391605] Hardware name: Rockchip (Device Tree) [ 5.396857] [<c0110124>] (unwind_backtrace) from [<c010bdd4>] (show_stack+0x20/0x24) [ 5.405475] [<c010bdd4>] (show_stack) from [<c050b8ec>] (dump_stack+0x84/0xa0) [ 5.413516] [<c050b8ec>] (dump_stack) from [<c08c8964>] (of_node_release+0x34/0xa4) [ 5.422035] [<c08c8964>] (of_node_release) from [<c050d8a4>] (kobject_put+0xbc/0xdc) [ 5.430649] [<c050d8a4>] (kobject_put) from [<c08c800c>] (of_node_put+0x24/0x28) [ 5.438877] [<c08c800c>] (of_node_put) from [<c08c7420>] (of_fwnode_put+0x3c/0x40) [ 5.447299] [<c08c7420>] (of_fwnode_put) from [<c0640ad4>] (fwnode_handle_put+0x38/0x3c) [ 5.456296] [<c0640ad4>] (fwnode_handle_put) from [<c0640c04>] (fwnode_graph_get_port_parent+0x4c/0x54) [ 5.466740] [<c0640c04>] (fwnode_graph_get_port_parent) from [<c078f3bc>] (__v4l2_async_notifier_parse_fwnode_endpoints+0x118/0x2bc) [ 5.479971] [<c078f3bc>] (__v4l2_async_notifier_parse_fwnode_endpoints) from [<c078f5bc>] (v4l2_async_notifier_parse_fwnode_endpoints_by_port+0x2c/0x34) [ 5.495128] [<c078f5bc>] (v4l2_async_notifier_parse_fwnode_endpoints_by_port) from [<c07c65fc>] (rockchip_mipidphy_probe+0x1b4/0x264) [ 5.508457] [<c07c65fc>] (rockchip_mipidphy_probe) from [<c063dc1c>] (platform_drv_probe+0x60/0xb0) [ 5.518513] [<c063dc1c>] (platform_drv_probe) from [<c063bda8>] (driver_probe_device+0x120/0x2a4) [ 5.528375] [<c063bda8>] (driver_probe_device) from [<c063bfa4>] (__driver_attach+0x78/0x9c) [ 5.537753] [<c063bfa4>] (__driver_attach) from [<c063a324>] (bus_for_each_dev+0x7c/0xa0) [ 5.546843] [<c063a324>] (bus_for_each_dev) from [<c063b8b8>] (driver_attach+0x28/0x30) [ 5.555741] [<c063b8b8>] (driver_attach) from [<c063b448>] (bus_add_driver+0xe0/0x1f4) [ 5.564542] [<c063b448>] (bus_add_driver) from [<c063cb28>] (driver_register+0xac/0xf0) [ 5.573443] [<c063cb28>] (driver_register) from [<c063db5c>] (__platform_driver_register+0x40/0x54) [ 5.583499] [<c063db5c>] (__platform_driver_register) from [<c123b30c>] (rockchip_isp_mipidphy_driver_init+0x18/0x20) [ 5.595287] [<c123b30c>] (rockchip_isp_mipidphy_driver_init) from [<c0101afc>] (do_one_initcall+0x114/0x1c8) [ 5.606206] [<c0101afc>] (do_one_initcall) from [<c1200edc>] (kernel_init_freeable+0x198/0x264) [ 5.615877] [<c1200edc>] (kernel_init_freeable) from [<c0c12d90>] (kernel_init+0x18/0x118) [ 5.625068] [<c0c12d90>] (kernel_init) from [<c0107690>] (ret_from_fork+0x14/0x24) Change-Id: I6c0d625d2bfd5ffc0af6556e85d6795e9a594e98 Reviewed-on: https://tp-biosrd-v02/gerrit/82306 Reviewed-by: Jamess Huang(黃以民) <Jamess_Huang@asus.com> Tested-by: Jamess Huang(黃以民) <Jamess_Huang@asus.com>
[ Upstream commit 9b70de6d0266888b3743f03802502e43131043c8 ] The bnx2x driver is not providing proper alignment on the receive buffers it passes to build_skb(), causing skb_shared_info to be misaligned. skb_shared_info contains an atomic, and while PPC normally supports unaligned accesses, it does not support unaligned atomics. Aligning the size of rx buffers will ensure that page_frag_alloc() returns aligned addresses. This can be reproduced on PPC by setting the network MTU to 1450 (or other non-multiple-of-4) and then generating sufficient inbound network traffic (one or two large "wget"s usually does it), producing the following oops: Unable to handle kernel paging request for unaligned access at address 0xc00000ffc43af656 Faulting instruction address: 0xc00000000080ef8c Oops: Kernel access of bad area, sig: 7 [#1] SMP NR_CPUS=2048 NUMA PowerNV Modules linked in: vmx_crypto powernv_rng rng_core powernv_op_panel leds_powernv led_class nfsd ip_tables x_tables autofs4 xfs lpfc bnx2x mdio libcrc32c crc_t10dif crct10dif_generic crct10dif_common CPU: 104 PID: 0 Comm: swapper/104 Not tainted 4.11.0-rc8-00088-g4c761da #2 task: c00000ffd4892400 task.stack: c00000ffd4920000 NIP: c00000000080ef8c LR: c00000000080eee8 CTR: c0000000001f8320 REGS: c00000ffffc33710 TRAP: 0600 Not tainted (4.11.0-rc8-00088-g4c761da) MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 24082042 XER: 00000000 CFAR: c00000000080eea0 DAR: c00000ffc43af656 DSISR: 00000000 SOFTE: 1 GPR00: c000000000907f64 c00000ffffc33990 c000000000dd3b00 c00000ffcaf22100 GPR04: c00000ffcaf22e00 0000000000000000 0000000000000000 0000000000000000 GPR08: 0000000000b80008 c00000ffc43af636 c00000ffc43af656 0000000000000000 GPR12: c0000000001f6f00 c00000000fe1a000 000000000000049f 000000000000c51f GPR16: 00000000ffffef33 0000000000000000 0000000000008a43 0000000000000001 GPR20: c00000ffc58a90c0 0000000000000000 000000000000dd86 0000000000000000 GPR24: c000007fd0ed10c0 00000000ffffffff 0000000000000158 000000000000014a GPR28: c00000ffc43af010 c00000ffc9144000 c00000ffcaf22e00 c00000ffcaf22100 NIP [c00000000080ef8c] __skb_clone+0xdc/0x140 LR [c00000000080eee8] __skb_clone+0x38/0x140 Call Trace: [c00000ffffc33990] [c00000000080fb74] skb_clone+0x74/0x110 (unreliable) [c00000ffffc339c0] [c000000000907f64] packet_rcv+0x144/0x510 [c00000ffffc33a40] [c000000000827b64] __netif_receive_skb_core+0x5b4/0xd80 [c00000ffffc33b00] [c00000000082b2bc] netif_receive_skb_internal+0x2c/0xc0 [c00000ffffc33b40] [c00000000082c49c] napi_gro_receive+0x11c/0x260 [c00000ffffc33b80] [d000000066483d68] bnx2x_poll+0xcf8/0x17b0 [bnx2x] [c00000ffffc33d00] [c00000000082babc] net_rx_action+0x31c/0x480 [c00000ffffc33e10] [c0000000000d5a44] __do_softirq+0x164/0x3d0 [c00000ffffc33f00] [c0000000000d60a8] irq_exit+0x108/0x120 [c00000ffffc33f20] [c000000000015b98] __do_irq+0x98/0x200 [c00000ffffc33f90] [c000000000027f14] call_do_irq+0x14/0x24 [c00000ffd4923a90] [c000000000015d94] do_IRQ+0x94/0x110 [c00000ffd4923ae0] [c000000000008d90] hardware_interrupt_common+0x150/0x160 Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 633e8799ddc09431be2744c4a1efdbda13af2b0b ] This changed is needed to avoid locking problem during boot as shown: <5>[ 8.824096] Registering SWP/SWPB emulation handler <6>[ 8.977294] clock: disabling unused clocks to save power <3>[ 9.108154] BUG: sleeping function called from invalid context at kernel_albert/kernel/mutex.c:269 <3>[ 9.122894] in_atomic(): 1, irqs_disabled(): 0, pid: 1, name: swapper/0 <4>[ 9.130249] 3 locks held by swapper/0/1: <4>[ 9.134613] #0: (&__lockdep_no_validate__){......}, at: [<c0342430>] __driver_attach+0x58/0xa8 <4>[ 9.144500] #1: (&__lockdep_no_validate__){......}, at: [<c0342440>] __driver_attach+0x68/0xa8 <4>[ 9.154357] #2: (&polling_timer){......}, at: [<c0053770>] run_timer_softirq+0x108/0x3ec <4>[ 9.163726] Backtrace: <4>[ 9.166473] [<c001269c>] (dump_backtrace+0x0/0x114) from [<c067e5f0>] (dump_stack+0x20/0x24) <4>[ 9.175811] r6:00203230 r5:0000010d r4:d782e000 r3:60000113 <4>[ 9.182250] [<c067e5d0>] (dump_stack+0x0/0x24) from [<c007441c>] (__might_sleep+0x10c/0x128) <4>[ 9.191650] [<c0074310>] (__might_sleep+0x0/0x128) from [<c0688f60>] (mutex_lock_nested+0x34/0x36c) <4>[ 9.201660] r5:c02d5350 r4:d79a0c64 <4>[ 9.205688] [<c0688f2c>] (mutex_lock_nested+0x0/0x36c) from [<c02d5350>] (regulator_set_current_limit+0x30/0x118) <4>[ 9.217071] [<c02d5320>] (regulator_set_current_limit+0x0/0x118) from [<c0435ce0>] (update_charger+0x84/0xc4) <4>[ 9.228027] r7:d782fb20 r6:00000101 r5:c1767e94 r4:00000000 <4>[ 9.234436] [<c0435c5c>] (update_charger+0x0/0xc4) from [<c0435d40>] (psy_changed+0x20/0x48) <4>[ 9.243804] r5:d782e000 r4:c1767e94 <4>[ 9.247802] [<c0435d20>] (psy_changed+0x0/0x48) from [<c0435dec>] (polling_timer_func+0x84/0xb8) <4>[ 9.257537] r4:c1767e94 r3:00000002 <4>[ 9.261566] [<c0435d68>] (polling_timer_func+0x0/0xb8) from [<c00537e4>] (run_timer_softirq+0x17c/0x3ec) <4>[ 9.272033] r4:c1767eb0 r3:00000000 <4>[ 9.276062] [<c0053668>] (run_timer_softirq+0x0/0x3ec) from [<c004b000>] (__do_softirq+0xf0/0x298) <4>[ 9.286010] [<c004af10>] (__do_softirq+0x0/0x298) from [<c004b650>] (irq_exit+0x98/0xa0) <4>[ 9.295013] [<c004b5b8>] (irq_exit+0x0/0xa0) from [<c000edbc>] (handle_IRQ+0x60/0xc0) <4>[ 9.303680] r4:c1194e98 r3:c00bc778 <4>[ 9.307708] [<c000ed5c>] (handle_IRQ+0x0/0xc0) from [<c0008504>] (gic_handle_irq+0x34/0x68) <4>[ 9.316955] r8:000ac383 r7:d782fc3c r6:d782fc08 r5:c11936c4 r4:e0802100 <4>[ 9.324310] r3:c026ba48 <4>[ 9.327301] [<c00084d0>] (gic_handle_irq+0x0/0x68) from [<c068c2c0>] (__irq_svc+0x40/0x74) <4>[ 9.336456] Exception stack(0xd782fc08 to 0xd782fc50) <4>[ 9.342041] fc00: d6e30e6c ac383627 00000000 ac383417 ea19c000 ea200000 <4>[ 9.351104] fc20: beffffff 00000667 000ac383 d6e30670 d6e3066c d782fc94 d782fbe8 d782fc50 <4>[ 9.360168] fc40: c026ba48 c001d1f0 00000113 ffffffff Fixes: b299804 ("[BATTERY] pda_power platform driver") Signed-off-by: Michael Trimarchi <michael@amarulasolutions.com> Signed-off-by: Anthony Brandon <anthony@amarulasolutions.com> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.co.uk> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
=============================== [ INFO: suspicious RCU usage. ] 4.4.120 #194 Not tainted ------------------------------- drivers/base/power/opp/core.c:453 Missing rcu_read_lock() or opp_table_lock protection! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 1 3 locks held by swapper/0/1: #0: (&dev->mutex){......}, at: [<ffffff80082bf1e0>] __device_attach+0x28/0x104 #1: (cpu_hotplug.lock){++++++}, at: [<ffffff800809aac0>] get_online_cpus+0x38/0x9c #2: (subsys mutex#6){+.+.+.}, at: [<ffffff80082be46c>] subsys_interface_register+0x38/0xc4 stack backtrace: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.4.120 #194 Hardware name: Rockchip RK3308 evb digital-pdm mic board (DT) Call trace: [<ffffff8008087f08>] dump_backtrace+0x0/0x1b8 [<ffffff80080880d4>] show_stack+0x14/0x1c [<ffffff800820ac14>] dump_stack+0xa8/0xe8 [<ffffff80080d82d8>] lockdep_rcu_suspicious+0xe4/0x10c [<ffffff80082ce72c>] dev_pm_opp_find_freq_ceil+0x6c/0x108 [<ffffff800824d84c>] rockchip_adjust_opp_by_irdrop+0xf4/0x1e8 [<ffffff80083a66fc>] cpufreq_init+0x134/0x414 [<ffffff80083a0eec>] cpufreq_online+0x1b4/0x68c [<ffffff80083a1464>] cpufreq_add_dev+0x3c/0x94 [<ffffff80082be4dc>] subsys_interface_register+0xa8/0xc4 [<ffffff80083a1700>] cpufreq_register_driver+0x10c/0x1a8 [<ffffff80083a6aa8>] dt_cpufreq_probe+0xcc/0xe8 [<ffffff80082c0db4>] platform_drv_probe+0x54/0xa8 [<ffffff80082bf478>] driver_probe_device+0x188/0x26c [<ffffff80082bf694>] __device_attach_driver+0x60/0x9c [<ffffff80082bda4c>] bus_for_each_drv+0x84/0x94 [<ffffff80082bf264>] __device_attach+0xac/0x104 [<ffffff80082bf80c>] device_initial_probe+0x10/0x18 [<ffffff80082be8e4>] bus_probe_device+0x2c/0x90 [<ffffff80082bcd48>] device_add+0x434/0x4dc [<ffffff80082c0ad0>] platform_device_add+0xa0/0x1e4 [<ffffff80082c1520>] platform_device_register_full+0xa0/0xe0 [<ffffff800882512c>] rockchip_cpufreq_driver_init+0xb4/0x360 [<ffffff800808312c>] do_one_initcall+0xe8/0x194 [<ffffff8008810cec>] kernel_init_freeable+0x1d0/0x1d4 [<ffffff80085b8fa4>] kernel_init+0x14/0x148 [<ffffff8008082ee0>] ret_from_fork+0x10/0x30 Note: we should use dev_pm_opp_put on new kernel. Change-Id: I39e85892f6e7994bbe4816d09e4446de5cd25cad Signed-off-by: Tao Huang <huangtao@rock-chips.com>
According to the dwc2 programmer's guide v3.10a, in '2.1.3.2 Dedicated FIFO Mode with No Thresholding', it suggested that: Device RxFIFO = - Scatter/Gather DMA mode: (4 * number of control endpoints + 6) + ((largest USB packet used / 4) + 1 for status information) + (2 * number of OUT endpoints) + 1 for Global NAK on rockchip platforms: (4 * 1 + 6) + ((1024 / 4) + 1) + (2 * 6) + 1 = 280 - Slave or Buffer DMA mode: (5 * number of control endpoints + 8) + ((largest USB packet used / 4) + 1 for status information) + (2 * number of OUT endpoints) + 1 for Global NAK on rockchip platforms: (5 * 1 + 8) + ((1024 / 4) + 1) + (2 * 6) + 1 = 283 Device IN Endpoint TxFIFO = The TxFIFO must equal at least one MaxPacketSize (MPS). In addition to RxFIFO and TxFIFOs, refer to dwc2 databook v3.10a, 'Figure 2-13 Device Mode FIFO Address Mapping and AHB FIFO Access Mapping (Dedicated FIFO)', it required that when the device is operating in non Scatter Gather Internal DMA mode, the last locations of the SPRAM are used to store the DMAADDR values for each Endpoint (1 location per endpoint). When the device is operating in Scatter Gather mode, then the last locations of the SPRAM store the Base Descriptor address, Current Descriptor address, Current Buffer address, and status quadlet information for each endpoint direction (4 locations per Endpoint). If an Endpoint is bidirectional , then 4 locations will be used for IN, and another 4 for OUT). Considering that the total FIFO size of dwc2 otg is 0x3cc (972), and we must reserve (4 * 13) = 52 locations for all Endpoints. So reconfig dwc2 device fifo size as follows: Device RxFIFO = 280 Device IN Endpoint TxFIFO - FIFO #0 = (64 / 4) = 16 (Assuming this is used for EP0) - FIFO #1 = (1024/4) = 256 (Assuming this is used for Isochronous) - FIFO #2 = (512/4) = 128 - FIFO #3 = (512/4) = 128 - FIFO #4 = (256/4) = 64 - FIFO #5 = (128/4) = 32 - FIFO #6 = (64/4) = 16 After reconfig the dwc2 device fifo size, test mtp write on rockchip platform (PC -> rockchip platform) on rk312x/rk3326/px30/rk3288 evb, when mask the 'vfs_write' in f_mtp.c, the writing data rate can be increased from 16MBps ~ 20MBps to 30MBps ~ 36MBps on different kinds of rockchip evbs. Change-Id: Icdf8a5dd95f96d174233e4ffc765c9a982b9f0b6 Signed-off-by: William Wu <william.wu@rock-chips.com>
According to the dwc2 programmer's guide v3.10a, in '2.1.3.2 Dedicated FIFO Mode with No Thresholding', it suggested that: Device RxFIFO = - Scatter/Gather DMA mode: (4 * number of control endpoints + 6) + ((largest USB packet used / 4) + 1 for status information) + (2 * number of OUT endpoints) + 1 for Global NAK on rockchip platforms: (4 * 1 + 6) + ((1024 / 4) + 1) + (2 * 6) + 1 = 280 - Slave or Buffer DMA mode: (5 * number of control endpoints + 8) + ((largest USB packet used / 4) + 1 for status information) + (2 * number of OUT endpoints) + 1 for Global NAK on rockchip platforms: (5 * 1 + 8) + ((1024 / 4) + 1) + (2 * 6) + 1 = 283 Device IN Endpoint TxFIFO = The TxFIFO must equal at least one MaxPacketSize (MPS). In addition to RxFIFO and TxFIFOs, refer to dwc2 databook v3.10a, 'Figure 2-13 Device Mode FIFO Address Mapping and AHB FIFO Access Mapping (Dedicated FIFO)', it required that when the device is operating in non Scatter Gather Internal DMA mode, the last locations of the SPRAM are used to store the DMAADDR values for each Endpoint (1 location per endpoint). When the device is operating in Scatter Gather mode, then the last locations of the SPRAM store the Base Descriptor address, Current Descriptor address, Current Buffer address, and status quadlet information for each endpoint direction (4 locations per Endpoint). If an Endpoint is bidirectional , then 4 locations will be used for IN, and another 4 for OUT). Considering that the total FIFO size of dwc2 otg is 0x3cc (972), and we must reserve (4 * 13) = 52 locations for all Endpoints. So reconfig dwc2 device fifo size as follows: Device RxFIFO = 280 Device IN Endpoint TxFIFO - FIFO #0 = (64 / 4) = 16 (Assuming this is used for EP0) - FIFO #1 = (1024/4) = 256 (Assuming this is used for Isochronous) - FIFO #2 = (512/4) = 128 - FIFO #3 = (512/4) = 128 - FIFO #4 = (256/4) = 64 - FIFO #5 = (128/4) = 32 - FIFO #6 = (64/4) = 16 After reconfig the dwc2 device fifo size, test mtp write on rockchip platform (PC -> rockchip platform) on rk312x/rk3326/px30/rk3288 evb, when mask the 'vfs_write' in f_mtp.c, the writing data rate can be increased from 16MBps ~ 20MBps to 30MBps ~ 36MBps on different kinds of rockchip evbs. Change-Id: I52c64a279523c811f706e69e427b0a6e8c45683b Signed-off-by: William Wu <william.wu@rock-chips.com>
commit ab31fd0ce65ec93828b617123792c1bb7c6dcc42 upstream. v4.10 commit 6f2ce1c6af37 ("scsi: zfcp: fix rport unblock race with LUN recovery") extended accessing parent pointer fields of struct zfcp_erp_action for tracing. If an erp_action has never been enqueued before, these parent pointer fields are uninitialized and NULL. Examples are zfcp objects freshly added to the parent object's children list, before enqueueing their first recovery subsequently. In zfcp_erp_try_rport_unblock(), we iterate such list. Accessing erp_action fields can cause a NULL pointer dereference. Since the kernel can read from lowcore on s390, it does not immediately cause a kernel page fault. Instead it can cause hangs on trying to acquire the wrong erp_action->adapter->dbf->rec_lock in zfcp_dbf_rec_action_lvl() ^bogus^ while holding already other locks with IRQs disabled. Real life example from attaching lots of LUNs in parallel on many CPUs: crash> bt 17723 PID: 17723 TASK: ... CPU: 25 COMMAND: "zfcperp0.0.1800" LOWCORE INFO: -psw : 0x0404300180000000 0x000000000038e424 -function : _raw_spin_lock_wait_flags at 38e424 ... #0 [fdde8fc90] zfcp_dbf_rec_action_lvl at 3e0004e9862 [zfcp] #1 [fdde8fce8] zfcp_erp_try_rport_unblock at 3e0004dfddc [zfcp] #2 [fdde8fd38] zfcp_erp_strategy at 3e0004e0234 [zfcp] #3 [fdde8fda8] zfcp_erp_thread at 3e0004e0a12 [zfcp] #4 [fdde8fe60] kthread at 173550 #5 [fdde8feb8] kernel_thread_starter at 10add2 zfcp_adapter zfcp_port zfcp_unit <address>, 0x404040d600000000 scsi_device NULL, returning early! zfcp_scsi_dev.status = 0x40000000 0x40000000 ZFCP_STATUS_COMMON_RUNNING crash> zfcp_unit <address> struct zfcp_unit { erp_action = { adapter = 0x0, port = 0x0, unit = 0x0, }, } zfcp_erp_action is always fully embedded into its container object. Such container object is never moved in its object tree (only add or delete). Hence, erp_action parent pointers can never change. To fix the issue, initialize the erp_action parent pointers before adding the erp_action container to any list and thus before it becomes accessible from outside of its initializing function. In order to also close the time window between zfcp_erp_setup_act() memsetting the entire erp_action to zero and setting the parent pointers again, drop the memset and instead explicitly initialize individually all erp_action fields except for parent pointers. To be extra careful not to introduce any other unintended side effect, even keep zeroing the erp_action fields for list and timer. Also double-check with WARN_ON_ONCE that erp_action parent pointers never change, so we get to know when we would deviate from previous behavior. Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: 6f2ce1c6af37 ("scsi: zfcp: fix rport unblock race with LUN recovery") Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2eb9eabf1e868fda15808954fb29b0f105ed65f1 upstream. syzkaller with KASAN reported an out-of-bounds read in asn1_ber_decoder(). It can be reproduced by the following command, assuming CONFIG_X509_CERTIFICATE_PARSER=y and CONFIG_KASAN=y: keyctl add asymmetric desc $'\x30\x30' @s The bug is that the length of an ASN.1 data value isn't validated in the case where it is encoded using the short form, causing the decoder to read past the end of the input buffer. Fix it by validating the length. The bug report was: BUG: KASAN: slab-out-of-bounds in asn1_ber_decoder+0x10cb/0x1730 lib/asn1_decoder.c:233 Read of size 1 at addr ffff88003cccfa02 by task syz-executor0/6818 CPU: 1 PID: 6818 Comm: syz-executor0 Not tainted 4.14.0-rc7-00008-g5f479447d983 #2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:16 [inline] dump_stack+0xb3/0x10b lib/dump_stack.c:52 print_address_description+0x79/0x2a0 mm/kasan/report.c:252 kasan_report_error mm/kasan/report.c:351 [inline] kasan_report+0x236/0x340 mm/kasan/report.c:409 __asan_report_load1_noabort+0x14/0x20 mm/kasan/report.c:427 asn1_ber_decoder+0x10cb/0x1730 lib/asn1_decoder.c:233 x509_cert_parse+0x1db/0x650 crypto/asymmetric_keys/x509_cert_parser.c:89 x509_key_preparse+0x64/0x7a0 crypto/asymmetric_keys/x509_public_key.c:174 asymmetric_key_preparse+0xcb/0x1a0 crypto/asymmetric_keys/asymmetric_type.c:388 key_create_or_update+0x347/0xb20 security/keys/key.c:855 SYSC_add_key security/keys/keyctl.c:122 [inline] SyS_add_key+0x1cd/0x340 security/keys/keyctl.c:62 entry_SYSCALL_64_fastpath+0x1f/0xbe RIP: 0033:0x447c89 RSP: 002b:00007fca7a5d3bd8 EFLAGS: 00000246 ORIG_RAX: 00000000000000f8 RAX: ffffffffffffffda RBX: 00007fca7a5d46cc RCX: 0000000000447c89 RDX: 0000000020006f4a RSI: 0000000020006000 RDI: 0000000020001ff5 RBP: 0000000000000046 R08: fffffffffffffffd R09: 0000000000000000 R10: 0000000000000002 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000000000 R14: 00007fca7a5d49c0 R15: 00007fca7a5d4700 Fixes: 42d5ec2 ("X.509: Add an ASN.1 decoder") Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: James Morris <james.l.morris@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 624f5ab8720b3371367327a822c267699c1823b8 upstream. syzkaller reported a NULL pointer dereference in asn1_ber_decoder(). It can be reproduced by the following command, assuming CONFIG_PKCS7_TEST_KEY=y: keyctl add pkcs7_test desc '' @s The bug is that if the data buffer is empty, an integer underflow occurs in the following check: if (unlikely(dp >= datalen - 1)) goto data_overrun_error; This results in the NULL data pointer being dereferenced. Fix it by checking for 'datalen - dp < 2' instead. Also fix the similar check for 'dp >= datalen - n' later in the same function. That one possibly could result in a buffer overread. The NULL pointer dereference was reproducible using the "pkcs7_test" key type but not the "asymmetric" key type because the "asymmetric" key type checks for a 0-length payload before calling into the ASN.1 decoder but the "pkcs7_test" key type does not. The bug report was: BUG: unable to handle kernel NULL pointer dereference at (null) IP: asn1_ber_decoder+0x17f/0xe60 lib/asn1_decoder.c:233 PGD 7b708067 P4D 7b708067 PUD 7b6ee067 PMD 0 Oops: 0000 [#1] SMP Modules linked in: CPU: 0 PID: 522 Comm: syz-executor1 Not tainted 4.14.0-rc8 #7 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.3-20171021_125229-anatol 04/01/2014 task: ffff9b6b3798c040 task.stack: ffff9b6b37970000 RIP: 0010:asn1_ber_decoder+0x17f/0xe60 lib/asn1_decoder.c:233 RSP: 0018:ffff9b6b37973c78 EFLAGS: 00010216 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 000000000000021c RDX: ffffffff814a04ed RSI: ffffb1524066e000 RDI: ffffffff910759e0 RBP: ffff9b6b37973d60 R08: 0000000000000001 R09: ffff9b6b3caa4180 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000002 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 FS: 00007f10ed1f2700(0000) GS:ffff9b6b3ea00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000007b6f3000 CR4: 00000000000006f0 Call Trace: pkcs7_parse_message+0xee/0x240 crypto/asymmetric_keys/pkcs7_parser.c:139 verify_pkcs7_signature+0x33/0x180 certs/system_keyring.c:216 pkcs7_preparse+0x41/0x70 crypto/asymmetric_keys/pkcs7_key_type.c:63 key_create_or_update+0x180/0x530 security/keys/key.c:855 SYSC_add_key security/keys/keyctl.c:122 [inline] SyS_add_key+0xbf/0x250 security/keys/keyctl.c:62 entry_SYSCALL_64_fastpath+0x1f/0xbe RIP: 0033:0x4585c9 RSP: 002b:00007f10ed1f1bd8 EFLAGS: 00000216 ORIG_RAX: 00000000000000f8 RAX: ffffffffffffffda RBX: 00007f10ed1f2700 RCX: 00000000004585c9 RDX: 0000000020000000 RSI: 0000000020008ffb RDI: 0000000020008000 RBP: 0000000000000000 R08: ffffffffffffffff R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000216 R12: 00007fff1b2260ae R13: 00007fff1b2260af R14: 00007f10ed1f2700 R15: 0000000000000000 Code: dd ca ff 48 8b 45 88 48 83 e8 01 4c 39 f0 0f 86 a8 07 00 00 e8 53 dd ca ff 49 8d 46 01 48 89 85 58 ff ff ff 48 8b 85 60 ff ff ff <42> 0f b6 0c 30 89 c8 88 8d 75 ff ff ff 83 e0 1f 89 8d 28 ff ff RIP: asn1_ber_decoder+0x17f/0xe60 lib/asn1_decoder.c:233 RSP: ffff9b6b37973c78 CR2: 0000000000000000 Fixes: 42d5ec2 ("X.509: Add an ASN.1 decoder") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: James Morris <james.l.morris@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit d754941225a7dbc61f6dd2173fa9498049f9a7ee ] If, for any reason, userland shuts down iscsi transport interfaces before proper logouts - like when logging in to LUNs manually, without logging out on server shutdown, or when automated scripts can't umount/logout from logged LUNs - kernel will hang forever on its sd_sync_cache() logic, after issuing the SYNCHRONIZE_CACHE cmd to all still existent paths. PID: 1 TASK: ffff8801a69b8000 CPU: 1 COMMAND: "systemd-shutdow" #0 [ffff8801a69c3a30] __schedule at ffffffff8183e9ee #1 [ffff8801a69c3a80] schedule at ffffffff8183f0d5 #2 [ffff8801a69c3a98] schedule_timeout at ffffffff81842199 #3 [ffff8801a69c3b40] io_schedule_timeout at ffffffff8183e604 #4 [ffff8801a69c3b70] wait_for_completion_io_timeout at ffffffff8183fc6c #5 [ffff8801a69c3bd0] blk_execute_rq at ffffffff813cfe10 #6 [ffff8801a69c3c88] scsi_execute at ffffffff815c3fc7 #7 [ffff8801a69c3cc8] scsi_execute_req_flags at ffffffff815c60fe #8 [ffff8801a69c3d30] sd_sync_cache at ffffffff815d37d7 #9 [ffff8801a69c3da8] sd_shutdown at ffffffff815d3c3c This happens because iscsi_eh_cmd_timed_out(), the transport layer timeout helper, would tell the queue timeout function (scsi_times_out) to reset the request timer over and over, until the session state is back to logged in state. Unfortunately, during server shutdown, this might never happen again. Other option would be "not to handle" the issue in the transport layer. That would trigger the error handler logic, which would also need the session state to be logged in again. Best option, for such case, is to tell upper layers that the command was handled during the transport layer error handler helper, marking it as DID_NO_CONNECT, which will allow completion and inform about the problem. After the session was marked as ISCSI_STATE_FAILED, due to the first timeout during the server shutdown phase, all subsequent cmds will fail to be queued, allowing upper logic to fail faster. Signed-off-by: Rafael David Tinoco <rafael.tinoco@canonical.com> Reviewed-by: Lee Duncan <lduncan@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c755e251357a0cee0679081f08c3f4ba797a8009 upstream. The xattr_sem deadlock problems fixed in commit 2e81a4eeedca: "ext4: avoid deadlock when expanding inode size" didn't include the use of xattr_sem in fs/ext4/inline.c. With the addition of project quota which added a new extra inode field, this exposed deadlocks in the inline_data code similar to the ones fixed by 2e81a4eeedca. The deadlock can be reproduced via: dmesg -n 7 mke2fs -t ext4 -O inline_data -Fq -I 256 /dev/vdc 32768 mount -t ext4 -o debug_want_extra_isize=24 /dev/vdc /vdc mkdir /vdc/a umount /vdc mount -t ext4 /dev/vdc /vdc echo foo > /vdc/a/foo and looks like this: [ 11.158815] [ 11.160276] ============================================= [ 11.161960] [ INFO: possible recursive locking detected ] [ 11.161960] 4.10.0-rc3-00015-g011b30a8a3cf #160 Tainted: G W [ 11.161960] --------------------------------------------- [ 11.161960] bash/2519 is trying to acquire lock: [ 11.161960] (&ei->xattr_sem){++++..}, at: [<c1225a4b>] ext4_expand_extra_isize_ea+0x3d/0x4cd [ 11.161960] [ 11.161960] but task is already holding lock: [ 11.161960] (&ei->xattr_sem){++++..}, at: [<c1227941>] ext4_try_add_inline_entry+0x3a/0x152 [ 11.161960] [ 11.161960] other info that might help us debug this: [ 11.161960] Possible unsafe locking scenario: [ 11.161960] [ 11.161960] CPU0 [ 11.161960] ---- [ 11.161960] lock(&ei->xattr_sem); [ 11.161960] lock(&ei->xattr_sem); [ 11.161960] [ 11.161960] *** DEADLOCK *** [ 11.161960] [ 11.161960] May be due to missing lock nesting notation [ 11.161960] [ 11.161960] 4 locks held by bash/2519: [ 11.161960] #0: (sb_writers#3){.+.+.+}, at: [<c11a2414>] mnt_want_write+0x1e/0x3e [ 11.161960] #1: (&type->i_mutex_dir_key){++++++}, at: [<c119508b>] path_openat+0x338/0x67a [ 11.161960] #2: (jbd2_handle){++++..}, at: [<c123314a>] start_this_handle+0x582/0x622 [ 11.161960] #3: (&ei->xattr_sem){++++..}, at: [<c1227941>] ext4_try_add_inline_entry+0x3a/0x152 [ 11.161960] [ 11.161960] stack backtrace: [ 11.161960] CPU: 0 PID: 2519 Comm: bash Tainted: G W 4.10.0-rc3-00015-g011b30a8a3cf #160 [ 11.161960] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-1 04/01/2014 [ 11.161960] Call Trace: [ 11.161960] dump_stack+0x72/0xa3 [ 11.161960] __lock_acquire+0xb7c/0xcb9 [ 11.161960] ? kvm_clock_read+0x1f/0x29 [ 11.161960] ? __lock_is_held+0x36/0x66 [ 11.161960] ? __lock_is_held+0x36/0x66 [ 11.161960] lock_acquire+0x106/0x18a [ 11.161960] ? ext4_expand_extra_isize_ea+0x3d/0x4cd [ 11.161960] down_write+0x39/0x72 [ 11.161960] ? ext4_expand_extra_isize_ea+0x3d/0x4cd [ 11.161960] ext4_expand_extra_isize_ea+0x3d/0x4cd [ 11.161960] ? _raw_read_unlock+0x22/0x2c [ 11.161960] ? jbd2_journal_extend+0x1e2/0x262 [ 11.161960] ? __ext4_journal_get_write_access+0x3d/0x60 [ 11.161960] ext4_mark_inode_dirty+0x17d/0x26d [ 11.161960] ? ext4_add_dirent_to_inline.isra.12+0xa5/0xb2 [ 11.161960] ext4_add_dirent_to_inline.isra.12+0xa5/0xb2 [ 11.161960] ext4_try_add_inline_entry+0x69/0x152 [ 11.161960] ext4_add_entry+0xa3/0x848 [ 11.161960] ? __brelse+0x14/0x2f [ 11.161960] ? _raw_spin_unlock_irqrestore+0x44/0x4f [ 11.161960] ext4_add_nondir+0x17/0x5b [ 11.161960] ext4_create+0xcf/0x133 [ 11.161960] ? ext4_mknod+0x12f/0x12f [ 11.161960] lookup_open+0x39e/0x3fb [ 11.161960] ? __wake_up+0x1a/0x40 [ 11.161960] ? lock_acquire+0x11e/0x18a [ 11.161960] path_openat+0x35c/0x67a [ 11.161960] ? sched_clock_cpu+0xd7/0xf2 [ 11.161960] do_filp_open+0x36/0x7c [ 11.161960] ? _raw_spin_unlock+0x22/0x2c [ 11.161960] ? __alloc_fd+0x169/0x173 [ 11.161960] do_sys_open+0x59/0xcc [ 11.161960] SyS_open+0x1d/0x1f [ 11.161960] do_int80_syscall_32+0x4f/0x61 [ 11.161960] entry_INT80_32+0x2f/0x2f [ 11.161960] EIP: 0xb76ad469 [ 11.161960] EFLAGS: 00000286 CPU: 0 [ 11.161960] EAX: ffffffda EBX: 08168ac8 ECX: 00008241 EDX: 000001b6 [ 11.161960] ESI: b75e46bc EDI: b7755000 EBP: bfbdb108 ESP: bfbdafc0 [ 11.161960] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b Cc: stable@vger.kernel.org # 3.10 (requires 2e81a4eeedca as a prereq) Reported-by: George Spelvin <linux@sciencehorizons.net> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 9c438d7a3a52dcc2b9ed095cb87d3a5e83cf7e60 ] Adding a dns_resolver key whose payload contains a very long option name resulted in that string being printed in full. This hit the WARN_ONCE() in set_precision() during the printk(), because printk() only supports a precision of up to 32767 bytes: precision 1000000 too large WARNING: CPU: 0 PID: 752 at lib/vsprintf.c:2189 vsnprintf+0x4bc/0x5b0 Fix it by limiting option strings (combined name + value) to a much more reasonable 128 bytes. The exact limit is arbitrary, but currently the only recognized option is formatted as "dnserror=%lu" which fits well within this limit. Also ratelimit the printks. Reproducer: perl -e 'print "#", "A" x 1000000, "\x00"' | keyctl padd dns_resolver desc @s This bug was found using syzkaller. Reported-by: Mark Rutland <mark.rutland@arm.com> Fixes: 4a2d789 ("DNS: If the DNS server returns an error, allow that to be cached [ver #2]") Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5c64576a77894a50be80be0024bed27171b55989 upstream. syzkaller reports for wrong rtnl_lock usage in sync code [1] and [2] We have 2 problems in start_sync_thread if error path is taken, eg. on memory allocation error or failure to configure sockets for mcast group or addr/port binding: 1. recursive locking: holding rtnl_lock while calling sock_release which in turn calls again rtnl_lock in ip_mc_drop_socket to leave the mcast group, as noticed by Florian Westphal. Additionally, sock_release can not be called while holding sync_mutex (ABBA deadlock). 2. task hung: holding rtnl_lock while calling kthread_stop to stop the running kthreads. As the kthreads do the same to leave the mcast group (sock_release -> ip_mc_drop_socket -> rtnl_lock) they hang. Fix the problems by calling rtnl_unlock early in the error path, now sock_release is called after unlocking both mutexes. Problem 3 (task hung reported by syzkaller [2]) is variant of problem 2: use _trylock to prevent one user to call rtnl_lock and then while waiting for sync_mutex to block kthreads that execute sock_release when they are stopped by stop_sync_thread. [1] IPVS: stopping backup sync thread 4500 ... WARNING: possible recursive locking detected 4.16.0-rc7+ #3 Not tainted -------------------------------------------- syzkaller688027/4497 is trying to acquire lock: (rtnl_mutex){+.+.}, at: [<00000000bb14d7fb>] rtnl_lock+0x17/0x20 net/core/rtnetlink.c:74 but task is already holding lock: IPVS: stopping backup sync thread 4495 ... (rtnl_mutex){+.+.}, at: [<00000000bb14d7fb>] rtnl_lock+0x17/0x20 net/core/rtnetlink.c:74 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(rtnl_mutex); lock(rtnl_mutex); *** DEADLOCK *** May be due to missing lock nesting notation 2 locks held by syzkaller688027/4497: #0: (rtnl_mutex){+.+.}, at: [<00000000bb14d7fb>] rtnl_lock+0x17/0x20 net/core/rtnetlink.c:74 #1: (ipvs->sync_mutex){+.+.}, at: [<00000000703f78e3>] do_ip_vs_set_ctl+0x10f8/0x1cc0 net/netfilter/ipvs/ip_vs_ctl.c:2388 stack backtrace: CPU: 1 PID: 4497 Comm: syzkaller688027 Not tainted 4.16.0-rc7+ #3 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x194/0x24d lib/dump_stack.c:53 print_deadlock_bug kernel/locking/lockdep.c:1761 [inline] check_deadlock kernel/locking/lockdep.c:1805 [inline] validate_chain kernel/locking/lockdep.c:2401 [inline] __lock_acquire+0xe8f/0x3e00 kernel/locking/lockdep.c:3431 lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:3920 __mutex_lock_common kernel/locking/mutex.c:756 [inline] __mutex_lock+0x16f/0x1a80 kernel/locking/mutex.c:893 mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:908 rtnl_lock+0x17/0x20 net/core/rtnetlink.c:74 ip_mc_drop_socket+0x88/0x230 net/ipv4/igmp.c:2643 inet_release+0x4e/0x1c0 net/ipv4/af_inet.c:413 sock_release+0x8d/0x1e0 net/socket.c:595 start_sync_thread+0x2213/0x2b70 net/netfilter/ipvs/ip_vs_sync.c:1924 do_ip_vs_set_ctl+0x1139/0x1cc0 net/netfilter/ipvs/ip_vs_ctl.c:2389 nf_sockopt net/netfilter/nf_sockopt.c:106 [inline] nf_setsockopt+0x67/0xc0 net/netfilter/nf_sockopt.c:115 ip_setsockopt+0x97/0xa0 net/ipv4/ip_sockglue.c:1261 udp_setsockopt+0x45/0x80 net/ipv4/udp.c:2406 sock_common_setsockopt+0x95/0xd0 net/core/sock.c:2975 SYSC_setsockopt net/socket.c:1849 [inline] SyS_setsockopt+0x189/0x360 net/socket.c:1828 do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x42/0xb7 RIP: 0033:0x446a69 RSP: 002b:00007fa1c3a64da8 EFLAGS: 00000246 ORIG_RAX: 0000000000000036 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000446a69 RDX: 000000000000048b RSI: 0000000000000000 RDI: 0000000000000003 RBP: 00000000006e29fc R08: 0000000000000018 R09: 0000000000000000 R10: 00000000200000c0 R11: 0000000000000246 R12: 00000000006e29f8 R13: 00676e697279656b R14: 00007fa1c3a659c0 R15: 00000000006e2b60 [2] IPVS: sync thread started: state = BACKUP, mcast_ifn = syz_tun, syncid = 4, id = 0 IPVS: stopping backup sync thread 25415 ... INFO: task syz-executor7:25421 blocked for more than 120 seconds. Not tainted 4.16.0-rc6+ #284 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. syz-executor7 D23688 25421 4408 0x00000004 Call Trace: context_switch kernel/sched/core.c:2862 [inline] __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440 schedule+0xf5/0x430 kernel/sched/core.c:3499 schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777 do_wait_for_common kernel/sched/completion.c:86 [inline] __wait_for_common kernel/sched/completion.c:107 [inline] wait_for_common kernel/sched/completion.c:118 [inline] wait_for_completion+0x415/0x770 kernel/sched/completion.c:139 kthread_stop+0x14a/0x7a0 kernel/kthread.c:530 stop_sync_thread+0x3d9/0x740 net/netfilter/ipvs/ip_vs_sync.c:1996 do_ip_vs_set_ctl+0x2b1/0x1cc0 net/netfilter/ipvs/ip_vs_ctl.c:2394 nf_sockopt net/netfilter/nf_sockopt.c:106 [inline] nf_setsockopt+0x67/0xc0 net/netfilter/nf_sockopt.c:115 ip_setsockopt+0x97/0xa0 net/ipv4/ip_sockglue.c:1253 sctp_setsockopt+0x2ca/0x63e0 net/sctp/socket.c:4154 sock_common_setsockopt+0x95/0xd0 net/core/sock.c:3039 SYSC_setsockopt net/socket.c:1850 [inline] SyS_setsockopt+0x189/0x360 net/socket.c:1829 do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x42/0xb7 RIP: 0033:0x454889 RSP: 002b:00007fc927626c68 EFLAGS: 00000246 ORIG_RAX: 0000000000000036 RAX: ffffffffffffffda RBX: 00007fc9276276d4 RCX: 0000000000454889 RDX: 000000000000048c RSI: 0000000000000000 RDI: 0000000000000017 RBP: 000000000072bf58 R08: 0000000000000018 R09: 0000000000000000 R10: 0000000020000000 R11: 0000000000000246 R12: 00000000ffffffff R13: 000000000000051c R14: 00000000006f9b40 R15: 0000000000000001 Showing all locks held in the system: 2 locks held by khungtaskd/868: #0: (rcu_read_lock){....}, at: [<00000000a1a8f002>] check_hung_uninterruptible_tasks kernel/hung_task.c:175 [inline] #0: (rcu_read_lock){....}, at: [<00000000a1a8f002>] watchdog+0x1c5/0xd60 kernel/hung_task.c:249 #1: (tasklist_lock){.+.+}, at: [<0000000037c2f8f9>] debug_show_all_locks+0xd3/0x3d0 kernel/locking/lockdep.c:4470 1 lock held by rsyslogd/4247: #0: (&f->f_pos_lock){+.+.}, at: [<000000000d8d6983>] __fdget_pos+0x12b/0x190 fs/file.c:765 2 locks held by getty/4338: #0: (&tty->ldisc_sem){++++}, at: [<00000000bee98654>] ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365 #1: (&ldata->atomic_read_lock){+.+.}, at: [<00000000c1d180aa>] n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131 2 locks held by getty/4339: #0: (&tty->ldisc_sem){++++}, at: [<00000000bee98654>] ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365 #1: (&ldata->atomic_read_lock){+.+.}, at: [<00000000c1d180aa>] n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131 2 locks held by getty/4340: #0: (&tty->ldisc_sem){++++}, at: [<00000000bee98654>] ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365 #1: (&ldata->atomic_read_lock){+.+.}, at: [<00000000c1d180aa>] n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131 2 locks held by getty/4341: #0: (&tty->ldisc_sem){++++}, at: [<00000000bee98654>] ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365 #1: (&ldata->atomic_read_lock){+.+.}, at: [<00000000c1d180aa>] n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131 2 locks held by getty/4342: #0: (&tty->ldisc_sem){++++}, at: [<00000000bee98654>] ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365 #1: (&ldata->atomic_read_lock){+.+.}, at: [<00000000c1d180aa>] n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131 2 locks held by getty/4343: #0: (&tty->ldisc_sem){++++}, at: [<00000000bee98654>] ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365 #1: (&ldata->atomic_read_lock){+.+.}, at: [<00000000c1d180aa>] n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131 2 locks held by getty/4344: #0: (&tty->ldisc_sem){++++}, at: [<00000000bee98654>] ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365 #1: (&ldata->atomic_read_lock){+.+.}, at: [<00000000c1d180aa>] n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131 3 locks held by kworker/0:5/6494: #0: ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at: [<00000000a062b18e>] work_static include/linux/workqueue.h:198 [inline] #0: ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at: [<00000000a062b18e>] set_work_data kernel/workqueue.c:619 [inline] #0: ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at: [<00000000a062b18e>] set_work_pool_and_clear_pending kernel/workqueue.c:646 [inline] #0: ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at: [<00000000a062b18e>] process_one_work+0xb12/0x1bb0 kernel/workqueue.c:2084 #1: ((addr_chk_work).work){+.+.}, at: [<00000000278427d5>] process_one_work+0xb89/0x1bb0 kernel/workqueue.c:2088 #2: (rtnl_mutex){+.+.}, at: [<00000000066e35ac>] rtnl_lock+0x17/0x20 net/core/rtnetlink.c:74 1 lock held by syz-executor7/25421: #0: (ipvs->sync_mutex){+.+.}, at: [<00000000d414a689>] do_ip_vs_set_ctl+0x277/0x1cc0 net/netfilter/ipvs/ip_vs_ctl.c:2393 2 locks held by syz-executor7/25427: #0: (rtnl_mutex){+.+.}, at: [<00000000066e35ac>] rtnl_lock+0x17/0x20 net/core/rtnetlink.c:74 #1: (ipvs->sync_mutex){+.+.}, at: [<00000000e6d48489>] do_ip_vs_set_ctl+0x10f8/0x1cc0 net/netfilter/ipvs/ip_vs_ctl.c:2388 1 lock held by syz-executor7/25435: #0: (rtnl_mutex){+.+.}, at: [<00000000066e35ac>] rtnl_lock+0x17/0x20 net/core/rtnetlink.c:74 1 lock held by ipvs-b:2:0/25415: #0: (rtnl_mutex){+.+.}, at: [<00000000066e35ac>] rtnl_lock+0x17/0x20 net/core/rtnetlink.c:74 Reported-and-tested-by: syzbot+a46d6abf9d56b1365a72@syzkaller.appspotmail.com Reported-and-tested-by: syzbot+5fe074c01b2032ce9618@syzkaller.appspotmail.com Fixes: e0b26cc ("ipvs: call rtnl_lock early") Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Zubin Mithra <zsm@chromium.org> Cc: Guenter Roeck <groeck@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit fba9eb7946251d6e420df3bdf7bc45195be7be9a ] Add a header with macros usable in assembler files to emit alternative code sequences. It works analog to the alternatives for inline assmeblies in C files, with the same restrictions and capabilities. The syntax is ALTERNATIVE "<default instructions sequence>", \ "<alternative instructions sequence>", \ "<features-bit>" and ALTERNATIVE_2 "<default instructions sequence>", \ "<alternative instructions sqeuence TinkerBoard#1>", \ "<feature-bit TinkerBoard#1>", "<alternative instructions sqeuence TinkerBoard#2>", \ "<feature-bit TinkerBoard#2>" Reviewed-by: Vasily Gorbik <gor@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 2c0aa08631b86a4678dbc93b9caa5248014b4458 ] Scenario: 1. Port down and do fail over 2. Ap do rds_bind syscall PID: 47039 TASK: ffff89887e2fe640 CPU: 47 COMMAND: "kworker/u:6" #0 [ffff898e35f159f0] machine_kexec at ffffffff8103abf9 TinkerBoard#1 [ffff898e35f15a60] crash_kexec at ffffffff810b96e3 TinkerBoard#2 [ffff898e35f15b30] oops_end at ffffffff8150f518 TinkerBoard#3 [ffff898e35f15b60] no_context at ffffffff8104854c TinkerBoard#4 [ffff898e35f15ba0] __bad_area_nosemaphore at ffffffff81048675 TinkerBoard#5 [ffff898e35f15bf0] bad_area_nosemaphore at ffffffff810487d3 TinkerBoard#6 [ffff898e35f15c00] do_page_fault at ffffffff815120b8 TinkerBoard#7 [ffff898e35f15d10] page_fault at ffffffff8150ea95 [exception RIP: unknown or invalid address] RIP: 0000000000000000 RSP: ffff898e35f15dc8 RFLAGS: 00010282 RAX: 00000000fffffffe RBX: ffff889b77f6fc00 RCX:ffffffff81c99d88 RDX: 0000000000000000 RSI: ffff896019ee08e8 RDI:ffff889b77f6fc00 RBP: ffff898e35f15df0 R8: ffff896019ee08c8 R9:0000000000000000 R10: 0000000000000400 R11: 0000000000000000 R12:ffff896019ee08c0 R13: ffff889b77f6fe68 R14: ffffffff81c99d80 R15: ffffffffa022a1e0 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 TinkerBoard#8 [ffff898e35f15dc8] cma_ndev_work_handler at ffffffffa022a228 [rdma_cm] TinkerBoard#9 [ffff898e35f15df8] process_one_work at ffffffff8108a7c6 TinkerBoard#10 [ffff898e35f15e58] worker_thread at ffffffff8108bda0 TinkerBoard#11 [ffff898e35f15ee8] kthread at ffffffff81090fe6 PID: 45659 TASK: ffff880d313d2500 CPU: 31 COMMAND: "oracle_45659_ap" #0 [ffff881024ccfc98] __schedule at ffffffff8150bac4 TinkerBoard#1 [ffff881024ccfd40] schedule at ffffffff8150c2cf TinkerBoard#2 [ffff881024ccfd50] __mutex_lock_slowpath at ffffffff8150cee7 TinkerBoard#3 [ffff881024ccfdc0] mutex_lock at ffffffff8150cdeb TinkerBoard#4 [ffff881024ccfde0] rdma_destroy_id at ffffffffa022a027 [rdma_cm] TinkerBoard#5 [ffff881024ccfe10] rds_ib_laddr_check at ffffffffa0357857 [rds_rdma] TinkerBoard#6 [ffff881024ccfe50] rds_trans_get_preferred at ffffffffa0324c2a [rds] TinkerBoard#7 [ffff881024ccfe80] rds_bind at ffffffffa031d690 [rds] TinkerBoard#8 [ffff881024ccfeb0] sys_bind at ffffffff8142a670 PID: 45659 PID: 47039 rds_ib_laddr_check /* create id_priv with a null event_handler */ rdma_create_id rdma_bind_addr cma_acquire_dev /* add id_priv to cma_dev->id_list */ cma_attach_to_dev cma_ndev_work_handler /* event_hanlder is null */ id_priv->id.event_handler Signed-off-by: Guanglei Li <guanglei.li@oracle.com> Signed-off-by: Honglei Wang <honglei.wang@oracle.com> Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Yanjun Zhu <yanjun.zhu@oracle.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Acked-by: Doug Ledford <dledford@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit b6dd4d83dc2f78cebc9a7e6e7e4bc2be4d29b94d ] The pr_debug() in gic-v3 gic_send_sgi() can trigger a circular locking warning: GICv3: CPU10: ICC_SGI1R_EL1 5000400 ====================================================== WARNING: possible circular locking dependency detected 4.15.0+ TinkerBoard#1 Tainted: G W ------------------------------------------------------ dynamic_debug01/1873 is trying to acquire lock: ((console_sem).lock){-...}, at: [<0000000099c891ec>] down_trylock+0x20/0x4c but task is already holding lock: (&rq->lock){-.-.}, at: [<00000000842e1587>] __task_rq_lock+0x54/0xdc which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> TinkerBoard#2 (&rq->lock){-.-.}: __lock_acquire+0x3b4/0x6e0 lock_acquire+0xf4/0x2a8 _raw_spin_lock+0x4c/0x60 task_fork_fair+0x3c/0x148 sched_fork+0x10c/0x214 copy_process.isra.32.part.33+0x4e8/0x14f0 _do_fork+0xe8/0x78c kernel_thread+0x48/0x54 rest_init+0x34/0x2a4 start_kernel+0x45c/0x488 -> TinkerBoard#1 (&p->pi_lock){-.-.}: __lock_acquire+0x3b4/0x6e0 lock_acquire+0xf4/0x2a8 _raw_spin_lock_irqsave+0x58/0x70 try_to_wake_up+0x48/0x600 wake_up_process+0x28/0x34 __up.isra.0+0x60/0x6c up+0x60/0x68 __up_console_sem+0x4c/0x7c console_unlock+0x328/0x634 vprintk_emit+0x25c/0x390 dev_vprintk_emit+0xc4/0x1fc dev_printk_emit+0x88/0xa8 __dev_printk+0x58/0x9c _dev_info+0x84/0xa8 usb_new_device+0x100/0x474 hub_port_connect+0x280/0x92c hub_event+0x740/0xa84 process_one_work+0x240/0x70c worker_thread+0x60/0x400 kthread+0x110/0x13c ret_from_fork+0x10/0x18 -> #0 ((console_sem).lock){-...}: validate_chain.isra.34+0x6e4/0xa20 __lock_acquire+0x3b4/0x6e0 lock_acquire+0xf4/0x2a8 _raw_spin_lock_irqsave+0x58/0x70 down_trylock+0x20/0x4c __down_trylock_console_sem+0x3c/0x9c console_trylock+0x20/0xb0 vprintk_emit+0x254/0x390 vprintk_default+0x58/0x90 vprintk_func+0xbc/0x164 printk+0x80/0xa0 __dynamic_pr_debug+0x84/0xac gic_raise_softirq+0x184/0x18c smp_cross_call+0xac/0x218 smp_send_reschedule+0x3c/0x48 resched_curr+0x60/0x9c check_preempt_curr+0x70/0xdc wake_up_new_task+0x310/0x470 _do_fork+0x188/0x78c SyS_clone+0x44/0x50 __sys_trace_return+0x0/0x4 other info that might help us debug this: Chain exists of: (console_sem).lock --> &p->pi_lock --> &rq->lock Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&rq->lock); lock(&p->pi_lock); lock(&rq->lock); lock((console_sem).lock); *** DEADLOCK *** 2 locks held by dynamic_debug01/1873: #0: (&p->pi_lock){-.-.}, at: [<000000001366df53>] wake_up_new_task+0x40/0x470 TinkerBoard#1: (&rq->lock){-.-.}, at: [<00000000842e1587>] __task_rq_lock+0x54/0xdc stack backtrace: CPU: 10 PID: 1873 Comm: dynamic_debug01 Tainted: G W 4.15.0+ TinkerBoard#1 Hardware name: GIGABYTE R120-T34-00/MT30-GS2-00, BIOS T48 10/02/2017 Call trace: dump_backtrace+0x0/0x188 show_stack+0x24/0x2c dump_stack+0xa4/0xe0 print_circular_bug.isra.31+0x29c/0x2b8 check_prev_add.constprop.39+0x6c8/0x6dc validate_chain.isra.34+0x6e4/0xa20 __lock_acquire+0x3b4/0x6e0 lock_acquire+0xf4/0x2a8 _raw_spin_lock_irqsave+0x58/0x70 down_trylock+0x20/0x4c __down_trylock_console_sem+0x3c/0x9c console_trylock+0x20/0xb0 vprintk_emit+0x254/0x390 vprintk_default+0x58/0x90 vprintk_func+0xbc/0x164 printk+0x80/0xa0 __dynamic_pr_debug+0x84/0xac gic_raise_softirq+0x184/0x18c smp_cross_call+0xac/0x218 smp_send_reschedule+0x3c/0x48 resched_curr+0x60/0x9c check_preempt_curr+0x70/0xdc wake_up_new_task+0x310/0x470 _do_fork+0x188/0x78c SyS_clone+0x44/0x50 __sys_trace_return+0x0/0x4 GICv3: CPU0: ICC_SGI1R_EL1 12000 This could be fixed with printk_deferred() but that might lessen its usefulness for debugging. So change it to pr_devel to keep it out of production kernels. Developers working on gic-v3 can enable it as needed in their kernels. Signed-off-by: Mark Salter <msalter@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 2bbea6e117357d17842114c65e9a9cf2d13ae8a3 ] when mounting an ISO filesystem sometimes (very rarely) the system hangs because of a race condition between two tasks. PID: 6766 TASK: ffff88007b2a6dd0 CPU: 0 COMMAND: "mount" #0 [ffff880078447ae0] __schedule at ffffffff8168d605 TinkerBoard#1 [ffff880078447b48] schedule_preempt_disabled at ffffffff8168ed49 TinkerBoard#2 [ffff880078447b58] __mutex_lock_slowpath at ffffffff8168c995 TinkerBoard#3 [ffff880078447bb8] mutex_lock at ffffffff8168bdef TinkerBoard#4 [ffff880078447bd0] sr_block_ioctl at ffffffffa00b6818 [sr_mod] TinkerBoard#5 [ffff880078447c10] blkdev_ioctl at ffffffff812fea50 TinkerBoard#6 [ffff880078447c70] ioctl_by_bdev at ffffffff8123a8b3 TinkerBoard#7 [ffff880078447c90] isofs_fill_super at ffffffffa04fb1e1 [isofs] TinkerBoard#8 [ffff880078447da8] mount_bdev at ffffffff81202570 TinkerBoard#9 [ffff880078447e18] isofs_mount at ffffffffa04f9828 [isofs] TinkerBoard#10 [ffff880078447e28] mount_fs at ffffffff81202d09 TinkerBoard#11 [ffff880078447e70] vfs_kern_mount at ffffffff8121ea8f TinkerBoard#12 [ffff880078447ea8] do_mount at ffffffff81220fee TinkerBoard#13 [ffff880078447f28] sys_mount at ffffffff812218d6 TinkerBoard#14 [ffff880078447f80] system_call_fastpath at ffffffff81698c49 RIP: 00007fd9ea914e9a RSP: 00007ffd5d9bf648 RFLAGS: 00010246 RAX: 00000000000000a5 RBX: ffffffff81698c49 RCX: 0000000000000010 RDX: 00007fd9ec2bc210 RSI: 00007fd9ec2bc290 RDI: 00007fd9ec2bcf30 RBP: 0000000000000000 R8: 0000000000000000 R9: 0000000000000010 R10: 00000000c0ed0001 R11: 0000000000000206 R12: 00007fd9ec2bc040 R13: 00007fd9eb6b2380 R14: 00007fd9ec2bc210 R15: 00007fd9ec2bcf30 ORIG_RAX: 00000000000000a5 CS: 0033 SS: 002b This task was trying to mount the cdrom. It allocated and configured a super_block struct and owned the write-lock for the super_block->s_umount rwsem. While exclusively owning the s_umount lock, it called sr_block_ioctl and waited to acquire the global sr_mutex lock. PID: 6785 TASK: ffff880078720fb0 CPU: 0 COMMAND: "systemd-udevd" #0 [ffff880078417898] __schedule at ffffffff8168d605 TinkerBoard#1 [ffff880078417900] schedule at ffffffff8168dc59 TinkerBoard#2 [ffff880078417910] rwsem_down_read_failed at ffffffff8168f605 TinkerBoard#3 [ffff880078417980] call_rwsem_down_read_failed at ffffffff81328838 TinkerBoard#4 [ffff8800784179d0] down_read at ffffffff8168cde0 TinkerBoard#5 [ffff8800784179e8] get_super at ffffffff81201cc7 TinkerBoard#6 [ffff880078417a10] __invalidate_device at ffffffff8123a8de TinkerBoard#7 [ffff880078417a40] flush_disk at ffffffff8123a94b TinkerBoard#8 [ffff880078417a88] check_disk_change at ffffffff8123ab50 TinkerBoard#9 [ffff880078417ab0] cdrom_open at ffffffffa00a29e1 [cdrom] TinkerBoard#10 [ffff880078417b68] sr_block_open at ffffffffa00b6f9b [sr_mod] TinkerBoard#11 [ffff880078417b98] __blkdev_get at ffffffff8123ba86 TinkerBoard#12 [ffff880078417bf0] blkdev_get at ffffffff8123bd65 TinkerBoard#13 [ffff880078417c78] blkdev_open at ffffffff8123bf9b TinkerBoard#14 [ffff880078417c90] do_dentry_open at ffffffff811fc7f7 TinkerBoard#15 [ffff880078417cd8] vfs_open at ffffffff811fc9cf TinkerBoard#16 [ffff880078417d00] do_last at ffffffff8120d53d TinkerBoard#17 [ffff880078417db0] path_openat at ffffffff8120e6b2 TinkerBoard#18 [ffff880078417e48] do_filp_open at ffffffff8121082b TinkerBoard#19 [ffff880078417f18] do_sys_open at ffffffff811fdd33 TinkerBoard#20 [ffff880078417f70] sys_open at ffffffff811fde4e TinkerBoard#21 [ffff880078417f80] system_call_fastpath at ffffffff81698c49 RIP: 00007f29438b0c20 RSP: 00007ffc76624b78 RFLAGS: 00010246 RAX: 0000000000000002 RBX: ffffffff81698c49 RCX: 0000000000000000 RDX: 00007f2944a5fa70 RSI: 00000000000a0800 RDI: 00007f2944a5fa70 RBP: 00007f2944a5f540 R8: 0000000000000000 R9: 0000000000000020 R10: 00007f2943614c40 R11: 0000000000000246 R12: ffffffff811fde4e R13: ffff880078417f78 R14: 000000000000000c R15: 00007f2944a4b010 ORIG_RAX: 0000000000000002 CS: 0033 SS: 002b This task tried to open the cdrom device, the sr_block_open function acquired the global sr_mutex lock. The call to check_disk_change() then saw an event flag indicating a possible media change and tried to flush any cached data for the device. As part of the flush, it tried to acquire the super_block->s_umount lock associated with the cdrom device. This was the same super_block as created and locked by the previous task. The first task acquires the s_umount lock and then the sr_mutex_lock; the second task acquires the sr_mutex_lock and then the s_umount lock. This patch fixes the issue by moving check_disk_change() out of cdrom_open() and let the caller take care of it. Signed-off-by: Maurizio Lombardi <mlombard@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit a3ca831249ca8c4c226e4ceafee04e280152e59d ] When booting up with "threadirqs" in command line, all irq handlers of the DMA controller pl330 will be threaded forcedly. These threads will race for the same list, pl330->req_done. Before the callback, the spinlock was released. And after it, the spinlock was taken. This opened an race window where another threaded irq handler could steal the spinlock and be permitted to delete entries of the list, pl330->req_done. If the later deleted an entry that was still referred to by the former, there would be a kernel panic when the former was scheduled and tried to get the next sibling of the deleted entry. The scenario could be depicted as below: Thread: T1 pl330->req_done Thread: T2 | | | | -A-B-C-D- | Locked | | | | Waiting Del A | | | -B-C-D- | Unlocked | | | | Locked Waiting | | | | Del B | | | | -C-D- Unlocked Waiting | | | Locked | get C via B \ - Kernel panic The kernel panic looked like as below: Unable to handle kernel paging request at virtual address dead000000000108 pgd = ffffff8008c9e000 [dead000000000108] *pgd=000000027fffe003, *pud=000000027fffe003, *pmd=0000000000000000 Internal error: Oops: 96000044 [TinkerBoard#1] PREEMPT SMP Modules linked in: CPU: 0 PID: 85 Comm: irq/59-66330000 Not tainted 4.8.24-WR9.0.0.12_standard TinkerBoard#2 Hardware name: Broadcom NS2 SVK (DT) task: ffffffc1f5cc3c00 task.stack: ffffffc1f5ce0000 PC is at pl330_irq_handler+0x27c/0x390 LR is at pl330_irq_handler+0x2a8/0x390 pc : [<ffffff80084cb694>] lr : [<ffffff80084cb6c0>] pstate: 800001c5 sp : ffffffc1f5ce3d00 x29: ffffffc1f5ce3d00 x28: 0000000000000140 x27: ffffffc1f5c530b0 x26: dead000000000100 x25: dead000000000200 x24: 0000000000418958 x23: 0000000000000001 x22: ffffffc1f5ccd668 x21: ffffffc1f5ccd590 x20: ffffffc1f5ccd418 x19: dead000000000060 x18: 0000000000000001 x17: 0000000000000007 x16: 0000000000000001 x15: ffffffffffffffff x14: ffffffffffffffff x13: ffffffffffffffff x12: 0000000000000000 x11: 0000000000000001 x10: 0000000000000840 x9 : ffffffc1f5ce0000 x8 : ffffffc1f5cc3338 x7 : ffffff8008ce2020 x6 : 0000000000000000 x5 : 0000000000000000 x4 : 0000000000000001 x3 : dead000000000200 x2 : dead000000000100 x1 : 0000000000000140 x0 : ffffffc1f5ccd590 Process irq/59-66330000 (pid: 85, stack limit = 0xffffffc1f5ce0020) Stack: (0xffffffc1f5ce3d00 to 0xffffffc1f5ce4000) 3d00: ffffffc1f5ce3d80 ffffff80080f09d0 ffffffc1f5ca0c00 ffffffc1f6f7c600 3d20: ffffffc1f5ce0000 ffffffc1f6f7c600 ffffffc1f5ca0c00 ffffff80080f0998 3d40: ffffffc1f5ce0000 ffffff80080f0000 0000000000000000 0000000000000000 3d60: ffffff8008ce202c ffffff8008ce2020 ffffffc1f5ccd668 ffffffc1f5c530b0 3d80: ffffffc1f5ce3db0 ffffff80080f0d70 ffffffc1f5ca0c40 0000000000000001 3da0: ffffffc1f5ce0000 ffffff80080f0cfc ffffffc1f5ce3e20 ffffff80080bf4f8 3dc0: ffffffc1f5ca0c80 ffffff8008bf3798 ffffff8008955528 ffffffc1f5ca0c00 3de0: ffffff80080f0c30 0000000000000000 0000000000000000 0000000000000000 3e00: 0000000000000000 0000000000000000 0000000000000000 ffffff80080f0b68 3e20: 0000000000000000 ffffff8008083690 ffffff80080bf420 ffffffc1f5ca0c80 3e40: 0000000000000000 0000000000000000 0000000000000000 ffffff80080cb648 3e60: ffffff8008b1c780 0000000000000000 0000000000000000 ffffffc1f5ca0c00 3e80: ffffffc100000000 ffffff8000000000 ffffffc1f5ce3e90 ffffffc1f5ce3e90 3ea0: 0000000000000000 ffffff8000000000 ffffffc1f5ce3eb0 ffffffc1f5ce3eb0 3ec0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 3ee0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 3f00: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 3f20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 3f40: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 3f60: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 3f80: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 3fa0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 3fc0: 0000000000000000 0000000000000005 0000000000000000 0000000000000000 3fe0: 0000000000000000 0000000000000000 0000000275ce3ff0 0000000275ce3ff8 Call trace: Exception stack(0xffffffc1f5ce3b30 to 0xffffffc1f5ce3c60) 3b20: dead000000000060 0000008000000000 3b40: ffffffc1f5ce3d00 ffffff80084cb694 0000000000000008 0000000000000e88 3b60: ffffffc1f5ce3bb0 ffffff80080dac68 ffffffc1f5ce3b90 ffffff8008826fe4 3b80: 00000000000001c0 00000000000001c0 ffffffc1f5ce3bb0 ffffff800848dfcc 3ba0: 0000000000020000 ffffff8008b15ae4 ffffffc1f5ce3c00 ffffff800808f000 3bc0: 0000000000000010 ffffff80088377f0 ffffffc1f5ccd590 0000000000000140 3be0: dead000000000100 dead000000000200 0000000000000001 0000000000000000 3c00: 0000000000000000 ffffff8008ce2020 ffffffc1f5cc3338 ffffffc1f5ce0000 3c20: 0000000000000840 0000000000000001 0000000000000000 ffffffffffffffff 3c40: ffffffffffffffff ffffffffffffffff 0000000000000001 0000000000000007 [<ffffff80084cb694>] pl330_irq_handler+0x27c/0x390 [<ffffff80080f09d0>] irq_forced_thread_fn+0x38/0x88 [<ffffff80080f0d70>] irq_thread+0x140/0x200 [<ffffff80080bf4f8>] kthread+0xd8/0xf0 [<ffffff8008083690>] ret_from_fork+0x10/0x40 Code: f2a00838 f9405763 aa1c03e1 aa1503e0 (f9000443) ---[ end trace f50005726d31199c ]--- Kernel panic - not syncing: Fatal exception in interrupt SMP: stopping secondary CPUs SMP: failed to stop secondary CPUs 0-1 Kernel Offset: disabled Memory Limit: none ---[ end Kernel panic - not syncing: Fatal exception in interrupt To fix this, re-start with the list-head after dropping the lock then re-takeing it. Reviewed-by: Frank Mori Hess <fmh6jj@gmail.com> Tested-by: Frank Mori Hess <fmh6jj@gmail.com> Signed-off-by: Qi Hou <qi.hou@windriver.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
While testing the deadline scheduler + cgroup setup I hit this warning. [ 132.612935] ------------[ cut here ]------------ [ 132.612951] WARNING: CPU: 5 PID: 0 at kernel/softirq.c:150 __local_bh_enable_ip+0x6b/0x80 [ 132.612952] Modules linked in: (a ton of modules...) [ 132.612981] CPU: 5 PID: 0 Comm: swapper/5 Not tainted 4.7.0-rc2 TinkerBoard#2 [ 132.612981] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.2-20150714_191134- 04/01/2014 [ 132.612982] 0000000000000086 45c8bb5effdd088b ffff88013fd43da0 ffffffff813d229e [ 132.612984] 0000000000000000 0000000000000000 ffff88013fd43de0 ffffffff810a652b [ 132.612985] 00000096811387b5 0000000000000200 ffff8800bab29d80 ffff880034c54c00 [ 132.612986] Call Trace: [ 132.612987] <IRQ> [<ffffffff813d229e>] dump_stack+0x63/0x85 [ 132.612994] [<ffffffff810a652b>] __warn+0xcb/0xf0 [ 132.612997] [<ffffffff810e76a0>] ? push_dl_task.part.32+0x170/0x170 [ 132.612999] [<ffffffff810a665d>] warn_slowpath_null+0x1d/0x20 [ 132.613000] [<ffffffff810aba5b>] __local_bh_enable_ip+0x6b/0x80 [ 132.613008] [<ffffffff817d6c8a>] _raw_write_unlock_bh+0x1a/0x20 [ 132.613010] [<ffffffff817d6c9e>] _raw_spin_unlock_bh+0xe/0x10 [ 132.613015] [<ffffffff811388ac>] put_css_set+0x5c/0x60 [ 132.613016] [<ffffffff8113dc7f>] cgroup_free+0x7f/0xa0 [ 132.613017] [<ffffffff810a3912>] __put_task_struct+0x42/0x140 [ 132.613018] [<ffffffff810e776a>] dl_task_timer+0xca/0x250 [ 132.613027] [<ffffffff810e76a0>] ? push_dl_task.part.32+0x170/0x170 [ 132.613030] [<ffffffff8111371e>] __hrtimer_run_queues+0xee/0x270 [ 132.613031] [<ffffffff81113ec8>] hrtimer_interrupt+0xa8/0x190 [ 132.613034] [<ffffffff81051a58>] local_apic_timer_interrupt+0x38/0x60 [ 132.613035] [<ffffffff817d9b0d>] smp_apic_timer_interrupt+0x3d/0x50 [ 132.613037] [<ffffffff817d7c5c>] apic_timer_interrupt+0x8c/0xa0 [ 132.613038] <EOI> [<ffffffff81063466>] ? native_safe_halt+0x6/0x10 [ 132.613043] [<ffffffff81037a4e>] default_idle+0x1e/0xd0 [ 132.613044] [<ffffffff810381cf>] arch_cpu_idle+0xf/0x20 [ 132.613046] [<ffffffff810e8fda>] default_idle_call+0x2a/0x40 [ 132.613047] [<ffffffff810e92d7>] cpu_startup_entry+0x2e7/0x340 [ 132.613048] [<ffffffff81050235>] start_secondary+0x155/0x190 [ 132.613049] ---[ end trace f91934d162ce9977 ]--- The warn is the spin_(lock|unlock)_bh(&css_set_lock) in the interrupt context. Converting the spin_lock_bh to spin_lock_irq(save) to avoid this problem - and other problems of sharing a spinlock with an interrupt. Change-Id: I5b5d5c79c3f380ac35f58596fc2cebaf6348eb67 Cc: Tejun Heo <tj@kernel.org> Cc: Li Zefan <lizefan@huawei.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Juri Lelli <juri.lelli@arm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: cgroups@vger.kernel.org Cc: stable@vger.kernel.org # 4.5+ Cc: linux-kernel@vger.kernel.org Reviewed-by: Rik van Riel <riel@redhat.com> Reviewed-by: "Luis Claudio R. Goncalves" <lgoncalv@redhat.com> Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com> Acked-by: Zefan Li <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>
commit df30781699f53e4fd4c494c6f7dd16e3d5c21d30 upstream. For problem determination we need to see whether and why we were successful or not. This allows deduction of scsi_eh escalation. Example trace record formatted with zfcpdbf from s390-tools: Timestamp : ... Area : SCSI Subarea : 00 Level : 1 Exception : - CPU ID : .. Caller : 0x... Record ID : 1 Tag : schrh_r SCSI host reset handler result Request ID : 0x0000000000000000 none (invalid) SCSI ID : 0xffffffff none (invalid) SCSI LUN : 0xffffffff none (invalid) SCSI LUN high : 0xffffffff none (invalid) SCSI result : 0x00002002 field re-used for midlayer value: SUCCESS or in other cases: 0x2009 == FAST_IO_FAIL SCSI retries : 0xff none (invalid) SCSI allowed : 0xff none (invalid) SCSI scribble : 0xffffffffffffffff none (invalid) SCSI opcode : ffffffff ffffffff ffffffff ffffffff none (invalid) FCP rsp inf cod: 0xff none (invalid) FCP rsp IU : 00000000 00000000 00000000 00000000 none (invalid) 00000000 00000000 v2.6.35 commit a1dbfdd ("[SCSI] zfcp: Pass return code from fc_block_scsi_eh to scsi eh") introduced the first return with something other than the previously hardcoded single SUCCESS return path. Signed-off-by: Steffen Maier <maier@linux.ibm.com> Fixes: a1dbfdd ("[SCSI] zfcp: Pass return code from fc_block_scsi_eh to scsi eh") Cc: <stable@vger.kernel.org> TinkerBoard#2.6.38+ Reviewed-by: Jens Remus <jremus@linux.ibm.com> Reviewed-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 81979ae63e872ef650a7197f6ce6590059d37172 upstream. We already have a SCSI trace for the end of abort and scsi_eh TMF. Due to zfcp_erp_wait() and fc_block_scsi_eh() time can pass between the start of our eh callback and an actual send/recv of an abort / TMF request. In order to see the temporal sequence including any abort / TMF send retries, add a trace before the above two blocking functions. This supports problem determination with scsi_eh and parallel zfcp ERP. No need to explicitly trace the beginning of our eh callback, since we typically can send an abort / TMF and see its HBA response (in the worst case, it's a pseudo response on dismiss all of adapter recovery, e.g. due to an FSF request timeout [fsrth_1] of the abort / TMF). If we cannot send, we now get a trace record for the first "abrt_wt" or "[lt]r_wait" which denotes almost the beginning of the callback. No need to explicitly trace the wakeup after the above two blocking functions because the next retry loop causes another trace in any case and that is sufficient. Example trace records formatted with zfcpdbf from s390-tools: Timestamp : ... Area : SCSI Subarea : 00 Level : 1 Exception : - CPU ID : .. Caller : 0x... Record ID : 1 Tag : abrt_wt abort, before zfcp_erp_wait() Request ID : 0x0000000000000000 none (invalid) SCSI ID : 0x<scsi_id> SCSI LUN : 0x<scsi_lun> SCSI LUN high : 0x<scsi_lun_high> SCSI result : 0x<scsi_result_of_cmd_to_be_aborted> SCSI retries : 0x<retries_of_cmd_to_be_aborted> SCSI allowed : 0x<allowed_retries_of_cmd_to_be_aborted> SCSI scribble : 0x<req_id_of_cmd_to_be_aborted> SCSI opcode : <CDB_of_cmd_to_be_aborted> FCP rsp inf cod: 0x.. none (invalid) FCP rsp IU : ... none (invalid) Timestamp : ... Area : SCSI Subarea : 00 Level : 1 Exception : - CPU ID : .. Caller : 0x... Record ID : 1 Tag : lr_wait LUN reset, before zfcp_erp_wait() Request ID : 0x0000000000000000 none (invalid) SCSI ID : 0x<scsi_id> SCSI LUN : 0x<scsi_lun> SCSI LUN high : 0x<scsi_lun_high> SCSI result : 0x... unrelated SCSI retries : 0x.. unrelated SCSI allowed : 0x.. unrelated SCSI scribble : 0x... unrelated SCSI opcode : ... unrelated FCP rsp inf cod: 0x.. none (invalid) FCP rsp IU : ... none (invalid) Signed-off-by: Steffen Maier <maier@linux.ibm.com> Fixes: 63caf36 ("[SCSI] zfcp: Improve reliability of SCSI eh handlers in zfcp") Fixes: af4de36 ("[SCSI] zfcp: Block scsi_eh thread for rport state BLOCKED") Cc: <stable@vger.kernel.org> TinkerBoard#2.6.38+ Reviewed-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
…ailed commit 512857a795cbbda5980efa4cdb3c0b6602330408 upstream. If a SCSI device is deleted during scsi_eh host reset, we cannot get a reference to the SCSI device anymore since scsi_device_get returns !=0 by design. Assuming the recovery of adapter and port(s) was successful, zfcp_erp_strategy_followup_success() attempts to trigger a LUN reset for the half-gone SCSI device. Unfortunately, it causes the following confusing trace record which states that zfcp will do a LUN recovery as "ERP need" is ZFCP_ERP_ACTION_REOPEN_LUN == 1 and equals "ERP want". Old example trace record formatted with zfcpdbf from s390-tools: Tag: : ersfs_3 ERP, trigger, unit reopen, port reopen succeeded LUN : 0x<FCP_LUN> WWPN : 0x<WWPN> D_ID : 0x<N_Port-ID> Adapter status : 0x5400050b Port status : 0x54000001 LUN status : 0x40000000 ZFCP_STATUS_COMMON_RUNNING but not ZFCP_STATUS_COMMON_UNBLOCKED as it was closed on close part of adapter reopen ERP want : 0x01 ERP need : 0x01 misleading However, zfcp_erp_setup_act() returns NULL as it cannot get the reference. Hence, zfcp_erp_action_enqueue() takes an early goto out and _NO_ recovery actually happens. We always do want the recovery trigger trace record even if no erp_action could be enqueued as in this case. For other cases where we did not enqueue an erp_action, 'need' has always been zero to indicate this. In order to indicate above goto out, introduce an eyecatcher "flag" to mark the "ERP need" as 'not needed' but still keep the information which erp_action type, that zfcp_erp_required_act() had decided upon, is needed. 0xc_ is chosen to be visibly different from 0x0_ in "ERP want". New example trace record formatted with zfcpdbf from s390-tools: Tag: : ersfs_3 ERP, trigger, unit reopen, port reopen succeeded LUN : 0x<FCP_LUN> WWPN : 0x<WWPN> D_ID : 0x<N_Port-ID> Adapter status : 0x5400050b Port status : 0x54000001 LUN status : 0x40000000 ERP want : 0x01 ERP need : 0xc1 would need LUN ERP, but no action set up ^ Before v2.6.38 commit ae0904f ("[SCSI] zfcp: Redesign of the debug tracing for recovery actions.") we could detect this case because the "erp_action" field in the trace was NULL. The rework removed erp_action as argument and field from the trace. This patch here is for tracing. A fix to allow LUN recovery in the case at hand is a topic for a separate patch. See also commit fdbd1c5 ("[SCSI] zfcp: Allow running unit/LUN shutdown without acquiring reference") for a similar case and background info. Signed-off-by: Steffen Maier <maier@linux.ibm.com> Fixes: ae0904f ("[SCSI] zfcp: Redesign of the debug tracing for recovery actions.") Cc: <stable@vger.kernel.org> TinkerBoard#2.6.38+ Reviewed-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
… return commit 96d9270499471545048ed8a6d7f425a49762283d upstream. get_device() and its internally used kobject_get() only return NULL if they get passed NULL as argument. zfcp_get_port_by_wwpn() loops over adapter->port_list so the iteration variable port is always non-NULL. Struct device is embedded in struct zfcp_port so &port->dev is always non-NULL. This is the argument to get_device(). However, if we get an fc_rport in terminate_rport_io() for which we cannot find a match within zfcp_get_port_by_wwpn(), the latter can return NULL. v2.6.30 commit 7093293 ("[SCSI] zfcp: Fix oops when port disappears") introduced an early return without adding a trace record for this case. Even if we don't need recovery in this case, for debugging we should still see that our callback was invoked originally by scsi_transport_fc. Example trace record formatted with zfcpdbf from s390-tools: Timestamp : ... Area : REC Subarea : 00 Level : 1 Exception : - CPU ID : .. Caller : 0x... Record ID : 1 Tag : sctrpin SCSI terminate rport I/O, no zfcp port LUN : 0xffffffffffffffff none (invalid) WWPN : 0x<wwpn> WWPN D_ID : 0x<n_port_id> N_Port-ID Adapter status : 0x... Port status : 0xffffffff unknown (-1) LUN status : 0x00000000 none (invalid) Ready count : 0x... Running count : 0x... ERP want : 0x03 ZFCP_ERP_ACTION_REOPEN_PORT_FORCED ERP need : 0xc0 ZFCP_ERP_ACTION_NONE Signed-off-by: Steffen Maier <maier@linux.ibm.com> Fixes: 7093293 ("[SCSI] zfcp: Fix oops when port disappears") Cc: <stable@vger.kernel.org> TinkerBoard#2.6.38+ Reviewed-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
…RP_FAILED commit d70aab55924b44f213fec2b900b095430b33eec6 upstream. For problem determination we always want to see when we were invoked on the terminate_rport_io callback whether we perform something or not. Temporal event sequence of interest with a long fast_io_fail_tmo of 27 sec: loose remote port t workqueue [s] zfcp_q_<dev> IRQ zfcperp<dev> === ================== =================== ============================ 0 recv RSCN q p.test_link_work block rport start fast_io_fail_tmo send ADISC ELS 4 recv ADISC fail block zfcp_port port forced reopen send open port 12 recv open port fail q p.gid_pn_work zfcp_erp_wakeup (zfcp_erp_wait would return) GID_PN fail Before this point, we got a SCSI trace with tag "sctrpi1" on fast_io_fail, e.g. with the typical 5 sec setting. port.status |= ERP_FAILED If fast_io_fail_tmo triggers after this point, we missed a SCSI trace. workqueue fc_dl_<host> ================== 27 fc_timeout_fail_rport_io fc_terminate_rport_io zfcp_scsi_terminate_rport_io zfcp_erp_port_forced_reopen _zfcp_erp_port_forced_reopen if (port.status & ERP_FAILED) return; Therefore, write a trace before above early return. Example trace record formatted with zfcpdbf from s390-tools: Timestamp : ... Area : REC Subarea : 00 Level : 1 Exception : - CPU ID : .. Caller : 0x... Record ID : 1 ZFCP_DBF_REC_TRIG Tag : sctrpi1 SCSI terminate rport I/O LUN : 0xffffffffffffffff none (invalid) WWPN : 0x<wwpn> D_ID : 0x<n_port_id> Adapter status : 0x... Port status : 0x... LUN status : 0x00000000 none (invalid) Ready count : 0x... Running count : 0x... ERP want : 0x03 ZFCP_ERP_ACTION_REOPEN_PORT_FORCED ERP need : 0xe0 ZFCP_ERP_ACTION_FAILED Signed-off-by: Steffen Maier <maier@linux.ibm.com> Cc: <stable@vger.kernel.org> TinkerBoard#2.6.38+ Reviewed-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8c3d20aada70042a39c6a6625be037c1472ca610 upstream. That other commit introduced an inconsistency because it would trace on ERP_FAILED for all callers of port forced reopen triggers (not just terminate_rport_io), but it would not trace on ERP_FAILED for all callers of other ERP triggers such as adapter, port regular, LUN. Therefore, generalize that other commit. zfcp_erp_action_enqueue() already had two early outs which re-used the one zfcp_dbf_rec_trig() call. All ERP trigger functions finally run through zfcp_erp_action_enqueue(). So move the special handling for ZFCP_STATUS_COMMON_ERP_FAILED into zfcp_erp_action_enqueue() and add another early out with new trace marker for pseudo ERP need in this case. This removes all early returns from all ERP trigger functions so we always end up at zfcp_dbf_rec_trig(). Example trace record formatted with zfcpdbf from s390-tools: Timestamp : ... Area : REC Subarea : 00 Level : 1 Exception : - CPU ID : .. Caller : 0x... Record ID : 1 ZFCP_DBF_REC_TRIG Tag : ....... LUN : 0x... WWPN : 0x... D_ID : 0x... Adapter status : 0x... Port status : 0x... LUN status : 0x... Ready count : 0x... Running count : 0x... ERP want : 0x0. ZFCP_ERP_ACTION_REOPEN_... ERP need : 0xe0 ZFCP_ERP_ACTION_FAILED Signed-off-by: Steffen Maier <maier@linux.ibm.com> Cc: <stable@vger.kernel.org> TinkerBoard#2.6.38+ Reviewed-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6a76550841d412330bd86aed3238d1888ba70f0e upstream. Example trace record formatted with zfcpdbf from s390-tools: Timestamp : ... Area : REC Subarea : 00 Level : 1 Exception : - CPU ID : .. Caller : 0x... Record ID : 1 ZFCP_DBF_REC_TRIG Tag : ....... LUN : 0x... WWPN : 0x... D_ID : 0x... Adapter status : 0x... Port status : 0x... LUN status : 0x... Ready count : 0x... Running count : 0x... ERP want : 0x0. ZFCP_ERP_ACTION_REOPEN_... ERP need : 0xc0 ZFCP_ERP_ACTION_NONE Signed-off-by: Steffen Maier <maier@linux.ibm.com> Cc: <stable@vger.kernel.org> TinkerBoard#2.6.38+ Reviewed-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b63e132b6433a41cf311e8bc382d33fd2b73b505 upstream. The current MIPS implementation of arch_trigger_cpumask_backtrace() is broken because it attempts to use synchronous IPIs despite the fact that it may be run with interrupts disabled. This means that when arch_trigger_cpumask_backtrace() is invoked, for example by the RCU CPU stall watchdog, we may: - Deadlock due to use of synchronous IPIs with interrupts disabled, causing the CPU that's attempting to generate the backtrace output to hang itself. - Not succeed in generating the desired output from remote CPUs. - Produce warnings about this from smp_call_function_many(), for example: [42760.526910] INFO: rcu_sched detected stalls on CPUs/tasks: [42760.535755] 0-...!: (1 GPs behind) idle=ade/140000000000000/0 softirq=526944/526945 fqs=0 [42760.547874] 1-...!: (0 ticks this GP) idle=e4a/140000000000000/0 softirq=547885/547885 fqs=0 [42760.559869] (detected by 2, t=2162 jiffies, g=266689, c=266688, q=33) [42760.568927] ------------[ cut here ]------------ [42760.576146] WARNING: CPU: 2 PID: 1216 at kernel/smp.c:416 smp_call_function_many+0x88/0x20c [42760.587839] Modules linked in: [42760.593152] CPU: 2 PID: 1216 Comm: sh Not tainted 4.15.4-00373-gee058bb4d0c2 TinkerBoard#2 [42760.603767] Stack : 8e09bd20 8e09bd20 8e09bd20 fffffff0 00000007 00000006 00000000 8e09bca8 [42760.616937] 95b2b379 95b2b379 807a0080 00000007 81944518 0000018a 00000032 00000000 [42760.630095] 00000000 00000030 80000000 00000000 806eca74 00000009 8017e2b8 000001a0 [42760.643169] 00000000 00000002 00000000 8e09baa4 00000008 808b8008 86d69080 8e09bca0 [42760.656282] 8e09ad50 805e20aa 00000000 00000000 00000000 8017e2b8 00000009 801070ca [42760.669424] ... [42760.673919] Call Trace: [42760.678672] [<27fde568>] show_stack+0x70/0xf0 [42760.685417] [<84751641>] dump_stack+0xaa/0xd0 [42760.692188] [<699d671c>] __warn+0x80/0x92 [42760.698549] [<68915d41>] warn_slowpath_null+0x28/0x36 [42760.705912] [<f7c76c1c>] smp_call_function_many+0x88/0x20c [42760.713696] [<6bbdfc2a>] arch_trigger_cpumask_backtrace+0x30/0x4a [42760.722216] [<f845bd33>] rcu_dump_cpu_stacks+0x6a/0x98 [42760.729580] [<796e7629>] rcu_check_callbacks+0x672/0x6ac [42760.737476] [<059b3b43>] update_process_times+0x18/0x34 [42760.744981] [<6eb94941>] tick_sched_handle.isra.5+0x26/0x38 [42760.752793] [<478d3d70>] tick_sched_timer+0x1c/0x50 [42760.759882] [<e56ea39f>] __hrtimer_run_queues+0xc6/0x226 [42760.767418] [<e88bbcae>] hrtimer_interrupt+0x88/0x19a [42760.775031] [<6765a19e>] gic_compare_interrupt+0x2e/0x3a [42760.782761] [<0558bf5f>] handle_percpu_devid_irq+0x78/0x168 [42760.790795] [<90c11ba2>] generic_handle_irq+0x1e/0x2c [42760.798117] [<1b6d462c>] gic_handle_local_int+0x38/0x86 [42760.805545] [<b2ada1c7>] gic_irq_dispatch+0xa/0x14 [42760.812534] [<90c11ba2>] generic_handle_irq+0x1e/0x2c [42760.820086] [<c7521934>] do_IRQ+0x16/0x20 [42760.826274] [<9aef3ce6>] plat_irq_dispatch+0x62/0x94 [42760.833458] [<6a94b53c>] except_vec_vi_end+0x70/0x78 [42760.840655] [<22284043>] smp_call_function_many+0x1ba/0x20c [42760.848501] [<54022b58>] smp_call_function+0x1e/0x2c [42760.855693] [<ab9fc705>] flush_tlb_mm+0x2a/0x98 [42760.862730] [<0844cdd0>] tlb_flush_mmu+0x1c/0x44 [42760.869628] [<cb259b74>] arch_tlb_finish_mmu+0x26/0x3e [42760.877021] [<1aeaaf74>] tlb_finish_mmu+0x18/0x66 [42760.883907] [<b3fce717>] exit_mmap+0x76/0xea [42760.890428] [<c4c8a2f6>] mmput+0x80/0x11a [42760.896632] [<a41a08f4>] do_exit+0x1f4/0x80c [42760.903158] [<ee01cef6>] do_group_exit+0x20/0x7e [42760.909990] [<13fa8d54>] __wake_up_parent+0x0/0x1e [42760.917045] [<46cf89d0>] smp_call_function_many+0x1a2/0x20c [42760.924893] [<8c21a93b>] syscall_common+0x14/0x1c [42760.931765] ---[ end trace 02aa09da9dc52a60 ]--- [42760.938342] ------------[ cut here ]------------ [42760.945311] WARNING: CPU: 2 PID: 1216 at kernel/smp.c:291 smp_call_function_single+0xee/0xf8 ... This patch switches MIPS' arch_trigger_cpumask_backtrace() to use async IPIs & smp_call_function_single_async() in order to resolve this problem. We ensure use of the pre-allocated call_single_data_t structures is serialized by maintaining a cpumask indicating that they're busy, and refusing to attempt to send an IPI when a CPU's bit is set in this mask. This should only happen if a CPU hasn't responded to a previous backtrace IPI - ie. if it's hung - and we print a warning to the console in this case. I've marked this for stable branches as far back as v4.9, to which it applies cleanly. Strictly speaking the faulty MIPS implementation can be traced further back to commit 856839b ("MIPS: Add arch_trigger_all_cpu_backtrace() function") in v3.19, but kernel versions v3.19 through v4.8 will require further work to backport due to the rework performed in commit 9a01c3ed5cdb ("nmi_backtrace: add more trigger_*_cpu_backtrace() methods"). Signed-off-by: Paul Burton <paul.burton@mips.com> Patchwork: https://patchwork.linux-mips.org/patch/19597/ Cc: James Hogan <jhogan@kernel.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Huacai Chen <chenhc@lemote.com> Cc: linux-mips@linux-mips.org Cc: stable@vger.kernel.org # v4.9+ Fixes: 856839b ("MIPS: Add arch_trigger_all_cpu_backtrace() function") Fixes: 9a01c3ed5cdb ("nmi_backtrace: add more trigger_*_cpu_backtrace() methods") [ Huacai: backported to 4.4: Restruction since generic NMI solution is unavailable ] Signed-off-by: Huacai Chen <chenhc@lemote.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c604cb767049b78b3075497b80ebb8fd530ea2cc upstream. My recent fix for dns_resolver_preparse() printing very long strings was incomplete, as shown by syzbot which still managed to hit the WARN_ONCE() in set_precision() by adding a crafted "dns_resolver" key: precision 50001 too large WARNING: CPU: 7 PID: 864 at lib/vsprintf.c:2164 vsnprintf+0x48a/0x5a0 The bug this time isn't just a printing bug, but also a logical error when multiple options ("#"-separated strings) are given in the key payload. Specifically, when separating an option string into name and value, if there is no value then the name is incorrectly considered to end at the end of the key payload, rather than the end of the current option. This bypasses validation of the option length, and also means that specifying multiple options is broken -- which presumably has gone unnoticed as there is currently only one valid option anyway. A similar problem also applied to option values, as the kstrtoul() when parsing the "dnserror" option will read past the end of the current option and into the next option. Fix these bugs by correctly computing the length of the option name and by copying the option value, null-terminated, into a temporary buffer. Reproducer for the WARN_ONCE() that syzbot hit: perl -e 'print "#A#", "\0" x 50000' | keyctl padd dns_resolver desc @s Reproducer for "dnserror" option being parsed incorrectly (expected behavior is to fail when seeing the unknown option "foo", actual behavior was to read the dnserror value as "1#foo" and fail there): perl -e 'print "#dnserror=1#foo\0"' | keyctl padd dns_resolver desc @s Reported-by: syzbot <syzkaller@googlegroups.com> Fixes: 4a2d789 ("DNS: If the DNS server returns an error, allow that to be cached [ver TinkerBoard#2]") Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 89da619bc18d79bca5304724c11d4ba3b67ce2c6 upstream. Kernel panic when with high memory pressure, calltrace looks like, PID: 21439 TASK: ffff881be3afedd0 CPU: 16 COMMAND: "java" #0 [ffff881ec7ed7630] machine_kexec at ffffffff81059beb TinkerBoard#1 [ffff881ec7ed7690] __crash_kexec at ffffffff81105942 TinkerBoard#2 [ffff881ec7ed7760] crash_kexec at ffffffff81105a30 TinkerBoard#3 [ffff881ec7ed7778] oops_end at ffffffff816902c8 TinkerBoard#4 [ffff881ec7ed77a0] no_context at ffffffff8167ff46 TinkerBoard#5 [ffff881ec7ed77f0] __bad_area_nosemaphore at ffffffff8167ffdc TinkerBoard#6 [ffff881ec7ed7838] __node_set at ffffffff81680300 TinkerBoard#7 [ffff881ec7ed7860] __do_page_fault at ffffffff8169320f TinkerBoard#8 [ffff881ec7ed78c0] do_page_fault at ffffffff816932b5 TinkerBoard#9 [ffff881ec7ed78f0] page_fault at ffffffff8168f4c8 [exception RIP: _raw_spin_lock_irqsave+47] RIP: ffffffff8168edef RSP: ffff881ec7ed79a8 RFLAGS: 00010046 RAX: 0000000000000246 RBX: ffffea0019740d00 RCX: ffff881ec7ed7fd8 RDX: 0000000000020000 RSI: 0000000000000016 RDI: 0000000000000008 RBP: ffff881ec7ed79a8 R8: 0000000000000246 R9: 000000000001a098 R10: ffff88107ffda000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000008 R14: ffff881ec7ed7a80 R15: ffff881be3afedd0 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 It happens in the pagefault and results in double pagefault during compacting pages when memory allocation fails. Analysed the vmcore, the page leads to second pagefault is corrupted with _mapcount=-256, but private=0. It's caused by the race between migration and ballooning, and lock missing in virtballoon_migratepage() of virtio_balloon driver. This patch fix the bug. Fixes: e225042 ("virtio_balloon: introduce migration primitives to balloon pages") Cc: stable@vger.kernel.org Signed-off-by: Jiang Biao <jiang.biao2@zte.com.cn> Signed-off-by: Huang Chong <huang.chong@zte.com.cn> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4e1a720d0312fd510699032c7694a362a010170f upstream. slub debug reported: [ 440.648642] ============================================================================= [ 440.648649] BUG kmalloc-1024 (Tainted: G BU O ): Poison overwritten [ 440.648651] ----------------------------------------------------------------------------- [ 440.648655] INFO: 0xe70f4bec-0xe70f4bec. First byte 0x6a instead of 0x6b [ 440.648665] INFO: Allocated in sk_prot_alloc+0x6b/0xc6 age=33155 cpu=1 pid=1047 [ 440.648671] ___slab_alloc.constprop.24+0x1fc/0x292 [ 440.648675] __slab_alloc.isra.18.constprop.23+0x1c/0x25 [ 440.648677] __kmalloc+0xb6/0x17f [ 440.648680] sk_prot_alloc+0x6b/0xc6 [ 440.648683] sk_alloc+0x1e/0xa1 [ 440.648700] sco_sock_alloc.constprop.6+0x26/0xaf [bluetooth] [ 440.648716] sco_connect_cfm+0x166/0x281 [bluetooth] [ 440.648731] hci_conn_request_evt.isra.53+0x258/0x281 [bluetooth] [ 440.648746] hci_event_packet+0x28b/0x2326 [bluetooth] [ 440.648759] hci_rx_work+0x161/0x291 [bluetooth] [ 440.648764] process_one_work+0x163/0x2b2 [ 440.648767] worker_thread+0x1a9/0x25c [ 440.648770] kthread+0xf8/0xfd [ 440.648774] ret_from_fork+0x2e/0x38 [ 440.648779] INFO: Freed in __sk_destruct+0xd3/0xdf age=3815 cpu=1 pid=1047 [ 440.648782] __slab_free+0x4b/0x27a [ 440.648784] kfree+0x12e/0x155 [ 440.648787] __sk_destruct+0xd3/0xdf [ 440.648790] sk_destruct+0x27/0x29 [ 440.648793] __sk_free+0x75/0x91 [ 440.648795] sk_free+0x1c/0x1e [ 440.648810] sco_sock_kill+0x5a/0x5f [bluetooth] [ 440.648825] sco_conn_del+0x8e/0xba [bluetooth] [ 440.648840] sco_disconn_cfm+0x3a/0x41 [bluetooth] [ 440.648855] hci_event_packet+0x45e/0x2326 [bluetooth] [ 440.648868] hci_rx_work+0x161/0x291 [bluetooth] [ 440.648872] process_one_work+0x163/0x2b2 [ 440.648875] worker_thread+0x1a9/0x25c [ 440.648877] kthread+0xf8/0xfd [ 440.648880] ret_from_fork+0x2e/0x38 [ 440.648884] INFO: Slab 0xf4718580 objects=27 used=27 fp=0x (null) flags=0x40008100 [ 440.648886] INFO: Object 0xe70f4b88 @offset=19336 fp=0xe70f54f8 When KASAN was enabled, it reported: [ 210.096613] ================================================================== [ 210.096634] BUG: KASAN: use-after-free in ex_handler_refcount+0x5b/0x127 [ 210.096641] Write of size 4 at addr ffff880107e17160 by task kworker/u9:1/2040 [ 210.096651] CPU: 1 PID: 2040 Comm: kworker/u9:1 Tainted: G U O 4.14.47-20180606+ TinkerBoard#2 [ 210.096654] Hardware name: , BIOS 2017.01-00087-g43e04de 08/30/2017 [ 210.096693] Workqueue: hci0 hci_rx_work [bluetooth] [ 210.096698] Call Trace: [ 210.096711] dump_stack+0x46/0x59 [ 210.096722] print_address_description+0x6b/0x23b [ 210.096729] ? ex_handler_refcount+0x5b/0x127 [ 210.096736] kasan_report+0x220/0x246 [ 210.096744] ex_handler_refcount+0x5b/0x127 [ 210.096751] ? ex_handler_clear_fs+0x85/0x85 [ 210.096757] fixup_exception+0x8c/0x96 [ 210.096766] do_trap+0x66/0x2c1 [ 210.096773] do_error_trap+0x152/0x180 [ 210.096781] ? fixup_bug+0x78/0x78 [ 210.096817] ? hci_debugfs_create_conn+0x244/0x26a [bluetooth] [ 210.096824] ? __schedule+0x113b/0x1453 [ 210.096830] ? sysctl_net_exit+0xe/0xe [ 210.096837] ? __wake_up_common+0x343/0x343 [ 210.096843] ? insert_work+0x107/0x163 [ 210.096850] invalid_op+0x1b/0x40 [ 210.096888] RIP: 0010:hci_debugfs_create_conn+0x244/0x26a [bluetooth] [ 210.096892] RSP: 0018:ffff880094a0f970 EFLAGS: 00010296 [ 210.096898] RAX: 0000000000000000 RBX: ffff880107e170e8 RCX: ffff880107e17160 [ 210.096902] RDX: 000000000000002f RSI: ffff88013b80ed40 RDI: ffffffffa058b940 [ 210.096906] RBP: ffff88011b2b0578 R08: 00000000852f0ec9 R09: ffffffff81cfcf9b [ 210.096909] R10: 00000000d21bdad7 R11: 0000000000000001 R12: ffff8800967b0488 [ 210.096913] R13: ffff880107e17168 R14: 0000000000000068 R15: ffff8800949c0008 [ 210.096920] ? __sk_destruct+0x2c6/0x2d4 [ 210.096959] hci_event_packet+0xff5/0x7de2 [bluetooth] [ 210.096969] ? __local_bh_enable_ip+0x43/0x5b [ 210.097004] ? l2cap_sock_recv_cb+0x158/0x166 [bluetooth] [ 210.097039] ? hci_le_meta_evt+0x2bb3/0x2bb3 [bluetooth] [ 210.097075] ? l2cap_ertm_init+0x94e/0x94e [bluetooth] [ 210.097093] ? xhci_urb_enqueue+0xbd8/0xcf5 [xhci_hcd] [ 210.097102] ? __accumulate_pelt_segments+0x24/0x33 [ 210.097109] ? __accumulate_pelt_segments+0x24/0x33 [ 210.097115] ? __update_load_avg_se.isra.2+0x217/0x3a4 [ 210.097122] ? set_next_entity+0x7c3/0x12cd [ 210.097128] ? pick_next_entity+0x25e/0x26c [ 210.097135] ? pick_next_task_fair+0x2ca/0xc1a [ 210.097141] ? switch_mm_irqs_off+0x346/0xb4f [ 210.097147] ? __switch_to+0x769/0xbc4 [ 210.097153] ? compat_start_thread+0x66/0x66 [ 210.097188] ? hci_conn_check_link_mode+0x1cd/0x1cd [bluetooth] [ 210.097195] ? finish_task_switch+0x392/0x431 [ 210.097228] ? hci_rx_work+0x154/0x487 [bluetooth] [ 210.097260] hci_rx_work+0x154/0x487 [bluetooth] [ 210.097269] process_one_work+0x579/0x9e9 [ 210.097277] worker_thread+0x68f/0x804 [ 210.097285] kthread+0x31c/0x32b [ 210.097292] ? rescuer_thread+0x70c/0x70c [ 210.097299] ? kthread_create_on_node+0xa3/0xa3 [ 210.097306] ret_from_fork+0x35/0x40 [ 210.097314] Allocated by task 2040: [ 210.097323] kasan_kmalloc.part.1+0x51/0xc7 [ 210.097328] __kmalloc+0x17f/0x1b6 [ 210.097335] sk_prot_alloc+0xf2/0x1a3 [ 210.097340] sk_alloc+0x22/0x297 [ 210.097375] sco_sock_alloc.constprop.7+0x23/0x202 [bluetooth] [ 210.097410] sco_connect_cfm+0x2d0/0x566 [bluetooth] [ 210.097443] hci_conn_request_evt.isra.53+0x6d3/0x762 [bluetooth] [ 210.097476] hci_event_packet+0x85e/0x7de2 [bluetooth] [ 210.097507] hci_rx_work+0x154/0x487 [bluetooth] [ 210.097512] process_one_work+0x579/0x9e9 [ 210.097517] worker_thread+0x68f/0x804 [ 210.097523] kthread+0x31c/0x32b [ 210.097529] ret_from_fork+0x35/0x40 [ 210.097533] Freed by task 2040: [ 210.097539] kasan_slab_free+0xb3/0x15e [ 210.097544] kfree+0x103/0x1a9 [ 210.097549] __sk_destruct+0x2c6/0x2d4 [ 210.097584] sco_conn_del.isra.1+0xba/0x10e [bluetooth] [ 210.097617] hci_event_packet+0xff5/0x7de2 [bluetooth] [ 210.097648] hci_rx_work+0x154/0x487 [bluetooth] [ 210.097653] process_one_work+0x579/0x9e9 [ 210.097658] worker_thread+0x68f/0x804 [ 210.097663] kthread+0x31c/0x32b [ 210.097670] ret_from_fork+0x35/0x40 [ 210.097676] The buggy address belongs to the object at ffff880107e170e8 which belongs to the cache kmalloc-1024 of size 1024 [ 210.097681] The buggy address is located 120 bytes inside of 1024-byte region [ffff880107e170e8, ffff880107e174e8) [ 210.097683] The buggy address belongs to the page: [ 210.097689] page:ffffea00041f8400 count:1 mapcount:0 mapping: (null) index:0xffff880107e15b68 compound_mapcount: 0 [ 210.110194] flags: 0x8000000000008100(slab|head) [ 210.115441] raw: 8000000000008100 0000000000000000 ffff880107e15b68 0000000100170016 [ 210.115448] raw: ffffea0004a47620 ffffea0004b48e20 ffff88013b80ed40 0000000000000000 [ 210.115451] page dumped because: kasan: bad access detected [ 210.115454] Memory state around the buggy address: [ 210.115460] ffff880107e17000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 210.115465] ffff880107e17080: fc fc fc fc fc fc fc fc fc fc fc fc fc fb fb fb [ 210.115469] >ffff880107e17100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 210.115472] ^ [ 210.115477] ffff880107e17180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 210.115481] ffff880107e17200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 210.115483] ================================================================== And finally when BT_DBG() and ftrace was enabled it showed: <...>-14979 [001] .... 186.104191: sco_sock_kill <-sco_sock_close <...>-14979 [001] .... 186.104191: sco_sock_kill <-sco_sock_release <...>-14979 [001] .... 186.104192: sco_sock_kill: sk ef0497a0 state 9 <...>-14979 [001] .... 186.104193: bt_sock_unlink <-sco_sock_kill kworker/u9:2-792 [001] .... 186.104246: sco_sock_kill <-sco_conn_del kworker/u9:2-792 [001] .... 186.104248: sco_sock_kill: sk ef0497a0 state 9 kworker/u9:2-792 [001] .... 186.104249: bt_sock_unlink <-sco_sock_kill kworker/u9:2-792 [001] .... 186.104250: sco_sock_destruct <-__sk_destruct kworker/u9:2-792 [001] .... 186.104250: sco_sock_destruct: sk ef0497a0 kworker/u9:2-792 [001] .... 186.104860: hci_conn_del <-hci_event_packet kworker/u9:2-792 [001] .... 186.104864: hci_conn_del: hci0 hcon ef0484c0 handle 266 Only in the failed case, sco_sock_kill() gets called with the same sock pointer two times. Add a check for SOCK_DEAD to avoid continue killing a socket which has already been killed. Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ccd5b3235180eef3cfec337df1c8554ab151b5cc upstream. The following commit: 39a0526fb3f7 ("x86/mm: Factor out LDT init from context init") renamed init_new_context() to init_new_context_ldt() and added a new init_new_context() which calls init_new_context_ldt(). However, the error code of init_new_context_ldt() was ignored. Consequently, if a memory allocation in alloc_ldt_struct() failed during a fork(), the ->context.ldt of the new task remained the same as that of the old task (due to the memcpy() in dup_mm()). ldt_struct's are not intended to be shared, so a use-after-free occurred after one task exited. Fix the bug by making init_new_context() pass through the error code of init_new_context_ldt(). This bug was found by syzkaller, which encountered the following splat: BUG: KASAN: use-after-free in free_ldt_struct.part.2+0x10a/0x150 arch/x86/kernel/ldt.c:116 Read of size 4 at addr ffff88006d2cb7c8 by task kworker/u9:0/3710 CPU: 1 PID: 3710 Comm: kworker/u9:0 Not tainted 4.13.0-rc4-next-20170811 TinkerBoard#2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:16 [inline] dump_stack+0x194/0x257 lib/dump_stack.c:52 print_address_description+0x73/0x250 mm/kasan/report.c:252 kasan_report_error mm/kasan/report.c:351 [inline] kasan_report+0x24e/0x340 mm/kasan/report.c:409 __asan_report_load4_noabort+0x14/0x20 mm/kasan/report.c:429 free_ldt_struct.part.2+0x10a/0x150 arch/x86/kernel/ldt.c:116 free_ldt_struct arch/x86/kernel/ldt.c:173 [inline] destroy_context_ldt+0x60/0x80 arch/x86/kernel/ldt.c:171 destroy_context arch/x86/include/asm/mmu_context.h:157 [inline] __mmdrop+0xe9/0x530 kernel/fork.c:889 mmdrop include/linux/sched/mm.h:42 [inline] exec_mmap fs/exec.c:1061 [inline] flush_old_exec+0x173c/0x1ff0 fs/exec.c:1291 load_elf_binary+0x81f/0x4ba0 fs/binfmt_elf.c:855 search_binary_handler+0x142/0x6b0 fs/exec.c:1652 exec_binprm fs/exec.c:1694 [inline] do_execveat_common.isra.33+0x1746/0x22e0 fs/exec.c:1816 do_execve+0x31/0x40 fs/exec.c:1860 call_usermodehelper_exec_async+0x457/0x8f0 kernel/umh.c:100 ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431 Allocated by task 3700: save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59 save_stack+0x43/0xd0 mm/kasan/kasan.c:447 set_track mm/kasan/kasan.c:459 [inline] kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551 kmem_cache_alloc_trace+0x136/0x750 mm/slab.c:3627 kmalloc include/linux/slab.h:493 [inline] alloc_ldt_struct+0x52/0x140 arch/x86/kernel/ldt.c:67 write_ldt+0x7b7/0xab0 arch/x86/kernel/ldt.c:277 sys_modify_ldt+0x1ef/0x240 arch/x86/kernel/ldt.c:307 entry_SYSCALL_64_fastpath+0x1f/0xbe Freed by task 3700: save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59 save_stack+0x43/0xd0 mm/kasan/kasan.c:447 set_track mm/kasan/kasan.c:459 [inline] kasan_slab_free+0x71/0xc0 mm/kasan/kasan.c:524 __cache_free mm/slab.c:3503 [inline] kfree+0xca/0x250 mm/slab.c:3820 free_ldt_struct.part.2+0xdd/0x150 arch/x86/kernel/ldt.c:121 free_ldt_struct arch/x86/kernel/ldt.c:173 [inline] destroy_context_ldt+0x60/0x80 arch/x86/kernel/ldt.c:171 destroy_context arch/x86/include/asm/mmu_context.h:157 [inline] __mmdrop+0xe9/0x530 kernel/fork.c:889 mmdrop include/linux/sched/mm.h:42 [inline] __mmput kernel/fork.c:916 [inline] mmput+0x541/0x6e0 kernel/fork.c:927 copy_process.part.36+0x22e1/0x4af0 kernel/fork.c:1931 copy_process kernel/fork.c:1546 [inline] _do_fork+0x1ef/0xfb0 kernel/fork.c:2025 SYSC_clone kernel/fork.c:2135 [inline] SyS_clone+0x37/0x50 kernel/fork.c:2129 do_syscall_64+0x26c/0x8c0 arch/x86/entry/common.c:287 return_from_SYSCALL_64+0x0/0x7a Here is a C reproducer: #include <asm/ldt.h> #include <pthread.h> #include <signal.h> #include <stdlib.h> #include <sys/syscall.h> #include <sys/wait.h> #include <unistd.h> static void *fork_thread(void *_arg) { fork(); } int main(void) { struct user_desc desc = { .entry_number = 8191 }; syscall(__NR_modify_ldt, 1, &desc, sizeof(desc)); for (;;) { if (fork() == 0) { pthread_t t; srand(getpid()); pthread_create(&t, NULL, fork_thread, NULL); usleep(rand() % 10000); syscall(__NR_exit_group, 0); } wait(NULL); } } Note: the reproducer takes advantage of the fact that alloc_ldt_struct() may use vmalloc() to allocate a large ->entries array, and after commit: 5d17a73a2ebe ("vmalloc: back off when the current task is killed") it is possible for userspace to fail a task's vmalloc() by sending a fatal signal, e.g. via exit_group(). It would be more difficult to reproduce this bug on kernels without that commit. This bug only affected kernels with CONFIG_MODIFY_LDT_SYSCALL=y. Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: <stable@vger.kernel.org> [v4.6+] Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Fixes: 39a0526fb3f7 ("x86/mm: Factor out LDT init from context init") Link: http://lkml.kernel.org/r/20170824175029.76040-1-ebiggers3@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b45d71fb89ab8adfe727b9d0ee188ed58582a647 upstream. Directories and inodes don't necessarily need to be in the same lockdep class. For ex, hugetlbfs splits them out too to prevent false positives in lockdep. Annotate correctly after new inode creation. If its a directory inode, it will be put into a different class. This should fix a lockdep splat reported by syzbot: > ====================================================== > WARNING: possible circular locking dependency detected > 4.18.0-rc8-next-20180810+ TinkerBoard#36 Not tainted > ------------------------------------------------------ > syz-executor900/4483 is trying to acquire lock: > 00000000d2bfc8fe (&sb->s_type->i_mutex_key#9){++++}, at: inode_lock > include/linux/fs.h:765 [inline] > 00000000d2bfc8fe (&sb->s_type->i_mutex_key#9){++++}, at: > shmem_fallocate+0x18b/0x12e0 mm/shmem.c:2602 > > but task is already holding lock: > 0000000025208078 (ashmem_mutex){+.+.}, at: ashmem_shrink_scan+0xb4/0x630 > drivers/staging/android/ashmem.c:448 > > which lock already depends on the new lock. > > -> TinkerBoard#2 (ashmem_mutex){+.+.}: > __mutex_lock_common kernel/locking/mutex.c:925 [inline] > __mutex_lock+0x171/0x1700 kernel/locking/mutex.c:1073 > mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:1088 > ashmem_mmap+0x55/0x520 drivers/staging/android/ashmem.c:361 > call_mmap include/linux/fs.h:1844 [inline] > mmap_region+0xf27/0x1c50 mm/mmap.c:1762 > do_mmap+0xa10/0x1220 mm/mmap.c:1535 > do_mmap_pgoff include/linux/mm.h:2298 [inline] > vm_mmap_pgoff+0x213/0x2c0 mm/util.c:357 > ksys_mmap_pgoff+0x4da/0x660 mm/mmap.c:1585 > __do_sys_mmap arch/x86/kernel/sys_x86_64.c:100 [inline] > __se_sys_mmap arch/x86/kernel/sys_x86_64.c:91 [inline] > __x64_sys_mmap+0xe9/0x1b0 arch/x86/kernel/sys_x86_64.c:91 > do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 > entry_SYSCALL_64_after_hwframe+0x49/0xbe > > -> TinkerBoard#1 (&mm->mmap_sem){++++}: > __might_fault+0x155/0x1e0 mm/memory.c:4568 > _copy_to_user+0x30/0x110 lib/usercopy.c:25 > copy_to_user include/linux/uaccess.h:155 [inline] > filldir+0x1ea/0x3a0 fs/readdir.c:196 > dir_emit_dot include/linux/fs.h:3464 [inline] > dir_emit_dots include/linux/fs.h:3475 [inline] > dcache_readdir+0x13a/0x620 fs/libfs.c:193 > iterate_dir+0x48b/0x5d0 fs/readdir.c:51 > __do_sys_getdents fs/readdir.c:231 [inline] > __se_sys_getdents fs/readdir.c:212 [inline] > __x64_sys_getdents+0x29f/0x510 fs/readdir.c:212 > do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 > entry_SYSCALL_64_after_hwframe+0x49/0xbe > > -> #0 (&sb->s_type->i_mutex_key#9){++++}: > lock_acquire+0x1e4/0x540 kernel/locking/lockdep.c:3924 > down_write+0x8f/0x130 kernel/locking/rwsem.c:70 > inode_lock include/linux/fs.h:765 [inline] > shmem_fallocate+0x18b/0x12e0 mm/shmem.c:2602 > ashmem_shrink_scan+0x236/0x630 drivers/staging/android/ashmem.c:455 > ashmem_ioctl+0x3ae/0x13a0 drivers/staging/android/ashmem.c:797 > vfs_ioctl fs/ioctl.c:46 [inline] > file_ioctl fs/ioctl.c:501 [inline] > do_vfs_ioctl+0x1de/0x1720 fs/ioctl.c:685 > ksys_ioctl+0xa9/0xd0 fs/ioctl.c:702 > __do_sys_ioctl fs/ioctl.c:709 [inline] > __se_sys_ioctl fs/ioctl.c:707 [inline] > __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:707 > do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 > entry_SYSCALL_64_after_hwframe+0x49/0xbe > > other info that might help us debug this: > > Chain exists of: > &sb->s_type->i_mutex_key#9 --> &mm->mmap_sem --> ashmem_mutex > > Possible unsafe locking scenario: > > CPU0 CPU1 > ---- ---- > lock(ashmem_mutex); > lock(&mm->mmap_sem); > lock(ashmem_mutex); > lock(&sb->s_type->i_mutex_key#9); > > *** DEADLOCK *** > > 1 lock held by syz-executor900/4483: > #0: 0000000025208078 (ashmem_mutex){+.+.}, at: > ashmem_shrink_scan+0xb4/0x630 drivers/staging/android/ashmem.c:448 Link: http://lkml.kernel.org/r/20180821231835.166639-1-joel@joelfernandes.org Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Reported-by: syzbot <syzkaller@googlegroups.com> Reviewed-by: NeilBrown <neilb@suse.com> Suggested-by: NeilBrown <neilb@suse.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Hugh Dickins <hughd@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1816494330a83f2a064499d8ed2797045641f92c upstream. This change has the following effects, in order of descreasing importance: 1) Prevent a stack buffer overflow 2) Do not append an unnecessary NULL to an anyway binary buffer, which is writing one byte past client_digest when caller is: chap_string_to_hex(client_digest, chap_r, strlen(chap_r)); The latter was found by KASAN (see below) when input value hes expected size (32 hex chars), and further analysis revealed a stack buffer overflow can happen when network-received value is longer, allowing an unauthenticated remote attacker to smash up to 17 bytes after destination buffer (16 bytes attacker-controlled and one null). As switching to hex2bin requires specifying destination buffer length, and does not internally append any null, it solves both issues. This addresses CVE-2018-14633. Beyond this: - Validate received value length and check hex2bin accepted the input, to log this rejection reason instead of just failing authentication. - Only log received CHAP_R and CHAP_C values once they passed sanity checks. ================================================================== BUG: KASAN: stack-out-of-bounds in chap_string_to_hex+0x32/0x60 [iscsi_target_mod] Write of size 1 at addr ffff8801090ef7c8 by task kworker/0:0/1021 CPU: 0 PID: 1021 Comm: kworker/0:0 Tainted: G O 4.17.8kasan.sess.connops+ TinkerBoard#2 Hardware name: To be filled by O.E.M. To be filled by O.E.M./Aptio CRB, BIOS 5.6.5 05/19/2014 Workqueue: events iscsi_target_do_login_rx [iscsi_target_mod] Call Trace: dump_stack+0x71/0xac print_address_description+0x65/0x22e ? chap_string_to_hex+0x32/0x60 [iscsi_target_mod] kasan_report.cold.6+0x241/0x2fd chap_string_to_hex+0x32/0x60 [iscsi_target_mod] chap_server_compute_md5.isra.2+0x2cb/0x860 [iscsi_target_mod] ? chap_binaryhex_to_asciihex.constprop.5+0x50/0x50 [iscsi_target_mod] ? ftrace_caller_op_ptr+0xe/0xe ? __orc_find+0x6f/0xc0 ? unwind_next_frame+0x231/0x850 ? kthread+0x1a0/0x1c0 ? ret_from_fork+0x35/0x40 ? ret_from_fork+0x35/0x40 ? iscsi_target_do_login_rx+0x3bc/0x4c0 [iscsi_target_mod] ? deref_stack_reg+0xd0/0xd0 ? iscsi_target_do_login_rx+0x3bc/0x4c0 [iscsi_target_mod] ? is_module_text_address+0xa/0x11 ? kernel_text_address+0x4c/0x110 ? __save_stack_trace+0x82/0x100 ? ret_from_fork+0x35/0x40 ? save_stack+0x8c/0xb0 ? 0xffffffffc1660000 ? iscsi_target_do_login+0x155/0x8d0 [iscsi_target_mod] ? iscsi_target_do_login_rx+0x3bc/0x4c0 [iscsi_target_mod] ? process_one_work+0x35c/0x640 ? worker_thread+0x66/0x5d0 ? kthread+0x1a0/0x1c0 ? ret_from_fork+0x35/0x40 ? iscsi_update_param_value+0x80/0x80 [iscsi_target_mod] ? iscsit_release_cmd+0x170/0x170 [iscsi_target_mod] chap_main_loop+0x172/0x570 [iscsi_target_mod] ? chap_server_compute_md5.isra.2+0x860/0x860 [iscsi_target_mod] ? rx_data+0xd6/0x120 [iscsi_target_mod] ? iscsit_print_session_params+0xd0/0xd0 [iscsi_target_mod] ? cyc2ns_read_begin.part.2+0x90/0x90 ? _raw_spin_lock_irqsave+0x25/0x50 ? memcmp+0x45/0x70 iscsi_target_do_login+0x875/0x8d0 [iscsi_target_mod] ? iscsi_target_check_first_request.isra.5+0x1a0/0x1a0 [iscsi_target_mod] ? del_timer+0xe0/0xe0 ? memset+0x1f/0x40 ? flush_sigqueue+0x29/0xd0 iscsi_target_do_login_rx+0x3bc/0x4c0 [iscsi_target_mod] ? iscsi_target_nego_release+0x80/0x80 [iscsi_target_mod] ? iscsi_target_restore_sock_callbacks+0x130/0x130 [iscsi_target_mod] process_one_work+0x35c/0x640 worker_thread+0x66/0x5d0 ? flush_rcu_work+0x40/0x40 kthread+0x1a0/0x1c0 ? kthread_bind+0x30/0x30 ret_from_fork+0x35/0x40 The buggy address belongs to the page: page:ffffea0004243bc0 count:0 mapcount:0 mapping:0000000000000000 index:0x0 flags: 0x17fffc000000000() raw: 017fffc000000000 0000000000000000 0000000000000000 00000000ffffffff raw: ffffea0004243c20 ffffea0004243ba0 0000000000000000 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff8801090ef680: f2 f2 f2 f2 f2 f2 f2 01 f2 f2 f2 f2 f2 f2 f2 00 ffff8801090ef700: f2 f2 f2 f2 f2 f2 f2 00 02 f2 f2 f2 f2 f2 f2 00 >ffff8801090ef780: 00 f2 f2 f2 f2 f2 f2 00 00 f2 f2 f2 f2 f2 f2 00 ^ ffff8801090ef800: 00 f2 f2 f2 f2 f2 f2 00 00 00 00 02 f2 f2 f2 f2 ffff8801090ef880: f2 f2 f2 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 ================================================================== Signed-off-by: Vincent Pelletier <plr.vincent@gmail.com> Reviewed-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 234b69e3e089d850a98e7b3145bd00e9b52b1111 upstream. While reading block, it is possible that io error return due to underlying storage issue, in this case, BH_NeedsValidate was left in the buffer head. Then when reading the very block next time, if it was already linked into journal, that will trigger the following panic. [203748.702517] kernel BUG at fs/ocfs2/buffer_head_io.c:342! [203748.702533] invalid opcode: 0000 [TinkerBoard#1] SMP [203748.702561] Modules linked in: ocfs2 ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs sunrpc dm_switch dm_queue_length dm_multipath bonding be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i iw_cxgb4 cxgb4 cxgb3i libcxgbi iw_cxgb3 cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ipmi_devintf iTCO_wdt iTCO_vendor_support dcdbas ipmi_ssif i2c_core ipmi_si ipmi_msghandler acpi_pad pcspkr sb_edac edac_core lpc_ich mfd_core shpchp sg tg3 ptp pps_core ext4 jbd2 mbcache2 sr_mod cdrom sd_mod ahci libahci megaraid_sas wmi dm_mirror dm_region_hash dm_log dm_mod [203748.703024] CPU: 7 PID: 38369 Comm: touch Not tainted 4.1.12-124.18.6.el6uek.x86_64 TinkerBoard#2 [203748.703045] Hardware name: Dell Inc. PowerEdge R620/0PXXHP, BIOS 2.5.2 01/28/2015 [203748.703067] task: ffff880768139c00 ti: ffff88006ff48000 task.ti: ffff88006ff48000 [203748.703088] RIP: 0010:[<ffffffffa05e9f09>] [<ffffffffa05e9f09>] ocfs2_read_blocks+0x669/0x7f0 [ocfs2] [203748.703130] RSP: 0018:ffff88006ff4b818 EFLAGS: 00010206 [203748.703389] RAX: 0000000008620029 RBX: ffff88006ff4b910 RCX: 0000000000000000 [203748.703885] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 00000000023079fe [203748.704382] RBP: ffff88006ff4b8d8 R08: 0000000000000000 R09: ffff8807578c25b0 [203748.704877] R10: 000000000f637376 R11: 000000003030322e R12: 0000000000000000 [203748.705373] R13: ffff88006ff4b910 R14: ffff880732fe38f0 R15: 0000000000000000 [203748.705871] FS: 00007f401992c700(0000) GS:ffff880bfebc0000(0000) knlGS:0000000000000000 [203748.706370] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [203748.706627] CR2: 00007f4019252440 CR3: 00000000a621e000 CR4: 0000000000060670 [203748.707124] Stack: [203748.707371] ffff88006ff4b828 ffffffffa0609f52 ffff88006ff4b838 0000000000000001 [203748.707885] 0000000000000000 0000000000000000 ffff880bf67c3800 ffffffffa05eca00 [203748.708399] 00000000023079ff ffffffff81c58b80 0000000000000000 0000000000000000 [203748.708915] Call Trace: [203748.709175] [<ffffffffa0609f52>] ? ocfs2_inode_cache_io_unlock+0x12/0x20 [ocfs2] [203748.709680] [<ffffffffa05eca00>] ? ocfs2_empty_dir_filldir+0x80/0x80 [ocfs2] [203748.710185] [<ffffffffa05ec0cb>] ocfs2_read_dir_block_direct+0x3b/0x200 [ocfs2] [203748.710691] [<ffffffffa05f0fbf>] ocfs2_prepare_dx_dir_for_insert.isra.57+0x19f/0xf60 [ocfs2] [203748.711204] [<ffffffffa065660f>] ? ocfs2_metadata_cache_io_unlock+0x1f/0x30 [ocfs2] [203748.711716] [<ffffffffa05f4f3a>] ocfs2_prepare_dir_for_insert+0x13a/0x890 [ocfs2] [203748.712227] [<ffffffffa05f442e>] ? ocfs2_check_dir_for_entry+0x8e/0x140 [ocfs2] [203748.712737] [<ffffffffa061b2f2>] ocfs2_mknod+0x4b2/0x1370 [ocfs2] [203748.713003] [<ffffffffa061c385>] ocfs2_create+0x65/0x170 [ocfs2] [203748.713263] [<ffffffff8121714b>] vfs_create+0xdb/0x150 [203748.713518] [<ffffffff8121b225>] do_last+0x815/0x1210 [203748.713772] [<ffffffff812192e9>] ? path_init+0xb9/0x450 [203748.714123] [<ffffffff8121bca0>] path_openat+0x80/0x600 [203748.714378] [<ffffffff811bcd45>] ? handle_pte_fault+0xd15/0x1620 [203748.714634] [<ffffffff8121d7ba>] do_filp_open+0x3a/0xb0 [203748.714888] [<ffffffff8122a767>] ? __alloc_fd+0xa7/0x130 [203748.715143] [<ffffffff81209ffc>] do_sys_open+0x12c/0x220 [203748.715403] [<ffffffff81026ddb>] ? syscall_trace_enter_phase1+0x11b/0x180 [203748.715668] [<ffffffff816f0c9f>] ? system_call_after_swapgs+0xe9/0x190 [203748.715928] [<ffffffff8120a10e>] SyS_open+0x1e/0x20 [203748.716184] [<ffffffff816f0d5e>] system_call_fastpath+0x18/0xd7 [203748.716440] Code: 00 00 48 8b 7b 08 48 83 c3 10 45 89 f8 44 89 e1 44 89 f2 4c 89 ee e8 07 06 11 e1 48 8b 03 48 85 c0 75 df 8b 5d c8 e9 4d fa ff ff <0f> 0b 48 8b 7d a0 e8 dc c6 06 00 48 b8 00 00 00 00 00 00 00 10 [203748.717505] RIP [<ffffffffa05e9f09>] ocfs2_read_blocks+0x669/0x7f0 [ocfs2] [203748.717775] RSP <ffff88006ff4b818> Joesph ever reported a similar panic. Link: https://oss.oracle.com/pipermail/ocfs2-devel/2013-May/008931.html Link: http://lkml.kernel.org/r/20180912063207.29484-1-junxiao.bi@oracle.com Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Changwei Ge <ge.changwei@h3c.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b63e132b6433a41cf311e8bc382d33fd2b73b505 upstream. The current MIPS implementation of arch_trigger_cpumask_backtrace() is broken because it attempts to use synchronous IPIs despite the fact that it may be run with interrupts disabled. This means that when arch_trigger_cpumask_backtrace() is invoked, for example by the RCU CPU stall watchdog, we may: - Deadlock due to use of synchronous IPIs with interrupts disabled, causing the CPU that's attempting to generate the backtrace output to hang itself. - Not succeed in generating the desired output from remote CPUs. - Produce warnings about this from smp_call_function_many(), for example: [42760.526910] INFO: rcu_sched detected stalls on CPUs/tasks: [42760.535755] 0-...!: (1 GPs behind) idle=ade/140000000000000/0 softirq=526944/526945 fqs=0 [42760.547874] 1-...!: (0 ticks this GP) idle=e4a/140000000000000/0 softirq=547885/547885 fqs=0 [42760.559869] (detected by 2, t=2162 jiffies, g=266689, c=266688, q=33) [42760.568927] ------------[ cut here ]------------ [42760.576146] WARNING: CPU: 2 PID: 1216 at kernel/smp.c:416 smp_call_function_many+0x88/0x20c [42760.587839] Modules linked in: [42760.593152] CPU: 2 PID: 1216 Comm: sh Not tainted 4.15.4-00373-gee058bb4d0c2 TinkerBoard#2 [42760.603767] Stack : 8e09bd20 8e09bd20 8e09bd20 fffffff0 00000007 00000006 00000000 8e09bca8 [42760.616937] 95b2b379 95b2b379 807a0080 00000007 81944518 0000018a 00000032 00000000 [42760.630095] 00000000 00000030 80000000 00000000 806eca74 00000009 8017e2b8 000001a0 [42760.643169] 00000000 00000002 00000000 8e09baa4 00000008 808b8008 86d69080 8e09bca0 [42760.656282] 8e09ad50 805e20aa 00000000 00000000 00000000 8017e2b8 00000009 801070ca [42760.669424] ... [42760.673919] Call Trace: [42760.678672] [<27fde568>] show_stack+0x70/0xf0 [42760.685417] [<84751641>] dump_stack+0xaa/0xd0 [42760.692188] [<699d671c>] __warn+0x80/0x92 [42760.698549] [<68915d41>] warn_slowpath_null+0x28/0x36 [42760.705912] [<f7c76c1c>] smp_call_function_many+0x88/0x20c [42760.713696] [<6bbdfc2a>] arch_trigger_cpumask_backtrace+0x30/0x4a [42760.722216] [<f845bd33>] rcu_dump_cpu_stacks+0x6a/0x98 [42760.729580] [<796e7629>] rcu_check_callbacks+0x672/0x6ac [42760.737476] [<059b3b43>] update_process_times+0x18/0x34 [42760.744981] [<6eb94941>] tick_sched_handle.isra.5+0x26/0x38 [42760.752793] [<478d3d70>] tick_sched_timer+0x1c/0x50 [42760.759882] [<e56ea39f>] __hrtimer_run_queues+0xc6/0x226 [42760.767418] [<e88bbcae>] hrtimer_interrupt+0x88/0x19a [42760.775031] [<6765a19e>] gic_compare_interrupt+0x2e/0x3a [42760.782761] [<0558bf5f>] handle_percpu_devid_irq+0x78/0x168 [42760.790795] [<90c11ba2>] generic_handle_irq+0x1e/0x2c [42760.798117] [<1b6d462c>] gic_handle_local_int+0x38/0x86 [42760.805545] [<b2ada1c7>] gic_irq_dispatch+0xa/0x14 [42760.812534] [<90c11ba2>] generic_handle_irq+0x1e/0x2c [42760.820086] [<c7521934>] do_IRQ+0x16/0x20 [42760.826274] [<9aef3ce6>] plat_irq_dispatch+0x62/0x94 [42760.833458] [<6a94b53c>] except_vec_vi_end+0x70/0x78 [42760.840655] [<22284043>] smp_call_function_many+0x1ba/0x20c [42760.848501] [<54022b58>] smp_call_function+0x1e/0x2c [42760.855693] [<ab9fc705>] flush_tlb_mm+0x2a/0x98 [42760.862730] [<0844cdd0>] tlb_flush_mmu+0x1c/0x44 [42760.869628] [<cb259b74>] arch_tlb_finish_mmu+0x26/0x3e [42760.877021] [<1aeaaf74>] tlb_finish_mmu+0x18/0x66 [42760.883907] [<b3fce717>] exit_mmap+0x76/0xea [42760.890428] [<c4c8a2f6>] mmput+0x80/0x11a [42760.896632] [<a41a08f4>] do_exit+0x1f4/0x80c [42760.903158] [<ee01cef6>] do_group_exit+0x20/0x7e [42760.909990] [<13fa8d54>] __wake_up_parent+0x0/0x1e [42760.917045] [<46cf89d0>] smp_call_function_many+0x1a2/0x20c [42760.924893] [<8c21a93b>] syscall_common+0x14/0x1c [42760.931765] ---[ end trace 02aa09da9dc52a60 ]--- [42760.938342] ------------[ cut here ]------------ [42760.945311] WARNING: CPU: 2 PID: 1216 at kernel/smp.c:291 smp_call_function_single+0xee/0xf8 ... This patch switches MIPS' arch_trigger_cpumask_backtrace() to use async IPIs & smp_call_function_single_async() in order to resolve this problem. We ensure use of the pre-allocated call_single_data_t structures is serialized by maintaining a cpumask indicating that they're busy, and refusing to attempt to send an IPI when a CPU's bit is set in this mask. This should only happen if a CPU hasn't responded to a previous backtrace IPI - ie. if it's hung - and we print a warning to the console in this case. I've marked this for stable branches as far back as v4.9, to which it applies cleanly. Strictly speaking the faulty MIPS implementation can be traced further back to commit 856839b ("MIPS: Add arch_trigger_all_cpu_backtrace() function") in v3.19, but kernel versions v3.19 through v4.8 will require further work to backport due to the rework performed in commit 9a01c3ed5cdb ("nmi_backtrace: add more trigger_*_cpu_backtrace() methods"). Signed-off-by: Paul Burton <paul.burton@mips.com> Patchwork: https://patchwork.linux-mips.org/patch/19597/ Cc: James Hogan <jhogan@kernel.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Huacai Chen <chenhc@lemote.com> Cc: linux-mips@linux-mips.org Cc: stable@vger.kernel.org # v4.9+ Fixes: 856839b ("MIPS: Add arch_trigger_all_cpu_backtrace() function") Fixes: 9a01c3ed5cdb ("nmi_backtrace: add more trigger_*_cpu_backtrace() methods") [ Huacai: backported to 4.4: Restruction since generic NMI solution is unavailable ] Signed-off-by: Huacai Chen <chenhc@lemote.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c604cb767049b78b3075497b80ebb8fd530ea2cc upstream. My recent fix for dns_resolver_preparse() printing very long strings was incomplete, as shown by syzbot which still managed to hit the WARN_ONCE() in set_precision() by adding a crafted "dns_resolver" key: precision 50001 too large WARNING: CPU: 7 PID: 864 at lib/vsprintf.c:2164 vsnprintf+0x48a/0x5a0 The bug this time isn't just a printing bug, but also a logical error when multiple options ("#"-separated strings) are given in the key payload. Specifically, when separating an option string into name and value, if there is no value then the name is incorrectly considered to end at the end of the key payload, rather than the end of the current option. This bypasses validation of the option length, and also means that specifying multiple options is broken -- which presumably has gone unnoticed as there is currently only one valid option anyway. A similar problem also applied to option values, as the kstrtoul() when parsing the "dnserror" option will read past the end of the current option and into the next option. Fix these bugs by correctly computing the length of the option name and by copying the option value, null-terminated, into a temporary buffer. Reproducer for the WARN_ONCE() that syzbot hit: perl -e 'print "#A#", "\0" x 50000' | keyctl padd dns_resolver desc @s Reproducer for "dnserror" option being parsed incorrectly (expected behavior is to fail when seeing the unknown option "foo", actual behavior was to read the dnserror value as "1#foo" and fail there): perl -e 'print "#dnserror=1#foo\0"' | keyctl padd dns_resolver desc @s Reported-by: syzbot <syzkaller@googlegroups.com> Fixes: 4a2d789 ("DNS: If the DNS server returns an error, allow that to be cached [ver TinkerBoard#2]") Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 89da619bc18d79bca5304724c11d4ba3b67ce2c6 upstream. Kernel panic when with high memory pressure, calltrace looks like, PID: 21439 TASK: ffff881be3afedd0 CPU: 16 COMMAND: "java" #0 [ffff881ec7ed7630] machine_kexec at ffffffff81059beb TinkerBoard#1 [ffff881ec7ed7690] __crash_kexec at ffffffff81105942 TinkerBoard#2 [ffff881ec7ed7760] crash_kexec at ffffffff81105a30 TinkerBoard#3 [ffff881ec7ed7778] oops_end at ffffffff816902c8 TinkerBoard#4 [ffff881ec7ed77a0] no_context at ffffffff8167ff46 TinkerBoard#5 [ffff881ec7ed77f0] __bad_area_nosemaphore at ffffffff8167ffdc TinkerBoard#6 [ffff881ec7ed7838] __node_set at ffffffff81680300 TinkerBoard#7 [ffff881ec7ed7860] __do_page_fault at ffffffff8169320f TinkerBoard#8 [ffff881ec7ed78c0] do_page_fault at ffffffff816932b5 TinkerBoard#9 [ffff881ec7ed78f0] page_fault at ffffffff8168f4c8 [exception RIP: _raw_spin_lock_irqsave+47] RIP: ffffffff8168edef RSP: ffff881ec7ed79a8 RFLAGS: 00010046 RAX: 0000000000000246 RBX: ffffea0019740d00 RCX: ffff881ec7ed7fd8 RDX: 0000000000020000 RSI: 0000000000000016 RDI: 0000000000000008 RBP: ffff881ec7ed79a8 R8: 0000000000000246 R9: 000000000001a098 R10: ffff88107ffda000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000008 R14: ffff881ec7ed7a80 R15: ffff881be3afedd0 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 It happens in the pagefault and results in double pagefault during compacting pages when memory allocation fails. Analysed the vmcore, the page leads to second pagefault is corrupted with _mapcount=-256, but private=0. It's caused by the race between migration and ballooning, and lock missing in virtballoon_migratepage() of virtio_balloon driver. This patch fix the bug. Fixes: e225042 ("virtio_balloon: introduce migration primitives to balloon pages") Cc: stable@vger.kernel.org Signed-off-by: Jiang Biao <jiang.biao2@zte.com.cn> Signed-off-by: Huang Chong <huang.chong@zte.com.cn> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4e1a720d0312fd510699032c7694a362a010170f upstream. slub debug reported: [ 440.648642] ============================================================================= [ 440.648649] BUG kmalloc-1024 (Tainted: G BU O ): Poison overwritten [ 440.648651] ----------------------------------------------------------------------------- [ 440.648655] INFO: 0xe70f4bec-0xe70f4bec. First byte 0x6a instead of 0x6b [ 440.648665] INFO: Allocated in sk_prot_alloc+0x6b/0xc6 age=33155 cpu=1 pid=1047 [ 440.648671] ___slab_alloc.constprop.24+0x1fc/0x292 [ 440.648675] __slab_alloc.isra.18.constprop.23+0x1c/0x25 [ 440.648677] __kmalloc+0xb6/0x17f [ 440.648680] sk_prot_alloc+0x6b/0xc6 [ 440.648683] sk_alloc+0x1e/0xa1 [ 440.648700] sco_sock_alloc.constprop.6+0x26/0xaf [bluetooth] [ 440.648716] sco_connect_cfm+0x166/0x281 [bluetooth] [ 440.648731] hci_conn_request_evt.isra.53+0x258/0x281 [bluetooth] [ 440.648746] hci_event_packet+0x28b/0x2326 [bluetooth] [ 440.648759] hci_rx_work+0x161/0x291 [bluetooth] [ 440.648764] process_one_work+0x163/0x2b2 [ 440.648767] worker_thread+0x1a9/0x25c [ 440.648770] kthread+0xf8/0xfd [ 440.648774] ret_from_fork+0x2e/0x38 [ 440.648779] INFO: Freed in __sk_destruct+0xd3/0xdf age=3815 cpu=1 pid=1047 [ 440.648782] __slab_free+0x4b/0x27a [ 440.648784] kfree+0x12e/0x155 [ 440.648787] __sk_destruct+0xd3/0xdf [ 440.648790] sk_destruct+0x27/0x29 [ 440.648793] __sk_free+0x75/0x91 [ 440.648795] sk_free+0x1c/0x1e [ 440.648810] sco_sock_kill+0x5a/0x5f [bluetooth] [ 440.648825] sco_conn_del+0x8e/0xba [bluetooth] [ 440.648840] sco_disconn_cfm+0x3a/0x41 [bluetooth] [ 440.648855] hci_event_packet+0x45e/0x2326 [bluetooth] [ 440.648868] hci_rx_work+0x161/0x291 [bluetooth] [ 440.648872] process_one_work+0x163/0x2b2 [ 440.648875] worker_thread+0x1a9/0x25c [ 440.648877] kthread+0xf8/0xfd [ 440.648880] ret_from_fork+0x2e/0x38 [ 440.648884] INFO: Slab 0xf4718580 objects=27 used=27 fp=0x (null) flags=0x40008100 [ 440.648886] INFO: Object 0xe70f4b88 @offset=19336 fp=0xe70f54f8 When KASAN was enabled, it reported: [ 210.096613] ================================================================== [ 210.096634] BUG: KASAN: use-after-free in ex_handler_refcount+0x5b/0x127 [ 210.096641] Write of size 4 at addr ffff880107e17160 by task kworker/u9:1/2040 [ 210.096651] CPU: 1 PID: 2040 Comm: kworker/u9:1 Tainted: G U O 4.14.47-20180606+ TinkerBoard#2 [ 210.096654] Hardware name: , BIOS 2017.01-00087-g43e04de 08/30/2017 [ 210.096693] Workqueue: hci0 hci_rx_work [bluetooth] [ 210.096698] Call Trace: [ 210.096711] dump_stack+0x46/0x59 [ 210.096722] print_address_description+0x6b/0x23b [ 210.096729] ? ex_handler_refcount+0x5b/0x127 [ 210.096736] kasan_report+0x220/0x246 [ 210.096744] ex_handler_refcount+0x5b/0x127 [ 210.096751] ? ex_handler_clear_fs+0x85/0x85 [ 210.096757] fixup_exception+0x8c/0x96 [ 210.096766] do_trap+0x66/0x2c1 [ 210.096773] do_error_trap+0x152/0x180 [ 210.096781] ? fixup_bug+0x78/0x78 [ 210.096817] ? hci_debugfs_create_conn+0x244/0x26a [bluetooth] [ 210.096824] ? __schedule+0x113b/0x1453 [ 210.096830] ? sysctl_net_exit+0xe/0xe [ 210.096837] ? __wake_up_common+0x343/0x343 [ 210.096843] ? insert_work+0x107/0x163 [ 210.096850] invalid_op+0x1b/0x40 [ 210.096888] RIP: 0010:hci_debugfs_create_conn+0x244/0x26a [bluetooth] [ 210.096892] RSP: 0018:ffff880094a0f970 EFLAGS: 00010296 [ 210.096898] RAX: 0000000000000000 RBX: ffff880107e170e8 RCX: ffff880107e17160 [ 210.096902] RDX: 000000000000002f RSI: ffff88013b80ed40 RDI: ffffffffa058b940 [ 210.096906] RBP: ffff88011b2b0578 R08: 00000000852f0ec9 R09: ffffffff81cfcf9b [ 210.096909] R10: 00000000d21bdad7 R11: 0000000000000001 R12: ffff8800967b0488 [ 210.096913] R13: ffff880107e17168 R14: 0000000000000068 R15: ffff8800949c0008 [ 210.096920] ? __sk_destruct+0x2c6/0x2d4 [ 210.096959] hci_event_packet+0xff5/0x7de2 [bluetooth] [ 210.096969] ? __local_bh_enable_ip+0x43/0x5b [ 210.097004] ? l2cap_sock_recv_cb+0x158/0x166 [bluetooth] [ 210.097039] ? hci_le_meta_evt+0x2bb3/0x2bb3 [bluetooth] [ 210.097075] ? l2cap_ertm_init+0x94e/0x94e [bluetooth] [ 210.097093] ? xhci_urb_enqueue+0xbd8/0xcf5 [xhci_hcd] [ 210.097102] ? __accumulate_pelt_segments+0x24/0x33 [ 210.097109] ? __accumulate_pelt_segments+0x24/0x33 [ 210.097115] ? __update_load_avg_se.isra.2+0x217/0x3a4 [ 210.097122] ? set_next_entity+0x7c3/0x12cd [ 210.097128] ? pick_next_entity+0x25e/0x26c [ 210.097135] ? pick_next_task_fair+0x2ca/0xc1a [ 210.097141] ? switch_mm_irqs_off+0x346/0xb4f [ 210.097147] ? __switch_to+0x769/0xbc4 [ 210.097153] ? compat_start_thread+0x66/0x66 [ 210.097188] ? hci_conn_check_link_mode+0x1cd/0x1cd [bluetooth] [ 210.097195] ? finish_task_switch+0x392/0x431 [ 210.097228] ? hci_rx_work+0x154/0x487 [bluetooth] [ 210.097260] hci_rx_work+0x154/0x487 [bluetooth] [ 210.097269] process_one_work+0x579/0x9e9 [ 210.097277] worker_thread+0x68f/0x804 [ 210.097285] kthread+0x31c/0x32b [ 210.097292] ? rescuer_thread+0x70c/0x70c [ 210.097299] ? kthread_create_on_node+0xa3/0xa3 [ 210.097306] ret_from_fork+0x35/0x40 [ 210.097314] Allocated by task 2040: [ 210.097323] kasan_kmalloc.part.1+0x51/0xc7 [ 210.097328] __kmalloc+0x17f/0x1b6 [ 210.097335] sk_prot_alloc+0xf2/0x1a3 [ 210.097340] sk_alloc+0x22/0x297 [ 210.097375] sco_sock_alloc.constprop.7+0x23/0x202 [bluetooth] [ 210.097410] sco_connect_cfm+0x2d0/0x566 [bluetooth] [ 210.097443] hci_conn_request_evt.isra.53+0x6d3/0x762 [bluetooth] [ 210.097476] hci_event_packet+0x85e/0x7de2 [bluetooth] [ 210.097507] hci_rx_work+0x154/0x487 [bluetooth] [ 210.097512] process_one_work+0x579/0x9e9 [ 210.097517] worker_thread+0x68f/0x804 [ 210.097523] kthread+0x31c/0x32b [ 210.097529] ret_from_fork+0x35/0x40 [ 210.097533] Freed by task 2040: [ 210.097539] kasan_slab_free+0xb3/0x15e [ 210.097544] kfree+0x103/0x1a9 [ 210.097549] __sk_destruct+0x2c6/0x2d4 [ 210.097584] sco_conn_del.isra.1+0xba/0x10e [bluetooth] [ 210.097617] hci_event_packet+0xff5/0x7de2 [bluetooth] [ 210.097648] hci_rx_work+0x154/0x487 [bluetooth] [ 210.097653] process_one_work+0x579/0x9e9 [ 210.097658] worker_thread+0x68f/0x804 [ 210.097663] kthread+0x31c/0x32b [ 210.097670] ret_from_fork+0x35/0x40 [ 210.097676] The buggy address belongs to the object at ffff880107e170e8 which belongs to the cache kmalloc-1024 of size 1024 [ 210.097681] The buggy address is located 120 bytes inside of 1024-byte region [ffff880107e170e8, ffff880107e174e8) [ 210.097683] The buggy address belongs to the page: [ 210.097689] page:ffffea00041f8400 count:1 mapcount:0 mapping: (null) index:0xffff880107e15b68 compound_mapcount: 0 [ 210.110194] flags: 0x8000000000008100(slab|head) [ 210.115441] raw: 8000000000008100 0000000000000000 ffff880107e15b68 0000000100170016 [ 210.115448] raw: ffffea0004a47620 ffffea0004b48e20 ffff88013b80ed40 0000000000000000 [ 210.115451] page dumped because: kasan: bad access detected [ 210.115454] Memory state around the buggy address: [ 210.115460] ffff880107e17000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 210.115465] ffff880107e17080: fc fc fc fc fc fc fc fc fc fc fc fc fc fb fb fb [ 210.115469] >ffff880107e17100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 210.115472] ^ [ 210.115477] ffff880107e17180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 210.115481] ffff880107e17200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 210.115483] ================================================================== And finally when BT_DBG() and ftrace was enabled it showed: <...>-14979 [001] .... 186.104191: sco_sock_kill <-sco_sock_close <...>-14979 [001] .... 186.104191: sco_sock_kill <-sco_sock_release <...>-14979 [001] .... 186.104192: sco_sock_kill: sk ef0497a0 state 9 <...>-14979 [001] .... 186.104193: bt_sock_unlink <-sco_sock_kill kworker/u9:2-792 [001] .... 186.104246: sco_sock_kill <-sco_conn_del kworker/u9:2-792 [001] .... 186.104248: sco_sock_kill: sk ef0497a0 state 9 kworker/u9:2-792 [001] .... 186.104249: bt_sock_unlink <-sco_sock_kill kworker/u9:2-792 [001] .... 186.104250: sco_sock_destruct <-__sk_destruct kworker/u9:2-792 [001] .... 186.104250: sco_sock_destruct: sk ef0497a0 kworker/u9:2-792 [001] .... 186.104860: hci_conn_del <-hci_event_packet kworker/u9:2-792 [001] .... 186.104864: hci_conn_del: hci0 hcon ef0484c0 handle 266 Only in the failed case, sco_sock_kill() gets called with the same sock pointer two times. Add a check for SOCK_DEAD to avoid continue killing a socket which has already been killed. Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ccd5b3235180eef3cfec337df1c8554ab151b5cc upstream. The following commit: 39a0526fb3f7 ("x86/mm: Factor out LDT init from context init") renamed init_new_context() to init_new_context_ldt() and added a new init_new_context() which calls init_new_context_ldt(). However, the error code of init_new_context_ldt() was ignored. Consequently, if a memory allocation in alloc_ldt_struct() failed during a fork(), the ->context.ldt of the new task remained the same as that of the old task (due to the memcpy() in dup_mm()). ldt_struct's are not intended to be shared, so a use-after-free occurred after one task exited. Fix the bug by making init_new_context() pass through the error code of init_new_context_ldt(). This bug was found by syzkaller, which encountered the following splat: BUG: KASAN: use-after-free in free_ldt_struct.part.2+0x10a/0x150 arch/x86/kernel/ldt.c:116 Read of size 4 at addr ffff88006d2cb7c8 by task kworker/u9:0/3710 CPU: 1 PID: 3710 Comm: kworker/u9:0 Not tainted 4.13.0-rc4-next-20170811 TinkerBoard#2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:16 [inline] dump_stack+0x194/0x257 lib/dump_stack.c:52 print_address_description+0x73/0x250 mm/kasan/report.c:252 kasan_report_error mm/kasan/report.c:351 [inline] kasan_report+0x24e/0x340 mm/kasan/report.c:409 __asan_report_load4_noabort+0x14/0x20 mm/kasan/report.c:429 free_ldt_struct.part.2+0x10a/0x150 arch/x86/kernel/ldt.c:116 free_ldt_struct arch/x86/kernel/ldt.c:173 [inline] destroy_context_ldt+0x60/0x80 arch/x86/kernel/ldt.c:171 destroy_context arch/x86/include/asm/mmu_context.h:157 [inline] __mmdrop+0xe9/0x530 kernel/fork.c:889 mmdrop include/linux/sched/mm.h:42 [inline] exec_mmap fs/exec.c:1061 [inline] flush_old_exec+0x173c/0x1ff0 fs/exec.c:1291 load_elf_binary+0x81f/0x4ba0 fs/binfmt_elf.c:855 search_binary_handler+0x142/0x6b0 fs/exec.c:1652 exec_binprm fs/exec.c:1694 [inline] do_execveat_common.isra.33+0x1746/0x22e0 fs/exec.c:1816 do_execve+0x31/0x40 fs/exec.c:1860 call_usermodehelper_exec_async+0x457/0x8f0 kernel/umh.c:100 ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431 Allocated by task 3700: save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59 save_stack+0x43/0xd0 mm/kasan/kasan.c:447 set_track mm/kasan/kasan.c:459 [inline] kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551 kmem_cache_alloc_trace+0x136/0x750 mm/slab.c:3627 kmalloc include/linux/slab.h:493 [inline] alloc_ldt_struct+0x52/0x140 arch/x86/kernel/ldt.c:67 write_ldt+0x7b7/0xab0 arch/x86/kernel/ldt.c:277 sys_modify_ldt+0x1ef/0x240 arch/x86/kernel/ldt.c:307 entry_SYSCALL_64_fastpath+0x1f/0xbe Freed by task 3700: save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59 save_stack+0x43/0xd0 mm/kasan/kasan.c:447 set_track mm/kasan/kasan.c:459 [inline] kasan_slab_free+0x71/0xc0 mm/kasan/kasan.c:524 __cache_free mm/slab.c:3503 [inline] kfree+0xca/0x250 mm/slab.c:3820 free_ldt_struct.part.2+0xdd/0x150 arch/x86/kernel/ldt.c:121 free_ldt_struct arch/x86/kernel/ldt.c:173 [inline] destroy_context_ldt+0x60/0x80 arch/x86/kernel/ldt.c:171 destroy_context arch/x86/include/asm/mmu_context.h:157 [inline] __mmdrop+0xe9/0x530 kernel/fork.c:889 mmdrop include/linux/sched/mm.h:42 [inline] __mmput kernel/fork.c:916 [inline] mmput+0x541/0x6e0 kernel/fork.c:927 copy_process.part.36+0x22e1/0x4af0 kernel/fork.c:1931 copy_process kernel/fork.c:1546 [inline] _do_fork+0x1ef/0xfb0 kernel/fork.c:2025 SYSC_clone kernel/fork.c:2135 [inline] SyS_clone+0x37/0x50 kernel/fork.c:2129 do_syscall_64+0x26c/0x8c0 arch/x86/entry/common.c:287 return_from_SYSCALL_64+0x0/0x7a Here is a C reproducer: #include <asm/ldt.h> #include <pthread.h> #include <signal.h> #include <stdlib.h> #include <sys/syscall.h> #include <sys/wait.h> #include <unistd.h> static void *fork_thread(void *_arg) { fork(); } int main(void) { struct user_desc desc = { .entry_number = 8191 }; syscall(__NR_modify_ldt, 1, &desc, sizeof(desc)); for (;;) { if (fork() == 0) { pthread_t t; srand(getpid()); pthread_create(&t, NULL, fork_thread, NULL); usleep(rand() % 10000); syscall(__NR_exit_group, 0); } wait(NULL); } } Note: the reproducer takes advantage of the fact that alloc_ldt_struct() may use vmalloc() to allocate a large ->entries array, and after commit: 5d17a73a2ebe ("vmalloc: back off when the current task is killed") it is possible for userspace to fail a task's vmalloc() by sending a fatal signal, e.g. via exit_group(). It would be more difficult to reproduce this bug on kernels without that commit. This bug only affected kernels with CONFIG_MODIFY_LDT_SYSCALL=y. Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: <stable@vger.kernel.org> [v4.6+] Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Fixes: 39a0526fb3f7 ("x86/mm: Factor out LDT init from context init") Link: http://lkml.kernel.org/r/20170824175029.76040-1-ebiggers3@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
I have an "init" file for resizing the rootfs partition in first boot.
I need to use magic SysRq key "
echo b > /proc/sysrq-trigger
" to reboot the system after partition has resized.The "sysrq" options in Kernel are enabled however, system cannot reboot.
It shut-downs the system immediately as expected but doesn't initiate the boot procedure.
Have to manually power-off and power-on the power source to make it boot again.
Can you please investigate and provide a solution?!
Thanks
The text was updated successfully, but these errors were encountered: