Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
kvm: deadlock in kvm_vgic_map_resources
On 12/01/17 09:55, Andre Przywara wrote: > Hi, > > On 12/01/17 09:32, Marc Zyngier wrote: >> Hi Dmitry, >> >> On 11/01/17 19:01, Dmitry Vyukov wrote: >>> Hello, >>> >>> While running syzkaller fuzzer I've got the following deadlock. >>> On commit 9c76358. >>> >>> >>> ============================================= >>> [ INFO: possible recursive locking detected ] >>> 4.9.0-rc6-xc2-00056-g08372dd4b91d-dirty torvalds#50 Not tainted >>> --------------------------------------------- >>> syz-executor/20805 is trying to acquire lock: >>> ( >>> &kvm->lock >>> ){+.+.+.} >>> , at: >>> [< inline >] kvm_vgic_dist_destroy >>> arch/arm64/kvm/../../../virt/kvm/arm/vgic/vgic-init.c:271 >>> [<ffff2000080ea4bc>] kvm_vgic_destroy+0x34/0x250 >>> arch/arm64/kvm/../../../virt/kvm/arm/vgic/vgic-init.c:294 >>> but task is already holding lock: >>> (&kvm->lock){+.+.+.}, at: >>> [<ffff2000080ea7e4>] kvm_vgic_map_resources+0x2c/0x108 >>> arch/arm64/kvm/../../../virt/kvm/arm/vgic/vgic-init.c:343 >>> other info that might help us debug this: >>> Possible unsafe locking scenario: >>> CPU0 >>> ---- >>> lock(&kvm->lock); >>> lock(&kvm->lock); >>> *** DEADLOCK *** >>> May be due to missing lock nesting notation >>> 2 locks held by syz-executor/20805: >>> #0:(&vcpu->mutex){+.+.+.}, at: >>> [<ffff2000080bcc30>] vcpu_load+0x28/0x1d0 >>> arch/arm64/kvm/../../../virt/kvm/kvm_main.c:143 >>> #1:(&kvm->lock){+.+.+.}, at: >>> [<ffff2000080ea7e4>] kvm_vgic_map_resources+0x2c/0x108 >>> arch/arm64/kvm/../../../virt/kvm/arm/vgic/vgic-init.c:343 >>> stack backtrace: >>> CPU: 2 PID: 20805 Comm: syz-executor Not tainted >>> 4.9.0-rc6-xc2-00056-g08372dd4b91d-dirty torvalds#50 >>> Hardware name: Hardkernel ODROID-C2 (DT) >>> Call trace: >>> [<ffff200008090560>] dump_backtrace+0x0/0x3c8 arch/arm64/kernel/traps.c:69 >>> [<ffff200008090948>] show_stack+0x20/0x30 arch/arm64/kernel/traps.c:219 >>> [< inline >] __dump_stack lib/dump_stack.c:15 >>> [<ffff200008895840>] dump_stack+0x100/0x150 lib/dump_stack.c:51 >>> [< inline >] print_deadlock_bug kernel/locking/lockdep.c:1728 >>> [< inline >] check_deadlock kernel/locking/lockdep.c:1772 >>> [< inline >] validate_chain kernel/locking/lockdep.c:2250 >>> [<ffff2000081c8718>] __lock_acquire+0x1938/0x3440 kernel/locking/lockdep.c:3335 >>> [<ffff2000081caa84>] lock_acquire+0xdc/0x1d8 kernel/locking/lockdep.c:3746 >>> [< inline >] __mutex_lock_common kernel/locking/mutex.c:521 >>> [<ffff200009700004>] mutex_lock_nested+0xdc/0x7b8 kernel/locking/mutex.c:621 >>> [< inline >] kvm_vgic_dist_destroy >>> arch/arm64/kvm/../../../virt/kvm/arm/vgic/vgic-init.c:271 >>> [<ffff2000080ea4bc>] kvm_vgic_destroy+0x34/0x250 >>> arch/arm64/kvm/../../../virt/kvm/arm/vgic/vgic-init.c:294 >>> [<ffff2000080ec290>] vgic_v2_map_resources+0x218/0x430 >>> arch/arm64/kvm/../../../virt/kvm/arm/vgic/vgic-v2.c:295 >>> [<ffff2000080ea884>] kvm_vgic_map_resources+0xcc/0x108 >>> arch/arm64/kvm/../../../virt/kvm/arm/vgic/vgic-init.c:348 >>> [< inline >] kvm_vcpu_first_run_init >>> arch/arm64/kvm/../../../arch/arm/kvm/arm.c:505 >>> [<ffff2000080d2768>] kvm_arch_vcpu_ioctl_run+0xab8/0xce0 >>> arch/arm64/kvm/../../../arch/arm/kvm/arm.c:591 >>> [<ffff2000080c1fec>] kvm_vcpu_ioctl+0x434/0xc08 >>> arch/arm64/kvm/../../../virt/kvm/kvm_main.c:2557 >>> [< inline >] vfs_ioctl fs/ioctl.c:43 >>> [<ffff200008450c38>] do_vfs_ioctl+0x128/0xfc0 fs/ioctl.c:679 >>> [< inline >] SYSC_ioctl fs/ioctl.c:694 >>> [<ffff200008451b78>] SyS_ioctl+0xa8/0xb8 fs/ioctl.c:685 >>> [<ffff200008083ef0>] el0_svc_naked+0x24/0x28 arch/arm64/kernel/entry.S:755 >> >> Nice catch, and many thanks for reporting this. >> >> The bug is fairly obvious. Christoffer, what do you think? I don't think >> we need to hold the kvm->lock all the way, but I'd like another pair of >> eyes (the coffee machine is out of order again, and tea doesn't cut it). >> >> Thanks, >> >> M. >> >> From 93f80b20fb9351a49ee8b74eed3fc59c84651371 Mon Sep 17 00:00:00 2001 >> From: Marc Zyngier <marc.zyngier@arm.com> >> Date: Thu, 12 Jan 2017 09:21:56 +0000 >> Subject: [PATCH] KVM: arm/arm64: vgic: Fix deadlock on error handling >> >> Dmitry Vyukov reported that the syzkaller fuzzer triggered a >> deadlock in the vgic setup code when an error was detected, as >> the cleanup code tries to take a lock that is already held by >> the setup code. >> >> The fix is pretty obvious: move the cleaup call after having >> dropped the lock, since not much can happen at that point. > ^^^^^^^^ > Is that really true? If for instance the calls to > vgic_register_dist_iodev() or kvm_phys_addr_ioremap() in > vgic_v2_map_resources() fail, we leave the function with a half > initialized VGIC (because vgic_init() succeeded). But we only set dist->ready to true when everything went OK. How is that an issue? > Dropping the lock at > this point without having the GIC cleaned up before sounds a bit > suspicious (I may be wrong on this, though). Thinking of it, that may open a race with vgic init call, leading to leaking distributor memory. > > Can't we just document that kvm_vgic_destroy() needs to be called with > the kvm->lock held and take the lock around the only other caller > (kvm_arch_destroy_vm() in arch/arm/kvm/arm.c)? > We can then keep holding the lock in the map_resources calls. > Though we might still move the calls to kvm_vgic_destroy() into the > wrapper function as a cleanup (as shown below), just before dropping the > lock. I'd rather keep the changes limited to the vgic code, and save myself having to document more locking (we already have our fair share here). How about this (untested): From 24dc3f5750da20d89e0ce9b7855d125d0100bee8 Mon Sep 17 00:00:00 2001 From: Marc Zyngier <marc.zyngier@arm.com> Date: Thu, 12 Jan 2017 09:21:56 +0000 Subject: [PATCH] KVM: arm/arm64: vgic: Fix deadlock on error handling Dmitry Vyukov reported that the syzkaller fuzzer triggered a deadlock in the vgic setup code when an error was detected, as the cleanup code tries to take a lock that is already held by the setup code. The fix is to avoid retaking the lock when cleaning up, by telling the cleanup function that we already hold it. Cc: stable@vger.kernel.org Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
- Loading branch information