Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linux 4.9: unable to handle kernel NULL pointer dereference #5

Closed
Wenzel opened this issue Mar 1, 2017 · 1 comment
Closed

Linux 4.9: unable to handle kernel NULL pointer dereference #5

Wenzel opened this issue Mar 1, 2017 · 1 comment

Comments

@Wenzel
Copy link
Member

Wenzel commented Mar 1, 2017

Currently, the branch linux-vmi is based on the latest stable linux kernel v4.9.

However running nitro on this kernel causes the kernel to crash because of a kernel NULL pointer dereference, triggered at the end of the execution.

The stack trace shown in dmesg is the following:

[  746.252527] BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8
[  746.254193] IP: [<ffffffff885f7aeb>] down_write+0x1b/0x40
[  746.255588] PGD 0 
[  746.256013] Oops: 0002 [#1] SMP
[  746.256013] Modules linked in: kvm_intel(OE) kvm(OE) irqbypass tun rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc fscache xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc crct10dif_pclmul crc32_pclmul ghash_clmulni_intel cirrus aesni_intel ttm aes_x86_64 lrw gf128mul snd_pcm glue_helper ablk_helper cryptd drm_kms_helper snd_timer snd ppdev soundcore pcspkr evdev drm acpi_cpufreq serio_raw virtio_balloon parport_pc parport tpm_tis tpm_tis_core tpm button i2c_piix4 autofs4 ext4 crc16 jbd2 fscrypto mbcache dm_mod ata_generic virtio_net virtio_blk ata_piix libata uhci_hcd floppy virtio_pci
[  746.256013]  virtio_ring ehci_hcd crc32c_intel psmouse virtio usbcore scsi_mod [last unloaded: irqbypass]
[  746.256013] CPU: 0 PID: 1685 Comm: python3 Tainted: G           OE   4.9.0-nitro+ #1
[  746.256013] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.1-1.fc24 04/01/2014
[  746.256013] task: ffff973ff64900c0 task.stack: ffffbcca8098c000
[  746.256013] RIP: 0010:[<ffffffff885f7aeb>]  [<ffffffff885f7aeb>] down_write+0x1b/0x40
[  746.256013] RSP: 0018:ffffbcca8098fdf8  EFLAGS: 00010286
[  746.256013] RAX: 00000000000000a8 RBX: 00000000000000a8 RCX: 0000000000000001
[  746.256013] RDX: ffffffff00000001 RSI: ffff973ff6426d00 RDI: 00000000000000a8
[  746.256013] RBP: ffff973ff88b7cc0 R08: 0000000000000000 R09: 0000000000000000
[  746.256013] R10: ffff973ff6426d10 R11: 0000000000000000 R12: ffff973ff88b7d60
[  746.256013] R13: ffff973ff88b7cc0 R14: ffff973f485f9900 R15: ffff973ffa0cd490
[  746.256013] FS:  00007fa658c20700(0000) GS:ffff973fffc00000(0000) knlGS:0000000000000000
[  746.256013] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  746.256013] CR2: 00000000000000a8 CR3: 0000000126407000 CR4: 00000000001426f0
[  746.256013] Stack:
[  746.256013]  ffff973ff88b7d18 ffffffff8828b294 ffff973ff9c10040 0000000000000008
[  746.256013]  ffff973ffa0cd490 ffff973ffaae9c60 ffff973f485f9900 ffffffffc08f2279
[  746.256013]  ffff973ff6426d00 ffffffff8820535d ffff973ff6426d10 ffff973ff64900c0
[  746.256013] Call Trace:
[  746.256013]  [<ffffffff8828b294>] ? debugfs_remove_recursive+0x44/0x190
[  746.256013]  [<ffffffffc08f2279>] ? kvm_vcpu_release+0x19/0x30 [kvm]
[  746.256013]  [<ffffffff8820535d>] ? __fput+0xcd/0x1e0
[  746.256013]  [<ffffffff88095aa2>] ? task_work_run+0x72/0x90
[  746.256013]  [<ffffffff8807bd65>] ? do_exit+0x395/0xb50
[  746.256013]  [<ffffffff8807c599>] ? do_group_exit+0x39/0xb0
[  746.256013]  [<ffffffff8807c620>] ? SyS_exit_group+0x10/0x10
[  746.256013]  [<ffffffff885fa1bb>] ? entry_SYSCALL_64_fastpath+0x1e/0xad
[  746.256013] Code: 01 74 08 48 c7 43 20 01 00 00 00 5b c3 0f 1f 00 0f 1f 44 00 00 53 48 89 fb e8 62 df ff ff 48 ba 01 00 00 00 ff ff ff ff 48 89 d8 <f0> 48 0f c1 10 85 d2 74 05 e8 47 07 d4 ff 65 48 8b 04 25 40 d4 
[  746.256013] RIP  [<ffffffff885f7aeb>] down_write+0x1b/0x40
[  746.256013]  RSP <ffffbcca8098fdf8>
[  746.256013] CR2: 00000000000000a8
[  746.256013] ---[ end trace a9571e43f9635c6a ]---
[  746.256013] Fixing recursive fault but reboot is needed!

The problem comes from the new KVM_DEBUG_FS feature, which was absent of the latest stable kernel for nitro (v4.5.x)

This is the debugfs_remove_recursive function:

647 void debugfs_remove_recursive(struct dentry *dentry)
648 {
649         struct dentry *child, *parent;
650 
651         if (IS_ERR_OR_NULL(dentry))
652                 return;
653 
654         parent = dentry;
655  down:
656         inode_lock(d_inode(parent));
657  loop:
658         /*
659          * The parent->d_subdirs is protected by the d_lock. Outside that
660          * lock, the child can be unlinked and set to be freed which can
661          * use the d_u.d_child as the rcu head and corrupt this list.
662          */
663         spin_lock(&parent->d_lock);
664         list_for_each_entry(child, &parent->d_subdirs, d_child) {
665                 if (!simple_positive(child))
666                         continue;
667 
668                 /* perhaps simple_empty(child) makes more sense */
669                 if (!list_empty(&child->d_subdirs)) {
670                         spin_unlock(&parent->d_lock);
671                         inode_unlock(d_inode(parent));
672                         parent = child;
673                         goto down;
674                 }
675 
676                 spin_unlock(&parent->d_lock);
677 
678                 if (!__debugfs_remove(child, parent))
679                         simple_release_fs(&debugfs_mount, &debugfs_mount_count);
680 
681                 /*
682                  * The parent->d_lock protects agaist child from unlinking
683                  * from d_subdirs. When releasing the parent->d_lock we can
684                  * no longer trust that the next pointer is valid.
685                  * Restart the loop. We'll skip this one with the
686                  * simple_positive() check.
687                  */
688                 goto loop;
689         }
690         spin_unlock(&parent->d_lock);
691 
692         inode_unlock(d_inode(parent));
693         child = parent;
694         parent = parent->d_parent;
695         inode_lock(d_inode(parent));
696 
697         if (child != dentry)
698                 /* go up */
699                 goto loop;
700 
701         if (!__debugfs_remove(child, parent))
702                 simple_release_fs(&debugfs_mount, &debugfs_mount_count);
703         inode_unlock(d_inode(parent));
704 
705         synchronize_srcu(&debugfs_srcu);
706 }
707 EXPORT_SYMBOL_GPL(debugfs_remove_recursive);

And apparently, during a call to inode_lock, it makes a call to down_write with a bad pointer

739 static inline void inode_lock(struct inode *inode)
740 {
741         down_write(&inode->i_rwsem);
742 }

Of course, this problem appears only at the end of a nitro run, be it when main.py or test_nitro.py calls their at_exit cleanup.

One thing that troubles me is that i check my kernel configuration, and the symbol KVM_DEBUG_FS was not enabled N...

Any ideas ?

@Wenzel
Copy link
Member Author

Wenzel commented Mar 1, 2017

So i found a solution by disabling every call to debugfs in kvm/virt/kvm_main.c

For an unknown reason, the config option CONFIG_KVM_DEBUG_FS is available in the kernel, but never used anywhere in the code.
the call to debugfs_* function in the kvm code are mandatory.

The solution was to make these calls optional.
See this commit which fixes the problem: really disable KVM debug fs if CONFIG_KVM_DEBUG_FS is unset

@Wenzel Wenzel closed this as completed Mar 18, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant