-
Notifications
You must be signed in to change notification settings - Fork 35
BUG ? host crashed while using hugetlbpage with uksmd was stopped #4
Comments
Since it's your special workload, you can try this fix and feedback here with the patch, if it works. |
Hi Mr. Xia Thank you for your reply. So the main problem here is we should not madvise(MAD_UNMERGEABLE) the vma of hugetlb page, I think your solution works.
--
+#endif
-- Thanks :) |
UKSM will BUG() when it tries to scan a hugetlb vma. If you really observe this BUG(), apply the second fix i mentioned above. |
Great, since now, we didn't catch it, thank you :) |
Hi,
It seems that, uksm doesn't handle hugetlbpage very well.
We encountered host crashed error while use hugetlbpage by device node way.
(That is, we start VM with whose memory backing is /dev/hugepages/libvirt/qemu)
The dump stack is:
[exception RIP: follow_page_mask+945]
RIP: ffffffff811984f1 RSP: ffff88115e9e3dc8 RFLAGS: 00010202
RAX: 0000000000000001 RBX: 00007f0eb8c00000 RCX: 00003ffffffff000
RDX: ffff880000000e30 RSI: 00007f0eb8c00000 RDI: 800000043aa000e7
RBP: ffff88115e9e3e10 R8: ffff88115e9e3ef0 R9: ffff8811c4288e30
R10: 0000000020000000 R11: ffff8811d146f080 R12: ffff881027548438
R13: ffff8811d146f080 R14: ffff88115e9e3e24 R15: 0000000000000004
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#8 [ffff88115e9e3dc0] follow_page_mask at ffffffff8119830f
#9 [ffff88115e9e3e18] break_ksm at ffffffff811bd075
#10 [ffff88115e9e3e58] ksm_madvise at ffffffff811c750f
#11 [ffff88115e9e3e90] sys_madvise at ffffffff81194947
#12 [ffff88115e9e3f80] system_call_fastpath at ffffffff81652189
RIP: 00007f0fbe7150f7 RSP: 00007f0bb41feb30 RFLAGS: 00000206
RAX: 000000000000001c RBX: ffffffff81652189 RCX: ffffffffffffffff
RDX: 000000000000000d RSI: 0000000020000000 RDI: 00007f0eb8c00000
RBP: 00007f0bb41fe9c0 R8: 00007f0fbe66cd38 R9: 0000000400000000
R10: 0000000000000000 R11: 0000000000000213 R12: 0000000000000000
R13: 0000000020000000 R14: 0000000000000000 R15: 0000000020000000
ORIG_RAX: 000000000000001c CS: 0033 SS: 002b
The kernel version of host is 3.10.
It also reports such information in core dump ‘PANIC: "kernel BUG at mm/memory.c:1576!"
The related codes here is:
struct page *follow_page_mask(struct vm_area_struct *vma,
unsigned long address, unsigned int flags,
unsigned int *page_mask)
{
... ...
Although codes in high version has changed here, but I believe, it still has
problems here, and i think this is caused by uksm which doesn't handle vma of hugetlbpage
properly, though, we are supposed not to support merging hugetlbpage in uksm.
I investigated this issue, and seemed to find the answer, the code path here is
SyS_mmap
--> SyS_mmap_pgoff
--> vm_mmap_pgoff
--> do_mmap_pgoff
--> mmap_region
--> vma_merge
--> vma_adjust
-->uksm_vma_add_new (We add this vma without VM_HUGETLB flag to uksm !!!!)
--> file->f_op->mmap(file, vma) (callback function here is hugetlbfs_file_mmap,it changed the flag value vma->vm_flags |= VM_HUGETLB | VM_DONTEXPAND;)
--> uksm_vma_add_new (It is OK this time)
Any idea about this problem ?
The text was updated successfully, but these errors were encountered: