-
Notifications
You must be signed in to change notification settings - Fork 596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deleted bind-mount points fail restore #9
Comments
Can you show the dump log please? |
I'm working with stlalpha on this, here's a gist with a dump and restore log. |
It looks like the file being watched has vanished after the dump, which is only possible if underlied filesystem does something weird. I suspect it's AUFS? Could you please show the process tree and ls -l /proc/28382/fd ? |
The old process was vaporized after the checkpoint, so I couldn't get the info from 28382. For cleanliness I bounced the box, and restarted a new container with the command. |
So it's docker watching for something on own root. We might need some help from AUFS camp because this problem is definitely lays somewhere inside AUFS code: on the dump we've tested the watchee if it's openable by file handler but on restore open by file handle failed. Is there a change to run same configuration on native VFS or something? |
Cc: @SaiedKazemi |
@cyrillos @avagin |
in v1.8 the syntax isn’t the same, but when I pass the —volume-driver=vfs flag, the results are the same (or at least appear that way to me). Docker command: docker run -d --volume-driver=vfs ubuntu /sbin/init Mark
|
@fl0yd |
It’s getting pretty late here, but I’ll give the 1.5 a shot, though we have some dependencies on the 1.8 driver features of docker so while it will be interesting to see if it is vfs or aufs related, it will be just that. However, I would like to say that I am extremely thankful for the time that you’ve spent on this issue already and the great work the team has done on CRIU. As of right now, the docker script is as simple as: docker run -d ubuntu /sbin/init Aside from having started the container, I haven’t started anything within the container. So it should work as is with 1.5 or 1.7/8 Mark
|
My pleasure to help where I can. I just tried your container with both VFS and OverlayFS storage drivers on 3.19.8 (Ubuntu Vivid) kernel patched to fix OverlayFS bugs. Both restores failed with the same error (see below). I think the "deleted" string confuses things but I remember that CRIU has code to handle it. Unfortunately, don't have time to take a closer look at the moment but will definitely come back to it first chance I get. VFS OverlayFS |
This doesn't look like file handle problem anymore though :) Still it seems that we're covering only one case of "deleted" name postfix, I'll re-check. |
@SaiedKazemi Saied, I sent out a patch for handling "deleted" postfix from names, could you please give it a shot once time permit? (sent it into mailing list with you CC'ed) |
I applied the diff from the patch Cyrillos mentioned above and confirm the "deleted" fix works for me. I am still having issues with the restore using aufs and vfs. with root mounted at "/" I get: (00.113321) 1: fsnotify: Restore 0x1 wd for 0x00000000 with root mounted at /mnt/tmproot via -v /:/mnt/tmproot I get: (00.010633) Error (mount.c:597): FS mnt ./mnt/tmproot/boot dev 0x800001 root / unsupported id 209 Thoughts? |
@fl0yd Mark, I fear I've no clue how docker operates with AUFS (neither how it works with VFS). What I know for sure is that when dump happens we explicitly check that file handle for watchee is openable, which allow us to be certain that we will be able to open it back on the restore. In turn your error message shows that underlies filesystem has changed own contents and the watchee no longer available. |
@cyrillos @fl0yd
Cyrill, the code path doesn't make it to strip_deleted() for new->root. Didn't look into why but I am sure it's not hard to find out why. |
@SaiedKazemi Stripping deleted parts called in two cases:
as far as I know we don't call this helper over mount point paths. Saied could you please show the mount tree which contains such thing? It looks like we should handle it too. |
@cyrillos |
So as far as I understand it's bindmount for deleted entry. Is it a common situation for docker? |
When you strip //deleted part it restores fine, correct? |
@xemul Re how we end up with it, I don't know. I had seen the " (deleted)" suffix which Pavel can explain much better why/when it shows up. But this is the first time I am seeing "//deleted". It's added by dentry_path() in fs/dcache.c when it calls prepend() although it's actually appended! |
@SaiedKazemi We end up this way when remove bindmount 136 130 0:52 /m1//deleted /home/cyrill/m/m2 rw,relatime shared:69 - tmpfs none rw It is easy to simply call for stripping off this postfix I'm just need some time to think if there some caveats ramin... That said once ready I'll send out the patch. |
@xemul Pavel rename the issue please to "Can't restore counters with deleted bind mount points" |
Done, thanks :) |
Greetings - is there anything we can do to help move this forward? Thanks for the great software! |
The best would be to send us a patch with this problem fixed ;) Just kidding. Sorry, out of time at the moment. Once time permit I'll do the patch. |
@cyrillos , does your recent patch titled "[PATCH] files-reg: Rework strip_deleted" helps with this? Or something else should be done? |
The patch itself covers only a part of problem: we need to call it when parsing mount points paths, but still while this does the trick for docker as far as I know, the general solution should be restoring deleted mount points. Thus as I said we can merge the patch since it doesn't hurt but need to do more work on top. |
Hm... Just curious -- why does this mountpoint appear at all? |
It's bind mount to deleted mount point. You bindmount some dir, then remove the source and you get //deleted name on the target. |
Yes, this is understood. Why does this happen in our case? Who and what for deletes the target? |
I don't know, it comes from docker log. CC'ing @SaiedKazemi |
@cyrillos @xemul @fl0yd |
It appears to work with external CRIU checkpoint/restore, but using the internal 1.8 experimental checkpoint and restore, I still get the error about a stale file handle. (00.092761) 1: Restoring fd 5 (state -> create) Mark
|
Probably /sbin/init deletes /dev/null and creates it again. |
You're right, it must be /sbin/init. We can easily reproduce it manually: [Terminal A] $ docker run -it ubuntu:latest bash -i Notice we cannot remove hosts (or hostname): [Terminal A] # # rm /etc/hosts |
It can be dead-lokced: #0 0x00007fafbf49f6ac in __lll_lock_wait_private () from /lib64/libc.so.6 #1 0x00007fafbf44af1c in _L_lock_2460 () from /lib64/libc.so.6 #2 0x00007fafbf44ad57 in __tz_convert () from /lib64/libc.so.6 checkpoint-restore#3 0x00000000004022e2 in test_msg (format=0x404508 "Receive signal %d\n") at msg.c:51 checkpoint-restore#4 <signal handler called> checkpoint-restore#5 0x00007fafbf3f2483 in __GI__IO_vfscanf () from /lib64/libc.so.6 checkpoint-restore#6 0x00007fafbf408f27 in vsscanf () from /lib64/libc.so.6 checkpoint-restore#7 0x00007fafbf4032f7 in sscanf () from /lib64/libc.so.6 checkpoint-restore#8 0x00007fafbf449ba6 in __tzset_parse_tz () from /lib64/libc.so.6 checkpoint-restore#9 0x00007fafbf44c4cb in __tzfile_compute () from /lib64/libc.so.6 checkpoint-restore#10 0x00007fafbf44ae17 in __tz_convert () from /lib64/libc.so.6 checkpoint-restore#11 0x00000000004022e2 in test_msg (format=format@entry=0x40458c "PASS\n") at msg.c:51 checkpoint-restore#12 0x0000000000401ceb in main (argc=<optimized out>, argv=<optimized out>) at ptrace_sig.c:172 https://jira.sw.ru/browse/PSBM-47772 Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Andrew Vagin <avagin@virtuozzo.com> Tested-by: Cyrill Gorcunov <gorcunov@openvz.org>
It can be dead-lokced: #0 0x00007fafbf49f6ac in __lll_lock_wait_private () from /lib64/libc.so.6 #1 0x00007fafbf44af1c in _L_lock_2460 () from /lib64/libc.so.6 #2 0x00007fafbf44ad57 in __tz_convert () from /lib64/libc.so.6 #3 0x00000000004022e2 in test_msg (format=0x404508 "Receive signal %d\n") at msg.c:51 #4 <signal handler called> #5 0x00007fafbf3f2483 in __GI__IO_vfscanf () from /lib64/libc.so.6 #6 0x00007fafbf408f27 in vsscanf () from /lib64/libc.so.6 #7 0x00007fafbf4032f7 in sscanf () from /lib64/libc.so.6 #8 0x00007fafbf449ba6 in __tzset_parse_tz () from /lib64/libc.so.6 #9 0x00007fafbf44c4cb in __tzfile_compute () from /lib64/libc.so.6 #10 0x00007fafbf44ae17 in __tz_convert () from /lib64/libc.so.6 #11 0x00000000004022e2 in test_msg (format=format@entry=0x40458c "PASS\n") at msg.c:51 #12 0x0000000000401ceb in main (argc=<optimized out>, argv=<optimized out>) at ptrace_sig.c:172 Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Andrew Vagin <avagin@virtuozzo.com> Tested-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
It can be dead-lokced: #0 0x00007fafbf49f6ac in __lll_lock_wait_private () from /lib64/libc.so.6 #1 0x00007fafbf44af1c in _L_lock_2460 () from /lib64/libc.so.6 #2 0x00007fafbf44ad57 in __tz_convert () from /lib64/libc.so.6 #3 0x00000000004022e2 in test_msg (format=0x404508 "Receive signal %d\n") at msg.c:51 #4 <signal handler called> #5 0x00007fafbf3f2483 in __GI__IO_vfscanf () from /lib64/libc.so.6 #6 0x00007fafbf408f27 in vsscanf () from /lib64/libc.so.6 #7 0x00007fafbf4032f7 in sscanf () from /lib64/libc.so.6 #8 0x00007fafbf449ba6 in __tzset_parse_tz () from /lib64/libc.so.6 #9 0x00007fafbf44c4cb in __tzfile_compute () from /lib64/libc.so.6 #10 0x00007fafbf44ae17 in __tz_convert () from /lib64/libc.so.6 #11 0x00000000004022e2 in test_msg (format=format@entry=0x40458c "PASS\n") at msg.c:51 #12 0x0000000000401ceb in main (argc=<optimized out>, argv=<optimized out>) at ptrace_sig.c:172 Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Andrew Vagin <avagin@virtuozzo.com> Tested-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
It can be dead-lokced: #0 0x00007fafbf49f6ac in __lll_lock_wait_private () from /lib64/libc.so.6 #1 0x00007fafbf44af1c in _L_lock_2460 () from /lib64/libc.so.6 #2 0x00007fafbf44ad57 in __tz_convert () from /lib64/libc.so.6 #3 0x00000000004022e2 in test_msg (format=0x404508 "Receive signal %d\n") at msg.c:51 #4 <signal handler called> #5 0x00007fafbf3f2483 in __GI__IO_vfscanf () from /lib64/libc.so.6 #6 0x00007fafbf408f27 in vsscanf () from /lib64/libc.so.6 #7 0x00007fafbf4032f7 in sscanf () from /lib64/libc.so.6 #8 0x00007fafbf449ba6 in __tzset_parse_tz () from /lib64/libc.so.6 #9 0x00007fafbf44c4cb in __tzfile_compute () from /lib64/libc.so.6 #10 0x00007fafbf44ae17 in __tz_convert () from /lib64/libc.so.6 #11 0x00000000004022e2 in test_msg (format=format@entry=0x40458c "PASS\n") at msg.c:51 #12 0x0000000000401ceb in main (argc=<optimized out>, argv=<optimized out>) at ptrace_sig.c:172 Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Andrew Vagin <avagin@virtuozzo.com> Tested-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
phys_stat_resolve() call mount_resolve_path() which requires that mntinfo_tree in the ns_id struct is initialized. This is a problem we observed with sockets on btrfs volumes: Program received signal SIGSEGV, Segmentation fault. 0x00005555555bb6dd in mount_resolve_path (mntinfo_tree=<optimized out>, path=0x555555875790 "/var/lib/lxd/unix.socket") at criu/mount.c:213 213 criu/mount.c: No such file or directory. (gdb) bt #0 0x00005555555bb6dd in mount_resolve_path (mntinfo_tree=<optimized out>, path=0x555555875790 "/var/lib/lxd/unix.socket") at criu/mount.c:213 #1 0x00005555555be240 in phys_stat_resolve_dev (ns=<optimized out>, st_dev=43, path=<optimized out>) at criu/mount.c:240 #2 0x00005555555be2bb in phys_stat_dev_match (st_dev=<optimized out>, phys_dev=41, ns=ns@entry=0x5555558753a0, path=path@entry=0x555555875790 "/var/lib/lxd/unix.socket") at criu/mount.c:256 checkpoint-restore#3 0x00005555555e75ed in unix_process_name (d=d@entry=0x5555558756e0, tb=tb@entry=0x7fffffffe0c0, m=<optimized out>) at criu/sk-unix.c:565 checkpoint-restore#4 0x00005555555e9378 in unix_collect_one (tb=0x7fffffffe0c0, m=0x555555869f18 <buf+312>) at criu/sk-unix.c:620 checkpoint-restore#5 unix_receive_one (h=0x555555869f08 <buf+296>, arg=<optimized out>) at criu/sk-unix.c:692 checkpoint-restore#6 0x00005555555b85aa in nlmsg_receive (buf=<optimized out>, arg=<optimized out>, err_cb=<optimized out>, cb=<optimized out>, len=<optimized out>) at criu/libnetlink.c:45 checkpoint-restore#7 do_rtnl_req (nl=nl@entry=5, req=req@entry=0x7fffffffe220, size=size@entry=72, receive_callback=0x5555555e9290 <unix_receive_one>, error_callback=0x5555555b83d0 <rtnl_return_err>, error_callback@entry=0x0, arg=arg@entry=0x0) at criu/libnetlink.c:119 checkpoint-restore#8 0x00005555555e9cf7 in do_collect_req (nl=nl@entry=5, req=req@entry=0x7fffffffe220, receive_callback=<optimized out>, arg=arg@entry=0x0, size=72) at criu/sockets.c:610 checkpoint-restore#9 0x00005555555eb1d0 in collect_sockets (ns=ns@entry=0x7fffffffe300) at criu/sockets.c:636 checkpoint-restore#10 0x000055555559ddfc in check_sock_diag () at criu/cr-check.c:118 checkpoint-restore#11 cr_check () at criu/cr-check.c:999 checkpoint-restore#12 0x00005555555872d0 in main (argc=<optimized out>, argv=0x7fffffffe678, envp=<optimized out>) at criu/crtools.c:719 Signed-off-by: Christian Brauner <christian.brauner@canonical.com>
A root mount namespace list is used to resolve paths to unix sockets if they are placed on btrfs. This patch fixes a crash: #0 mount_resolve_path at criu/mount.c:213 #1 phys_stat_resolve_dev at criu/mount.c:240 #2 phys_stat_dev_match at criu/mount.c:256 checkpoint-restore#3 unix_process_name at criu/sk-unix.c:565 checkpoint-restore#4 unix_collect_one at criu/sk-unix.c:620 checkpoint-restore#5 unix_receive_one at criu/sk-unix.c:692 checkpoint-restore#6 nlmsg_receive at criu/libnetlink.c:45 checkpoint-restore#7 do_rtnl_req at criu/libnetlink.c:119 checkpoint-restore#8 do_collect_req at criu/sockets.c:610 checkpoint-restore#9 collect_sockets at criu/sockets.c:636 https://bugzilla.redhat.com/show_bug.cgi?id=1381351 Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
A root mount namespace list is used to resolve paths to unix sockets if they are placed on btrfs. This patch fixes a crash: #0 mount_resolve_path at criu/mount.c:213 #1 phys_stat_resolve_dev at criu/mount.c:240 #2 phys_stat_dev_match at criu/mount.c:256 #3 unix_process_name at criu/sk-unix.c:565 #4 unix_collect_one at criu/sk-unix.c:620 #5 unix_receive_one at criu/sk-unix.c:692 #6 nlmsg_receive at criu/libnetlink.c:45 #7 do_rtnl_req at criu/libnetlink.c:119 #8 do_collect_req at criu/sockets.c:610 #9 collect_sockets at criu/sockets.c:636 travis-ci: success for cr-check: fill up a root task mount namespace https://bugzilla.redhat.com/show_bug.cgi?id=1381351 Signed-off-by: Andrei Vagin <avagin@virtuozzo.com> Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
A root mount namespace list is used to resolve paths to unix sockets if they are placed on btrfs. This patch fixes a crash: #0 mount_resolve_path at criu/mount.c:213 #1 phys_stat_resolve_dev at criu/mount.c:240 #2 phys_stat_dev_match at criu/mount.c:256 #3 unix_process_name at criu/sk-unix.c:565 #4 unix_collect_one at criu/sk-unix.c:620 #5 unix_receive_one at criu/sk-unix.c:692 #6 nlmsg_receive at criu/libnetlink.c:45 #7 do_rtnl_req at criu/libnetlink.c:119 #8 do_collect_req at criu/sockets.c:610 #9 collect_sockets at criu/sockets.c:636 travis-ci: success for cr-check: fill up a root task mount namespace https://bugzilla.redhat.com/show_bug.cgi?id=1381351 Signed-off-by: Andrei Vagin <avagin@virtuozzo.com> Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
It can be dead-locked: #0 0x00007fafbf49f6ac in __lll_lock_wait_private () from /lib64/libc.so.6 checkpoint-restore#1 0x00007fafbf44af1c in _L_lock_2460 () from /lib64/libc.so.6 checkpoint-restore#2 0x00007fafbf44ad57 in __tz_convert () from /lib64/libc.so.6 checkpoint-restore#3 0x00000000004022e2 in test_msg (format=0x404508 "Receive signal %d\n") at msg.c:51 checkpoint-restore#4 <signal handler called> checkpoint-restore#5 0x00007fafbf3f2483 in __GI__IO_vfscanf () from /lib64/libc.so.6 checkpoint-restore#6 0x00007fafbf408f27 in vsscanf () from /lib64/libc.so.6 checkpoint-restore#7 0x00007fafbf4032f7 in sscanf () from /lib64/libc.so.6 checkpoint-restore#8 0x00007fafbf449ba6 in __tzset_parse_tz () from /lib64/libc.so.6 checkpoint-restore#9 0x00007fafbf44c4cb in __tzfile_compute () from /lib64/libc.so.6 checkpoint-restore#10 0x00007fafbf44ae17 in __tz_convert () from /lib64/libc.so.6 checkpoint-restore#11 0x00000000004022e2 in test_msg (format=format@entry=0x40458c "PASS\n") at msg.c:51 checkpoint-restore#12 0x0000000000401ceb in main (argc=<optimized out>, argv=<optimized out>) at ptrace_sig.c:172 https://jira.sw.ru/browse/PSBM-47772 Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Andrew Vagin <avagin@virtuozzo.com> Signed-off-by: Cyrill Gorcunov <gorcunov@virtuozzo.com>
'info' array is off-by-one, nla_parse_nested() requires destination array (i.e. 'info') to have maxtype+1 (i.e. IFLA_INFO_MAX+1) elements: ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffef823e3f8 WRITE of size 48 at 0x7ffef823e3f8 thread T0 #0 0x7f9ab7a3915b in __asan_memset (/usr/lib/gcc/x86_64-pc-linux-gnu/5.4.0/libasan.so.2+0x8d15b) #1 0x7f9ab6d4e553 in nla_parse (/usr/lib64/libnl-3.so.200+0xa553) #2 0x4acfb7 in dump_one_netdev criu/net.c:445 checkpoint-restore#3 0x4adb60 in dump_one_ethernet criu/net.c:594 checkpoint-restore#4 0x4adb60 in dump_one_link criu/net.c:665 checkpoint-restore#5 0x48af69 in nlmsg_receive criu/libnetlink.c:45 checkpoint-restore#6 0x48af69 in do_rtnl_req criu/libnetlink.c:119 checkpoint-restore#7 0x4b0e86 in dump_links criu/net.c:878 checkpoint-restore#8 0x4b0e86 in dump_net_ns criu/net.c:1651 checkpoint-restore#9 0x4a760d in do_dump_namespaces criu/namespaces.c:985 checkpoint-restore#10 0x4a760d in dump_namespaces criu/namespaces.c:1045 checkpoint-restore#11 0x451ef7 in cr_dump_tasks criu/cr-dump.c:1799 checkpoint-restore#12 0x424588 in main criu/crtools.c:736 checkpoint-restore#13 0x7f9ab67b171f in __libc_start_main (/lib64/libc.so.6+0x2071f) checkpoint-restore#14 0x4253d8 in _start (/criu/criu/criu+0x4253d8) Address 0x7ffef823e3f8 is located in stack of thread T0 at offset 264 in frame #0 0x4ac9ef in dump_one_netdev criu/net.c:364 This frame has 5 object(s): [32, 168) 'netdev' [224, 264) 'info' <== Memory access at offset 264 overflows this variable [320, 1040) 'req' [1088, 3368) 'path' [3424, 3625) 'stable_secret' Increase 'info' size to fix this. Fixes: b705dcc ("net: pass the struct nlattrs to dump() functions") Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
'info' array is off-by-one, nla_parse_nested() requires destination array (i.e. 'info') to have maxtype+1 (i.e. IFLA_INFO_MAX+1) elements: ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffef823e3f8 WRITE of size 48 at 0x7ffef823e3f8 thread T0 #0 0x7f9ab7a3915b in __asan_memset (/usr/lib/gcc/x86_64-pc-linux-gnu/5.4.0/libasan.so.2+0x8d15b) #1 0x7f9ab6d4e553 in nla_parse (/usr/lib64/libnl-3.so.200+0xa553) #2 0x4acfb7 in dump_one_netdev criu/net.c:445 #3 0x4adb60 in dump_one_ethernet criu/net.c:594 #4 0x4adb60 in dump_one_link criu/net.c:665 #5 0x48af69 in nlmsg_receive criu/libnetlink.c:45 #6 0x48af69 in do_rtnl_req criu/libnetlink.c:119 #7 0x4b0e86 in dump_links criu/net.c:878 #8 0x4b0e86 in dump_net_ns criu/net.c:1651 #9 0x4a760d in do_dump_namespaces criu/namespaces.c:985 #10 0x4a760d in dump_namespaces criu/namespaces.c:1045 #11 0x451ef7 in cr_dump_tasks criu/cr-dump.c:1799 #12 0x424588 in main criu/crtools.c:736 #13 0x7f9ab67b171f in __libc_start_main (/lib64/libc.so.6+0x2071f) #14 0x4253d8 in _start (/criu/criu/criu+0x4253d8) Address 0x7ffef823e3f8 is located in stack of thread T0 at offset 264 in frame #0 0x4ac9ef in dump_one_netdev criu/net.c:364 This frame has 5 object(s): [32, 168) 'netdev' [224, 264) 'info' <== Memory access at offset 264 overflows this variable [320, 1040) 'req' [1088, 3368) 'path' [3424, 3625) 'stable_secret' Increase 'info' size to fix this. Fixes: b705dcc ("net: pass the struct nlattrs to dump() functions") travis-ci: success for net: fix stack out-of-bounds access in dump_one_netdev() Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
'info' array is off-by-one, nla_parse_nested() requires destination array (i.e. 'info') to have maxtype+1 (i.e. IFLA_INFO_MAX+1) elements: ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffef823e3f8 WRITE of size 48 at 0x7ffef823e3f8 thread T0 #0 0x7f9ab7a3915b in __asan_memset (/usr/lib/gcc/x86_64-pc-linux-gnu/5.4.0/libasan.so.2+0x8d15b) #1 0x7f9ab6d4e553 in nla_parse (/usr/lib64/libnl-3.so.200+0xa553) #2 0x4acfb7 in dump_one_netdev criu/net.c:445 #3 0x4adb60 in dump_one_ethernet criu/net.c:594 #4 0x4adb60 in dump_one_link criu/net.c:665 #5 0x48af69 in nlmsg_receive criu/libnetlink.c:45 #6 0x48af69 in do_rtnl_req criu/libnetlink.c:119 #7 0x4b0e86 in dump_links criu/net.c:878 #8 0x4b0e86 in dump_net_ns criu/net.c:1651 #9 0x4a760d in do_dump_namespaces criu/namespaces.c:985 #10 0x4a760d in dump_namespaces criu/namespaces.c:1045 #11 0x451ef7 in cr_dump_tasks criu/cr-dump.c:1799 #12 0x424588 in main criu/crtools.c:736 #13 0x7f9ab67b171f in __libc_start_main (/lib64/libc.so.6+0x2071f) #14 0x4253d8 in _start (/criu/criu/criu+0x4253d8) Address 0x7ffef823e3f8 is located in stack of thread T0 at offset 264 in frame #0 0x4ac9ef in dump_one_netdev criu/net.c:364 This frame has 5 object(s): [32, 168) 'netdev' [224, 264) 'info' <== Memory access at offset 264 overflows this variable [320, 1040) 'req' [1088, 3368) 'path' [3424, 3625) 'stable_secret' Increase 'info' size to fix this. Fixes: b705dcc ("net: pass the struct nlattrs to dump() functions") travis-ci: success for net: fix stack out-of-bounds access in dump_one_netdev() Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
'info' array is off-by-one, nla_parse_nested() requires destination array (i.e. 'info') to have maxtype+1 (i.e. IFLA_INFO_MAX+1) elements: ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffef823e3f8 WRITE of size 48 at 0x7ffef823e3f8 thread T0 #0 0x7f9ab7a3915b in __asan_memset (/usr/lib/gcc/x86_64-pc-linux-gnu/5.4.0/libasan.so.2+0x8d15b) #1 0x7f9ab6d4e553 in nla_parse (/usr/lib64/libnl-3.so.200+0xa553) #2 0x4acfb7 in dump_one_netdev criu/net.c:445 #3 0x4adb60 in dump_one_ethernet criu/net.c:594 #4 0x4adb60 in dump_one_link criu/net.c:665 #5 0x48af69 in nlmsg_receive criu/libnetlink.c:45 #6 0x48af69 in do_rtnl_req criu/libnetlink.c:119 #7 0x4b0e86 in dump_links criu/net.c:878 #8 0x4b0e86 in dump_net_ns criu/net.c:1651 #9 0x4a760d in do_dump_namespaces criu/namespaces.c:985 #10 0x4a760d in dump_namespaces criu/namespaces.c:1045 #11 0x451ef7 in cr_dump_tasks criu/cr-dump.c:1799 #12 0x424588 in main criu/crtools.c:736 #13 0x7f9ab67b171f in __libc_start_main (/lib64/libc.so.6+0x2071f) #14 0x4253d8 in _start (/criu/criu/criu+0x4253d8) Address 0x7ffef823e3f8 is located in stack of thread T0 at offset 264 in frame #0 0x4ac9ef in dump_one_netdev criu/net.c:364 This frame has 5 object(s): [32, 168) 'netdev' [224, 264) 'info' <== Memory access at offset 264 overflows this variable [320, 1040) 'req' [1088, 3368) 'path' [3424, 3625) 'stable_secret' Increase 'info' size to fix this. Fixes: b705dcc ("net: pass the struct nlattrs to dump() functions") travis-ci: success for net: fix stack out-of-bounds access in dump_one_netdev() Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Checkpoints are now completing successfully after enabling evasive devices for this container - however I am getting the following error on restores (checkpoint and restoring on same host) - any ideas? Thanks!:
(00.106760) 1: Found fd 1 (id pipe:[546719]) in inherit fd list (caller inherit_fd_resolve_clash)
(00.106765) 1: Inherit fd 1 moved to 5 to resolve clash
(00.106767) 1: Going to dup 0 into 1
(00.106770) 1: Found fd 2 (id pipe:[546720]) in inherit fd list (caller inherit_fd_resolve_clash)
(00.106773) 1: Inherit fd 2 moved to 6 to resolve clash
(00.106775) 1: Going to dup 0 into 2
(00.106782) 1: Restoring fd 1 (state -> create)
(00.106784) 1: Restoring fd 2 (state -> create)
(00.106787) 1: Restoring fd 3 (state -> create)
(00.106789) 1: Creating pipe pipe_id=0x857e7 id=0x2
(00.106796) 1: Restoring size 0x10000 for 0x857e7
(00.106803) 1: Wait fdinfo pid=1 fd=4
(00.106806) 1: Send fd 7 to /crtools-fd-1-4
(00.106823) 1: Create fd for 3
(00.106827) 1: Restoring fd 4 (state -> create)
(00.106829) 1: Creating pipe pipe_id=0x857e7 id=0x3
(00.106832) 1: Waiting fd for 4
(00.106842) 1: Create fd for 4
(00.106846) 1: Restoring fd 5 (state -> create)
(00.106853) 1: fsnotify: Restore 1 wd for 0x00000000
(00.106856) 1: fsnotify: Opening fhandle 2e:41...
(00.106871) 1: Path
/' resolved to
./' mountpoint(00.106971) 1: Error (fsnotify.c:131): fsnotify: Can't open file handle for 0x0000002e:0x000000000000000e: Stale file handle
(00.114893) Error (cr-restore.c:1234): 28246 exited, status=1
(00.115123) Error (cr-restore.c:1927): Restoring FAILED.
The text was updated successfully, but these errors were encountered: