Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C/R 32-bit tasks on x86_64 #43

Closed
xemul opened this issue Sep 23, 2015 · 44 comments
Closed

C/R 32-bit tasks on x86_64 #43

xemul opened this issue Sep 23, 2015 · 44 comments

Comments

@xemul
Copy link
Member

xemul commented Sep 23, 2015

For x86 we only dump and restore 64-bit tasks. Doing 32-bit should also be done, but keep in mind, that not only 64-bit tree OR 32-bit tree should be supported. There can be mixed 64-and-32-bit trees out there and CRIU should support those too.

@xemul
Copy link
Member Author

xemul commented Sep 23, 2015

@cyrillos in Cc :)

@cyrillos
Copy link
Member

Yes, we've compat mode (ie 32bit tasks running on x86-64 kernel) in my todo list. It will require kernel patching. Thanks for reminder ;)

@xemul
Copy link
Member Author

xemul commented Apr 21, 2016

@0x7f454c46 is working on it

@xemul
Copy link
Member Author

xemul commented Jan 31, 2017

@rockymtnman
Copy link

Hi guys, just curious if you have an estimate when this feature might be added. Guessing this is low priority fix?

@xemul
Copy link
Member Author

xemul commented Feb 8, 2017

Cc @0x7f454c46 and @cyrillos

@cyrillos
Copy link
Member

cyrillos commented Feb 8, 2017

We're working on it.

@rockymtnman
Copy link

Ok, thanks for the update. I'm a developer, if there are any small tasks that you think someone could easily pick up I'd be happy to help

@cyrillos
Copy link
Member

cyrillos commented Feb 8, 2017

Sure, thanks. Once I manage to split this task into pieces I'll ping you.

@0x7f454c46
Copy link
Member

Hi @rockymtnman,
there are easy things to do, but to test them you'll need to patch the kernel.
Linux kernel from v4.9 supports 32-bit C/R on x86_64 platform. But, as it happens, there is a kernel bug which was found on ZDTM test suite. There are patches to fix that bug, but yet they haven't got into mainline. So, to correctly test 32-bit C/R, you'll need to get sources of kernel that's newer than v4.9, patch them. Also you'll need to boot that kernel with vsyscall=none boot parameter. That's the hardest part of preparations.
After you got a patched kernel running, you can run compatible ZDTM tests with:

make COMPAT_TEST=y -j9 zdtm
./test/zdtm.py run -a --keep-going

There are a couple of tests failing at this moment, you can find a list of them here. The easiest things to fix are:

  1. futex-rl test - need to call 32-bit syscall for dumping and restoring robust list: the kernel keeps two lists per task - compatible and native futex lists. So, you'll need to change get_task_futex_robust_list() and restore_thread_common() to call 32-bit syscall for compatible tasks.
  2. Even easier, if you know python: patch zdtm.py to add option, something like '--compat' to call make with COMPAT_TEST=y parameter and run compat tests. (but only if CONFIG_COMPAT defined)
  3. Well, you can fix any ZDTM test that fails, but I don't promise that it'll be so easy as previous two.

@rockymtnman
Copy link

Ok, it'll take me a little bit to get up to speed, but I'll give it a try

@rockymtnman
Copy link

Hi @0x7f454c46 where can I find the patchfile or source for your patch? I see the affected files listed, but not sure how to get access to your source. I have v4.9.9 kernel building now (first time)

@cyrillos
Copy link
Member

Dima's patches can be fetched here https://patchwork.kernel.org/project/LKML/list/?page=37

@0x7f454c46
Copy link
Member

Forgot to mention: Before compiling the kernel don't forget to enable criu-specific configs like CONFIG_CHECKPOINT_RESTORE. You can find the list of needed configs here https://criu.org/Installation#Configuring_the_kernel

@xemul xemul changed the title C/R 32-bit tasks on x86 C/R 32-bit tasks on x86_64 Feb 10, 2017
@xemul xemul mentioned this issue Feb 10, 2017
@rockymtnman
Copy link

Thanks guys, I'll try to make some progress this weekend

@rockymtnman
Copy link

The some of the patches didn't merge cleanly with the source from the latest stable kernel I pulled, so I'm merging the changes by hand. Guessing the source files got updated from when you did the patching originally? Am I missing something?

@cyrillos
Copy link
Member

The patches are for latest master git. Just do

git clone --depth 1 git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

to fetch latest sources (or drop --depth 1 argument if you prefer the complete git history).

@rockymtnman
Copy link

Awesome, that should save a lot of time, thanks

@rockymtnman
Copy link

So I've got ZDTM running now, but not sure if the results are correct:
################### 10 TEST(S) FAILED (TOTAL 290/SKIPPED 19) ###################

  • zdtm/static/socket-tcp-fin-wait1(unknown)
  • zdtm/static/socket-tcp6-closing(unknown)
  • zdtm/static/socket-tcp-fin-wait2(unknown)
  • zdtm/static/socket-tcp-syn-sent(unknown)
  • zdtm/static/socket-tcp-closing(unknown)
  • zdtm/static/socket-tcp-local(unknown)
  • zdtm/static/socket-tcp-closed(unknown)
  • zdtm/static/socket-tcp-last-ack(unknown)
  • zdtm/static/socket-tcp-close-wait(unknown)
  • zdtm/static/socket-tcp-closed-last-ack(unknown)
    ##################################### FAIL #####################################

@0x7f454c46
Copy link
Member

@rockymtnman check, that you run 32-bit tests: file test/zdtm/static/busyloop00 should say, it's 32-bit compatible ELF. If it doesn't say so - do git clean -fdx test/ and compile 32-bit test suite with: make COMPAT_TEST=y -j9 zdtm.
And socket-tcp-* may fail because of some kernel option isn't set (not sure, which).

@rockymtnman
Copy link

Thanks 0x7f454c46, I wasn't on the dev branch, and was still missing some i386 deps. file test/zdtm/static/busyloop00 returns a 32-bit compatible ELF now... however I'm getting 220 failures. All are either zdtm/transition/xxx(unkown) or zdtm/static/xxx(unkown). Guessing this means something is wrong with my kernel config?? Here's my latest kernel build info:
Linux ubuntu 4.10.0-rc7+ #1 SMP Sat Feb 11 17:15:26 PST 2017 x86_64 x86_64 x86_64 GNU/Linux

@0x7f454c46
Copy link
Member

0x7f454c46 commented Feb 14, 2017 via email

@rockymtnman
Copy link

Успех! Needed to rebuild the kernel with x86 supporting libs. The majority of tests pass now, but it's hanging forever on test 221 (static/autofs). Does this test just take a really long time? I think I'm almost to the point where I can actually do something useful...

@0x7f454c46
Copy link
Member

0x7f454c46 commented Feb 14, 2017

@rockymtnman Yes, autofs test is broken now in 32-bit version.
You can check the list of failing tests here: https://criu.org/32bit_tasks_C/R#List_of_failed_tests
Note, that fpu tests (fpu00, mmx00, sse00, sse20) should be fixed by @cyrillos patches set "ia32: Add support for FPU c/r in compat environment". I'll update the list today (You also can if you register in CRIU's wiki).

@0x7f454c46
Copy link
Member

0x7f454c46 commented Feb 14, 2017

@rockymtnman If you want, you can start with autofs test, but I'm not sure how simple it is to fix.
(as I didn't check what fails there at this moment)

@0x7f454c46
Copy link
Member

It may be simple as test hangs without C/R (which shouldn't happen) - so it may be only error in the test itself.

@rockymtnman
Copy link

Sorry got crazy busy at work this week. Looks like quite a lot got fixed! Should have some time on Sunday to look into the bug list now that I have the dev environment all set up

@rockymtnman
Copy link

rockymtnman commented Feb 21, 2017

@0x7f454c46 autofs appears to be more than a script issue. Packet size is different (301 vs 300 bytes) also having issues with mounting and unmounting (hangs instead of timing out). More time needed to better understand this

@0x7f454c46
Copy link
Member

@rockymtnman ok, no problem :)
If you've found that you're up the creek - then you may check other TODOs.
I have also at this moment a famine of time.

@rockymtnman
Copy link

@0x7f454c46 - I'll drop some notes in the wiki if I get stuck and move on to another

@rockymtnman
Copy link

Looks like a bounty was posted for this fix

@xemul
Copy link
Member Author

xemul commented Mar 29, 2017

Next release will be 3.0 with 32bit support in it.

@xemul
Copy link
Member Author

xemul commented Apr 24, 2017

@xemul xemul closed this as completed Apr 24, 2017
@rockymtnman
Copy link

@xemul @0x7f454c46 Are these the only necessary kernel patches to enable C/R for 32-bit tasks? Or are there plans to make additional submissions? I'd like to test this change with the appropriate kernel mods
https://patchwork.kernel.org/patch/9545023/
https://patchwork.kernel.org/patch/9545021/
https://patchwork.kernel.org/patch/9545027/
https://patchwork.kernel.org/patch/9545025/
https://patchwork.kernel.org/patch/9545153/

@0x7f454c46
Copy link
Member

@rockymtnman but if you have older than v4.9 kernel - you'll need some more patches, you can find them with x86 subject here: https://criu.org/Upstream_kernel_commits

@rockymtnman
Copy link

@0x7f454c46 The linux-next/criu 3.0 build worked great and I'm able to C/R 32-bit tasks using criu dump/restore. I'm also able to checkpoint a simple looper shell script in 32 bit debian container. However, when I attempt to save a unique 32-bit task (https:www.qemu.org) I get the same error:
"Cannot checkpoint container XXXX: rpc error: code = 2 desc = exit status 1: "criu failed: type NOTIFY errno 0\nlog file: "
qemu 32-bit C/R works fine natively on the host with the --shell-job option, is it possible that this option is not getting passed to the container's 32-bit process via the docker criu interface? Or could it be that there's a difference in how the shell script container process is treated vs. qemu container process?

@rockymtnman
Copy link

Note this looks like it has to do with the tty session established by the emulator/container (I think this was what required the --shell-job option for the native task). When I use the -it option to run the container with the shell script I get the same error

@avagin
Copy link
Member

avagin commented Apr 27, 2017

Here is my pull-request to support C/R of containers with external terminals
opencontainers/runc#1355

@rockymtnman
Copy link

Great work guys, За тебя!

@0x7f454c46
Copy link
Member

@rockymtnman You're welcome!
If you have any other issues with C/R, let us know :)
And big thanks for your support/bounty award - it's very appreciated.
By the way, JFI - what do you do with CRIU and qemu in docker? (if you don't mind me asking).

@rockymtnman
Copy link

C/R of embedded devices (thus 32 bit processes). It's an obscure use case but comes in handy :-) Might want to add to your list

@0x7f454c46
Copy link
Member

@rockymtnman Sounds cool!
There are some guys using C/R on arm32 in embedded (e.g, #306, #206 and etc.), but I'm not sure I do follow what is the purpose of docker in there? Probably to migrate a couple of applications simultaneously..
Thanks again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants