New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Complete aarch64 failure #415
Comments
Could you give it a shot with this patch by any chance? |
Still crashes:
|
Yeah, I only later realized that you running in |
@adrianreber could you run python test/zdtm.py -t zdtm/static/env00 -f h --sat and show all log and strace files |
is CONFIG_PROC_PAGE_MONITOR enabled for your kernel? ls -l /proc/self/pagemap |
I tired to reproduce this issue in a qemu vm on my laptop:
|
CONFIG_PROC_PAGE_MONITOR is enabled
|
@avagin |
|
@0x7f454c46 , @avagin Not totally sure what the situation here is now? Is my kernel missing a patch or my CRIU checkout? |
@adrianreber just for easier reproduction, you use vanilla v4.14 kernel with master 3.5 criu? |
@adrianreber I think your kernel is missing a patch. I checked the vanilla 4.14 kernel, and tests passed. |
@avagin any idea which patch that might be? I will also start a test with the vanilla kernel. |
I have no idea. You can try to check this list https://criu.org/Upstream_kernel_commits I know that my problem was due to this patch: |
I tried it today with 4.15.0-rc2 from kernel.org and I get the same error. commit 2391f0b4808e3d5af348324d69f5f45c56a26836 Am I missing a config option? Or is my user-space broken? I will try 'make defconfig' to see if this might fix it. |
@adrianreber Do you use vanilla criu? |
@0x7f454c46 That is my CRIU version: Version: 3.6 With a make defconfig kernel (plus a few additional configs turned on for CRIU) I now get
Besides the 'Unable to kill 24: [Errno 3] No such process' it seems to work. Not sure how fatal that is. The log files look correctly. |
I think I found it. As soon as I set CONFIG_ARM64_64K_PAGES=y the error is back. I guess it is a valid configuration option. Not sure how CRIU should handle it. |
@adrianreber Cool, thanks! |
Just random idea how it could affect the code: we have this PAGE_SIZE macros and functions, they might work not as it's expected. |
@0x7f454c46 arm64 kernel may also have CONFIG_ARM64_16K_PAGES=y ;-) |
Adding my post to the mailing list also here: https://lists.openvz.org/pipermail/criu/2018-February/040524.html |
I finally found some time to create arm64 VM and to try to debug this. I'm going to try 4.15.. with 16K and 64K in the next few days. @adrianreber are 4.13 and 4.14 still relevant? |
@rppt Thanks for trying. The interesting thing for me would be if it works with 64K pages. From my tests (https://lists.openvz.org/pipermail/criu/2018-February/040524.html) I would say it will not work as CRIU is hardcoded to 4K. With the changes in that mail it works, but after the restore the process crashes with: [ 5281.926998] busyloop00[2000]: unhandled level 3 translation fault (11) at 0xffffa95a06c0, esr 0x82000007 Kernel version would be something like 4.14 but if it works for you on a newer kernel I can do the necessary backports for my kernel (probably). |
@adrianreber I've tested with 4.15.7 kernel built with
I've tested on a fedora27 VM (qemu tcg) |
@rppt: Thanks, I was able to run the test-suite with only four errors:
Any ideas how we could handle the page size automatically? Should we try to detect the page size with CRIU's feature-check script? |
I wonder if page_size() already works.
It does sysconf(_SC_PAGESIZE), which IMHO should work?
|
BTW, PAGE_IMAGE_SIZE looks to be used only for BUG_ON()..
Do we need it by some reason?
It looks as it was used for
:struct page_entry {
: u64 va;
: u8 data[PAGE_IMAGE_SIZE];
:} __packed;
But after fd3f33f it's not used for anything except BUG_ON().
|
So what if we define PAGE_SIZE as sysconf(_SC_PAGESIZE) and then use it for everything like PAGE_SHIFT and whatnot? It looks even simpler than I've expected:
we have only one build-time BUG_ON which may be deleted as I've just suggested and we shouldn't care about run-time BUG_ON:
```
[dima criu]$ git grep 'BUG_ON.*PAGE_'
criu/cr-restore.c: BUG_ON(task_args->bootstrap_len & (PAGE_SIZE - 1));
criu/crtools.c: BUILD_BUG_ON(PAGE_SIZE != PAGE_IMAGE_SIZE);
[dima criu]$
```
|
So, @rppt, @adrianreber, if you've time to test, here is a version that builds: |
On aarch64 I see many ERROR OVER with following output:
the whole test suite takes a unusually long time. It is still running. The first few testcases on ppc64le look correct. |
@adrianreber thanks for testing - will see what's going on there this weekend. |
Sorry, wrong branch on ppc64le, also lot's of errors with your PAGE_SIZE branch:
|
Looks like there is a common issue with that on restore.. |
@adrianreber could you give it another shot - I've added a fixup on the top of the branch. |
The first twenty tests are now passing on aarch64 and ppc64le without errors. I keep zdtm running and will post the results once it finishes. I will also run it on s390x (soon). |
I see the following failures on ppc64le:
Not sure if this is related to your changes.
All those errors seem unrelated to your changes on ppc64le. |
Thank you, @adrianreber! |
On ppc64/aarch64 Linux can be set to use Large pages, so the PAGE_SIZE isn't build-time constant anymore. Define it through _SC_PAGESIZE. There are different sizes for a page on ppc64: : #if defined(CONFIG_PPC_256K_PAGES) : #define PAGE_SHIFT 18 : #elif defined(CONFIG_PPC_64K_PAGES) : #define PAGE_SHIFT 16 : #elif defined(CONFIG_PPC_16K_PAGES) : #define PAGE_SHIFT 14 : #else : #define PAGE_SHIFT 12 : #endif And on aarch64 there are default sizes and possibly someone can set his own PAGE_SHIFT: : config ARM64_PAGE_SHIFT : int : default 16 if ARM64_64K_PAGES : default 14 if ARM64_16K_PAGES : default 12 On the downside - each time we need PAGE_SIZE, we're doing libc function call on aarch64/ppc64. Fixes: checkpoint-restore#415 Tested-by: Adrian Reber <areber@redhat.com> Signed-off-by: Dmitry Safonov <dima@arista.com>
On ppc64/aarch64 Linux can be set to use Large pages, so the PAGE_SIZE isn't build-time constant anymore. Define it through _SC_PAGESIZE. There are different sizes for a page on ppc64: : #if defined(CONFIG_PPC_256K_PAGES) : #define PAGE_SHIFT 18 : #elif defined(CONFIG_PPC_64K_PAGES) : #define PAGE_SHIFT 16 : #elif defined(CONFIG_PPC_16K_PAGES) : #define PAGE_SHIFT 14 : #else : #define PAGE_SHIFT 12 : #endif And on aarch64 there are default sizes and possibly someone can set his own PAGE_SHIFT: : config ARM64_PAGE_SHIFT : int : default 16 if ARM64_64K_PAGES : default 14 if ARM64_16K_PAGES : default 12 On the downside - each time we need PAGE_SIZE, we're doing libc function call on aarch64/ppc64. Fixes: checkpoint-restore#415 Save a couple of cycles by having __page_size && __page_shift cached as suggested-by Mike. Tested-by: Adrian Reber <areber@redhat.com> Signed-off-by: Dmitry Safonov <dima@arista.com>
On ppc64/aarch64 Linux can be set to use Large pages, so the PAGE_SIZE isn't build-time constant anymore. Define it through _SC_PAGESIZE. There are different sizes for a page on ppc64: : #if defined(CONFIG_PPC_256K_PAGES) : #define PAGE_SHIFT 18 : #elif defined(CONFIG_PPC_64K_PAGES) : #define PAGE_SHIFT 16 : #elif defined(CONFIG_PPC_16K_PAGES) : #define PAGE_SHIFT 14 : #else : #define PAGE_SHIFT 12 : #endif And on aarch64 there are default sizes and possibly someone can set his own PAGE_SHIFT: : config ARM64_PAGE_SHIFT : int : default 16 if ARM64_64K_PAGES : default 14 if ARM64_16K_PAGES : default 12 On the downside - each time we need PAGE_SIZE, we're doing libc function call on aarch64/ppc64. Fixes: checkpoint-restore#415 Save a couple of cycles by having __page_size && __page_shift cached as suggested-by Mike. Signed-off-by: Dmitry Safonov <dima@arista.com>
On ppc64/aarch64 Linux can be set to use Large pages, so the PAGE_SIZE isn't build-time constant anymore. Define it through _SC_PAGESIZE. There are different sizes for a page on ppc64: : #if defined(CONFIG_PPC_256K_PAGES) : #define PAGE_SHIFT 18 : #elif defined(CONFIG_PPC_64K_PAGES) : #define PAGE_SHIFT 16 : #elif defined(CONFIG_PPC_16K_PAGES) : #define PAGE_SHIFT 14 : #else : #define PAGE_SHIFT 12 : #endif And on aarch64 there are default sizes and possibly someone can set his own PAGE_SHIFT: : config ARM64_PAGE_SHIFT : int : default 16 if ARM64_64K_PAGES : default 14 if ARM64_16K_PAGES : default 12 On the downside - each time we need PAGE_SIZE, we're doing libc function call on aarch64/ppc64. Fixes: checkpoint-restore#415 Save a couple of cycles by having __page_size && __page_shift cached as suggested-by Mike. Signed-off-by: Dmitry Safonov <dima@arista.com>
I'll fix aarch64 issue and resend patches this week (possibly on weekend, but I hope sooner). Ugh, sorry for the delay - had some other work to do :( |
Ok, I had a look what happens there. |
I think I will submit patches v1-alike, not as beautiful as v2, if fixing aarch64 issues will take much time. |
On ppc64/aarch64 Linux can be set to use Large pages, so the PAGE_SIZE isn't build-time constant anymore. Define it through _SC_PAGESIZE. There are different sizes for a page on ppc64: : #if defined(CONFIG_PPC_256K_PAGES) : #define PAGE_SHIFT 18 : #elif defined(CONFIG_PPC_64K_PAGES) : #define PAGE_SHIFT 16 : #elif defined(CONFIG_PPC_16K_PAGES) : #define PAGE_SHIFT 14 : #else : #define PAGE_SHIFT 12 : #endif And on aarch64 there are default sizes and possibly someone can set his own PAGE_SHIFT: : config ARM64_PAGE_SHIFT : int : default 16 if ARM64_64K_PAGES : default 14 if ARM64_16K_PAGES : default 12 On the downside - each time we need PAGE_SIZE, we're doing libc function call on aarch64/ppc64. Fixes: checkpoint-restore#415 Tested-by: Adrian Reber <areber@redhat.com> Signed-off-by: Dmitry Safonov <dima@arista.com>
On ppc64/aarch64 Linux can be set to use Large pages, so the PAGE_SIZE isn't build-time constant anymore. Define it through _SC_PAGESIZE. There are different sizes for a page on ppc64: : #if defined(CONFIG_PPC_256K_PAGES) : #define PAGE_SHIFT 18 : #elif defined(CONFIG_PPC_64K_PAGES) : #define PAGE_SHIFT 16 : #elif defined(CONFIG_PPC_16K_PAGES) : #define PAGE_SHIFT 14 : #else : #define PAGE_SHIFT 12 : #endif And on aarch64 there are default sizes and possibly someone can set his own PAGE_SHIFT: : config ARM64_PAGE_SHIFT : int : default 16 if ARM64_64K_PAGES : default 14 if ARM64_16K_PAGES : default 12 On the downside - each time we need PAGE_SIZE, we're doing libc function call on aarch64/ppc64. Fixes: checkpoint-restore#415 Tested-by: Adrian Reber <areber@redhat.com> Signed-off-by: Dmitry Safonov <dima@arista.com>
On ppc64/aarch64 Linux can be set to use Large pages, so the PAGE_SIZE isn't build-time constant anymore. Define it through _SC_PAGESIZE. There are different sizes for a page on ppc64: : #if defined(CONFIG_PPC_256K_PAGES) : #define PAGE_SHIFT 18 : #elif defined(CONFIG_PPC_64K_PAGES) : #define PAGE_SHIFT 16 : #elif defined(CONFIG_PPC_16K_PAGES) : #define PAGE_SHIFT 14 : #else : #define PAGE_SHIFT 12 : #endif And on aarch64 there are default sizes and possibly someone can set his own PAGE_SHIFT: : config ARM64_PAGE_SHIFT : int : default 16 if ARM64_64K_PAGES : default 14 if ARM64_16K_PAGES : default 12 On the downside - each time we need PAGE_SIZE, we're doing libc function call on aarch64/ppc64. Fixes: checkpoint-restore#415 Tested-by: Adrian Reber <areber@redhat.com> Signed-off-by: Dmitry Safonov <dima@arista.com>
On ppc64/aarch64 Linux can be set to use Large pages, so the PAGE_SIZE isn't build-time constant anymore. Define it through _SC_PAGESIZE. There are different sizes for a page on ppc64: : #if defined(CONFIG_PPC_256K_PAGES) : #define PAGE_SHIFT 18 : #elif defined(CONFIG_PPC_64K_PAGES) : #define PAGE_SHIFT 16 : #elif defined(CONFIG_PPC_16K_PAGES) : #define PAGE_SHIFT 14 : #else : #define PAGE_SHIFT 12 : #endif And on aarch64 there are default sizes and possibly someone can set his own PAGE_SHIFT: : config ARM64_PAGE_SHIFT : int : default 16 if ARM64_64K_PAGES : default 14 if ARM64_16K_PAGES : default 12 On the downside - each time we need PAGE_SIZE, we're doing libc function call on aarch64/ppc64. Fixes: checkpoint-restore#415 Tested-by: Adrian Reber <areber@redhat.com> Signed-off-by: Dmitry Safonov <dima@arista.com>
On ppc64/aarch64 Linux can be set to use Large pages, so the PAGE_SIZE isn't build-time constant anymore. Define it through _SC_PAGESIZE. There are different sizes for a page on ppc64: : #if defined(CONFIG_PPC_256K_PAGES) : #define PAGE_SHIFT 18 : #elif defined(CONFIG_PPC_64K_PAGES) : #define PAGE_SHIFT 16 : #elif defined(CONFIG_PPC_16K_PAGES) : #define PAGE_SHIFT 14 : #else : #define PAGE_SHIFT 12 : #endif And on aarch64 there are default sizes and possibly someone can set his own PAGE_SHIFT: : config ARM64_PAGE_SHIFT : int : default 16 if ARM64_64K_PAGES : default 14 if ARM64_16K_PAGES : default 12 On the downside - each time we need PAGE_SIZE, we're doing libc function call on aarch64/ppc64. Fixes: #415 Tested-by: Adrian Reber <areber@redhat.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
On ppc64/aarch64 Linux can be set to use Large pages, so the PAGE_SIZE isn't build-time constant anymore. Define it through _SC_PAGESIZE. There are different sizes for a page on ppc64: : #if defined(CONFIG_PPC_256K_PAGES) : #define PAGE_SHIFT 18 : #elif defined(CONFIG_PPC_64K_PAGES) : #define PAGE_SHIFT 16 : #elif defined(CONFIG_PPC_16K_PAGES) : #define PAGE_SHIFT 14 : #else : #define PAGE_SHIFT 12 : #endif And on aarch64 there are default sizes and possibly someone can set his own PAGE_SHIFT: : config ARM64_PAGE_SHIFT : int : default 16 if ARM64_64K_PAGES : default 14 if ARM64_16K_PAGES : default 12 On the downside - each time we need PAGE_SIZE, we're doing libc function call on aarch64/ppc64. Fixes: checkpoint-restore#415 Tested-by: Adrian Reber <areber@redhat.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Closing as it is fixed in criu-dev |
On ppc64/aarch64 Linux can be set to use Large pages, so the PAGE_SIZE isn't build-time constant anymore. Define it through _SC_PAGESIZE. There are different sizes for a page on ppc64: : #if defined(CONFIG_PPC_256K_PAGES) : #define PAGE_SHIFT 18 : #elif defined(CONFIG_PPC_64K_PAGES) : #define PAGE_SHIFT 16 : #elif defined(CONFIG_PPC_16K_PAGES) : #define PAGE_SHIFT 14 : #else : #define PAGE_SHIFT 12 : #endif And on aarch64 there are default sizes and possibly someone can set his own PAGE_SHIFT: : config ARM64_PAGE_SHIFT : int : default 16 if ARM64_64K_PAGES : default 14 if ARM64_16K_PAGES : default 12 On the downside - each time we need PAGE_SIZE, we're doing libc function call on aarch64/ppc64. Fixes: checkpoint-restore#415 Tested-by: Adrian Reber <areber@redhat.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
On ppc64/aarch64 Linux can be set to use Large pages, so the PAGE_SIZE isn't build-time constant anymore. Define it through _SC_PAGESIZE. There are different sizes for a page on ppc64: : #if defined(CONFIG_PPC_256K_PAGES) : #define PAGE_SHIFT 18 : #elif defined(CONFIG_PPC_64K_PAGES) : #define PAGE_SHIFT 16 : #elif defined(CONFIG_PPC_16K_PAGES) : #define PAGE_SHIFT 14 : #else : #define PAGE_SHIFT 12 : #endif And on aarch64 there are default sizes and possibly someone can set his own PAGE_SHIFT: : config ARM64_PAGE_SHIFT : int : default 16 if ARM64_64K_PAGES : default 14 if ARM64_16K_PAGES : default 12 On the downside - each time we need PAGE_SIZE, we're doing libc function call on aarch64/ppc64. Fixes: checkpoint-restore#415 Tested-by: Adrian Reber <areber@redhat.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
On ppc64/aarch64 Linux can be set to use Large pages, so the PAGE_SIZE isn't build-time constant anymore. Define it through _SC_PAGESIZE. There are different sizes for a page on ppc64: : #if defined(CONFIG_PPC_256K_PAGES) : #define PAGE_SHIFT 18 : #elif defined(CONFIG_PPC_64K_PAGES) : #define PAGE_SHIFT 16 : #elif defined(CONFIG_PPC_16K_PAGES) : #define PAGE_SHIFT 14 : #else : #define PAGE_SHIFT 12 : #endif And on aarch64 there are default sizes and possibly someone can set his own PAGE_SHIFT: : config ARM64_PAGE_SHIFT : int : default 16 if ARM64_64K_PAGES : default 14 if ARM64_16K_PAGES : default 12 On the downside - each time we need PAGE_SIZE, we're doing libc function call on aarch64/ppc64. Fixes: checkpoint-restore#415 Tested-by: Adrian Reber <areber@redhat.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
On ppc64/aarch64 Linux can be set to use Large pages, so the PAGE_SIZE isn't build-time constant anymore. Define it through _SC_PAGESIZE. There are different sizes for a page on ppc64: : #if defined(CONFIG_PPC_256K_PAGES) : #define PAGE_SHIFT 18 : #elif defined(CONFIG_PPC_64K_PAGES) : #define PAGE_SHIFT 16 : #elif defined(CONFIG_PPC_16K_PAGES) : #define PAGE_SHIFT 14 : #else : #define PAGE_SHIFT 12 : #endif And on aarch64 there are default sizes and possibly someone can set his own PAGE_SHIFT: : config ARM64_PAGE_SHIFT : int : default 16 if ARM64_64K_PAGES : default 14 if ARM64_16K_PAGES : default 12 On the downside - each time we need PAGE_SIZE, we're doing libc function call on aarch64/ppc64. Fixes: checkpoint-restore#415 Tested-by: Adrian Reber <areber@redhat.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
I am trying to run criu on aarch64 with 4.14 and I think it already worked before. Right now all test cases are failing:
But:
https://lisas.de/~adrian/aarch64-dump.log
Using today's git checkout.
Looking at the latest travis logs for aarch64 CRIU is only built but zdtm does not actually run.
Am I just missing a kernel option or why is CRIU on aarch64 so unhappy for me?
The text was updated successfully, but these errors were encountered: