Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Complete aarch64 failure #415

Closed
adrianreber opened this issue Nov 21, 2017 · 67 comments
Closed

Complete aarch64 failure #415

adrianreber opened this issue Nov 21, 2017 · 67 comments
Labels

Comments

@adrianreber
Copy link
Member

I am trying to run criu on aarch64 with 4.14 and I think it already worked before. Right now all test cases are failing:

[criu]# criu/criu --version
Version: 3.5
GitID: v3.5-509-gf18a63b
[criu]# criu/criu check --all
Warn  (criu/cr-check.c:680): Dirty tracking is OFF. Memory snapshot will not work.
Warn  (criu/cr-check.c:1061): Do not have API to map vDSO - will use mremap() to restore vDSO
Warn  (criu/cr-check.c:1029): CRIU built without CONFIG_COMPAT - can't C/R compatible tasks
Looks good but some kernel features are missing
which, depending on your process tree, may cause
dump or restore failure.

But:

========================== Run zdtm/static/poll in h ===========================
Start test
./poll --pidfile=poll.pid --outfile=poll.out
Run criu dump
=[log]=> dump/zdtm/static/poll/24/1/dump.log
------------------------ grep Error ------------------------
(00.014483) Pagemap generated: 0 pages (0 lazy) 0 holes
(00.014487) pagemap-cache: filling VMA 1ccb0000-1cce0000 (192K) [l:1cc00000 h:1ce00000]
(00.014509) Pagemap generated: 0 pages (0 lazy) 0 holes
(00.014514) pagemap-cache: filling VMA ffffab8e0000-ffffaba50000 (1472K) [l:ffffab800000 h:ffffaba00000]
(00.014531) Error (criu/pagemap-cache.c:159): pagemap-cache: Can't read 24's pagemap file: No such file or directory
(00.014537) Error (criu/pagemap-cache.c:175): pagemap-cache: Failed to fill cache for 24 (ffffab8e0000-ffffaba50000)
(00.019010) Pagemap generated: 0 pages (0 lazy) 0 holes
(00.019014) pagemap-cache: filling VMA 1ccb0000-1cce0000 (192K) [l:1cc00000 h:1ce00000]
(00.019036) Pagemap generated: 0 pages (0 lazy) 0 holes
(00.019040) pagemap-cache: filling VMA ffffab8e0000-ffffaba50000 (1472K) [l:ffffab800000 h:ffffaba00000]
(00.019050) Error (criu/pagemap-cache.c:159): pagemap-cache: Can't read 25's pagemap file: No such file or directory
(00.019056) Error (criu/pagemap-cache.c:175): pagemap-cache: Failed to fill cache for 25 (ffffab8e0000-ffffaba50000)
------------------------ ERROR OVER ------------------------
Unable to kill 24: [Errno 3] No such process
Run criu restore
=[log]=> dump/zdtm/static/poll/24/1/restore.log
------------------------ grep Error ------------------------
(00.103484)     24: Search for 0x00ffffaba70000 shmem 0x659e9 0xffffa0e80ae8/24
(00.103487)     25: Search for 0x00ffffaba70000 shmem 0x659e9 0xffffa0e80ae8/24
(00.103494)     25: Waiting for the 659e9 shmem to appear
(00.103567)     24: No pagemap-shmem-416233.img image
(00.103579)     24: Error (criu/shmem.c:559): Can't restore shmem content
(00.103612)     24: Error (criu/mem.c:1208): `- Can't open vma
(00.103675) Error (criu/cr-restore.c:2449): Restoring FAILED.
------------------------ ERROR OVER ------------------------
################## Test zdtm/static/poll FAIL at CRIU restore ##################
##################################### FAIL #####################################

https://lisas.de/~adrian/aarch64-dump.log

Using today's git checkout.

Looking at the latest travis logs for aarch64 CRIU is only built but zdtm does not actually run.

Am I just missing a kernel option or why is CRIU on aarch64 so unhappy for me?

@0x7f454c46
Copy link
Member

0x7f454c46 commented Nov 21, 2017

Could you give it a shot with this patch by any chance?

@adrianreber
Copy link
Member Author

Still crashes:

------------------------ grep Error ------------------------
(00.014520) Pagemap generated: 0 pages (0 lazy) 0 holes
(00.014524) pagemap-cache: filling VMA 25350000-25380000 (192K) [l:25200000 h:25400000]
(00.014541) Pagemap generated: 0 pages (0 lazy) 0 holes
(00.014545) pagemap-cache: filling VMA ffff7ead0000-ffff7ec40000 (1472K) [l:ffff7ea00000 h:ffff7ec00000]
(00.014563) Error (criu/pagemap-cache.c:159): pagemap-cache: Can't read 24's pagemap file: No such file or directory
(00.014571) Error (criu/pagemap-cache.c:175): pagemap-cache: Failed to fill cache for 24 (ffff7ead0000-ffff7ec40000)
(00.019047) Pagemap generated: 0 pages (0 lazy) 0 holes
(00.019051) pagemap-cache: filling VMA 25350000-25380000 (192K) [l:25200000 h:25400000]
(00.019068) Pagemap generated: 0 pages (0 lazy) 0 holes
(00.019072) pagemap-cache: filling VMA ffff7ead0000-ffff7ec40000 (1472K) [l:ffff7ea00000 h:ffff7ec00000]
(00.019082) Error (criu/pagemap-cache.c:159): pagemap-cache: Can't read 25's pagemap file: No such file or directory
(00.019089) Error (criu/pagemap-cache.c:175): pagemap-cache: Failed to fill cache for 25 (ffff7ead0000-ffff7ec40000)
------------------------ ERROR OVER ------------------------

@0x7f454c46
Copy link
Member

Yeah, I only later realized that you running in h, not testing in a namespace.

@avagin
Copy link
Member

avagin commented Nov 21, 2017

@adrianreber could you run python test/zdtm.py -t zdtm/static/env00 -f h --sat and show all log and strace files

@avagin
Copy link
Member

avagin commented Nov 21, 2017

is CONFIG_PROC_PAGE_MONITOR enabled for your kernel?

ls -l /proc/self/pagemap

@avagin
Copy link
Member

avagin commented Nov 22, 2017

I tired to reproduce this issue in a qemu vm on my laptop:

[root@localhost criu]# python test/zdtm.py run -t zdtm/static/env00
=== Run 1/1 ================ zdtm/static/env00

========================== Run zdtm/static/env00 in h ==========================
Start test
./env00 --pidfile=env00.pid --outfile=env00.out --envname=ENV_00_TEST
Run criu dump
Run criu restore
Send the 15 signal to  24
Wait for zdtm/static/env00(24) to die for 0.100000
Wait for zdtm/static/env00(24) to die for 0.200000
Wait for zdtm/static/env00(24) to die for 0.400000
Wait for zdtm/static/env00(24) to die for 0.800000
tail: cannot open 'zdtm/static/env00.out' for reading: No such file or directory
======================= zdtm/static/env00.out.inprogress =======================

======================= zdtm/static/env00.out.inprogress =======================
################# Test zdtm/static/env00 FAIL at result check ##################
##################################### FAIL #####################################
[root@localhost criu]# dmesg -c
[ 2895.975628] env00[8023]: unhandled level 2 translation fault (11) at 0xffffbd9996c0, esr 0x82000006
[ 2895.978721] CPU: 1 PID: 8023 Comm: env00 Not tainted 4.13.9-300.fc27.aarch64 #1
[ 2895.979135] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
[ 2895.979569] task: ffffb7c9f7229e00 task.stack: ffffb7c9f7768000
[ 2895.980195] PC is at 0xffffbd9996c0
[ 2895.980403] LR is at 0xffffbd9996c0
[ 2895.980631] pc : [<0000ffffbd9996c0>] lr : [<0000ffffbd9996c0>] pstate: 00000000
[ 2895.981720] sp : 0000ffffc1ce0740
[ 2895.981929] x29: 0000ffffc1ce1990 x28: 0000000000000000 
[ 2895.982228] x27: 0000000000000000 x26: 0000000000000000 
[ 2895.982490] x25: 0000000000000000 x24: 0000000000000000 
[ 2895.982775] x23: 0000000000000000 x22: 0000000000000000 
[ 2895.983057] x21: 0000000000401b30 x20: 0000000000420000 
[ 2895.983310] x19: 00000000004203b8 x18: 0000000000000000 
[ 2895.983687] x17: 0000ffffaf71fcb0 x16: 0000000000420238 
[ 2895.983987] x15: 0000ffffaf635a60 x14: 0000000000000000 
[ 2895.984246] x13: 0000ffffaf7d6000 x12: 0000ffffaf637760 
[ 2895.984846] x11: 0000ffffaf644dd8 x10: 0000ffffaf637d28 
[ 2895.985114] x9 : 0000000000000000 x8 : 0000000000000062 
[ 2895.985351] x7 : 0000ffffaf814000 x6 : 0000ffffaf814000 
[ 2895.985630] x5 : 0000000000000000 x4 : 0000000000000000 
[ 2895.985823] x3 : 0000000000000000 x2 : 000000007fffffff 
[ 2895.986008] x1 : 0000000000000001 x0 : 0000000000000000 

@adrianreber
Copy link
Member Author

CONFIG_PROC_PAGE_MONITOR is enabled

# ls -l /proc/self/pagemap 
-r--------. 1 root root 0 Nov 22 05:26 /proc/self/pagemap
# ./zdtm.py run -t zdtm/static/env00  -f h --sat
=== Run 1/1 ================ zdtm/static/env00

========================== Run zdtm/static/env00 in h ==========================
Start test
./env00 --pidfile=env00.pid --outfile=env00.out --envname=ENV_00_TEST
Run criu dump
=[strace]=> dump/zdtm/static/env00/24/1/dump.strace
=[log]=> dump/zdtm/static/env00/24/1/dump.log
------------------------ grep Error ------------------------
(00.041236) Pagemap generated: 0 pages (0 lazy) 0 holes
(00.041273) pagemap-cache: filling VMA 3b900000-3b930000 (192K) [l:3b800000 h:3ba00000]
(00.041379) Pagemap generated: 0 pages (0 lazy) 0 holes
(00.041425) pagemap-cache: filling VMA ffff8bcb0000-ffff8be20000 (1472K) [l:ffff8bc00000 h:ffff8be00000]
(00.041506) Error (criu/pagemap-cache.c:159): pagemap-cache: Can't read 24's pagemap file: No such file or directory
(00.041551) Error (criu/pagemap-cache.c:175): pagemap-cache: Failed to fill cache for 24 (ffff8bcb0000-ffff8be20000)
------------------------ ERROR OVER ------------------------
Unable to kill 24: [Errno 3] No such process
Run criu restore
=[strace]=> dump/zdtm/static/env00/24/1/restore.strace
=[log]=> dump/zdtm/static/env00/24/1/restore.log
------------------------ grep Error ------------------------
(00.117129)     24:  `- type 1 ID 0x6
(00.117170)     24:    `- FD 3 pid 24
(00.117210)     24:  `- type 1 ID 0x7
(00.117604)     24: Opened local page read 1 (parent 0)
(00.118106) Error (criu/namespaces.c:1604): Spurious pid ns helper: pid=24
(00.118212) Error (criu/cr-restore.c:1547): 24 killed by signal 11: Segmentation fault
(00.118416) Error (criu/cr-restore.c:2449): Restoring FAILED.
------------------------ ERROR OVER ------------------------
========================= Test zdtm/static/env00 PASS ==========================

https://lisas.de/~adrian/aarch64/

@0x7f454c46
Copy link
Member

@avagin 4.13.9-300.fc27.aarch64, I guess this one is not there (it's 4.14-rc1):
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=739586951b

@avagin
Copy link
Member

avagin commented Nov 22, 2017

[root@localhost criu]# python test/zdtm.py run -t zdtm/static/poll -f h
=== Run 1/1 ================ zdtm/static/poll

========================== Run zdtm/static/poll in h ===========================
Start test
./poll --pidfile=poll.pid --outfile=poll.out
Run criu dump
Unable to kill 24: [Errno 3] No such process
Run criu restore
Send the 15 signal to  24
Wait for zdtm/static/poll(24) to die for 0.100000
Unable to kill 24: [Errno 3] No such process
Removing dump/zdtm/static/poll/24
========================== Test zdtm/static/poll PASS ==========================
[root@localhost criu]# uname -a
Linux localhost.localdomain 4.14.0-1.fc28.aarch64 #1 SMP Mon Nov 13 14:05:55 UTC 2017 aarch64 aarch64 aarch64 GNU/Linux

@adrianreber
Copy link
Member Author

@0x7f454c46 , @avagin Not totally sure what the situation here is now? Is my kernel missing a patch or my CRIU checkout?

@0x7f454c46
Copy link
Member

@adrianreber just for easier reproduction, you use vanilla v4.14 kernel with master 3.5 criu?
Could you try with vanilla criu and/or vanilla kernel to see if it reproduces?
And if so - could you upload the kernel .config?
If it doesn't reproduce - then I think it might be simple to bisect this.

@avagin
Copy link
Member

avagin commented Dec 4, 2017

@adrianreber I think your kernel is missing a patch. I checked the vanilla 4.14 kernel, and tests passed.

@adrianreber
Copy link
Member Author

@avagin any idea which patch that might be? I will also start a test with the vanilla kernel.

@avagin
Copy link
Member

avagin commented Dec 5, 2017

I have no idea. You can try to check this list https://criu.org/Upstream_kernel_commits

I know that my problem was due to this patch:
torvalds: 739586951b arm64/vdso: Support mremap() for vDSO 4.14-rc1

@adrianreber
Copy link
Member Author

I tried it today with 4.15.0-rc2 from kernel.org and I get the same error.

commit 2391f0b4808e3d5af348324d69f5f45c56a26836
Merge: 236fa07 d9e427f
Author: Linus Torvalds <...>
Date: Mon Dec 4 11:32:02 2017 -0800

Am I missing a config option? Or is my user-space broken? I will try 'make defconfig' to see if this might fix it.

@0x7f454c46
Copy link
Member

@adrianreber Do you use vanilla criu?
Because as I pointed, I don't see such git-hash in my criu repo: v3.5-509-gf18a63b
So, I would suggest to try with defconfig + vanilla criu and see if that works for you, then looking what does the breakage.

@adrianreber
Copy link
Member Author

@0x7f454c46 That is my CRIU version:

Version: 3.6
GitID: v3.6-428-g0c1b1d0

With a make defconfig kernel (plus a few additional configs turned on for CRIU) I now get

======================= Run zdtm/static/busyloop00 in h ========================
Start test
./busyloop00 --pidfile=busyloop00.pid --outfile=busyloop00.out
Run criu dump
Unable to kill 24: [Errno 3] No such process
Run criu restore
Send the 15 signal to  24
Wait for zdtm/static/busyloop00(24) to die for 0.100000
Unable to kill 24: [Errno 3] No such process
======================= Test zdtm/static/busyloop00 PASS =======================

Besides the 'Unable to kill 24: [Errno 3] No such process' it seems to work. Not sure how fatal that is. The log files look correctly.

@adrianreber
Copy link
Member Author

I think I found it. As soon as I set CONFIG_ARM64_64K_PAGES=y the error is back. I guess it is a valid configuration option. Not sure how CRIU should handle it.

@0x7f454c46
Copy link
Member

@adrianreber Cool, thanks!
Will try to find time to look at it if no-one beats me at it :)

@0x7f454c46
Copy link
Member

Just random idea how it could affect the code: we have this PAGE_SIZE macros and functions, they might work not as it's expected.

@avagin avagin added the kernel label Dec 30, 2017
@rppt
Copy link
Member

rppt commented Feb 20, 2018

@0x7f454c46 arm64 kernel may also have CONFIG_ARM64_16K_PAGES=y ;-)

@adrianreber
Copy link
Member Author

Adding my post to the mailing list also here:

https://lists.openvz.org/pipermail/criu/2018-February/040524.html

@rppt
Copy link
Member

rppt commented Mar 14, 2018

I finally found some time to create arm64 VM and to try to debug this.
I could reproduce the crash @avagin reported eralier (#415 (comment)) with 4.13.9-300.fc27.aarch64. However, with 4.15.8-300.fc27.aarch64 tests are passing, at least with 4K page size.

I'm going to try 4.15.. with 16K and 64K in the next few days.

@adrianreber are 4.13 and 4.14 still relevant?
As far as I can tell, 4.13 is EOL, at least for the mainline.

@adrianreber
Copy link
Member Author

@rppt Thanks for trying. The interesting thing for me would be if it works with 64K pages. From my tests (https://lists.openvz.org/pipermail/criu/2018-February/040524.html) I would say it will not work as CRIU is hardcoded to 4K. With the changes in that mail it works, but after the restore the process crashes with:

[ 5281.926998] busyloop00[2000]: unhandled level 3 translation fault (11) at 0xffffa95a06c0, esr 0x82000007

Kernel version would be something like 4.14 but if it works for you on a newer kernel I can do the necessary backports for my kernel (probably).

@rppt
Copy link
Member

rppt commented Mar 18, 2018

@adrianreber I've tested with 4.15.7 kernel built with CONFIG_ARM64_64K_PAGES=y and env00 passed with hardcoding PAGE_SHIFT to 16 and PAGE_IMAGE_SIZE to 64k:

index d60cbadd..63e21c1f 100644
--- a/criu/include/image.h
+++ b/criu/include/image.h
@@ -15,7 +15,7 @@
 #ifdef _ARCH_PPC64
 #define PAGE_IMAGE_SIZE        65536
 #else
-#define PAGE_IMAGE_SIZE        4096
+#define PAGE_IMAGE_SIZE        65536
 #endif /* _ARCH_PPC64 */
 #define PAGE_RSS       1
 #define PAGE_ANON      2
diff --git a/include/common/arch/aarch64/asm/page.h b/include/common/arch/aarch64/asm/page.h
index de1fe542..8fb1f616 100644
--- a/include/common/arch/aarch64/asm/page.h
+++ b/include/common/arch/aarch64/asm/page.h
@@ -4,7 +4,7 @@
 #include <unistd.h>
 
 #ifndef PAGE_SHIFT
-# define PAGE_SHIFT    12
+# define PAGE_SHIFT    16
 #endif
 
 #ifndef PAGE_SIZE

I've tested on a fedora27 VM (qemu tcg)

@adrianreber
Copy link
Member Author

@rppt: Thanks, I was able to run the test-suite with only four errors:

################### 4 TEST(S) FAILED (TOTAL 350/SKIPPED 113) ###################
 * zdtm/static/fd(unknown)
 * zdtm/static/maps06(unknown)
 * zdtm/static/cow01(unknown)
 * zdtm/transition/maps007(unknown)
##################################### FAIL #####################################

Any ideas how we could handle the page size automatically? Should we try to detect the page size with CRIU's feature-check script?

@0x7f454c46
Copy link
Member

0x7f454c46 commented Mar 20, 2018 via email

@0x7f454c46
Copy link
Member

0x7f454c46 commented Mar 20, 2018 via email

@0x7f454c46
Copy link
Member

0x7f454c46 commented Mar 20, 2018 via email

@0x7f454c46
Copy link
Member

0x7f454c46 commented Mar 23, 2018

So, @rppt, @adrianreber, if you've time to test, here is a version that builds:
https://github.com/0x7f454c46/criu/commits/wip/PAGE_SIZE
I warn you that I'll rebase it, but only to write commit messages, no functional changes.
It passes building on Travis-CI on aarch64/ppc64, so hopefully, it may even work.

@adrianreber
Copy link
Member Author

On aarch64 I see many ERROR OVER with following output:

############## Test zdtm/static/uffd-events FAIL at CRIU restore ###############
=== Run 198/350 =========------- zdtm/static/thread_different_uid_gid

================ Run zdtm/static/thread_different_uid_gid in h =================
Start test
Test is SUID
./thread_different_uid_gid --pidfile=thread_different_uid_gid.pid --outfile=thread_different_uid_gid.out
Run criu dump
Run criu restore
=[log]=> dump/zdtm/static/thread_different_uid_gid/24/1/restore.log
------------------------ grep Error ------------------------
(00.055712)     24: Opened local page read 1 (parent 0)
(00.055761)     24: Enqueue page-read
(00.055768)     24: Enqueue page-read
(00.055773)     24: Enqueue page-read
(00.055777)     24: Error (criu/mem.c:1103): Page entry address 430000 outside of VMA 3ad70000-3ada0000
(00.055800) Error (criu/cr-restore.c:2346): Failed to wait inprogress tasks
(00.055843) Error (criu/cr-restore.c:2523): Restoring FAILED.
------------------------ ERROR OVER ------------------------
######## Test zdtm/static/thread_different_uid_gid FAIL at CRIU restore ########

the whole test suite takes a unusually long time. It is still running. The first few testcases on ppc64le look correct.

@0x7f454c46
Copy link
Member

@adrianreber thanks for testing - will see what's going on there this weekend.

@adrianreber
Copy link
Member Author

Sorry, wrong branch on ppc64le, also lot's of errors with your PAGE_SIZE branch:

======================== Run zdtm/static/futex-rl in h =========================
 LINK      futex-rl
Start test
./futex-rl --pidfile=futex-rl.pid --outfile=futex-rl.out
Run criu dump
Run criu restore
=[log]=> dump/zdtm/static/futex-rl/41/1/restore.log
------------------------ grep Error ------------------------
(00.032953)     41: Opened local page read 1 (parent 0)
(00.032962)     41: Enqueue page-read
(00.032965)     41: Enqueue page-read
(00.032968)     41: Enqueue page-read
(00.032971)     41: Error (criu/mem.c:1103): Page entry address 10030000 outside of VMA 4c400000-4c430000
(00.033326) Error (criu/namespaces.c:1604): Spurious pid ns helper: pid=41
(00.033335) Error (criu/cr-restore.c:1583): 41 exited, status=1
(00.033341) Error (criu/cr-restore.c:2346): Failed to wait inprogress tasks
(00.033350) Error (criu/cr-restore.c:2523): Restoring FAILED.
------------------------ ERROR OVER ------------------------
################ Test zdtm/static/futex-rl FAIL at CRIU restore ################

@0x7f454c46
Copy link
Member

0x7f454c46 commented Mar 23, 2018

Looks like there is a common issue with that on restore..
I'll check, thanks again!

@0x7f454c46
Copy link
Member

@adrianreber could you give it another shot - I've added a fixup on the top of the branch.
It looks like, I made stupid off-by-one, hehe

@adrianreber
Copy link
Member Author

The first twenty tests are now passing on aarch64 and ppc64le without errors. I keep zdtm running and will post the results once it finishes. I will also run it on s390x (soon).

@adrianreber
Copy link
Member Author

I see the following failures on ppc64le:

################### 10 TEST(S) FAILED (TOTAL 339/SKIPPED 33) ###################
 * zdtm/static/unhashed_proc(unknown)
 * zdtm/static/maps06(unknown)
 * zdtm/static/deleted_unix_sock(unknown)
 * zdtm/static/unlink_fstat03(unknown)
 * zdtm/static/inotify00(unknown)
 * zdtm/static/mntns_link_remap(unknown)
 * zdtm/static/mntns_ghost(unknown)
 * zdtm/static/mntns_ghost01(unknown)
 * zdtm/static/unlink_regular00(unknown)
 * zdtm/static/del_standalone_un(unknown)
##################################### FAIL #####################################

Not sure if this is related to your changes.

=== Run 310/339 ==============-- zdtm/static/del_standalone_un

==================== Run zdtm/static/del_standalone_un in h ====================
Start test
./del_standalone_un --pidfile=del_standalone_un.pid --outfile=del_standalone_un.out --dirname=del_standalone_un.test
make: *** [del_standalone_un.pid] Error 1
 Test zdtm/static/del_standalone_un FAIL at start: [Errno 2] No such file or directory: 'zdtm/static/del_standalone_un.pid' 
Test output: ================================
10:32:07.240:    36: FAIL: del_standalone_un.c:43: bind /root/criu/test/zdtm/static/del_standalone_un.test/sock (errno = 13 (Permission denied))
10:32:07.240:    35: ERR: test.c:252: Test exited unexpectedly with code 1

 <<< ================================

==================== Run zdtm/static/unlink_regular00 in ns ====================
Start test
Test is SUID
./unlink_regular00 --pidfile=unlink_regular00.pid --outfile=unlink_regular00.out --dirname=unlink_regular00.test
Run criu dump
Run criu restore
zdtm/static/unlink_regular00: link-remap files left: ['zdtm/static/link_remap.8']
########## Test zdtm/static/unlink_regular00 FAIL at link remaps left ##########
Send the 9 signal to  62
Wait for zdtm/static/unlink_regular00(62) to die for 0.100000

=== Run 290/339 =============--- zdtm/static/mntns_ghost

====================== Run zdtm/static/mntns_ghost in ns =======================
Start test
Test is SUID
./mntns_ghost --pidfile=mntns_ghost.pid --outfile=mntns_ghost.out --dirname=mntns_ghost.test
Run criu dump
Run criu restore
zdtm/static/mntns_ghost: link-remap files left: ['zdtm/static/link_remap.8']
############ Test zdtm/static/mntns_ghost FAIL at link remaps left #############
Send the 9 signal to  61
Wait for zdtm/static/mntns_ghost(61) to die for 0.100000
=== Run 291/339 =============--- zdtm/static/mntns_ghost01

===================== Run zdtm/static/mntns_ghost01 in ns ======================
Start test
Test is SUID
./mntns_ghost01 --pidfile=mntns_ghost01.pid --outfile=mntns_ghost01.out --dirname=mntns_ghost01.test
Run criu dump
Run criu restore
zdtm/static/mntns_ghost01: link-remap files left: ['zdtm/static/link_remap.8']
########### Test zdtm/static/mntns_ghost01 FAIL at link remaps left ############
Send the 9 signal to  61
Wait for zdtm/static/mntns_ghost01(61) to die for 0.100000
Wait for zdtm/static/mntns_ghost01(61) to die for 0.200000

=== Run 289/339 =============--- zdtm/static/mntns_link_remap

=================== Run zdtm/static/mntns_link_remap in uns ====================
Start test
Test is SUID
./mntns_link_remap --pidfile=mntns_link_remap.pid --outfile=mntns_link_remap.out --dirname=mntns_link_remap.test
Run criu dump
Run criu restore
zdtm/static/mntns_link_remap: link-remap files left: ['zdtm/static/link_remap.8']
########## Test zdtm/static/mntns_link_remap FAIL at link remaps left ##########
Send the 9 signal to  68
Wait for zdtm/static/mntns_link_remap(68) to die for 0.100000

=== Run 275/339 ============---- zdtm/static/inotify00

======================== Run zdtm/static/inotify00 in h ========================
Start test
./inotify00 --pidfile=inotify00.pid --outfile=inotify00.out --dirname=inotify00.test
Run criu dump
Run criu restore
zdtm/static/inotify00: link-remap files left: ['zdtm/static/link_remap.8']
############# Test zdtm/static/inotify00 FAIL at link remaps left ##############
Send the 9 signal to  36

=== Run 216/339 ==========------ zdtm/static/unlink_fstat03

===================== Run zdtm/static/unlink_fstat03 in h ======================
Start test
./unlink_fstat03 --pidfile=unlink_fstat03.pid --outfile=unlink_fstat03.out --filename=unlink_fstat03.test
Run criu dump
=[log]=> dump/zdtm/static/unlink_fstat03/36/1/dump.log
------------------------ grep Error ------------------------
(00.015451) 36 fdinfo 3: pos:                0 flags:           200000/0
(00.015463) Dumping path for 3 fd via self 12 [/root/criu/test/zdtm/static]
(00.015487) 36 fdinfo 4: pos:            0x3e8 flags:           200001/0
(00.015499) Dumping path for 4 fd via self 13 [/root/criu/test/zdtm/static/unlink_fstat03.test (deleted)]
(00.015511) Warn  (criu/files-reg.c:1408): Can't link  -> ./root/criu/test/zdtm/static/link_remap.8(00.015517) Error (criu/files-reg.c:961): Can't link remap to /root/criu/test/zdtm/static/unlink_fstat03.test (deleted): File exists
(00.015522) ----------------------------------------
(00.015540) Error (criu/cr-dump.c:1447): Dump files (pid: 36) failed with -1
(00.016688) 36 was stopped
(00.016783) Unlock network
(00.016821) Unfreezing tasks into 1
(00.016825) 	Unseizing 36 into 1
(00.016834) Error (criu/cr-dump.c:1840): Dumping FAILED.
------------------------ ERROR OVER ------------------------
############## Test zdtm/static/unlink_fstat03 FAIL at CRIU dump ###############
Send the 9 signal to  36
Wait for zdtm/static/unlink_fstat03(36) to die for 0.100000

=== Run 209/339 =========------- zdtm/static/deleted_unix_sock

==================== Run zdtm/static/deleted_unix_sock in h ====================
Start test
./deleted_unix_sock --pidfile=deleted_unix_sock.pid --outfile=deleted_unix_sock.out --filename=deleted_unix_sock.test
make: *** [deleted_unix_sock.pid] Error 1
 Test zdtm/static/deleted_unix_sock FAIL at start: [Errno 2] No such file or directory: 'zdtm/static/deleted_unix_sock.pid' 
Test output: ================================
10:21:48.759:    36: ERR: deleted_unix_sock.c:50: can't bind to socket "deleted_unix_sock.test" (errno = 13 (Permission denied))
10:21:48.759:    35: ERR: test.c:252: Test exited unexpectedly with code 1

 <<< ================================

=== Run 201/339 =========------- zdtm/static/maps06

========================= Run zdtm/static/maps06 in h ==========================
Start test
./maps06 --pidfile=maps06.pid --outfile=maps06.out --filename=maps06.test
make: *** [maps06.pid] Error 1
 Test zdtm/static/maps06 FAIL at start: [Errno 2] No such file or directory: 'zdtm/static/maps06.pid' 
Test output: ================================
10:20:59.148:    35: ERR: test.c:252: Test exited unexpectedly with code 1

 <<< ================================

=== Run 142/339 ======---------- zdtm/static/unhashed_proc

====================== Run zdtm/static/unhashed_proc in h ======================
Start test
./unhashed_proc --pidfile=unhashed_proc.pid --outfile=unhashed_proc.out
Run criu dump
Run criu restore
zdtm/static/unhashed_proc: link-remap files left: ['zdtm/static/link_remap.8']
########### Test zdtm/static/unhashed_proc FAIL at link remaps left ############
Send the 9 signal to  36
Wait for zdtm/static/unhashed_proc(36) to die for 0.100000

All those errors seem unrelated to your changes on ppc64le.

@0x7f454c46
Copy link
Member

0x7f454c46 commented Mar 23, 2018

Thank you, @adrianreber!
So, I'll write some commit messages and post patches with your Tested-by (will Cc you and @rppt).

criupatchwork pushed a commit to criupatchwork/criu that referenced this issue Mar 26, 2018
On ppc64/aarch64 Linux can be set to use Large pages, so the PAGE_SIZE
isn't build-time constant anymore. Define it through _SC_PAGESIZE.

There are different sizes for a page on ppc64:
: #if defined(CONFIG_PPC_256K_PAGES)
: #define PAGE_SHIFT              18
: #elif defined(CONFIG_PPC_64K_PAGES)
: #define PAGE_SHIFT              16
: #elif defined(CONFIG_PPC_16K_PAGES)
: #define PAGE_SHIFT              14
: #else
: #define PAGE_SHIFT              12
: #endif

And on aarch64 there are default sizes and possibly someone can set his
own PAGE_SHIFT:
: config ARM64_PAGE_SHIFT
:         int
:         default 16 if ARM64_64K_PAGES
:         default 14 if ARM64_16K_PAGES
:         default 12

On the downside - each time we need PAGE_SIZE, we're doing libc
function call on aarch64/ppc64.

Fixes: checkpoint-restore#415

Tested-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
0x7f454c46 added a commit to 0x7f454c46/criu that referenced this issue Mar 28, 2018
On ppc64/aarch64 Linux can be set to use Large pages, so the PAGE_SIZE
isn't build-time constant anymore. Define it through _SC_PAGESIZE.

There are different sizes for a page on ppc64:
: #if defined(CONFIG_PPC_256K_PAGES)
: #define PAGE_SHIFT              18
: #elif defined(CONFIG_PPC_64K_PAGES)
: #define PAGE_SHIFT              16
: #elif defined(CONFIG_PPC_16K_PAGES)
: #define PAGE_SHIFT              14
: #else
: #define PAGE_SHIFT              12
: #endif

And on aarch64 there are default sizes and possibly someone can set his
own PAGE_SHIFT:
: config ARM64_PAGE_SHIFT
:         int
:         default 16 if ARM64_64K_PAGES
:         default 14 if ARM64_16K_PAGES
:         default 12

On the downside - each time we need PAGE_SIZE, we're doing libc
function call on aarch64/ppc64.

Fixes: checkpoint-restore#415

Save a couple of cycles by having __page_size && __page_shift cached
as suggested-by Mike.

Tested-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
criupatchwork pushed a commit to criupatchwork/criu that referenced this issue Mar 28, 2018
On ppc64/aarch64 Linux can be set to use Large pages, so the PAGE_SIZE
isn't build-time constant anymore. Define it through _SC_PAGESIZE.

There are different sizes for a page on ppc64:
: #if defined(CONFIG_PPC_256K_PAGES)
: #define PAGE_SHIFT              18
: #elif defined(CONFIG_PPC_64K_PAGES)
: #define PAGE_SHIFT              16
: #elif defined(CONFIG_PPC_16K_PAGES)
: #define PAGE_SHIFT              14
: #else
: #define PAGE_SHIFT              12
: #endif

And on aarch64 there are default sizes and possibly someone can set his
own PAGE_SHIFT:
: config ARM64_PAGE_SHIFT
:         int
:         default 16 if ARM64_64K_PAGES
:         default 14 if ARM64_16K_PAGES
:         default 12

On the downside - each time we need PAGE_SIZE, we're doing libc
function call on aarch64/ppc64.

Fixes: checkpoint-restore#415

Save a couple of cycles by having __page_size && __page_shift cached
as suggested-by Mike.

Signed-off-by: Dmitry Safonov <dima@arista.com>
rppt pushed a commit to rppt/criu that referenced this issue Mar 29, 2018
On ppc64/aarch64 Linux can be set to use Large pages, so the PAGE_SIZE
isn't build-time constant anymore. Define it through _SC_PAGESIZE.

There are different sizes for a page on ppc64:
: #if defined(CONFIG_PPC_256K_PAGES)
: #define PAGE_SHIFT              18
: #elif defined(CONFIG_PPC_64K_PAGES)
: #define PAGE_SHIFT              16
: #elif defined(CONFIG_PPC_16K_PAGES)
: #define PAGE_SHIFT              14
: #else
: #define PAGE_SHIFT              12
: #endif

And on aarch64 there are default sizes and possibly someone can set his
own PAGE_SHIFT:
: config ARM64_PAGE_SHIFT
:         int
:         default 16 if ARM64_64K_PAGES
:         default 14 if ARM64_16K_PAGES
:         default 12

On the downside - each time we need PAGE_SIZE, we're doing libc
function call on aarch64/ppc64.

Fixes: checkpoint-restore#415

Save a couple of cycles by having __page_size && __page_shift cached
as suggested-by Mike.

Signed-off-by: Dmitry Safonov <dima@arista.com>
@0x7f454c46
Copy link
Member

I'll fix aarch64 issue and resend patches this week (possibly on weekend, but I hope sooner). Ugh, sorry for the delay - had some other work to do :(

@0x7f454c46
Copy link
Member

Ok, I had a look what happens there.
As I did add PAGE_SIZE as a global in PIE - gcc was smart enough to add it to got table..
And we don't do relocations on arm/aarch64.
And what's more annoying - we don't warn that they do exists if we don't expect them. So, instead of build-failure, it becomes a run-time issue.
Anyway, just to let you know, will work to fix that hopefully, soon. Sorry about the delay, @adrianreber.

@0x7f454c46
Copy link
Member

I think I will submit patches v1-alike, not as beautiful as v2, if fixing aarch64 issues will take much time.
And will do relocs on the top not block you @adrianreber.
But let's see, maybe I'll fix v2 tomorrow properly.

0x7f454c46 added a commit to 0x7f454c46/criu that referenced this issue Apr 6, 2018
On ppc64/aarch64 Linux can be set to use Large pages, so the PAGE_SIZE
isn't build-time constant anymore. Define it through _SC_PAGESIZE.

There are different sizes for a page on ppc64:
: #if defined(CONFIG_PPC_256K_PAGES)
: #define PAGE_SHIFT              18
: #elif defined(CONFIG_PPC_64K_PAGES)
: #define PAGE_SHIFT              16
: #elif defined(CONFIG_PPC_16K_PAGES)
: #define PAGE_SHIFT              14
: #else
: #define PAGE_SHIFT              12
: #endif

And on aarch64 there are default sizes and possibly someone can set his
own PAGE_SHIFT:
: config ARM64_PAGE_SHIFT
:         int
:         default 16 if ARM64_64K_PAGES
:         default 14 if ARM64_16K_PAGES
:         default 12

On the downside - each time we need PAGE_SIZE, we're doing libc
function call on aarch64/ppc64.

Fixes: checkpoint-restore#415

Tested-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
criupatchwork pushed a commit to criupatchwork/criu that referenced this issue Apr 7, 2018
On ppc64/aarch64 Linux can be set to use Large pages, so the PAGE_SIZE
isn't build-time constant anymore. Define it through _SC_PAGESIZE.

There are different sizes for a page on ppc64:
: #if defined(CONFIG_PPC_256K_PAGES)
: #define PAGE_SHIFT              18
: #elif defined(CONFIG_PPC_64K_PAGES)
: #define PAGE_SHIFT              16
: #elif defined(CONFIG_PPC_16K_PAGES)
: #define PAGE_SHIFT              14
: #else
: #define PAGE_SHIFT              12
: #endif

And on aarch64 there are default sizes and possibly someone can set his
own PAGE_SHIFT:
: config ARM64_PAGE_SHIFT
:         int
:         default 16 if ARM64_64K_PAGES
:         default 14 if ARM64_16K_PAGES
:         default 12

On the downside - each time we need PAGE_SIZE, we're doing libc
function call on aarch64/ppc64.

Fixes: checkpoint-restore#415

Tested-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
0x7f454c46 added a commit to 0x7f454c46/criu that referenced this issue May 10, 2018
On ppc64/aarch64 Linux can be set to use Large pages, so the PAGE_SIZE
isn't build-time constant anymore. Define it through _SC_PAGESIZE.

There are different sizes for a page on ppc64:
: #if defined(CONFIG_PPC_256K_PAGES)
: #define PAGE_SHIFT              18
: #elif defined(CONFIG_PPC_64K_PAGES)
: #define PAGE_SHIFT              16
: #elif defined(CONFIG_PPC_16K_PAGES)
: #define PAGE_SHIFT              14
: #else
: #define PAGE_SHIFT              12
: #endif

And on aarch64 there are default sizes and possibly someone can set his
own PAGE_SHIFT:
: config ARM64_PAGE_SHIFT
:         int
:         default 16 if ARM64_64K_PAGES
:         default 14 if ARM64_16K_PAGES
:         default 12

On the downside - each time we need PAGE_SIZE, we're doing libc
function call on aarch64/ppc64.

Fixes: checkpoint-restore#415

Tested-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
criupatchwork pushed a commit to criupatchwork/criu that referenced this issue May 10, 2018
On ppc64/aarch64 Linux can be set to use Large pages, so the PAGE_SIZE
isn't build-time constant anymore. Define it through _SC_PAGESIZE.

There are different sizes for a page on ppc64:
: #if defined(CONFIG_PPC_256K_PAGES)
: #define PAGE_SHIFT              18
: #elif defined(CONFIG_PPC_64K_PAGES)
: #define PAGE_SHIFT              16
: #elif defined(CONFIG_PPC_16K_PAGES)
: #define PAGE_SHIFT              14
: #else
: #define PAGE_SHIFT              12
: #endif

And on aarch64 there are default sizes and possibly someone can set his
own PAGE_SHIFT:
: config ARM64_PAGE_SHIFT
:         int
:         default 16 if ARM64_64K_PAGES
:         default 14 if ARM64_16K_PAGES
:         default 12

On the downside - each time we need PAGE_SIZE, we're doing libc
function call on aarch64/ppc64.

Fixes: checkpoint-restore#415

Tested-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
avagin pushed a commit that referenced this issue May 15, 2018
On ppc64/aarch64 Linux can be set to use Large pages, so the PAGE_SIZE
isn't build-time constant anymore. Define it through _SC_PAGESIZE.

There are different sizes for a page on ppc64:
: #if defined(CONFIG_PPC_256K_PAGES)
: #define PAGE_SHIFT              18
: #elif defined(CONFIG_PPC_64K_PAGES)
: #define PAGE_SHIFT              16
: #elif defined(CONFIG_PPC_16K_PAGES)
: #define PAGE_SHIFT              14
: #else
: #define PAGE_SHIFT              12
: #endif

And on aarch64 there are default sizes and possibly someone can set his
own PAGE_SHIFT:
: config ARM64_PAGE_SHIFT
:         int
:         default 16 if ARM64_64K_PAGES
:         default 14 if ARM64_16K_PAGES
:         default 12

On the downside - each time we need PAGE_SIZE, we're doing libc
function call on aarch64/ppc64.

Fixes: #415

Tested-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
avagin pushed a commit to avagin/criu that referenced this issue May 24, 2018
On ppc64/aarch64 Linux can be set to use Large pages, so the PAGE_SIZE
isn't build-time constant anymore. Define it through _SC_PAGESIZE.

There are different sizes for a page on ppc64:
: #if defined(CONFIG_PPC_256K_PAGES)
: #define PAGE_SHIFT              18
: #elif defined(CONFIG_PPC_64K_PAGES)
: #define PAGE_SHIFT              16
: #elif defined(CONFIG_PPC_16K_PAGES)
: #define PAGE_SHIFT              14
: #else
: #define PAGE_SHIFT              12
: #endif

And on aarch64 there are default sizes and possibly someone can set his
own PAGE_SHIFT:
: config ARM64_PAGE_SHIFT
:         int
:         default 16 if ARM64_64K_PAGES
:         default 14 if ARM64_16K_PAGES
:         default 12

On the downside - each time we need PAGE_SIZE, we're doing libc
function call on aarch64/ppc64.

Fixes: checkpoint-restore#415

Tested-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
@adrianreber
Copy link
Member Author

Closing as it is fixed in criu-dev

avagin pushed a commit to avagin/criu that referenced this issue Jul 4, 2018
On ppc64/aarch64 Linux can be set to use Large pages, so the PAGE_SIZE
isn't build-time constant anymore. Define it through _SC_PAGESIZE.

There are different sizes for a page on ppc64:
: #if defined(CONFIG_PPC_256K_PAGES)
: #define PAGE_SHIFT              18
: #elif defined(CONFIG_PPC_64K_PAGES)
: #define PAGE_SHIFT              16
: #elif defined(CONFIG_PPC_16K_PAGES)
: #define PAGE_SHIFT              14
: #else
: #define PAGE_SHIFT              12
: #endif

And on aarch64 there are default sizes and possibly someone can set his
own PAGE_SHIFT:
: config ARM64_PAGE_SHIFT
:         int
:         default 16 if ARM64_64K_PAGES
:         default 14 if ARM64_16K_PAGES
:         default 12

On the downside - each time we need PAGE_SIZE, we're doing libc
function call on aarch64/ppc64.

Fixes: checkpoint-restore#415

Tested-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
avagin pushed a commit to avagin/criu that referenced this issue Jul 6, 2018
On ppc64/aarch64 Linux can be set to use Large pages, so the PAGE_SIZE
isn't build-time constant anymore. Define it through _SC_PAGESIZE.

There are different sizes for a page on ppc64:
: #if defined(CONFIG_PPC_256K_PAGES)
: #define PAGE_SHIFT              18
: #elif defined(CONFIG_PPC_64K_PAGES)
: #define PAGE_SHIFT              16
: #elif defined(CONFIG_PPC_16K_PAGES)
: #define PAGE_SHIFT              14
: #else
: #define PAGE_SHIFT              12
: #endif

And on aarch64 there are default sizes and possibly someone can set his
own PAGE_SHIFT:
: config ARM64_PAGE_SHIFT
:         int
:         default 16 if ARM64_64K_PAGES
:         default 14 if ARM64_16K_PAGES
:         default 12

On the downside - each time we need PAGE_SIZE, we're doing libc
function call on aarch64/ppc64.

Fixes: checkpoint-restore#415

Tested-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
avagin pushed a commit to avagin/criu that referenced this issue Jul 6, 2018
On ppc64/aarch64 Linux can be set to use Large pages, so the PAGE_SIZE
isn't build-time constant anymore. Define it through _SC_PAGESIZE.

There are different sizes for a page on ppc64:
: #if defined(CONFIG_PPC_256K_PAGES)
: #define PAGE_SHIFT              18
: #elif defined(CONFIG_PPC_64K_PAGES)
: #define PAGE_SHIFT              16
: #elif defined(CONFIG_PPC_16K_PAGES)
: #define PAGE_SHIFT              14
: #else
: #define PAGE_SHIFT              12
: #endif

And on aarch64 there are default sizes and possibly someone can set his
own PAGE_SHIFT:
: config ARM64_PAGE_SHIFT
:         int
:         default 16 if ARM64_64K_PAGES
:         default 14 if ARM64_16K_PAGES
:         default 12

On the downside - each time we need PAGE_SIZE, we're doing libc
function call on aarch64/ppc64.

Fixes: checkpoint-restore#415

Tested-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
avagin pushed a commit to avagin/criu that referenced this issue Jul 6, 2018
On ppc64/aarch64 Linux can be set to use Large pages, so the PAGE_SIZE
isn't build-time constant anymore. Define it through _SC_PAGESIZE.

There are different sizes for a page on ppc64:
: #if defined(CONFIG_PPC_256K_PAGES)
: #define PAGE_SHIFT              18
: #elif defined(CONFIG_PPC_64K_PAGES)
: #define PAGE_SHIFT              16
: #elif defined(CONFIG_PPC_16K_PAGES)
: #define PAGE_SHIFT              14
: #else
: #define PAGE_SHIFT              12
: #endif

And on aarch64 there are default sizes and possibly someone can set his
own PAGE_SHIFT:
: config ARM64_PAGE_SHIFT
:         int
:         default 16 if ARM64_64K_PAGES
:         default 14 if ARM64_16K_PAGES
:         default 12

On the downside - each time we need PAGE_SIZE, we're doing libc
function call on aarch64/ppc64.

Fixes: checkpoint-restore#415

Tested-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants