-
Notifications
You must be signed in to change notification settings - Fork 583
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Relative timestamps #668
Relative timestamps #668
Conversation
So, here's the enhanced version of the first try. Changes are: 1. The wrapper name is criu-ns instead of crns.py 2. The CLI is absolutely the same as for criu, since the script re-execl-s criu binary. E.g. scripts/criu-ns dump -t 1234 ... just works 3. Caller doesn't need to care about substituting CLI options, instead, the scripts analyzes the command line and a) replaces -t|--tree argument with virtual pid __if__ the target task lives in another pidns b) keeps the current cwd (and root) __if__ switches to another mntns. A limitation applies here -- cwd path should be the same in target ns, no "smart path mapping" is performed. So this script is for now only useful for mntns clones (which is our main goal at the moment). Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com> Looks-good-to: Andrey Vagin <avagin@openvz.org>
This is the case when the in/out files are image cache/proxy sockets. Signed-off-by: Rodrigo Bruno <rbruno@gsd.inesc-id.pt> Signed-off-by: Katerina Koukiou <k.koukiou@gmail.com> Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
This patch introduces the --remote option and the necessary code changes to support it. This leaves user the option to decide if the checkpoint data is to be stored on disk or sent through the network (through the image-proxy). The latter forwards the data to the destination node where image-cache receives it. The overall communication is performed as follows: src_node CRIU dump -> (sends images through UNIX sockets) -> image-proxy | V dst_node: CRIU restore <- (receives images through UNIX sockets)<- image-cache Communication between image-proxy and image-cache is done through a single TCP connection. Running criu with --remote option is like this: dst_node# criu image-cache -d --port <port> -o /tmp/image-cache.log dst_node# criu restore --remote -o /tmp/image-cache.log src_node# criu image-proxy -d --port <port> --address <dst_node> -o /tmp/image-proxy.log src_node# criu dump -t <pid> --remote -o /tmp/dump.log [ xemul: here's the list of what should be done with the cache/proxy in order to have them merged into master. 0. Document the whole thing :) Please, add articles for newly introduced actions and options to https://criu.org/CLI page. Also, it would be good to have an article describing the protocols involved. 1. Make the unix sockets reside in work-dir. The good thing is that we've get rid of the socket name option :) But looking at do_open_remote_image() I see that it fchdir-s to image dir before connecting to proxy/cache. Better solution is to put the socket into workdir. 1a. After this the option -D|--images-dir should become optional. Provided the --remote is given CRIU should work purely on the work-dir and not generate anything in the images-dir. 2. Tune up the image_cache and image_proxy commands to accept the --status-fd and --pidfile options. Presumably the very cr_daemon() call should be equipped with everything that should be done for daemonizing and proxy/cache tasks should just call it :) 3. Fix local connections not to generate per-image threads. There can be many images and it's not nice to stress the system with such amount of threads. Please, look at how criu/uffd.c manages multiple descriptors with page-faults using the epoll stuff. 3a. The accept_remote_image_connections() seem not to work well with opts.ps_socket scenario as the former just calls accept() on whatever socket is passed there, while the opts.ps_socket is already an established socket for data transfer. 4. No strings in protocol. Now the hard-coded "RESTORE_FINISH" string (and DUMP_FINISHED one) is used to terminate the communication. Need to tune up the protobuf objects to send boolean (or integer) EOF sign rather that the string. 5. Check how proxy/cache works with incremental dumps. Looking at the skip_remote_bytes() I think that image-cache and -proxy still do not work well with stacked pages images. Probably for those we'll need the page-server or lazy-pages -like protocol that would request the needed regions and receive it back rather than read bytes from sockets simply to skip those. 6. Add support for cache/proxy into go-phaul code. I haven't yet finished with the prototype, but plan to do it soon, so once the above steps are done we'll be able to proceed with this one. ] Signed-off-by: Rodrigo Bruno <rbruno@gsd.inesc-id.pt> Signed-off-by: Katerina Koukiou <k.koukiou@gmail.com> Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
The current patch brings the implementation of the image proxy and image cache. These components are necessary to perform in-memory live migration of processes using CRIU. The image proxy receives images from CRIU Dump/Pre-Dump (through UNIX sockets) and forwards them to the image cache (through a TCP socket). The image cache caches image in memory and sends them to CRIU Restore (through UNIX sockets) when requested. Signed-off-by: Rodrigo Bruno <rbruno@gsd.inesc-id.pt> Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Rodrigo Bruno <rbruno@gsd.inesc-id.pt> Signed-off-by: Katerina Koukiou <k.koukiou@gmail.com> Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
To suppress protobuf's warning: > [libprotobuf WARNING google/protobuf/compiler/parser.cc:546] > No syntax specified for the proto file: remote-image.proto. > Please use 'syntax = "proto2";' or 'syntax = "proto3";' > to specify a syntax version. (Defaulted to proto2 syntax.) Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Signed-off-by: Rodrigo Bruno <rbruno@gsd.inesc-id.pt> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Signed-off-by: rodrigo-bruno <rbruno@gsd.inesc-id.pt> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Signed-off-by: rodrigo-bruno <rbruno@gsd.inesc-id.pt> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
I see no need to do dynamic init here. Cc: Rodrigo Bruno <rbruno@gsd.inesc-id.pt> Signed-off-by: Kir Kolyshkin <kir@openvz.org> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
OK, so we have pr_perror() for cases where errno is set (and it makes sense to show it), and pr_err() for other errors. A correct function is to be used, depending on the context. 1. pthread_mutex_*() functions don't set errno, therefore pr_perror() should not be used. 2. accept() sets errno => makes sense to use pr_perror(). 3. read_header() arguably sets errno => use pr_err(). 4. open_proc_rw() already prints an error message, there is no need for yet another one. Cc: Rodrigo Bruno <rbruno@gsd.inesc-id.pt> Signed-off-by: Kir Kolyshkin <kir@openvz.org> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
1. Use xmalloc() where possible. 2. There is no need to print an error message, as xmalloc() has already printed it for you. Cc: Rodrigo Bruno <rbruno@gsd.inesc-id.pt> Signed-off-by: Kir Kolyshkin <kir@openvz.org> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
In those error paths where we don't have errno set, don't use pr_perror(), use pr_err() instead. Cc: Rodrigo Bruno <rbruno@gsd.inesc-id.pt> Signed-off-by: Kir Kolyshkin <kir@openvz.org> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
When --remote option is specified, read_local_page tries to pread from a socket, and fails with "Illegal seek" error. Restore single pread call for regular image files case and introduce maybe_read_page_img_cache version of maybe_read_page method. Generally-approved-by: Rodrigo Bruno <rbruno@gsd.inesc-id.pt> Acked-by: Pavel Emelyanov <xemul@virtuozzo.com> Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
There is no real need to have both. Signed-off-by: Omri Kramer <omri.kramer@gmail.com> Singed-off-by: Lior Fisch <fischlior@gmail.com> Reviewed-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
It's simply impossible (yet), so emit a warning. Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
The opts.remote is always false in this code. Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
We have two places to check for parent via page server -- as a part of _OPEN req and explicit req. Make the latter code be in-sync with the opening one. Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Those may not support sendfiles, so use read/write-s instead Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>
Drop the constants for default cache host/port and page size because they are not used anywhere. Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com> Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>
1) fix sfle memory leak on get_fle_for_scm error 2) fix gfd open descriptor leak on get_fle_for_scm error 3-6) fix buf memory leak on read and pwrite errors Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
I made only one commit over the latest master. Don't know why three commits are showing up. |
compel/plugins/std/log.c
Outdated
struct timeval now; | ||
|
||
sys_gettimeofday(&now, NULL); | ||
timediff(&start, &now); | ||
struct timeval *pivot = print_ts_diffs ? &last_t : &start; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have to follow the coding style:
https://www.kernel.org/doc/html/v4.10/process/coding-style.html
git fetch |
What does this "flag TODO" mean? You need to write more details in the commit message. |
Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>
Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>
class ctypes.c_char_p Represents the C char * datatype when it points to a zero- terminated string. For a general character pointer that may also point to binary data, POINTER(c_char) must be used. The constructor accepts an integer address, or a bytes object. https://docs.python.org/3/library/ctypes.html#ctypes.c_char_p Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>
We want to commit --check-mounts feature to vz-criu. But to maintain image level compatibility between ms-criu and vz-criu one shouldn't use the same field id for different data. So add a comment that these id is reserved.
imp.load_source() has been deprecated [1]. The recommended alternative API for loading a module is exec_module() [2]. [1] https://docs.python.org/2.7/library/imp.html#imp.load_module [2] https://docs.python.org/3.4/library/importlib.html#importlib.abc.Loader.exec_module Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>
in Android NDK, <elf.h> doesn't has define for: NT_X86_XSTATE NT_PRSTATUS so add these defines to pass compile. NOTE: add <linux/elf.h> will have more build errors Cc: Chen Hu <hu1.chen@intel.com> Signed-off-by: Zhang Ning <ning.a.zhang@intel.com> Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com> Signed-off-by: Andrei Vagin <avagin@gmail.com>
with Android P's Clang versoin: 6.0.2, and Android NDK's Clang version 8.0.2 Clang will report below error: criu/compel/include/uapi/compel/sigframe-common.h:55:34: error: expected member name or ';' after declaration specifiers int __unused[32 - (sizeof (k_rtsigset_t) / sizeof (int))]; ~~~ ^ it takes __unused as an attribute, not a varible, chang to _unused, pass compile. Cc: Chen Hu <hu1.chen@intel.com> Signed-off-by: Zhang Ning <ning.a.zhang@intel.com> Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com> Signed-off-by: Andrei Vagin <avagin@gmail.com>
it reports: criu/pie/util-vdso-elf32.c:255:8: error: implicit declaration of function 'ELF32_ST_TYPE' is invalid in C99 [-Werror,-Wimplicit-function-declaration] if (ELF_ST_TYPE(sym->st_info) != STT_FUNC && ^ criu/include/util-vdso.h:72:21: note: expanded from macro 'ELF_ST_TYPE' ^ /opt/android-ndk/toolchains/llvm/prebuilt/linux-x86_64//sysroot/usr/include/linux/elf.h:114:26: note: expanded from macro 'ELF32_ST_TYPE' ^ criu/include/util-vdso.h:72:21: note: expanded from macro 'ELF_ST_TYPE' add #ifndef to check whether these macro is already defined. Cc: Chen Hu <hu1.chen@intel.com> Signed-off-by: Zhang Ning <ning.a.zhang@intel.com> Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com> Signed-off-by: Andrei Vagin <avagin@gmail.com>
criu/log.c:356:16: error: called object type 'int' is not a function or function pointer int __errno = errno; ^~~~~ /root/android-ndk/toolchains/llvm/prebuilt/linux-x86_64//sysroot/usr/include/errno.h:43:24: note: expanded from macro 'errno' ~~~~~~~^ criu/log.c:391:2: error: called object type 'int' is not a function or function pointer errno = __errno; ^~~~~ /root/android-ndk/toolchains/llvm/prebuilt/linux-x86_64//sysroot/usr/include/errno.h:43:24: note: expanded from macro 'errno' in Android NDK's errno.h: 42: int* __errno(void) __attribute_const__; 43: #define errno (*__errno()) so rename __errno to _errno to pass build Cc: Chen Hu <hu1.chen@intel.com> Signed-off-by: Zhang Ning <ning.a.zhang@intel.com> Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com> Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com> Signed-off-by: Andrei Vagin <avagin@gmail.com>
due to Android NDK's clang is x86_64-linux-android28-clang --sysroot ${SYSROOT_PATH} and it's ld is x86_64-linux-android-ld, it's not able to use a single pattern to discript clang and ld. and there is a error for x86_64-linux-android-ld. x86_64-linux-android-ld -L/home/ning/source/criu/protobuf-c/../target/lib -lprotobuf-c -r -z noexecstack -T ./compel/arch/x86/scripts/compel-pack.lds.S -o criu/pie/parasite.built-in.o criu/pie/parasite.o criu/pie/pie.lib.a ./compel/plugins/std.lib.a ./compel/compel-host hgen -f criu/pie/parasite.built-in.o -o criu/pie/parasite-blob.h Error (compel/src/lib/handle-elf-host.c:335): Unexpected undefined symbol: `'. External symbol in PIE? criu/pie/Makefile:49: recipe for target 'criu/pie/parasite-blob.h' failed rebuild with host ld, can pass build. so support override CC/LD from command line can pass build. Cc: Chen Hu <hu1.chen@intel.com> Signed-off-by: Zhang Ning <ning.a.zhang@intel.com> Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com> Signed-off-by: Andrei Vagin <avagin@gmail.com>
@avagin I've updated the pull request. |
@@ -13,6 +13,8 @@ struct simple_buf { | |||
void (*flush)(struct simple_buf *b); | |||
}; | |||
|
|||
int print_ts_diffs = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pls write a comment before this variable. It is used only in this file, so you need to mark it as static.
compel/plugins/std/log.c
Outdated
struct timeval now; | ||
|
||
sys_gettimeofday(&now, NULL); | ||
timediff(&start, &now); | ||
struct timeval *pivot = print_ts_diffs ? &last_t : &start; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all variables have to be declared at the beginning of a block where they are used.
compel/plugins/std/log.c
Outdated
now.tv_sec = 0; | ||
now.tv_usec = 0; | ||
} | ||
if (print_ts_diffs) last_t = curr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (print_ts_diff)
last_t = curr;
compel/plugins/std/log.c
Outdated
@@ -41,6 +43,7 @@ static inline void pad_num(char **s, int *n, int nr) | |||
|
|||
static void sbuf_log_init(struct simple_buf *b) | |||
{ | |||
static struct timeval last_t; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need another variable? Cat we use start instead of last_t?
@@ -139,6 +139,7 @@ struct cr_options { | |||
pid_t tree_id; | |||
int log_level; | |||
char *imgs_dir; | |||
int relative_timestamps; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you need to use tabs instead of spaces for alignments
A friendly reminder that this PR had no activity for 30 days. |
Adds relative timestamps as requested in issue #341. The decision to print relative timestamps is handed over to flags as of now. For
criu
command line tool, the flag could be tied to an argument. How shouldcompel
set that flag?I'll make the changes once these questions are resolved.