-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Asynchronous image decompression on restore #1
Conversation
size_t insize = mapped_size - (compbuf - mapped_addr); | ||
if (!insize) { | ||
/* reached end of file or stream, no data left to decompress */ | ||
decompressionResult = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't return value of these functions now pointless? Only decompressionResult
matters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but I'd like to make functions consistent returning a value, rather than checking a global variable every time. This may help for further refactoring. Currently the implementation checks the global variable once, in the place it really needs.
lz4err = LZ4F_freeDecompressionContext(dctx); | ||
if (LZ4F_isError(lz4err)) { | ||
pr_err("LZ4 free context error: %s\n", LZ4F_getErrorName(lz4err)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} | |
decompressionResult = -EINVAL; | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about it. Failing to free decompression context, as well as mapped memory below, doesn't mean decompression fails. A decompressed image can still be open and read when restoring.
out_offset += bytes_written; | ||
if (outsize < (size_t)bytes_written) { | ||
pr_err("written more bytes in total than needed, expected=%lu, written=%lu\n", | ||
(unsigned long)outsize, (unsigned long)bytes_written); | ||
break; | ||
} | ||
outsize -= bytes_written; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A nit: I'd prefer these two variables updates to be closer
out_offset += bytes_written;
outsize -= bytes_written;
I cannot image write
to write more bytes than was requested, the outsize < (size_t)bytes_written
test can be assert
ed.
There is a problem when you store an uncompressed image into a image dir which already contains compressed image, the restore will prefer the old compressed pages image (which will be out of sync with other files, thus the complete image dir is invalid). Not sure if to handle in CRaC JDK or in CRIU, e.g. by removing pages-1.img{,.comp} from the image directroy on every checkpoint. In the rest, looks good! With the new compression, we'll be able to remove external compression from the example-lambda (WIP: AntonKozlov/example-lambda@5f17101), and even have better restore results!
|
Hearing no objections, I'm integrating this. |
coverity CID 389202: 54int ext_mount_add(char *key, char *val) 55{ 56 char *e_str; 57 1. alloc_fn: Storage is returned from allocation function malloc. 2. var_assign: Assigning: ___p = storage returned from malloc(strlen(key) + strlen(val) + 8UL). 3. Condition !___p, taking false branch. 4. leaked_storage: Variable ___p going out of scope leaks the storage it points to. 5. var_assign: Assigning: e_str = ({...; ___p;}). 58 e_str = xmalloc(strlen(key) + strlen(val) + 8); 6. Condition !e_str, taking false branch. 59 if (!e_str) 60 return -1; ... 7. noescape: Resource e_str is not freed or pointed-to in sprintf. 73 sprintf(e_str, "mnt[%s]:%s", key, val); 8. noescape: Resource e_str is not freed or pointed-to in add_external. [show details] CID 389202 (CRaC#1 of 1): Resource leak (RESOURCE_LEAK)9. leaked_storage: Variable e_str going out of scope leaks the storage it points to. 74 return add_external(e_str); 75} We need to free e_str after add_external used it. v2: use cleanup_free attribute (@adrianreber) Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
coverity CID 389194: 1238static int dump_one_task(struct pstree_item *item, InventoryEntry *parent_ie) 1239{ ... 1245 struct cr_imgset *cr_imgset = NULL; ... 11. alloc_fn: Storage is returned from allocation function cr_task_imgset_open. [show details] 12. var_assign: Assigning: cr_imgset = storage returned from cr_task_imgset_open(vpid(item), 577). 1355 cr_imgset = cr_task_imgset_open(vpid(item), O_DUMP); 13. Condition !cr_imgset, taking false branch. 1356 if (!cr_imgset) 1357 goto err_cure; 1358 ... 25. Condition opts.lazy_pages, taking false branch. 1427 if (opts.lazy_pages) 1428 ret = compel_cure_remote(parasite_ctl); 1429 else 1430 ret = compel_cure(parasite_ctl); 26. Condition ret, taking true branch. 1431 if (ret) { 1432 pr_err("Can't cure (pid: %d) from parasite\n", pid); 27. Jumping to label err. 1433 goto err; 1434 } ... 1448 close_cr_imgset(&cr_imgset); 1449 exit_code = 0; 1450err: 1451 close_pid_proc(); 1452 free_mappings(&vmas); 1453 xfree(dfds); CID 389194 (CRaC#1 of 1): Resource leak (RESOURCE_LEAK)28. leaked_storage: Variable cr_imgset going out of scope leaks the storage it points to. 1454 return exit_code; 1455 1456err_cure: 1457 close_cr_imgset(&cr_imgset); 1458err_cure_imgset: 1459 ret = compel_cure(parasite_ctl); 1460 if (ret) 1461 pr_err("Can't cure (pid: %d) from parasite\n", pid); 1462 goto err; 1463} On compel_cure() error path we do not do close_cr_imgset() thich leads to leaked cr_imgset, let's move corresponding close_cr_imgset below err label. Also now we can merge remove close_cr_imgset() in err_cure label as it goes to err label later anyway. Separate err_cure_imgset label is not needed as close_cr_imgset() is ready for cr_imgset == NULL. v2: remove excess close_cr_imgset() in label err_cure (@adrianreber) Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
coverity CID 389205: 452int dump_tun_link(NetDeviceEntry *nde, struct cr_imgset *fds, struct nlattr **info) 453{ ... 458 struct tun_link *tl; ... 2. alloc_fn: Storage is returned from allocation function get_tun_link_fd. [show details] 3. var_assign: Assigning: tl = storage returned from get_tun_link_fd(nde->name, nde->peer_nsid, tle.flags). 475 tl = get_tun_link_fd(nde->name, nde->peer_nsid, tle.flags); 4. Condition !tl, taking false branch. 476 if (!tl) 477 return ret; 478 479 tle.vnethdr = tl->dmp.vnethdr; 480 tle.sndbuf = tl->dmp.sndbuf; 481 482 nde->tun = &tle; CID 389205 (CRaC#1 of 1): Resource leak (RESOURCE_LEAK)5. leaked_storage: Variable tl going out of scope leaks the storage it points to. 483 return write_netdev_img(nde, fds, info); 484} Function get_tun_link_fd() can both return tun_link entry from tun_links list and a newly allocated one. So we should not free entry if it is from list and should free it when it is a new one to fix leak. Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
coverity CID 389193: CID 389193 (CRaC#1 of 1): Printf format string issue (PW.BAD_PRINTF_FORMAT_STRING) 1. bad_printf_format_string: invalid format string conversion 598 pr_warn("Can't stat socket %#x(%s), skipping: %m (err %d)\n", id, rpath, errno); Specifier "%#x" is wrong for id as it is of type uint32_t, let's change it to "%#" PRIx32 "" to fix the problem. Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
coverity CID 389192: 550static int parse_join_ns(const char *ptr) 551{ ... 553 char *ns; 554 1. alloc_fn: Storage is returned from allocation function strdup. 2. var_assign: Assigning: ___p = storage returned from strdup(ptr). 3. Condition !___p, taking false branch. 4. leaked_storage: Variable ___p going out of scope leaks the storage it points to. 5. var_assign: Assigning: ns = ({...; ___p;}). 555 ns = xstrdup(ptr); 6. Condition ns == NULL, taking false branch. 556 if (ns == NULL) 557 return -1; 558 7. noescape: Resource ns is not freed or pointed-to in strchr. 559 aux = strchr(ns, ':'); 8. Condition aux == NULL, taking true branch. 560 if (aux == NULL) CID 389192 (CRaC#1 of 1): Resource leak (RESOURCE_LEAK)9. leaked_storage: Variable ns going out of scope leaks the storage it points to. 561 return -1; We should free ns string after we finish it's use in parse_join_ns, easiest way to do it is to use cleanup_free attribute. Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
coverity CID 389187: 3193int veth_pair_add(char *in, char *out) 3194{ 3195 char *e_str; 3196 1. alloc_fn: Storage is returned from allocation function malloc. 2. var_assign: Assigning: ___p = storage returned from malloc(200UL). 3. Condition !___p, taking false branch. 4. leaked_storage: Variable ___p going out of scope leaks the storage it points to. 5. var_assign: Assigning: e_str = ({...; ___p;}). 3197 e_str = xmalloc(200); /* For 3 IFNAMSIZ + 8 service characters */ 6. Condition !e_str, taking false branch. 3198 if (!e_str) 3199 return -1; 7. noescape: Resource e_str is not freed or pointed-to in snprintf. 3200 snprintf(e_str, 200, "veth[%s]:%s", in, out); 8. noescape: Resource e_str is not freed or pointed-to in add_external. [show details] CID 389187 (CRaC#1 of 1): Resource leak (RESOURCE_LEAK)9. leaked_storage: Variable e_str going out of scope leaks the storage it points to. 3201 return add_external(e_str); 3202} We should free e_str string after we finish it's use in veth_pair_add, easiest way to do it is to use cleanup_free attribute. Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
coverity CID 389190: 1538int inherit_fd_add(int fd, char *key) 1539{ 1540 struct inherit_fd *inh; ... 2. alloc_fn: Storage is returned from allocation function malloc. 3. var_assign: Assigning: ___p = storage returned from malloc(32UL). 4. Condition !___p, taking false branch. 5. leaked_storage: Variable ___p going out of scope leaks the storage it points to. 6. var_assign: Assigning: inh = ({...; ___p;}). 1548 inh = xmalloc(sizeof *inh); 7. Condition inh == NULL, taking false branch. 1549 if (inh == NULL) 1550 return -1; 1551 ... 9. Condition !___p, taking true branch. 1555 inh->inh_id = xstrdup(key); 10. Condition inh->inh_id == NULL, taking true branch. 1556 if (inh->inh_id == NULL) CID 389190 (CRaC#1 of 1): Resource leak (RESOURCE_LEAK)11. leaked_storage: Variable inh going out of scope leaks the storage it points to. 1557 return -1; We should free inh on inh_id allocation error path in inherit_fd_add. Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
coverity CID 389191: int unix_sk_id_add(unsigned int ino) 2327{ 2328 char *e_str; 2329 1. alloc_fn: Storage is returned from allocation function malloc. 2. var_assign: Assigning: ___p = storage returned from malloc(20UL). 3. Condition !___p, taking false branch. 4. leaked_storage: Variable ___p going out of scope leaks the storage it points to. 5. var_assign: Assigning: e_str = ({...; ___p;}). 2330 e_str = xmalloc(20); 6. Condition !e_str, taking false branch. 2331 if (!e_str) 2332 return -1; 7. noescape: Resource e_str is not freed or pointed-to in snprintf. 2333 snprintf(e_str, 20, "unix[%u]", ino); 8. noescape: Resource e_str is not freed or pointed-to in add_external. [show details] CID 389191 (CRaC#1 of 1): Resource leak (RESOURCE_LEAK)9. leaked_storage: Variable e_str going out of scope leaks the storage it points to. 2334 return add_external(e_str); 2335} We should free e_str string after we finish it's use in unix_sk_id_add, easiest way to do it is to use cleanup_free attribute. Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
coverity CID 389197: CID 389197 (CRaC#1 of 1): Invalid printf format string (PRINTF_ARGS) format_error: Length modifier L not applicable to conversion specifier in %Lu. [show details] 284 pr_err("Incompatible uffd API: expected %Lu, got %Lu\n", UFFD_API, uffdio_api.api); Looking on C11 standard it seems that "%Lu" is undefined, we better not use this, see: "L Specifies that a following a, A, e, E, f, F, g, or G conversion specifier applies to a long double argument." http://port70.net/~nsz/c/c11/n1570.html#7.21.6.1p7 Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
CID 302713 (#1 of 1): Missing varargs init or cleanup (VARARGS) va_end was not called for argptr. Signed-off-by: Adrian Reber <areber@redhat.com>
These changes implement asynchronous decompression of an image (pages-1.img) on restore stage.
CRIU compression can be enabled by specifying '--compress' command line option. Decompression is enabled automatically if a compressed image exists.
LZ4 library is used as compression algorithm implementation.
On checkpoint:
After a process dump is completed, CRIU synchronously compresses 'pages-1.img' into 'pages-1.comp.img' at the same folder as the original file located, using LZ4 level 8. Then CRIU deletes the original img file.
On restore:
On start, CRIU creates the decompression thread, and decompresses 'pages-1.comp.img' asynchronously. The current implementation decompresses an image into '/tmp' folder.
When the 'restore' procedure tries to obtain a file descriptor of the decompressed file, it waits for decompression is finished if needed.
Then it restores the process as usual.