Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lrunzip multi-thread failures #122

Closed
GitBillW opened this issue Jul 25, 2019 · 9 comments
Closed

lrunzip multi-thread failures #122

GitBillW opened this issue Jul 25, 2019 · 9 comments

Comments

@GitBillW
Copy link

The debian-10 build of lrunzip-0.631 (built 2018/05/29) often fails when running in multi-threaded mode. This looks like a regression introduced some time after debian-9 (lrzip built 2016/11/14). Most of these problems don't happen with my own lrunzip (built from a clone of https://github.com/ckolivas/lrzip.git), so I guess I need to report this as a debian-10 problem.

However, I have one simple and repeatable test case that is still broken in the latest lrzip code (as of today, 2019/07/25). With an input file consisting of 4096 binary zeroes, the output of lrzip looks like this:

00000000: 4c52 5a49 0006 0010 0000 0000 0000 0000 LRZI............
00000010: 5d00 0000 0101 0000 0201 0010 0300 0000 ]...............
00000020: 002f 0003 0000 0000 0e00 061a 0000 1040 ./.............@
00000030: 0000 006f fdff ffa3 b7ff 473e 4815 7239 ...o......G>H.r9
00000040: 6151 b892 28e6 a384 9e29 4603 0a00 0a00 aQ..(....)F.....
00000050: 0000 0000 1000 0000 0000 0000 0300 0000 ................
00000060: 0000 0062 0f0b 67a9 1f7f 7415 1bc5 be74 ...b..g...t....t
00000070: 5b71 10 [q.

If I try to lrunzip that data with multiple threads I get no output, and this -vv trace:

Decompressing...
Reading chunk_bytes at 24
Expected size: 4096
Chunk byte width: 2
Reading eof flag at 25
EOF: 1
Reading expected chunksize at 26
Chunk size: 4096
Reading stream 0 header at 29
Reading stream 1 header at 36
Reading ucomp header at 76
Fill_buffer stream 0 c_len 10 u_len 10 last_head 0
Starting thread 0 to decompress 10 bytes from stream 0
Thread 0 decompressed 10 bytes from stream 0
Taking decompressed data from thread 0
Reading ucomp header at 43
Fill_buffer stream 1 c_len 26 u_len 4096 last_head 64
Starting thread 1 to decompress 26 bytes from stream 1
Reading ucomp header at 93
---> Invalid data compressed len 0 uncompressed 0 last_head 0 <--- (fatal_return in stream.c)

Adding a -p1 option makes lrunzip work correctly. Strangely, if I comment out the test in stream.c (around line 1636) that is responsible for the fatal_return, then lrunzip works correctly even in multi-thread mode ... but obviously that isn't a proper fix.

I get similar failures with any 4096 byte file, BTW - I've just shown a zero-filled file because the lrzip output is obviously very small. Still fails in the same way if I populate the input file from /dev/urandom.

@pete4abw
Copy link
Contributor

pete4abw commented Oct 9, 2019

While this (or any) error should not occur, I seem to recall something about minimum dictionary sizes in lzma of 4096 bytes. Anything decompressed size under that may cause problems. Perhaps a proper fix would be to stop the thread from being decompressed based on data size. Or, if compressed size is smaller than X or the uncompressed size is smaller than Y force single thread. Lots of testing for threshold amounts will be needed although quite honestly, small files have no business being compressed with lzma or lrzip.

@GitBillW
Copy link
Author

Needless to say I don't make a practice of running lrzip on 4KB files (let alone ones with zero content)! This was just the simplest repeatable test case I discovered when investigating lrunzip failures on the debian-10 build of the lrzip package.

What I was really bothered about were failures seen when trying to lrunzip some of my backups - pretty scary, until I found that it worked OK if forced into single-thread mode. More recent lrzip builds (later than the debian-10 version) also work OK, even in multi-threaded mode.

I'm not familiar with the lrzip code, so I don't know if my 4KB file failure is a unique pathological case, or an instance of a more general problem. That's why I thought it worth posting the bug report here.

@pete4abw
Copy link
Contributor

pete4abw commented Oct 11, 2019

This is a post decompression failure when the decompressed data is expanded into an unrzipped stream. There is a case when the read_len is > 0 but all the uncompressed variables show 0 which makes it look like an error when it is not. I included a test for read_len in the line you identified (thank you!). I tested small files and big ones using different compressions. But it needs more testing. Here the patch and a patch file. Keep us posted.

--- /home/peter/projects/git/lrzip/stream.c     2019-08-26 15:39:08.344885614 -0500
+++ ./stream.c  2019-10-11 07:16:53.753965748 -0500
@@ -1565,6 +1565,7 @@
 static int fill_buffer(rzip_control *control, struct stream_info *sinfo, struct stream *s, int streamno)
 {
        i64 u_len, c_len, last_head, padded_len, header_length, max_len;
+       int read_len;
        uchar enc_head[25 + SALT_LEN], blocksalt[SALT_LEN];
        stream_thread_struct *st;
        uchar c_type, *s_buf;
@@ -1604,7 +1605,6 @@
                last_head = last_head32;
                header_length = 13;
        } else {
-               int read_len;

                print_maxverbose("Reading ucomp header at %lld\n", get_readseek(control, sinfo->fd));
                if ((control->major_version == 0 && control->minor_version < 6) || ENCRYPT)
@@ -1633,7 +1633,8 @@
        last_head = le64toh(last_head);
        /* Check for invalid data and that the last_head is actually moving
         * forward correctly. */
-       if (unlikely(c_len < 1 || u_len < 1 || last_head < 0 || (last_head && last_head <= s->last_head))) {
+       if (unlikely(read_len < 1 && 
+               (c_len < 1 || u_len < 1 || last_head < 0 || (last_head && last_head <= s->last_head)))) {
                fatal_return(("Invalid data compressed len %lld uncompressed %lld last_head %lld\n",
                             c_len, u_len, last_head), -1);
        }

smallfile.patch.gz

@GitBillW
Copy link
Author

I tried your patch and it appeared to be working OK ... however, I have found that if I run lrunzip multiple times, with exactly the same input, then it sometimes seg faults. Unfortunately I can't be more specific than "sometimes", as I can't see any pattern to the failures. My test data is as described in my first comment above: ie the result of running lrzip on a 4KB file populated from /dev/zero.

Valgrind tells me the lrunzip seg fault is at stream.c:1519, ie this line:

if (uci->c_type != CTYPE_NONE) {

I inserted a debug line to see what's going on:

uci = &ucthread[i];
fprintf(stderr, "i=%d uci=%lu\n", i, (unsigned long)uci); fflush(stderr);

When it works, the debug output looks like this:

i=0 uci=94774476441008
i=1 uci=94774476441056
i=2 uci=94774476441104

and when it seg faults, it's like this:

i=0 uci=94864767026608
i=1 uci=94864767026656
i=2 uci=96

The big numbers differ between runs, but when it seg faults the last uci value is always 96. For what it's worth, even when it seg faults, the uncompressed output is complete and correct, although in the seg faulting case the output file timestamp doesn't get set to match the timestamp of the original input file.

@pete4abw
Copy link
Contributor

pete4abw commented Oct 11, 2019 via email

@jlittlenz
Copy link

I suspect I have hit this crash on Ubuntu 19.10; the comment "This is a post decompression failure when the decompressed data is expanded into an unrzipped stream" seems to apply from this -vv output:

....
Taking decompressed data from thread 4
Reading ucomp header at 86550598                        4:0%
Fill_buffer stream 1 c_len 2579624 u_len 95372668 last_head 103922820
Starting thread 4 to decompress 2579624 bytes from stream 1
Taking decompressed data from thread 5
Reading ucomp header at 103922827                               5:0%
Fill_buffer stream 1 c_len 1068205 u_len 36379022 last_head 0
Starting thread 5 to decompress 1068205 bytes from stream 1
Taking decompressed data from thread 6
Taking decompressed data from thread 7                                  6:0%
Taking decompressed data from thread 8
Taking decompressed data from thread 9
Taking decompressed data from thread 1
Taking decompressed data from thread 2
zsh: segmentation fault (core dumped) 

This happens every time with lrzip -d; lrzip -d -p1 is ok, but very slow.

I built from git here and there's no problem.

The file is about 2 GiB of source code which lrzip compresses to 100 MiB; I'd have thought that was "normal" use of lrzip.

I am confused by the versioning, or lack of such. APT says the version is 0.631+git180528-1 but "180528" isn't in the git log. Maybe the debian packagers have cherry-picked fixes.

I'm really commenting here to let someone know it seems to be fixed, and in the off chance it might help poor sods who can't read their backups.

@pete4abw
Copy link
Contributor

I am confused by the versioning, or lack of such. APT says the version is 0.631+git180528-1 but "180528" isn't in the git log. Maybe the debian packagers have cherry-picked fixes.

John, this debian package was made on 2018-05-28. Really old, and this bug was addressed as you discovered. CF: Debian Changelog.

@jlittlenz
Copy link

John, this debian package was made on 2018-05-28. Really old, and this bug was addressed as you discovered. CF: Debian Changelog.

Thank you. i'd looked at the debian change log, but just found it confusing. I have raised a bug on the Ubuntu package suggesting getting the fix from upstream; I don't know if that's the way things are done.

@pete4abw
Copy link
Contributor

@jlittlenz , I emailed the dev directly, encouraging him to pull a new git. @ckolivas , maybe it's time for a little version bump? Cheers. :-)

@ckolivas ckolivas closed this as completed Feb 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants