Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chained encoder does not produce the same result as the lz4 cli when chaining is enabled #73

Open
rmja opened this issue Aug 31, 2022 · 4 comments
Labels
notbug Not a bug

Comments

@rmja
Copy link

rmja commented Aug 31, 2022

Description
During a test of the lz4 encoder I have seen differences in the encoded output compared to the lz4 cli when block chaining is enabled

To reproduce
Consider the binary attached input file testfile.zip. The file is zip compressed and must be decompressed.
Processing the file with the lz4 cli produces the following checksums:

lz4 -v -B4 -BI -1 --no-frame-crc testfile.bin testfile.lz4-1-independent-expected
lz4 -v -B4 -BD -1 --no-frame-crc testfile.bin testfile.lz4-1-chained-expected
lz4 -v -B4 -BI -3 --no-frame-crc testfile.bin testfile.lz4-3-independent-expected
lz4 -v -B4 -BD -3 --no-frame-crc testfile.bin testfile.lz4-3-chained-expected

sha1sum testfile.*
a1938254adb8d00835b5bd7a63d51499ddb9c3af  testfile.bin
58e6b6fe0f76de620d78d8afd8e19539b4fa0289  testfile.lz4-1-chained-expected
6bf34dc13fd8f7102b506c810c6f975751f1e236  testfile.lz4-1-independent-expected
d12c77f9ce1a44996e5679ef2f873928369dee7e  testfile.lz4-3-chained-expected
4522410e07a080d555c90680c3e8d00a39b1e002  testfile.lz4-3-independent-expected

Now consider this reproducing example program:

using K4os.Compression.LZ4;
using K4os.Compression.LZ4.Internal;
using K4os.Compression.LZ4.Streams;

Console.WriteLine("Hello, World!");

using (var source = File.OpenRead("testfile.bin"))
{
    // lz4 -v -B4 -BI -1 --no-frame-crc testfile.bin testfile.lz4-1-independent-expected
    using (var actual = LZ4Stream.Encode(File.Create("testfile.lz4-1-independent-actual"), new LZ4EncoderSettings()
    {
        ChainBlocks = false,
        BlockSize = Mem.K64,
        CompressionLevel = LZ4Level.L00_FAST,
    }))
    {
        source.Position = 0;
        source.CopyTo(actual);
    }
    PrintComparison("testfile.lz4-1-independent-expected", "testfile.lz4-1-independent-actual");

    // lz4 -v -B4 -BD -1 --no-frame-crc testfile.bin testfile.lz4-1-chained-expected
    using (var actual = LZ4Stream.Encode(File.Create("testfile.lz4-1-chained-actual"), new LZ4EncoderSettings()
    {
        ChainBlocks = true,
        BlockSize = Mem.K64,
        CompressionLevel = LZ4Level.L00_FAST,
    }))
    {
        source.Position = 0;
        source.CopyTo(actual);
    }
    PrintComparison("testfile.lz4-1-chained-expected", "testfile.lz4-1-chained-actual");

    // lz4 -v -B4 -BI -3 --no-frame-crc testfile.bin testfile.lz4-3-independent-expected
    using (var actual = LZ4Stream.Encode(File.Create("testfile.lz4-3-independent-actual"), new LZ4EncoderSettings()
    {
        ChainBlocks = false,
        BlockSize = Mem.K64,
        CompressionLevel = LZ4Level.L03_HC,
    }))
    {
        source.Position = 0;
        source.CopyTo(actual);
    }
    PrintComparison("testfile.lz4-3-independent-expected", "testfile.lz4-3-independent-actual");

    // lz4 -v -B4 -BD -3 --no-frame-crc testfile.bin testfile.lz4-3-chained-expected
    using (var actual = LZ4Stream.Encode(File.Create("testfile.lz4-3-chained-actual"), new LZ4EncoderSettings()
    {
        ChainBlocks = true,
        BlockSize = Mem.K64,
        CompressionLevel = LZ4Level.L03_HC,
    }))
    {
        source.Position = 0;
        source.CopyTo(actual);
    }
    PrintComparison("testfile.lz4-3-chained-expected", "testfile.lz4-3-chained-actual");
}


static void PrintComparison(string expectedFile, string actualFile)
{
    var expected = File.ReadAllBytes(expectedFile);
    var actual = File.ReadAllBytes(actualFile);

    if (expected.SequenceEqual(actual))
    {
        Console.WriteLine($"The files {expectedFile} and {actualFile} are the same.");
    }
    else
    {
        Console.Error.WriteLine($"The files {expectedFile} and {actualFile} are NOT the same!");
    }
}

It will produce this output:

Hello, World!
The files testfile.lz4-1-independent-expected and testfile.lz4-1-independent-actual are the same.
The files testfile.lz4-1-chained-expected and testfile.lz4-1-chained-actual are NOT the same!
The files testfile.lz4-3-independent-expected and testfile.lz4-3-independent-actual are the same.
The files testfile.lz4-3-chained-expected and testfile.lz4-3-chained-actual are NOT the same!

This can be verified from the file checksums:

sha1sum bin/Debug/net6.0/testfile.*
a1938254adb8d00835b5bd7a63d51499ddb9c3af  bin/Debug/net6.0/testfile.bin
af7d8eee3d20a43a6553f4cb7bf960cf9920791b  bin/Debug/net6.0/testfile.lz4-1-chained-actual
58e6b6fe0f76de620d78d8afd8e19539b4fa0289  bin/Debug/net6.0/testfile.lz4-1-chained-expected
6bf34dc13fd8f7102b506c810c6f975751f1e236  bin/Debug/net6.0/testfile.lz4-1-independent-actual
6bf34dc13fd8f7102b506c810c6f975751f1e236  bin/Debug/net6.0/testfile.lz4-1-independent-expected
85f5c92294dfbc6df1f0247410c3888aaa55caf9  bin/Debug/net6.0/testfile.lz4-3-chained-actual
d12c77f9ce1a44996e5679ef2f873928369dee7e  bin/Debug/net6.0/testfile.lz4-3-chained-expected
4522410e07a080d555c90680c3e8d00a39b1e002  bin/Debug/net6.0/testfile.lz4-3-independent-actual
4522410e07a080d555c90680c3e8d00a39b1e002  bin/Debug/net6.0/testfile.lz4-3-independent-expected

Expected behavior
The expected behavior is that the encoder produces the same result as the lz4 cli when chaining is enabled.

Actual behavior
The encoder results are not the same.

Environment

  • CPU: AMD Ryzen 7
  • OS: Windows and Ubuntu - tested on both
  • .NET: net6
  • LZ4: 1.2.16

Additional context
I have tried both the lz4 cli on windows and linux - it produces the same result on both platforms.

@MiloszKrajewski
Copy link
Owner

MiloszKrajewski commented Sep 1, 2022

Is output compatible?
Can it compressed with one and decompress with the other?
If yes, then this might interesting but low priority.

I wrote chaining code myself (just from spec), so it might behave a little bit differently.
Also, it can use x86 or x64 encoder (which do produce different results), so this is another thing to keep in mind.

So, my question is, regardless being different - is it compatible?

@rmja
Copy link
Author

rmja commented Sep 2, 2022

Yes, they seem compatible. At lease from the set of files that I have tested. The lz4 cli seem to be able to decompress the encoded files.

I found this when I was implementing #14. The implementation is here:
https://gist.github.com/rmja/98dc7e0576c933faa0a75629b46af71c

For this I created a bunch of different random files for testing and found the issue that way.

@MiloszKrajewski
Copy link
Owner

What about sizes?

@rmja
Copy link
Author

rmja commented Sep 2, 2022

They are not the same size. It seems as if this library produces a 1 byte smaller file in the tests that I have made. This is a diff from the last block:

The highlighted bytes are the block header. This library produces a length of 0x00001F65 bytes and the cli produces 0x00001F66 bytes.
image

The following is the diff within the block:
image

It is somewhere in the middle of the block.

@MiloszKrajewski MiloszKrajewski added the notbug Not a bug label Nov 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
notbug Not a bug
Projects
None yet
Development

No branches or pull requests

2 participants