Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core dump in HuffmanEncoder::GenerateCodeLengths in case of only one leaf in tree. #242

Closed
wants to merge 2 commits into from

Conversation

kvirund
Copy link
Contributor

@kvirund kvirund commented Aug 10, 2016

No description provided.

@kvirund kvirund changed the title Fixed core dump in HuffmanEncoder::GenerateCodeLengths in case of only one leaf in tree. Core dump in HuffmanEncoder::GenerateCodeLengths in case of only one leaf in tree. Aug 10, 2016
@noloader
Copy link
Collaborator

noloader commented Aug 10, 2016

Thanks @kvirund. Do you have small test case for the crash? I'd like to add it to our test suite.

@kvirund
Copy link
Contributor Author

kvirund commented Aug 10, 2016

Unfortunately I cannot share that data, but I will try to build another test case.

@kvirund
Copy link
Contributor Author

kvirund commented Aug 10, 2016

Actually, I can share values of arguments passed into HuffmanEncoder::GenerateCodeLengths(unsigned int *codeBits, unsigned int maxCodeBits, const unsigned int *codeCounts, size_t nCodes) when core dump occurred:

nCodes = 0x1e
codeCounts = {1, 0, 0 x 27 times, 0}
maxCodeBits = 15
codeBits = <uninitialized>

@kvirund
Copy link
Contributor Author

kvirund commented Aug 10, 2016

Essentially the code to reproduce the error is:

#include "gzip.h"

void main()
{
    const size_t nCodes = 30;
    const unsigned int codeCounts[nCodes] = {
        1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0};
    const unsigned int maxCodeBits = nCodes >> 1;
    unsigned int codeBits[nCodes] = {
        ~0u, ~0u, ~0u, ~0u, ~0u,
        ~0u, ~0u, ~0u, ~0u, ~0u,
        ~0u, ~0u, ~0u, ~0u, ~0u,
    };

    CryptoPP::HuffmanEncoder::GenerateCodeLengths(codeBits, maxCodeBits, codeCounts, nCodes);
}

/* vim: set ts=4 sw=4 tw=0 noet :*/

At line const size_t depth = STDMIN(maxCodeBits, tree[n].depth + 1); n had wrong value. Because it had value tree[i].parent where i is treeBegin where treeBegin points to the single symbol. But because this symbol is single it does not have parent (it was not merged with any other symbols in loop for (i=nCodes; i<tree.size(); i++), because tree.size() is equal to nCodes: tree.resize(nCodes + nCodes - treeBegin - 1); because 1 == nCodes - treeBegin => nCodes == nCodes + nCodes - treeBegin - 1).

Update: Actually now I cannot reproduce it from scratch even on Windows because at line SecBlockWithHint<HuffmanNode, 2*286> tree(nCodes); variable tree initializes its array m_ptr by zeroes. Because of that line const size_t n = tree[i].parent; assigns 0 to n. So, it does not core dump at the next line: const size_t depth = STDMIN(maxCodeBits, tree[n].depth + 1); just because tree[n] exists. In my case it tied to get access to tree[0xcccccccc00000000].

@noloader
Copy link
Collaborator

noloader commented Aug 20, 2016

@kvirund,

... now I cannot reproduce it from scratch even on Windows...

Does that mean something else changed (like you started working from Master) and the issue is no longer present? Or does that mean the issue still exists, but the test case does not capture it? Or maybe something else?

I guess the second question is, do we still need to consider the pull request?

Sorry to have to ask. This looks fishy: tree[0xcccccccc00000000], and I'm not sure what's going on such that the low part of the word is initialized, but the high part of the word is uninitialized.

@kvirund
Copy link
Contributor Author

kvirund commented Aug 21, 2016

No, issue still exists. Just when I tried to reproduce it I built latest version of CryptoPP and in this version variables (tree specifically) were initialized by zeros. So I was not able to reproduce the issue just because values became lucky. But error is still there. Because code is the same as where I got core dump.

I got value tree[0xcccccccc00000000] because index had gotten from const size_t n = tree[i].parent;. But parent is the union: union {size_t parent; unsigned depth, freq;};. So when last element was initialized on line zdeflate.cpp:143: tree[tree.size()-1].depth = 0;, it initialized only low part of parent because size of unsigned is half shorter than size of size_t.

Code that I mentioned is the reproducible case. Just I don't know how to compile project so that tree is initialized by 0xcccccccccccccccc. But to check my theory you can do it manually (just add before line zdeflate.cpp:119 the following code tree[i].parent = 0xccccccccccccccccull;. If algorithm would have worked correctly it should not mean anything).

@noloader
Copy link
Collaborator

noloader commented Aug 21, 2016

I got value tree[0xcccccccc00000000] because index had gotten from const size_t n = tree[i].parent;. But parent is the union: union {size_t parent; unsigned depth, freq;} ...

Ah, OK, thank you very much. I did not realize a union was in play. Related, in C++, its undefined behavior to read from the inactive union member. In practice nearly every (every?) C++ compiler does the right thing because its legal to do in C.

I've seen that before under MSVC++. I encountered it when trying to avoid an unitialized warning for LARGE_INTEGER in hrtimer.cpp. In that case, can you test this:

$ git diff
diff --git a/zdeflate.cpp b/zdeflate.cpp
index a016b30..afc513f 100644
--- a/zdeflate.cpp
+++ b/zdeflate.cpp
@@ -91,9 +91,9 @@ struct HuffmanNode
 {
        // Coverity finding on uninitalized 'symbol' member
        HuffmanNode()
-               : symbol(0), parent(0) {}
+               : symbol(0) { depth=0, freq=0; }
        HuffmanNode(const HuffmanNode& rhs)
-               : symbol(rhs.symbol), parent(rhs.parent) {}
+               : symbol(rhs.symbol) { depth=rhs.depth; freq=rhs.freq; }

        size_t symbol;
        union {size_t parent; unsigned depth, freq;};

@kvirund
Copy link
Contributor Author

kvirund commented Aug 21, 2016

Actually the problem is not in the initializing. Trash in high part of parent value just uncovers the problem.

If I understand right, algorithm should build tree that reflects the Huffman codes. So, in the loop at zdeflate.cpp:132 it builds that tree by creating parent node over two nodes with least frequency values. The case I stepped on is when tree initially has only one node. So no any parent nodes will be built. And this single node won't have a parent.

After that, algorithm recursively counts depth of each node at zdeflate.cpp:150. Here "recursively" means that to count depth of some node, algorithm takes depth value of its parent node. So body of this loop expects that each node being processed in the loop has some parent node. But it is not true when only the single node presented in the tree. That is the problem. And this is the case I have handled in my pull request.

@noloader
Copy link
Collaborator

noloader commented Aug 21, 2016

@kvirund,

I added the test case at Commit 1e7c837442231a55. Thank you very much.

The test case does not trigger on OS X or Linux, so I don't know if the reworked HuffmanEncoder ctors actually solve the problem on Windows.

There could be two different problems here. If that's the case, then I want to work the initialization problem first. Once I know the HuffmanEncoder is properly initialized, then we can move onto the bad assignment that follows.

@kvirund
Copy link
Contributor Author

kvirund commented Aug 21, 2016

Do you need something else from me?

@noloader
Copy link
Collaborator

noloader commented Aug 21, 2016

Do you need something else from me?

Well, I'm not sure.

I tested Commit 1e7c837442231a55 on Windows 7 with both Visual Studio 2010 and 2012 under both X86 and X64. I can't seem to reproduce the issue.

What version of the library are you experiencing the issue? 5.6.3 ZIP? Master? Something else?

What platform and compiler are you experiencing the issue?

@kvirund
Copy link
Contributor Author

kvirund commented Aug 21, 2016

It was Microsoft Visual Studio 2010, Windows 7. But I did not use CMake script that comes with library. I just compiled all source files with my own flags.

Actually, I think core dump will not be reproduced anymore. But don't you agree with my reasonings? I mean my notice that the loop at zdeflate.cpp:150 works in assumption that each node in range [treeBegin, nCodes) has a parent node. But for my test case we have only one node that does not have parent node. So that is the mistake in the logic: it does not handle properly corner case.

@noloader
Copy link
Collaborator

Actually, I think core dump will not be reproduced anymore.

In that case, I would prefer not take action. I want to avoid inadvertently breaking something with an unintended consequence.

After AES and SHA, the compressors and decompressors are some of the hairiest code in the library. In Issue 83: zdeflate error I knocked something loose while chasing an uninitialized bug that surfaced under Valgrind. It took some time for the break to surface and subsequent fix, and I don't want to repeat it.

I'm not married to the "no action" position. If I can get hold of a test case that reproduces the issue, then I'm happy to move against it.

@noloader noloader closed this Aug 21, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants