Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output Dict/Struct of Huffman Symbol Codes #28

Closed
dnhmphrs opened this issue Jul 14, 2023 · 2 comments
Closed

Output Dict/Struct of Huffman Symbol Codes #28

dnhmphrs opened this issue Jul 14, 2023 · 2 comments

Comments

@dnhmphrs
Copy link

It would be very good for me (neuroinformatics research) if we could output the binary symbol codes from the Huffman tree.

I've been working in the python library but can write rust code. If this feature exists in the rust code I'm happy to write the python binding. If the feature isn't in rust code either, I'd be happy to write it and add the python binding also.

Let me know if this would be good, and maybe me a little direction about how you'd like it if so!

@robamler
Copy link
Collaborator

Thank you for reporting! Could you expand on your use case? The point of any source coding encoder is to convert a message (i.e., a sequence of symbols) to a bit string. The message can be of arbitrary length. In particular, when using a symbol code like Huffman coding, there's nothing wrong with encoding a message that consists of only a single symbol:

import constriction
import numpy as np

def single_code_word(symbol, code_book):
    encoder = constriction.symbol.QueueEncoder()
    encoder.encode_symbol(symbol, code_book)
    code_word, length = encoder.get_compressed()
    return f'{code_word[0]:0{length}b}'[::-1] # works for code words of length <=32

probabilities = np.array([0.1, 0.2, 0.3, 0.4])
code_book = constriction.symbol.huffman.EncoderHuffmanTree(probabilities)
print({symbol: single_code_word(symbol, code_book) for symbol in range(4)})

Are you looking for a more convenient way to do this? If so, I'd be curious which of the following best describes your use case:

  1. being able to call dict(tree) where tree is an EncoderHuffmanTree, and getting a dictionary that maps symbols to some representations of bit strings (like in the above example);
  2. having a method on EncoderHuffmanTree that takes a symbol as argument and outputs some representation of this symbol's code word;
  3. having a simple way to summarize an EncoderHuffmanTree in a human-readable string for debugging; or
  4. something else?

@robamler
Copy link
Collaborator

Closing this for now because it's not actionable without further information. As mentioned in my last comment, the specific requested feature already exists, and while it is very well possible that the existing solution may not be suitable for all use cases, further information is needed to clarify what exactly is missing. Without clarification of the intended use case, any attempt from my side to guess and address the underlying issue would likely run into the XY problem.

Please feel free to reopen if you can clarify the broader picture of what you're trying to achieve.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants