Skip to content

Commit

Permalink
LibCompress: When limiting huffman tree depth, sacrifice bottom of tree
Browse files Browse the repository at this point in the history
Deflate and WebP can store at most 15 bits per symbol, meaning their
huffman trees can be at most 15 levels deep.

During construction, when we hit this level, we used to try again
with an ever lower frequency cap per symbol. This had the effect
of giving the symbols with the highest frequency lower frequencies
first, causing the most-frequent symbols to be merged. For example,
maybe the most-frequent symbol had 1 bit, and the 2nd-frequent
two bits (and everything else at least 3). With the cap, the two
most frequent symbols might both have 2 symbols, freeing up bits
for the lower levels of the tree.

This has the effect of making the most-frequent symbols longer at
first, which isn't great for file size.

Instead of using a frequency cap, ignore ever more of the low
bits of the frequency. This sacrifices resolution where it hurts
the lower levels of the tree first, and those are stored less
frequently.

For deflate, the 64 kiB block size means this doesn't have a big
effect, but for WebP it can have a big effect:

sunset-retro.png (876K): 2.02M -> 1.73M -- now (very slightly) smaller
than twice the input size! Maybe we'll be competitive one day.

(For wow.webp and 7z7c.webp, it has no effect, since we don't hit
the "tree too deep" case there, since those have relatively few
colors.)

No behavior change other than smaller file size. (No performance
cost either, and it's less code too.)
  • Loading branch information
nico authored and awesomekling committed May 26, 2024
1 parent 2023e8d commit 1a9d8e8
Show file tree
Hide file tree
Showing 2 changed files with 4 additions and 12 deletions.
10 changes: 4 additions & 6 deletions Userland/Libraries/LibCompress/Huffman.h
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
namespace Compress {

template<size_t Size>
void generate_huffman_lengths(Array<u8, Size>& lengths, Array<u16, Size> const& frequencies, size_t max_bit_length, u16 frequency_cap = UINT16_MAX)
void generate_huffman_lengths(Array<u8, Size>& lengths, Array<u16, Size> const& frequencies, size_t max_bit_length, u16 shift = 0)
{
VERIFY((1u << max_bit_length) >= Size);
u16 heap_keys[Size]; // Used for O(n) heap construction
Expand All @@ -26,9 +26,7 @@ void generate_huffman_lengths(Array<u8, Size>& lengths, Array<u16, Size> const&
if (frequency == 0)
continue;

if (frequency > frequency_cap) {
frequency = frequency_cap;
}
frequency = max(1, frequency >> shift);

heap_keys[non_zero_freqs] = frequency; // sort symbols by frequency
heap_values[non_zero_freqs] = Size + non_zero_freqs; // huffman_links "links"
Expand Down Expand Up @@ -78,8 +76,8 @@ void generate_huffman_lengths(Array<u8, Size>& lengths, Array<u16, Size> const&
}

if (bit_length > max_bit_length) {
VERIFY(frequency_cap != 1);
return generate_huffman_lengths(lengths, frequencies, max_bit_length, frequency_cap / 2);
VERIFY(shift < 15);
return generate_huffman_lengths(lengths, frequencies, max_bit_length, shift + 1);
}

lengths[i] = bit_length;
Expand Down
6 changes: 0 additions & 6 deletions Userland/Libraries/LibGfx/ImageFormats/WebPWriterLossless.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -257,12 +257,6 @@ static ErrorOr<void> write_VP8L_image_data(Stream& stream, Bitmap const& bitmap)
// We do use huffman coding by writing a single prefix-code-group for the entire image.
// FIXME: Consider using a meta-prefix image and using one prefix-code-group per tile.

// FIXME: generate_huffman_lengths() currently halves a frequency cap if the maximum bit length is reached.
// This has the effect of giving very frequent symbols a higher bit length than they would have otherwise.
// Instead, try dividing frequencies by 2 if the maximum bit length is reached.
// Then, low-frequency symbols will get a higher bit length than they would have otherwise, which might help
// compressed size. (For deflate, it doesn't matter much since their blocks are 64kiB large, but for WebP
// we currently use a single huffman tree per channel for the entire image.)
Array<Array<u16, 256>, 4> symbol_frequencies {};
for (ARGB32 pixel : bitmap) {
static constexpr auto saturating_increment = [](u16& value) {
Expand Down

0 comments on commit 1a9d8e8

Please sign in to comment.