From 87507833d1ebabb46cf26a3e45b8d02e28133dfa Mon Sep 17 00:00:00 2001 From: Ryan Blue Date: Thu, 21 May 2026 14:26:29 -0700 Subject: [PATCH] Mumbling: Add draft Mumbling Bitmap spec. --- format/mumbling-spec.md | 316 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 316 insertions(+) create mode 100644 format/mumbling-spec.md diff --git a/format/mumbling-spec.md b/format/mumbling-spec.md new file mode 100644 index 000000000000..172b457afd43 --- /dev/null +++ b/format/mumbling-spec.md @@ -0,0 +1,316 @@ +--- +title: "Mumbling Bitmap Spec" +--- + + +# Mumbling Bitmap Spec + +This is a specification for the Mumbling compressed bitmap format designed for +use cases with bounded total size, like a deletion vector. This is based on +[Roaring Bitmap][roaring]. + +This spec is for version 1. + +[roaring]: https://roaringbitmap.org/ + + +## Overview + +Mumbling bitmaps are based on the same idea as Roaring bitmaps: a bitmap is +divided into fixed-size (256 bit) regions, called _containers_. Each container +is either _sparse_ to store offsets (0-255) or _dense_ to store a bit set. The +main difference from Roaring is that Mumbling bitmaps use a smaller scale where +each container is at most 32 bytes, and are limited to at most 2,097,152 +values. + +Containers are tracked by an array of descriptor bytes, one per container. The +position of the descriptor byte in the array encodes the most significant bits +of the positions stored in its corresponding container (the _key_ in Roaring +bitmap). The descriptor encodes the container size for sparse containers (0-31 +values), or that a container is dense (32). Because the format uses a +descriptor array instead of keys and offsets, descriptor bytes are stored +as a PFOR-encoded array. + + +## Design choices + +Unlike Roaring, this format uses a descriptor array instead of storing a key, +cardinality, and offset for each container. + +The Iceberg use case is for small, embedded deletion vectors. These vectors +will be small, likely less than 100,000 entries. But, limiting the +per-container overhead by using a single byte for the key (256 containers) +would be too limiting (only 65,536 total entries). As a result, this would +require 2-byte keys and 2-byte offsets (8192 containers * 32 bytes / +container). + +The descriptor array avoids 4 bytes of overhead for each 32-byte container. The +key is implicit and is the offset into the descriptor array. Descriptors encode +the length of a container instead of its offset, requiring only one byte. + +The first trade-off of this approach is that each descriptor must be present, +even for 0-length (empty) containers. And because an empty state is needed, +descriptors cannot encode the container cardinality (up to 256). PFOR encoding +is used to reduce this overhead. + +The second trade-off is that offsets are not directly stored. However, offsets +can be computed from the relatively small descriptor array and finding the +descriptor for a key uses direct array indexing. + +An alternative to the descriptor array is to store the offsets, and use the +difference between offsets to find container length. This approach was not +chosen because array encoding would be worse (values are increasing), and the +remaining descriptor bits cannot be used. + + +## Format + +A Mumbling bitmap consists of 3 sections: + +* Header +* Descriptor array +* Containers + +Throughout the foramt, integers are unsigned and stored as little endian. + + +### Header + +The Mumbling header is made up of the following fields: + +| Field | Size | Description | +|-----------------|---------|-------------| +| Format version | 1 byte | `0x01` for version 1 | +| Cardinality | 3 bytes | The number of set bits | +| Container count | 2 bytes | The number of containers in the bitmap | + +Because the container count is limited to 8,192, cardinality is limited to +2,097,152 (8,192 containers of 256 bits). + + +### Descriptor array + +The descriptor array contains one descriptor byte per container. Its length is +the container count. + +The 3 most significant bits of the descriptor byte determine the container type +and how to interpret the remaining least significant bits of the descriptor. + +| MSB pattern | Container type | Description | Remaining bits | +|-------------|----------------|----------------------------------|----------------| +| `000` | Sparse | A sparse container of 0-31 bytes | Container length / number of bits set | +| `001` | Dense | A dense container of 32 bytes | Must be 0 | + +In v1, the descriptor byte encodes the size of its corresponding container. +The two most significant bits are reserved for future use and implementations +must ignore the least-significant bits for a dense container if they are set. + +Example descriptors: + +| Hex | Binary | Interpretation | +|------|-------------|----------------| +| `00` | `0000 0000` | Sparse container of 0 values | +| `05` | `0000 0101` | Sparse container of 5 values | +| `1F` | `0001 1111` | Sparse container of 31 values | +| `20` | `0010 0000` | Dense container stored in 32 bytes | + +The descriptor array is encoded using patched frame of reference (PFOR) +documented in [Appendix A](pfor). PFOR was chosen because it can efficiently +store mostly uniform container sizes along with occasional larger values. The +binary representation for descriptors also allows saving at least 2 bits per +value. + +[pfor]: #appendix-a-pfor-encoding-for-unsigned-bytes + + +### Containers + +The containers section consists of concatenated containers. Each container +stores a 256 bit region of the bitmap. The number of containers is limited to +65,536 to fit into an unsigned 16-bit integer. + +Positions in the bitmap are split into the first 16 bits that identify the +container and the last 8 bits that identify the corresponding position within +the container. + +Containers may be sparse or dense. This type is encoded by the container's +corresponding descriptor byte. Containers with less then 32 bits set must be +sparse and containers with 32 or more bits set must be dense. + + +#### Sparse containers + +A sparse container encodes up to 31 set positions. All other positions are +unset. Each set position is stored as an unsigned byte (0-255), relative to the +start of the container. + +The list of positions must be sorted in ascending order. This allows checking +whether a specific position is set to stop when a higher position is reached or +the last position is reached. + +A sparse container's length is not stored in the container. It is the value of +the corresponding descriptor byte. + +Examples: + +| Descriptor | Container bytes | Set positions | +|------------|-------------------|---------------| +| 0 | (zero bytes) | None | +| 3 | `00 22 FF` | 0, 34, 255 | +| 31 | `00 01 02 ... 1F` | 0, 1, 2, ..., 31 | + + +### Dense containers + +A dense container encodes each bit of the container as 0 (unset) or 1 (set) in +a 32-byte array. + +The first byte in the byte array contains bits 0-7, the next byte contains +8-15, etc. Bit positions are ordered from most significant to least. As a +result, the 0th position in the container is the most significant bit of the +first byte, and the 255th position in the container is the least-significant +bit of the last byte. + +Examples: + +| Descriptor | Container bytes | Set positions | +|------------|-------------------------|---------------| +| 32 | `FF FF FF FF 00 ... 00` | 0-31 | +| 32 | `FF FF FF FF 80 ... 00` | 0-32 | +| 32 | `FF FF 00 ... 00 FF FF` | 0-15, 240-255 | +| 32 | `AA AA ... AA AA` | Even positions: 0, 2, 4, ... | + + +## Working with bitmaps + +Mumbling bitmaps use a descriptor array to track containers. For quick lookups, +implementations should decode the descriptor array and use it to produce an +offset array. + +The container for a position is accessed by finding its type and offset using +the descriptor array. The index into the descriptor array is the position +divided by 256: + +``` +let container_index: u16 = (pos >> 8) as u16 +``` + +The offset of the container is the sum of the lengths of previous containers, +as determined by container descriptors. + +``` +let descriptor: u8 = descriptors[container_index] +let offset: u16 = offsets[container_index] +``` + +The corresponding position within a container is the least significant 8 bits +of the bitmap position: + +``` +let pos_in_container: u8 = (pos & 0xFF) as u8 +``` + + +# Appendix A: PFOR encoding for unsigned bytes + +The unsigned byte PFOR encoding splits the value array into 256-value chunks. +The length of each chunk is 256 values until the last chunk, which is the +remainder. Length is not encoded in each chunk. + +Chunks are encoded separately using a PFOR scheme for single-byte values. +First, the chunk's min value is subtracted from every value in the chunk to +normalize values. Second, a bit width, `b1`, is chosen so that most values in +the chunk can be stored in `b1` bits. Next, the least-significant `b1` bits of +each value are packed into the _primary_ array. Finally, the positions of +_exception_ values that do not fit in `b1` bits are tracked in an offset array, +and the remaining bits of the exceptions are packed into an exception array. + + +## PFOR encoding + +Each chunk is stored using the following sections: + +* Header (3 bytes) +* Primary value array +* Exception offsets +* Exception value array + + +### Header + +The header stores information about the chunk encoding: + +* `b1`: The bit width of values in the primary value array +* `b2`: The bit width of values in the exception array; at most `8 - b1` +* `e`: The number of exception values that do not fit in `b1` bits +* `m`: A constant that has been subtracted from each value, usually the min + +The header layout packs `b1` and `b2` in one byte, followed by `e` and `m`. + +| Byte | Bits | Field | +|------|------|-------| +| 0 | 0-3 | `b1` | +| 0 | 4-7 | `b2` | +| 1 | 0-7 | `e` | +| 2 | 0-7 | `m` | + + +### Encoding + +To encode a chunk of values: + +1. Find the minimum value, `m` +2. Subtract `m` from each value in the chunk +3. Choose `b1` and `b2` (see below) +4. Pack the least-significant `b1` bits of each value into `32 * b1` bytes +5. Collect exception values that do not fit into `b1` bits, and their offsets +6. For each exception, pack the remaining `b2` bits into `ceil(e*b2/8)` bytes + +The recommended way to choose bit widths `b1` and `b2` is: + +1. For each value, find the number of bits required to store it +2. Count the values requiring each bit width, 0-8, and find the largest width +3. For each bit width, `b`, calculate the total size for that width: + * Let `e` be the number of exceptions, the sum of counts for larger widths + * Let `b2` be the exception bit width, the largest width minus `b` + * The total size is `32*b + e + ceil(e*b2/8)` +4. Choose a bit width `b1` that minimizes the total size + +Values are packed using the most significant bits for the first value. If a bit +packed array is not full, each packed section is padded with 0s to the next +byte. + +For example, for bit width `b = 2`, the array [ 3, 2, 1, 2, 3 ] is stored as +binary `1110 0110 1100 0000` or hex `E6 C0`. + +When `b1` is 8, values are each stored in a byte and there are no exceptions. +In this case, `e` must be 0, `b2` must be 0, and it is recommended that +implementations store the original values (`m` is 0). + + +### Examples + +| Length | Encoded byte array hex | Decoded values | Description | +|--------|------------------------|----------------------------|-------------| +| 256 | `00 00 00` | 256 values, all = 0 | 0 bits per value, `m` = 0, no exceptions | +| 51 | `00 00 05` | 51 values, all = 5 | 0 bits per value, `m` = 5, no exceptions | +| 8 | `80 02 00 04 07 FF FE` | [0, 0, 0, 0, FF, 0, 0, FE] | 0 bits per value, `m` = 0, 2 exceptions, 8 bits per exception | +| 3 | `02 00 06 18` | [6, 7, 8] | 2 bits per value, `m` = 6, no exceptions | +| 4 | `32 01 06 09 01 E0` | [6, 34, 8, 7] | 2 bits per value, `m` = 6, 1 exception, 3 bits per exception | + +