-
Notifications
You must be signed in to change notification settings - Fork 332
XUBC7 File Format
Copyright (C) 2025-2026 Binomial LLC. All rights reserved except as granted under the Apache 2.0 LICENSE. Also see our NOTICE file. If you modify the Basis Universal source code, specifications, or wiki documents and redistribute the files, you must cause any modified files to carry prominent notices stating that you changed the files (see Apache 2.0 §4(b)).
XUBC7 is a supercompressed standard BC7 texture codec, first shipped in Basis Universal v2.50. The decoder reconstructs standard 128-bit BC7 blocks — mode, partition, endpoints, p-bits, and per-texel weight indices — which are then either used directly (BC7-capable GPUs), transcoded near-losslessly to ASTC LDR 4x4, or quickly recompressed to the other LDR GPU texture formats. Like XUASTC LDR, it compresses the structures that generate pixels (endpoints, partitions, weight grids) rather than the pixels themselves, using prediction, DPCM, and an absolute/residual DCT over the block weight planes.
Key properties:
- Lossless or lossy. Endpoints, configs, and partitions are always coded exactly. At quality 100 the block weights are also coded exactly (residual DPCM), making the entire file bit-exact relative to the input BC7 blocks. At lower qualities the weight planes go through a quantized DCT, JPEG-style.
- Self-contained slices. Each image (mip level/layer/face) is an independent compressed slice — no cross-slice codebooks or global state.
- Per-blob Zstd. A slice is a small container of tagged byte streams ("blobs"); each blob is individually Zstd-compressed, or stored raw when Zstd wouldn't shrink it. There is no whole-slice or whole-mip-level entropy pass.
- Parallel encode and decode. Images are split into 1–16 horizontal stripes of block rows; a per-stripe seek table lets the decoder run stripes concurrently.
- Optional alpha. RGB or RGBA, flagged per slice.
- Deterministic decode. All math is fixed-point integer (including the DCT), so decodes are bit-identical across platforms.
This document describes the XUBC7 slice/file structure at a high level. It intentionally does not specify the bit-exact decode procedure; the normative reference is the decoder source, transcoder/basisu_xbc7_decoder.h and transcoder/basisu_xbc7_decoder.inl, which are heavily commented.
Each mip level image (slice) decodes independently. The high-level flow:
- Locate the slice in the container (.KTX2: level index + seek table; .basis: slice descriptor table) and hand its bytes to the decoder.
-
Dispatch on the first byte.
0xB8/0xB9= a tiny mip: the payload is just raw packed BC7 blocks — copy them out, done.0xB7= the regular blob-container form, continue below. - Parse the blob directory. Each blob is an independent tagged byte stream, individually Zstd-compressed or stored raw. All compressed blobs are inflated up front (into a single arena allocation in the reference decoder).
-
Read the header blob (ID 0): image dimensions, DCT quality, alpha flag, stripe count. Rebuild the stripe geometry, and when
num_stripes > 1read the stripe seek table (ID 26) so each stripe's starting position in every stream is known. - Decode each stripe — independently, optionally in parallel. Within a stripe, walk its blocks in raster order. For every block, read one command byte from the commands stream (ID 1). The command says how the block is built — repeated from a neighbor, a solid color, or a full block — and therefore which of the other streams are consumed for it: config, partition, endpoint, and weight data.
- Reconstruct the logical BC7 block (mode, partition, exact endpoints and p-bits, weight indices) and pack it into a standard 16-byte BC7 block.
The result is a standard BC7 texture, ready to upload or transcode.
-
KTX2 File Format Support Technical Details — section 8, XUBC7 — how XUBC7 slices are carried in .KTX2 files (header, DFD, seek table,
m_profile, alpha signaling). - XUASTC LDR — the sibling ASTC-domain codec; XUBC7 reuses much of its machinery.
- Release Notes — v2.50 announcement.
XUBC7 data is carried in either of Basis Universal's containers:
-
.KTX2:
vkFormat = UNDEFINED, DFD color model 170 (0xAA), supercompression schemeKTX2_SS_XUBC7(6). ThesupercompressionGlobalDatablock holds a per-image seek table (12-byte descriptors: mip-relative offset, length, and a profile word). All of this is documented in detail in the KTX2 doc, section 8. -
.basis:
basis_tex_format::cXUBC7(33), one independent compressed slice per image, located by the standard slice descriptor table. No codebook or Huffman table sections are present (those are ETC1S-only).
In both containers, each addressed slice is one of the self-contained payloads described below.
The first byte of every slice payload is a format-dispatch marker:
| First byte | Form | Description |
|---|---|---|
0xB7 |
Blob container | The normal form: a tagged-blob stream (next section). Alpha presence is in the header blob's flags. |
0xB8 |
Tiny mip, no alpha | Raw BC7 blocks, described below. |
0xB9 |
Tiny mip, has alpha | Same, with the alpha bit carried in the marker itself. |
Any other leading byte is not XUBC7 and is rejected.
Tiny-mip slices are used for the smallest mip levels, where the blob container's overhead isn't worthwhile. The layout is:
[uint8 marker] 0xB8 = no alpha, 0xB9 = has alpha
[uint8 num_blocks_x] (> 0)
[uint8 num_blocks_y] (> 0)
[16 bytes per block] standard packed BC7 blocks, raster order
The total payload length must be exactly 3 + num_blocks_x*num_blocks_y*16 bytes. Tiny mips store no exact texel dimensions and use no prediction, DCT, or compression — the decoder's logical size is block-aligned (num_blocks_x*4 by num_blocks_y*4).
A regular slice is a simple ("KISS") tagged-blob container: a directory of up to 128 independent byte streams, identified by a 7-bit blob ID. Serialized layout (all sizes are LEB128 varints — 7 bits per byte, high bit = continue, max 5 bytes for a uint32):
[uint8 0xB7] begin marker
[uint8 num_blobs] only non-empty blobs are stored
repeated num_blobs times:
[uint8 id_and_flag] low 7 bits = blob id (< 128); high bit: 1 = Zstd-compressed, 0 = stored raw
if raw: [varint size] // > 0; 'size' bytes follow
if compressed: [varint uncompressed_size][varint stored_size] // both > 0, stored_size < uncompressed_size
[blob data]
[uint8 0x6A] end marker; must land exactly on the final byte
Notes:
- Raw vs. compressed is decided per blob by the encoder: a blob is Zstd-compressed only if that actually shrinks it. Incompressible streams (e.g. sign bits) are automatically stored raw with no special-case flags.
- Per-blob overhead is typically 3 bytes raw / 5 bytes compressed; a ~20-blob slice costs roughly 80 bytes of container overhead.
- The reader rejects structural problems: bad begin/end markers, truncation or trailing garbage (the end marker must be the final byte), duplicate blob IDs, zero sizes, and compressed blobs whose stored size isn't strictly smaller than their uncompressed size.
- The reader does not reject unknown blob IDs — well-formed blobs with IDs a decoder doesn't recognize are simply never queried. This is the format's forward-compatibility mechanism (see Versioning below).
Blob 0 is a fixed 7-byte header (#pragma pack(1), little-endian):
struct xbc7_header // 7 bytes
{
uint16_t m_width_in_texels; // exact image width, 1..16384
uint16_t m_height_in_texels; // exact image height, 1..16384
uint8_t m_dct_q; // global DCT quality, 1..100 (100 = lossless weights)
uint8_t m_flags; // bit 0 = XBC7_FLAG_HAS_ALPHA; all other bits must be 0 (rejected otherwise)
uint8_t m_num_stripes; // encoder stripe count: >= 1, <= num_blocks_y, <= 16
};
Block dimensions are derived: num_blocks_x = (width+3)/4, num_blocks_y = (height+3)/4. The stripe count must be in the header because the solid-block predictor is implicit (see Stripes below).
The complete blob ID vocabulary as of v2.50. "Granularity" describes how entries map to blocks; a blob is only present when at least one block consumes it.
| ID | Name | Contents | Granularity |
|---|---|---|---|
| 0 | Header | File-level metadata (above) | fixed 7 bytes |
| 1 | Commands | One command byte per block, raster order — drives all other streams | 1 byte/block, exactly total_blocks bytes |
| 2 | BC7BlockConfig | New-config bytes: BC7 mode (bits 0-2), mode 4/5 rotation (bits 3-4), mode 4 index selector (bit 5); bits 6-7 reserved-zero (rejected if set) | 1 byte per new-config command |
| 3 | Partition2 | 2-subset partition indices (modes 1, 3, 7; 64 patterns) | 1 byte per 2-subset block |
| 4 | Partition3 | 3-subset partition indices (modes 0, 2; mode 0 index must be < 16) | 1 byte per 3-subset block |
| 5 | WeightPredictors | Joint (predictor candidate, amplitude code) byte for DCT-weight blocks; values ≥ 200 rejected | 1 byte per DCT block |
| 6 | DCCoeffsSmall | DC coefficients, 2/3-bit weight modes (magnitude lattice-coded; sign conditional). Note: the v2.50 encoder routes all DC values through this stream. | 1 per coded plane |
| 7 | DCCoeffsLarge | DC coefficients, 4-bit weight modes (see note on ID 6) | 1 per coded plane |
| 8 | ACCoeffs | AC DCT coefficients, run-length coded in zig-zag order | variable, per DCT block |
| 9 | CoeffSigns | Raw sign bits for AC (and conditional DC) coefficients | bit-packed |
| 10 | PBits | Raw endpoint p-bits (per the mode's p-bit shape) | bit-packed |
| 11-14 | EPDeltaFine R/G/B/A | Endpoint DPCM residuals, fine (≥ 6-bit) precision, one stream per channel | 1 byte per delta |
| 15-18 | EPDeltaCoarse R/G/B/A | Endpoint DPCM residuals, coarse (< 6-bit) precision | 1 byte per delta |
| 19 | EPRaw | Raw endpoints (the endpoint "raw escape" path) | bit-packed |
| 20 | EPBlockIndex | Endpoint indexed-DPCM block references: 5-bit index into the causal offset table (top 3 bits reserved-zero) | 1 byte per reference |
| 21 | RawWeightBits | Quantized weight indices for DPCM-weight blocks using the absolute predictor, byte-packed by bit width | byte-packed, per plane |
| 22 | SolidRGBADeltas | Solid-block color residuals vs. the implicit neighbor-edge prediction: 4 interleaved wrapped bytes (G, R−G, B−G, A) in 8-bit pixel space | 4 bytes per solid block |
| 23-25 | DPCMWeightResid 2/3/4 | Wrapped n-bit weight-index residuals for DPCM-weight blocks with a real predictor, split by bit width | byte-packed, per plane |
| 26 | StripeSeekTable | Per-stripe stream offsets (see Stripes); present only when num_stripes > 1
|
num_stripes × 25 × 4 bytes |
| 27-127 | reserved | Future streams (e.g. P-frame motion, temporal references, optional tables) | — |
Three blobs (9, 10, 19) are bit streams rather than byte streams — their stripe seek offsets are bit offsets. IDs ≥ 128 are structurally impossible (bit 7 of the directory entry byte is the compression flag).
This section sketches the coding model conceptually; the decoder source is the normative reference.
Blob 1 holds exactly one command byte per 4x4 block, in raster order; its size must equal the image's total block count. Each byte packs three fields:
Command byte:
bits 0-2 : command (0-7, table below)
bits 3-5 : endpoint mode (0-7; full-block commands only)
bit 6 : weight mode (0 = DPCM/raw, 1 = DCT; full-block commands only)
bit 7 : reserved (P-frame flag, reserved -- must be 0, rejected otherwise)
The eight commands:
| Value | Command | Meaning |
|---|---|---|
| 0 | RepeatLast | Copy the left neighbor's entire block. Consumes nothing else. |
| 1 | RepeatUpper | Copy the upper neighbor's entire block. Consumes nothing else. |
| 2 | SolidDPCM | A solid-color block (below). Consumes blob 22 only. |
| 3 | NewConfig | Read a fresh BC7 config byte (blob 2), then partitions/endpoints/weights. |
| 4-7 | ReuseConfigLeft / Upper / LeftDiagonal / RightDiagonal | Inherit the BC7 config from the named causal neighbor, then read this block's own partition/endpoints/weights. |
For the three simple commands (0-2) the whole byte is the command: their endpoint/weight-mode bits must be zero, and the decoder rejects non-canonical encodings. The full-block commands (3-7) additionally interpret the endpoint-mode and weight-mode fields:
Endpoint modes (bits 3-5):
| Value | Mode | Meaning |
|---|---|---|
| 0 | Raw | Endpoint values and p-bits read verbatim from the raw endpoint streams (blobs 19/10). |
| 1 | DPCM Left | Predict the endpoints from the left neighbor; code only the residuals (blobs 11-18). |
| 2 | DPCM Up | Predict from the upper neighbor. |
| 3 | DPCM Left-diagonal | Predict from the upper-left neighbor. |
| 4 | DPCM Right-diagonal | Predict from the upper-right neighbor. |
| 5 | DPCM Block-index | Predict from an arbitrary causal block selected by a 5-bit index into the shared offset table (blob 20). |
| 6 | DPCM Left, subset 1 | Predict from the left neighbor's second subset — useful when a partitioned neighbor's other half matches better. Rejected if the referenced block has fewer than 2 subsets. |
| 7 | DPCM Up, subset 1 | Same, from the upper neighbor's second subset. |
Weight mode (bit 6): 0 = the block's weight plane is coded losslessly (raw or DPCM residuals — blobs 21, 23-25); 1 = it is DCT-coded (blobs 5-9).
The command byte therefore determines which of the other blobs the block consumes — repeat and solid blocks short-circuit nearly everything.
A config is a block's structural identity: BC7 mode (all 8 modes are supported), the component rotation for dual-plane modes 4/5, and mode 4's index selector. Configs are either sent fresh (blob 2) or inherited from a causal neighbor via the reuse commands, so runs of structurally-similar blocks don't repeat the config byte. Partition indices travel in their own streams (blobs 3/4), split by subset count because the two pattern vocabularies are disjoint.
Endpoints are never transformed or quantized by the codec; they are reconstructed exactly for every block. The command's endpoint mode (see the table above) selects either the raw escape (endpoint values and p-bits read verbatim from blobs 19/10) or DPCM prediction from a causal neighbor, where only the per-channel residuals are coded (blobs 11-18, split into fine/coarse streams by precision).
The BC7 weight (selector index) plane is coded by first predicting it from already-decoded neighbors, then coding the residual. The predictor is chosen per block from a large candidate set: edge replications and blends, reflections, gradients, diagonal propagation, adaptive predictors (JPEG-LS-style MED, a CALIC-spirit gradient-adaptive blend, a least-squares plane fit), plus 32 generic causal block references at fixed (dx,dy) offsets — or Absolute, meaning no prediction at all.
The residual is then coded one of two ways, per the command's weight mode:
- DPCM (lossless): wrapped weight-index residuals, byte-packed (blobs 23-25, or blob 21 for the absolute predictor).
-
DCT (lossy): the 4x4 residual plane is transformed with a deterministic fixed-point DCT-II, quantized with a JPEG-style table (scaled adaptively per block from the global quality
m_dct_qand the block's endpoint span, with a dead-zone), and coded as DC (blobs 6/7) + zig-zag run-length AC (blob 8) + raw signs (blob 9). The predictor choice for DCT blocks travels in blob 5.
This is the "absolute and residual DCT" in the release notes: a DCT block either transforms the raw weight plane (Absolute predictor — DC is unsigned by construction) or a prediction residual (signed DC).
Losslessness: at m_dct_q = 100 the encoder codes every block's weights with lossless DPCM, so the whole file reproduces the input BC7 blocks bit-exactly. Below 100, DCT-coded blocks are lossy (individual blocks still fall back to lossless DPCM whenever that codes smaller).
A solid block's color is coded as a DPCM residual (blob 22) against an implicit prediction: the average of the left neighbor's right-edge column and the upper neighbor's bottom-edge row, in decoded 8-bit pixel space. Because the prediction is derived from decoded neighbors rather than signaled, encoder and decoder must agree on exactly which neighbors are visible — which is why the stripe count lives in the header (next section).
An image's block rows are split as evenly as possible into m_num_stripes (1–16) contiguous horizontal bands. Each stripe's portion of every per-stripe stream (blob IDs 1-25) is independently seekable via the stripe seek table (blob 26, present only when num_stripes > 1): for each stripe it stores the starting offset within each stream — byte offsets for byte streams, bit offsets for the three bit streams — encoded as little-endian 32-bit deltas from the previous stripe (stripe 0's delta is always 0), byte-plane transposed for better compression. The decoder rebuilds absolute offsets with a prefix sum and can then decode all stripes concurrently (the threaded decoder spawns one job per stripe).
Prediction is confined to a stripe: all explicit references (endpoint DPCM, weight predictors, repeats, block references) are causal within the stripe, and the implicit solid-block prediction clamps its upper-neighbor access at stripe seams — mirroring the encoder, which couldn't see across the seam either. More stripes = more parallelism at a small compression cost.
- The first payload byte (
0xB7/0xB8/0xB9) is the payload-form/version marker. In .KTX2 files this same byte is duplicated into the seek table'sm_profileword (low 8 bits), with a codec variant index (currently 0x01) in bits 8-15 — see the KTX2 doc. The v2.50 transcoder derives everything from the payload itself. - Blob IDs 27-127 are reserved for future streams. Because the container reader ignores unqueried IDs, files that add new optional streams remain readable by older decoders that don't know about them.
- Reserved bits are enforced: the decoder rejects nonzero reserved bits in the header flags, config bytes, endpoint block-index bytes, and command bytes, keeping those bits available for future signaled extensions.
The reference decoder is callback-streaming and owns no image storage. After validating the container and header it fires an init callback (block counts, texel dimensions, DCT quality, alpha flag), then a per-block callback delivering each reconstructed logical BC7 block, which the caller packs into a physical 16-byte BC7 block (or hands to a transcoder). In the threaded path, callbacks may arrive concurrently from different stripes, but always exactly once per block.
XUBC7's native target is BC7 (a trivial pack of the decoded logical block — lossless relative to the stored data). ASTC LDR 4x4 is the other fast path: usually latent-to-latent, with a low-probability real-time fallback for the few BC7 2/3-subset partition patterns that don't map losslessly to ASTC. All other targets decode the BC7 block to texels and re-encode in real time.
Supported transcode targets (gated by BASISD_SUPPORT_XUASTC): ASTC LDR 4x4, BC1, BC3, BC4, BC5, BC7, ETC1, ETC2 RGBA, EAC R11, EAC RG11, PVRTC1 RGB/RGBA (power-of-2 dimensions only), and uncompressed RGBA32 / RGB565 / BGR565 / RGBA4444. Not supported: ATC, FXT1, PVRTC2.
From the basisu command line tool (-xubc7 selects the codec):
| Option | Range / default | Meaning |
|---|---|---|
-quality |
[1,100], default lossless (100) | Sets the global DCT quality m_dct_q. 100 = lossless BC7 supercompression; lower = smaller and lossier weight coding. |
-effort |
[0,10] | Encoder speed vs. compression: controls how exhaustively predictors and coding choices are searched. |
-xubc7_rdo_level |
[0,100], default 0 (off) | Rate-distortion optimization strength: enables and scales the block-reuse / alternate-pack / endpoint-DPCM / AC-truncation RDO passes, trading PSNR for smaller files. |
-xubc7_num_stripes |
[1,16], default 8 | Number of encode/decode stripes (parallelism vs. a slight size cost). |
-xubc7_bc7f / -xubc7_bc7e_scalar
|
default -xubc7_bc7f
|
Selects the BC7 base encoder that produces the blocks XUBC7 supercompresses: bc7f (fast, analytical) or bc7e_scalar (slower, higher quality). |
-xubc7_bc7e_scalar_level |
[0,6], default 2 | Quality level for the bc7e_scalar base encoder. |
Note these options apply when compressing source images. XUBC7 can also, of course, supercompress BC7 data produced by any encoder.