Skip to content

XUBC7 File Format

Richard Geldreich edited this page Jul 2, 2026 · 8 revisions

XUBC7 File Format

Copyright (C) 2025-2026 Binomial LLC. All rights reserved except as granted under the Apache 2.0 LICENSE. Also see our NOTICE file. If you modify the Basis Universal source code, specifications, or wiki documents and redistribute the files, you must cause any modified files to carry prominent notices stating that you changed the files (see Apache 2.0 §4(b)).

Intro

XUBC7 is a supercompressed standard BC7 texture codec, first shipped in Basis Universal v2.50. The decoder reconstructs standard 128-bit BC7 blocks — mode, partition, endpoints, p-bits, and per-texel weight indices — which are then either used directly (BC7-capable GPUs), transcoded near-losslessly to ASTC LDR 4x4, or quickly recompressed to the other LDR GPU texture formats. Like XUASTC LDR, it compresses the latent structures that generate pixels (endpoints, partitions, weight grids) rather than the pixels themselves, using prediction, DPCM, and an absolute/residual DCT over the block weight planes.

Key properties:

  • Lossless or lossy. Endpoints, configs, and partitions are always coded exactly. At quality 100 the block weights are also coded exactly (residual DPCM), making the entire file bit-exact relative to the input BC7 blocks. At lower qualities the weight planes go through a quantized DCT, JPEG-style.
  • Self-contained slices. Each image (mip level/layer/face) is an independent compressed slice — no cross-slice codebooks or global state.
  • Per-blob Zstd. A slice is a small container of tagged byte streams ("blobs"); each blob is individually Zstd-compressed, or stored raw when Zstd wouldn't shrink it. There is no whole-slice or whole-mip-level entropy pass.
  • Parallel encode and decode. Images are split into 1–16 horizontal stripes of block rows; a per-stripe seek table lets the decoder run stripes concurrently.
  • Optional alpha. RGB or RGBA, flagged per slice.
  • Deterministic decode. All math is fixed-point integer (including the DCT), so decodes are bit-identical across platforms.

This document describes the XUBC7 slice/file structure at a high level. It intentionally does not specify the bit-exact decode procedure; the normative reference is the decoder source, transcoder/basisu_xbc7_decoder.h and transcoder/basisu_xbc7_decoder.inl, which are heavily commented.

Decoding a Mip Level at a Glance

Each mip level image (slice) decodes independently. The high-level flow:

  1. Locate the slice in the container (.KTX2: level index + seek table; .basis: slice descriptor table) and hand its bytes to the decoder.
  2. Dispatch on the first byte. 0xB8/0xB9 = a tiny mip: the payload is just raw packed BC7 blocks — copy them out, done. 0xB7 = the regular blob-container form, continue below.
  3. Parse the blob directory. Each blob is an independent tagged byte stream, individually Zstd-compressed or stored raw. All compressed blobs are inflated up front (into a single arena allocation in the reference decoder).
  4. Read the header blob (ID 0): image dimensions, DCT quality, alpha flag, stripe count. Rebuild the stripe geometry, and when num_stripes > 1 read the stripe seek table (ID 26) so each stripe's starting position in every stream is known.
  5. Decode each stripe — independently, optionally in parallel. Within a stripe, walk its blocks in raster order. For every block, read one command byte from the commands stream (ID 1). The command says how the block is built — repeated from a neighbor, a solid color, or a full block — and therefore which of the other streams are consumed for it: config, partition, endpoint, and weight data.
  6. Reconstruct the logical BC7 block (mode, partition, exact endpoints and p-bits, weight indices) and pack it into a standard 16-byte BC7 block.

The result is a standard BC7 texture, ready to upload or transcode.

Referenced Documents

Containers

XUBC7 data is carried in either of Basis Universal's containers:

  • .KTX2: vkFormat = UNDEFINED, DFD color model 170 (0xAA), supercompression scheme KTX2_SS_XUBC7 (6). The supercompressionGlobalData block holds a per-image seek table (12-byte descriptors: mip-relative offset, length, and a profile word). All of this is documented in detail in the KTX2 doc, section 8.
  • .basis: basis_tex_format::cXUBC7 (33), one independent compressed slice per image, located by the standard slice descriptor table. No codebook or Huffman table sections are present (those are ETC1S-only).

In both containers, each addressed slice is one of the self-contained payloads described below.

Slice Payload Forms

The first byte of every slice payload is a format-dispatch marker:

First byte Form Description
0xB7 Blob container The normal form: a tagged-blob stream (next section). Alpha presence is in the header blob's flags.
0xB8 Tiny mip, no alpha Raw BC7 blocks, described below.
0xB9 Tiny mip, has alpha Same, with the alpha bit carried in the marker itself.

Any other leading byte is not XUBC7 and is rejected.

Tiny-mip slices are used for the smallest mip levels, where the blob container's overhead isn't worthwhile. The layout is:

[uint8 marker]          0xB8 = no alpha, 0xB9 = has alpha
[uint8 num_blocks_x]    (> 0)
[uint8 num_blocks_y]    (> 0)
[16 bytes per block]    standard packed BC7 blocks, raster order

The total payload length must be exactly 3 + num_blocks_x*num_blocks_y*16 bytes. Tiny mips store no exact texel dimensions and use no prediction, DCT, or compression — the decoder's logical size is block-aligned (num_blocks_x*4 by num_blocks_y*4).

The Blob-Stream Container (0xB7)

A regular slice is a simple ("KISS") tagged-blob container: a directory of up to 128 independent byte streams, identified by a 7-bit blob ID. Serialized layout (all sizes are LEB128 varints — 7 bits per byte, high bit = continue, max 5 bytes for a uint32):

[uint8  0xB7]              begin marker
[uint8  num_blobs]         only non-empty blobs are stored
repeated num_blobs times:
    [uint8 id_and_flag]    low 7 bits = blob id (< 128); high bit: 1 = Zstd-compressed, 0 = stored raw
    if raw:        [varint size]                                  // > 0; 'size' bytes follow
    if compressed: [varint uncompressed_size][varint stored_size] // both > 0, stored_size < uncompressed_size
    [blob data]
[uint8  0x6A]              end marker; must land exactly on the final byte

Notes:

  • Raw vs. compressed is decided per blob by the encoder: a blob is Zstd-compressed only if that actually shrinks it. Incompressible streams (e.g. sign bits) are automatically stored raw with no special-case flags.
  • Per-blob overhead is typically 3 bytes raw / 5 bytes compressed; a ~20-blob slice costs roughly 80 bytes of container overhead.
  • The reader rejects structural problems: bad begin/end markers, truncation or trailing garbage (the end marker must be the final byte), duplicate blob IDs, zero sizes, and compressed blobs whose stored size isn't strictly smaller than their uncompressed size.
  • The reader does not reject unknown blob IDs — well-formed blobs with IDs a decoder doesn't recognize are simply never queried. This is the format's forward-compatibility mechanism (see Versioning below).

The Header Blob (ID 0)

Blob 0 is a fixed 7-byte header (#pragma pack(1), little-endian):

struct xbc7_header               // 7 bytes
{
    uint16_t m_width_in_texels;  // exact image width,  1..16384
    uint16_t m_height_in_texels; // exact image height, 1..16384
    uint8_t  m_dct_q;            // global DCT quality, 1..100 (100 = lossless weights)
    uint8_t  m_flags;            // bit 0 = XBC7_FLAG_HAS_ALPHA; all other bits must be 0 (rejected otherwise)
    uint8_t  m_num_stripes;      // encoder stripe count: >= 1, <= num_blocks_y, <= 16
};

Block dimensions are derived: num_blocks_x = (width+3)/4, num_blocks_y = (height+3)/4. The stripe count must be in the header because the solid-block predictor is implicit (see Stripes below).

Blob Catalog

The complete blob ID vocabulary as of v2.50. "Granularity" describes how entries map to blocks; a blob is only present when at least one block consumes it.

ID Name Contents Granularity
0 Header File-level metadata (above) fixed 7 bytes
1 Commands One command byte per block, raster order — drives all other streams 1 byte/block, exactly total_blocks bytes
2 BC7BlockConfig New-config bytes: BC7 mode (bits 0-2), mode 4/5 rotation (bits 3-4), mode 4 index selector (bit 5); bits 6-7 reserved-zero (rejected if set) 1 byte per new-config command
3 Partition2 2-subset partition indices (modes 1, 3, 7; 64 patterns) 1 byte per 2-subset block
4 Partition3 3-subset partition indices (modes 0, 2; mode 0 index must be < 16) 1 byte per 3-subset block
5 WeightPredictors Joint (predictor candidate, amplitude code) byte for DCT-weight blocks; values ≥ 200 rejected 1 byte per DCT block
6 DCCoeffsSmall DC coefficients, 2/3-bit weight modes (magnitude lattice-coded; sign conditional). Note: the v2.50 encoder routes all DC values through this stream. 1 per coded plane
7 DCCoeffsLarge DC coefficients, 4-bit weight modes (see note on ID 6) 1 per coded plane
8 ACCoeffs AC DCT coefficients, run-length coded in zig-zag order variable, per DCT block
9 CoeffSigns Raw sign bits for AC (and conditional DC) coefficients bit-packed
10 PBits Raw endpoint p-bits (per the mode's p-bit shape) bit-packed
11-14 EPDeltaFine R/G/B/A Endpoint DPCM residuals, fine (≥ 6-bit) precision, one stream per channel 1 byte per delta
15-18 EPDeltaCoarse R/G/B/A Endpoint DPCM residuals, coarse (< 6-bit) precision 1 byte per delta
19 EPRaw Raw endpoints (the endpoint "raw escape" path) bit-packed
20 EPBlockIndex Endpoint indexed-DPCM block references: 5-bit index into the causal offset table (top 3 bits reserved-zero) 1 byte per reference
21 RawWeightBits Quantized weight indices for DPCM-weight blocks using the absolute predictor, byte-packed by bit width byte-packed, per plane
22 SolidRGBADeltas Solid-block color residuals vs. the implicit neighbor-edge prediction: 4 interleaved wrapped bytes (G, R−G, B−G, A) in 8-bit pixel space 4 bytes per solid block
23-25 DPCMWeightResid 2/3/4 Wrapped n-bit weight-index residuals for DPCM-weight blocks with a real predictor, split by bit width byte-packed, per plane
26 StripeSeekTable Per-stripe stream offsets (see Stripes); present only when num_stripes > 1 num_stripes × 25 × 4 bytes
27-127 reserved Future streams (e.g. P-frame motion, temporal references, optional tables)

Three blobs (9, 10, 19) are bit streams rather than byte streams — their stripe seek offsets are bit offsets. IDs ≥ 128 are structurally impossible (bit 7 of the directory entry byte is the compression flag).

How the Streams Fit Together

This section sketches the coding model conceptually; the decoder source is the normative reference.

The command stream (blob 1)

Blob 1 holds exactly one command byte per 4x4 block, in raster order; its size must equal the image's total block count. Each byte packs three fields:

Command byte:
  bits 0-2 : command       (0-7, table below)
  bits 3-5 : endpoint mode (0-7; full-block commands only)
  bit  6   : weight mode   (0 = DPCM/raw, 1 = DCT; full-block commands only)
  bit  7   : reserved      (P-frame flag, reserved -- must be 0, rejected otherwise)

The eight commands:

Value Command Meaning
0 RepeatLast Copy the left neighbor's entire block. Consumes nothing else.
1 RepeatUpper Copy the upper neighbor's entire block. Consumes nothing else.
2 SolidDPCM A solid-color block (below). Consumes blob 22 only.
3 NewConfig Read a fresh BC7 config byte (blob 2), then partitions/endpoints/weights.
4-7 ReuseConfigLeft / Upper / LeftDiagonal / RightDiagonal Inherit the BC7 config from the named causal neighbor, then read this block's own partition/endpoints/weights.

For the three simple commands (0-2) the whole byte is the command: their endpoint/weight-mode bits must be zero, and the decoder rejects non-canonical encodings. The full-block commands (3-7) additionally interpret the endpoint-mode and weight-mode fields:

Endpoint modes (bits 3-5):

Value Mode Meaning
0 Raw Endpoint values and p-bits read verbatim from the raw endpoint streams (blobs 19/10).
1 DPCM Left Predict the endpoints from the left neighbor; code only the residuals (blobs 11-18).
2 DPCM Up Predict from the upper neighbor.
3 DPCM Left-diagonal Predict from the upper-left neighbor.
4 DPCM Right-diagonal Predict from the upper-right neighbor.
5 DPCM Block-index Predict from an arbitrary causal block selected by a 5-bit index into the shared offset table (blob 20).
6 DPCM Left, subset 1 Predict from the left neighbor's second subset — useful when a partitioned neighbor's other half matches better. Rejected if the referenced block has fewer than 2 subsets.
7 DPCM Up, subset 1 Same, from the upper neighbor's second subset.

Weight mode (bit 6): 0 = the block's weight plane is coded losslessly (raw or DPCM residuals — blobs 21, 23-25); 1 = it is DCT-coded (blobs 5-9).

The command byte therefore determines which of the other blobs the block consumes — repeat and solid blocks short-circuit nearly everything.

Configs and partitions

A config is a block's structural identity: BC7 mode (all 8 modes are supported), the component rotation for dual-plane modes 4/5, and mode 4's index selector. Configs are either sent fresh (blob 2) or inherited from a causal neighbor via the reuse commands, so runs of structurally-similar blocks don't repeat the config byte. Partition indices travel in their own streams (blobs 3/4), split by subset count because the two pattern vocabularies are disjoint.

Endpoints — always exact

Endpoints are never transformed or quantized by the codec; they are reconstructed exactly for every block. The command's endpoint mode (see the table above) selects either the raw escape (endpoint values and p-bits read verbatim from blobs 19/10) or DPCM prediction from a causal neighbor, where only the per-channel residuals are coded (blobs 11-18, split into fine/coarse streams by precision).

Weights — predicted, then DPCM (lossless) or DCT (lossy)

The BC7 weight (selector index) plane is coded by first predicting it from already-decoded neighbors, then coding the residual. The predictor is chosen per block from a large candidate set: edge replications and blends, reflections, gradients, diagonal propagation, adaptive predictors (JPEG-LS-style MED, a CALIC-spirit gradient-adaptive blend, a least-squares plane fit), plus 32 generic causal block references at fixed (dx,dy) offsets — or Absolute, meaning no prediction at all.

The residual is then coded one of two ways, per the command's weight mode:

  • DPCM (lossless): wrapped weight-index residuals, byte-packed (blobs 23-25, or blob 21 for the absolute predictor).
  • DCT (lossy): the 4x4 residual plane is transformed with a deterministic fixed-point DCT-II, quantized with a JPEG-style table (scaled adaptively per block from the global quality m_dct_q and the block's endpoint span, with a dead-zone), and coded as DC (blobs 6/7) + zig-zag run-length AC (blob 8) + raw signs (blob 9). The predictor choice for DCT blocks travels in blob 5.

This is the "absolute and residual DCT" in the release notes: a DCT block either transforms the raw weight plane (Absolute predictor — DC is unsigned by construction) or a prediction residual (signed DC).

Losslessness: at m_dct_q = 100 the encoder codes every block's weights with lossless DPCM, so the whole file reproduces the input BC7 blocks bit-exactly. Below 100, DCT-coded blocks are lossy (individual blocks still fall back to lossless DPCM whenever that codes smaller).

Solid blocks

A solid block's color is coded as a DPCM residual (blob 22) against an implicit prediction: the average of the left neighbor's right-edge column and the upper neighbor's bottom-edge row, in decoded 8-bit pixel space. Because the prediction is derived from decoded neighbors rather than signaled, encoder and decoder must agree on exactly which neighbors are visible — which is why the stripe count lives in the header (next section).

Stripes and Parallel Decode

An image's block rows are split as evenly as possible into m_num_stripes (1–16) contiguous horizontal bands. Each stripe's portion of every per-stripe stream (blob IDs 1-25) is independently seekable via the stripe seek table (blob 26, present only when num_stripes > 1): for each stripe it stores the starting offset within each stream — byte offsets for byte streams, bit offsets for the three bit streams — encoded as little-endian 32-bit deltas from the previous stripe (stripe 0's delta is always 0), byte-plane transposed for better compression. The decoder rebuilds absolute offsets with a prefix sum and can then decode all stripes concurrently (the threaded decoder spawns one job per stripe).

Prediction is confined to a stripe: all explicit references (endpoint DPCM, weight predictors, repeats, block references) are causal within the stripe, and the implicit solid-block prediction clamps its upper-neighbor access at stripe seams — mirroring the encoder, which couldn't see across the seam either. More stripes = more parallelism at a small compression cost.

Versioning and Forward Compatibility

  • The first payload byte (0xB7/0xB8/0xB9) is the payload-form/version marker. In .KTX2 files this same byte is duplicated into the seek table's m_profile word (low 8 bits), with a codec variant index (currently 0x01) in bits 8-15 — see the KTX2 doc. The v2.50 transcoder derives everything from the payload itself.
  • Blob IDs 27-127 are reserved for future streams. Because the container reader ignores unqueried IDs, files that add new optional streams remain readable by older decoders that don't know about them.
  • Reserved bits are enforced: the decoder rejects nonzero reserved bits in the header flags, config bytes, endpoint block-index bytes, and command bytes, keeping those bits available for future signaled extensions.

Decoder API

The reference decoder is callback-streaming and owns no image storage. After validating the container and header it fires an init callback (block counts, texel dimensions, DCT quality, alpha flag), then a per-block callback delivering each reconstructed logical BC7 block, which the caller packs into a physical 16-byte BC7 block (or hands to a transcoder). In the threaded path, callbacks may arrive concurrently from different stripes, but always exactly once per block.

Transcoding

XUBC7's native target is BC7 (a trivial pack of the decoded logical block — lossless relative to the stored data). ASTC LDR 4x4 is the other fast path: usually latent-to-latent, with a low-probability real-time fallback for the few BC7 2/3-subset partition patterns that don't map losslessly to ASTC. All other targets decode the BC7 block to texels and re-encode in real time.

Supported transcode targets (gated by BASISD_SUPPORT_XUASTC): ASTC LDR 4x4, BC1, BC3, BC4, BC5, BC7, ETC1, ETC2 RGBA, EAC R11, EAC RG11, PVRTC1 RGB/RGBA (power-of-2 dimensions only), and uncompressed RGBA32 / RGB565 / BGR565 / RGBA4444. Not supported: ATC, FXT1, PVRTC2.

Encoding Options

From the basisu command line tool (-xubc7 selects the codec):

Option Range / default Meaning
-quality [1,100], default lossless (100) Sets the global DCT quality m_dct_q. 100 = lossless BC7 supercompression; lower = smaller and lossier weight coding.
-effort [0,10] Encoder speed vs. compression: controls how exhaustively predictors and coding choices are searched.
-xubc7_rdo_level [0,100], default 0 (off) Rate-distortion optimization strength: enables and scales the block-reuse / alternate-pack / endpoint-DPCM / AC-truncation RDO passes, trading PSNR for smaller files.
-xubc7_num_stripes [1,16], default 8 Number of encode/decode stripes (parallelism vs. a slight size cost).
-xubc7_bc7f / -xubc7_bc7e_scalar default -xubc7_bc7f Selects the BC7 base encoder that produces the blocks XUBC7 supercompresses: bc7f (fast, analytical) or bc7e_scalar (slower, higher quality).
-xubc7_bc7e_scalar_level [0,6], default 2 Quality level for the bc7e_scalar base encoder.

Note these options apply when compressing source images. XUBC7 can also, of course, supercompress BC7 data produced by any encoder.

Clone this wiki locally