Skip to content

Shader and In Loop Deblocking

Richard Geldreich edited this page Jul 3, 2026 · 19 revisions

Shader and In-Loop Deblocking

Copyright (C) 2025-2026 Binomial LLC. All rights reserved except as granted under the Apache 2.0 LICENSE. Also see our NOTICE file. If you modify the Basis Universal source code, specifications, or wiki documents and redistribute the files, you must cause any modified files to carry prominent notices stating that you changed the files (see Apache 2.0 §4(b)).

Intro

Block boundaries are predictable.

Large ASTC/XUASTC block sizes (8x8 through 12x12) reach very low bitrates — down to ~0.89 bpp — but block-boundary seams become increasingly visible beyond roughly 6x6. Basis Universal addresses this with a simple, standardized deblocking reconstruction operator that smooths block-boundary seams, making the largest block sizes (and therefore the lowest bitrates and lowest VRAM consumption) practical.

This is an in-loop filter, not a post-process. When deblocking is enabled at compression time, the encoder factors this exact reconstruction operator into its rate/quality decisions (using Stochastic Coordinate Descent to globally optimize the output blocks with the deblocking filter in the loop). The filter is therefore the decoder half of the codec: content compressed with deblocking awareness should be sampled with the deblocking filter applied, and the KTX2 file records this expectation (see DeblockFilterID below).

The Reconstruction Operator

Per sample, the operator:

  1. Computes the sample's offset within its block. Interior texels get weight 0 and are left completely untouched.
  2. Computes an edge-proximity weight that ramps up near each block boundary, separately for horizontal and vertical seams. The falloff radius is a tolerance, not a codec constant: 1.0 texel suits flat 2D viewers, 1.5 is more stable under minification and oblique viewing angles in 3D.
  3. Samples the 4 axis-neighbors (5 taps total including the center), runs a 3-tap low-pass across each axis, and blends from the original sample toward the filtered one by the edge weight, with corner normalization where the horizontal and vertical filters overlap.

Far from a seam = the original pixel, bit-exact; on a seam = maximum smoothing; in between = a smooth lerp. The operator is temporally stable and compatible with mipmapping, bilinear/trilinear filtering, and anisotropic filtering.

Mipmap awareness (effective mip space)

The block lattice is evaluated in effective mip space, not base-texture space: the shader recovers the mip level the hardware is actually sampling from the screen-space UV derivatives (dFdx/dFdy / ddx/ddy), snaps to the dominant mip, and evaluates the block grid and tap spacing in that mip's texel space. This is what makes the filter track the real, on-screen block seams at any zoom level or viewing angle, instead of only working at mip 0.

Two Ways to Deblock

1. On the CPU, during transcoding (adaptive deblocking)

When transcoding ASTC/XUASTC LDR to other LDR formats (BC7, ETC1, raw pixels, etc.), the transcoder applies the deblocking filter on the CPU automatically for large block sizes. Each mipmap level is decompressed to memory, deblocked, then packed to the output format with our real-time encoders (bc7f, etc1f, etc.).

  • Default: enabled when the source block area is 80 texels or larger — i.e. 10x8, 10x10, 12x10, and 12x12 (BASISU_DEBLOCKING_BLOCK_SIZE_THRESHOLD in basisu_transcoder.h); smaller block sizes are not deblocked unless forced.
  • Memory note: adaptive deblocking temporarily decompresses a whole mip level; without it, the ASTC/XUASTC pipeline only ever decodes the minimum block rows needed. Disable deblocking if this temporary memory matters.
  • CPU deblocking is used when transcoding to a format other than ASTC (an ASTC→ASTC transcode passes the blocks through).

2. On the GPU, at sample time (shader deblocking)

A ~95-line pixel shader applies the same operator while sampling: 1 tap normally, 1+4 taps near block edges. Virtually any graphics programmer who understands shaders can integrate it into an engine. This is the preferred deployment for interactive rendering — no transcode-time memory or time cost, and the filter runs in whatever space the GPU is actually sampling.

Reference implementations, all using the same operator:

Sample API Notes
shader_deblocking (Python) Python + OpenGL The original testbed; also supports raw PNG input for experimentation
shader_deblocking_glfw C++ / OpenGL 3.3 Native port; bin/shader.glsl is the reference GLSL
webgl/shader_deblocking WebGL 2 Live demo here
shader_deblocking_d3d11 C++ / Direct3D 11 bin/deblock.hlsl has the operator as a reusable DeblockSample() HLSL function

To integrate: lift the fragment shader from shader.glsl (GLSL) or the DeblockSample() function from deblock.hlsl (HLSL) verbatim. The shader needs four inputs: the texture's base-mip dimensions, the source block size in texels, maxLod (= mip levels − 1), and the falloff tolerance.

The Two Rules Integrators Must Follow

1. The block size is a property of the encoded SOURCE, not the GPU format. Transcoding changes the container, not the artifact lattice: an 8x8 XUASTC source transcoded to BC7 (4x4 storage) — or even to uncompressed RGBA — still carries seams on the 8x8 source grid, and the filter must run at 8x8. Always pass the block size from the KTX2 header (ktx2_transcoder::get_block_width()/get_block_height()), never the transcode target's block size. Getting this wrong means the filtering won't align with the actual block artifacts.

2. Never deblock twice. Exactly one of the two deployment modes should run. If the GPU shader performs the deblocking, pass cDecodeFlagsNoDeblockFiltering to the transcoder so the CPU doesn't also filter — running both double-filters the image.

The DeblockFilterID KTX2 Metadata

Basis Universal writes a deblocking filter ID key/value into the .KTX2 file, readable via ktx2_transcoder::get_deblocking_filter_index(). When it is 1, the texture was compressed with deblocking-aware (in-loop) encoding and viewers should enable the deblocking filter by default (the user can still toggle it). All four samples honor this contract: they auto-enable the GPU shader when the ID is 1, and pass the no-deblock flag at transcode time.

Controlling Deblocking

Transcoder decode flags

  • cDecodeFlagsNoDeblockFiltering — disable all adaptive deblocking (faster, less temporary memory; required when a GPU shader deblocks instead).
  • cDecodeFlagsForceDeblockFiltering — always deblock, even on small/medium block sizes (by default only 10x8 and larger are deblocked).
  • cDecodeFlagsStrongerDeblockFiltering — stronger deblocking filter coefficients (only used when deblocking is enabled).

Command line (basisu tool)

Transcode-time (unpack/validate): -transcode_no_deblocking (alias -transcode_disable_deblocking) and -transcode_force_deblocking.

Encode-time, XUASTC LDR only — controls deblocking awareness (SCD) and/or the filter applied during compression (default -xuastc_deblocking_largest; each flag also has a -xuastc_ldr_* alias):

  • -xuastc_no_deblocking — disable both SCD and filtering.
  • -xuastc_deblocking_largest — SCD + filtering only on the largest block sizes (default).
  • -xuastc_deblocking_all — SCD + filtering on all block sizes.
  • -xuastc_deblocking_scd_no_filtering — SCD only, no filtering.
  • -xuastc_deblocking_no_scd_filtering_largest / _all — filtering without SCD.
  • -xuastc_deblocking_num_passes X — number of deblocking filter passes.

A useful recipe

For XUASTC LDR targeting a non-ASTC format such as BC7, very low Weight Grid DCT quality factors (1-15) become usable if you force adaptive deblocking on all block sizes and enable stronger deblocking. Deblocking is what permits the very lowest bitrates — or the largest block sizes — to become practical.

Related Pages

Clone this wiki locally