Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
SPIR-V module binary size / compression #382
We've heard reports that SPIR-V modules are larger than those compiled to other representations.
First, SPIR-V binary encoding is extremely regular and is designed to be very simple to handle. It has lots of redundancy. For example, the SPIRV-Tools binary parser is simple and nearly stateless.
To make smaller binaries, we need to make tools smarter: Emit less redundant info in the first place, make tools to eliminate redundancy (but still produce valid SPIR-V binaries), and make semantically lossless compression and decompression.
This issue is a brain dump of a few ideas along these lines. (Keep in mind that the SPIRV-Tools must remain unencumbered, including a possible relicensing under the Apache 2 license.)
Random ideas include those that leave the result as valid SPIR-V binary:
Generic compression ideas:
Low level encoding ideas (stateless):
Anyway, this is just a start of what we could do.
Good stuff! In my mind, there's several goals, and several approaches here:
Given a SPIR-V program:
I'm playing around with various approaches for the last item -- the "varint" encoding you mentioned, also gonna try delta-encoding the IDs (in almost all the programs I have here, majority of IDs are 1-2 away from IDs used in previous instruction).
I have also seen some fairly unexpected stats from the shaders I have, e.g. OpVectorShuffle takes up a lot total space - it's a very verbose encoding (very often 9 words), in most cases just representing a swizzle or a scalar splat. Have to dig in more to find whether there are similar patterns that could be either encoded more compactly, or encoded in a way that's more compressible.
With shader variants, you'd probably find that the same basic blocks are found over and over, with maybe just shifted IDs. Could be interesting to have an archive format where SPIR-V files would do something like OpLabelLink $hash $id-shift, and basic blocks could be inlined in runtime before passing to driver.
Experimenting with varint encoding + some delta encoding. Results look quite promising. Messy code on github (https://github.com/aras-p/smol-v -- Win/Mac builds)
But, testing on 113 shaders I have right now (caveat emptor: all produced by HLSL -> d3dcompiler -> DX11 bytecode -> HLSLcc -> GLSL -> glslang, so they might have some patterns that aren't "common" elsewhere):
There's a lot of more instructions I could be encoding (so far just looked at the ones taking up most space), and perhaps other tricks could be done. The shaders I have do have debug names on them, I am not stripping them out.
(edit: updated with August 28 results)
@aras-p Very promising results! Thanks for sharing.
I also agree that we have to be mindful of using a reasonable tuning set of shaders. Here's a good project for someone: make a public repository of example shaders and define meaningful tuning sets over them.
Other encoding ideas:
(This reminds of life writing assembly for the 6502.)
Good stuff, thanks.
I also want to put in a plug for another dimension to generate less SPIR-V to begin with.
There are two big ones SPIR-V is designed for:
For the first, if multiple GLSL shaders are being generated with different values of constants (e.g., fixed sets of elements to process or bool turning features on/off), it is possible to instead make one GLSL shader with specialization constants and wait until app run time to provide the actual constant values needed:
Single shader with specialization:
This turns multiple shaders into a single shader, long before compression even comes into play.
For the second point, @dneto0 already touched on it with:
It's possible that enough ID remapping and cross-file compression would recognize the commonality (would be good to find out how much that is happening), but if not two other approaches would help:
As a place to look for inspiration the WebAssembly group may have some relevant ideas. They've heavily iterated on efficient instruction encodings that are easy to parse and compress well. They ended up with LEB128 varint encoding for most things, as well as some ways of reducing long instruction encoding (like the 9-word swizzle mentioned above). Some docs here, but if anyone's interested they could ping the group and chat - I'm sure they'd be willing to share what they learned along the way :)
This is an important design constraint to be aware of. The question is whether anything not "off the shelf" is needed on the target (end user) system. Applies to both decompression and denormalization.
Also key is whether multiple files are seen just at compression time, or earlier at normalization/remapping time.
The fuller taxonomy is more like:
All combinations make sense. The remapper was indeed intentionally targeting the combination of
These are all constraints, and certainly lifting any of them would enable a tool to perform better.
So, I'm curious to what extent gains were seen by lifting the constraints and to what extent by finding more ways of doing better normalization.
I wrote up what I did so far here: http://aras-p.info/blog/2016/09/01/SPIR-V-Compression/
And indeed, the combination I chose is somewhat different from the remapper. I did this:
Now, my "normalization/denormalization" step also makes it smaller, so you could view it as some sort of compression too. But it's not a dictionary/entropy compression, so you can still compress it afterwards with regular off-the shelf compressors.