-
Notifications
You must be signed in to change notification settings - Fork 542
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Literal string endianness is handled incorrectly #149
Comments
Sorry, I'm having difficulty parsing this. Can you describe more specifically this incorrect assumption regarding writing words from memory to disk? And why doesn't it apply to all words, not just strings? |
Ok, I'll try. Endianness is hard! Let's say that a big-endian machine is used to create a SPIR-V module. Furthermore, let's say that this module contains the literal string "foo" as an operand somewhere. When creating the stream of 32-bit words that comprise the SPIR-V, it follows the spec. text that I quoted, putting "foo" in a word in little-endian order such that Now, let's suppose that a program on a little-endian machine wants to parse this SPIR-V module. It loads the module from disk, reinterprets the array of bytes as an array of 32-bit integers, and since the endianness is swapped compared to the machine that produced the module, it notices that the magic word is swapped compared to what it expects. So it does the natural and expected thing, swapping each of the words until it has an array of 32-bit words that's the same as the original array that the big-endian machine had -- nevermind the fact that the bytes are stored in a different order. Now, when it wants to convert our stream of words to a normal string, it knows that the string is stored in little-endian format, matching the endianness of the machine, so it can simply reinterpret the array of words as an array of bytes and get a C-style string. Note that although the bytes of the string happened to be swapped on-disk, we swapped them when loading the module, so everything is a-ok. If we were on a big-endian machine, then at this point we would've had to swap the bytes to get the string (perhaps a second time, yes, but what can you do...). Note that if this is a game passing the SPIR-V to a Vulkan driver, then the first part is done by the game loading the module from disk, and the second part by the driver consuming the module handed to it. SPIR-V Tools doesn't work like this, though. Let's see what happens when we take this module that we have and try to disassemble it using Now, note that in the first two parts, where I described how our hypothetical producer and consumer handled the literal string, when I got to the "words -> disk" and "disk -> words" parts I said that how they handled it is "natural and expected" rather than something like "required by the spec." The SPIR-V specification intentionally doesn't specify how a SPIR-V module should be converted to/from byte-oriented mediums, such as on-disk storage, but it does strongly suggest that the way SPIRV-Tools is doing it is wrong in the case of literal strings. SPIRV-Tools pretty much ignores the word-based nature of SPIR-V when dealing with strings, directly converting the bytes on-disk directly to bytes in a C-style string and vice-versa. You could, entirely post-hoc, derive a way to convert the byte stream on disk to/from a word stream which treats string literals differently from other words so that the way SPIRV-Tools is doing things would be "correct," but it would be ugly and it would make loading modules produced by SPIRV-Tools from disk and feeding them to a driver more complicated. That's because the two parts of parsing a module on-disk, converting the bytes on-disk to 32-bit words and then extracting the desired information from those words, are done separately, the former by the app and the latter by the driver, whereas For example, let's say that a game on a big-endian machine wants to load a module produced by So, what should be done? There are a few changes that need to be made to SPIRV-Tools:
Does any of this make sense now? This will break the parsing of modules created by |
Thanks, @cwabbott0! This prompted some internal discussion on the team, and our conclusion is that the tools do, indeed, seem to violate the current letter of the spec. I think the issue can be stated more succinctly by leaving out the memory-to-disk process and simply focusing on byte addresses in memory: if the word containing "foo" begins at, say, byte address 0x1000, then the spec requires that it be stored like this:
(BE = big endian, LE = little endian) This is not how the SPIR-V tools (nor glslang, apparently!) currently pack strings, so that's a spec violation. However, we do find this spec requirement odd, and we filed this spec bug to express our concerns. If it turns out that this is really the spec's intent, we'll fix the SPIR-V tools code and close this issue. |
@dekimir Ok, thanks. I'll wait for johnk to comment on the spec bug, but I believe that the requirement was intentional. I also made KhronosGroup/glslang#202 to track the glslang issue. |
The SPIR working group confirmed that the spec will stay the same, and this indeed is a bug in SPIRV-Tools. |
I stumbled upon this as well when testing SPIRV-tools on big-endian output from my SPIRV encoder/decoder. Also, when passing it to the Vulkan, it seems that the endian in the input has to match with the system's endian. I suppose it's a rare feature that you have a big-endian platform with Vulkan? |
To get correct and consistent encoding and decoding of string literals on big-endian platforms, use spvtools::utils::MakeString and MakeVector (or wrapper functions) consistently for handling string literals. - add variant of MakeVector that encodes a string literal into an existing vector of words - add variants of MakeString - add a wrapper spvDecodeLiteralStringOperand in source/ - fix wrapper Operand::AsString to use MakeString (source/opt) - remove Operand::AsCString as broken and unused - add a variant of GetOperandAs for string literals (source/val) ... and apply those wrappers throughout the code. Fixes KhronosGroup#149
To get correct and consistent encoding and decoding of string literals on big-endian platforms, use spvtools::utils::MakeString and MakeVector (or wrapper functions) consistently for handling string literals. - add variant of MakeVector that encodes a string literal into an existing vector of words - add variants of MakeString - add a wrapper spvDecodeLiteralStringOperand in source/ - fix wrapper Operand::AsString to use MakeString (source/opt) - remove Operand::AsCString as broken and unused - add a variant of GetOperandAs for string literals (source/val) ... and apply those wrappers throughout the code. Fixes KhronosGroup#149
To get correct and consistent encoding and decoding of string literals on big-endian platforms, use spvtools::utils::MakeString and MakeVector (or wrapper functions) consistently for handling string literals. - add variant of MakeVector that encodes a string literal into an existing vector of words - add variants of MakeString - add a wrapper spvDecodeLiteralStringOperand in source/ - fix wrapper Operand::AsString to use MakeString (source/opt) - remove Operand::AsCString as broken and unused - add a variant of GetOperandAs for string literals (source/val) ... and apply those wrappers throughout the code. Fixes KhronosGroup#149
To get correct and consistent encoding and decoding of string literals on big-endian platforms, use spvtools::utils::MakeString and MakeVector (or wrapper functions) consistently for handling string literals. - add variant of MakeVector that encodes a string literal into an existing vector of words - add variants of MakeString - add a wrapper spvDecodeLiteralStringOperand in source/ - fix wrapper Operand::AsString to use MakeString (source/opt) - remove Operand::AsCString as broken and unused - add a variant of GetOperandAs for string literals (source/val) ... and apply those wrappers throughout the code. Fixes KhronosGroup#149
To get correct and consistent encoding and decoding of string literals on big-endian platforms, use spvtools::utils::MakeString and MakeVector (or wrapper functions) consistently for handling string literals. - add variant of MakeVector that encodes a string literal into an existing vector of words - add variants of MakeString - add a wrapper spvDecodeLiteralStringOperand in source/ - fix wrapper Operand::AsString to use MakeString (source/opt) - remove Operand::AsCString as broken and unused - add a variant of GetOperandAs for string literals (source/val) ... and apply those wrappers throughout the code. Fixes KhronosGroup#149
To get correct and consistent encoding and decoding of string literals on big-endian platforms, use spvtools::utils::MakeString and MakeVector (or wrapper functions) consistently for handling string literals. - add variant of MakeVector that encodes a string literal into an existing vector of words - add variants of MakeString - add a wrapper spvDecodeLiteralStringOperand in source/ - fix wrapper Operand::AsString to use MakeString (source/opt) - remove Operand::AsCString as broken and unused - add a variant of GetOperandAs for string literals (source/val) ... and apply those wrappers throughout the code. Fixes KhronosGroup#149
To get correct and consistent encoding and decoding of string literals on big-endian platforms, use spvtools::utils::MakeString and MakeVector (or wrapper functions) consistently for handling string literals. - add variant of MakeVector that encodes a string literal into an existing vector of words - add variants of MakeString - add a wrapper spvDecodeLiteralStringOperand in source/ - fix wrapper Operand::AsString to use MakeString (source/opt) - remove Operand::AsCString as broken and unused - add a variant of GetOperandAs for string literals (source/val) ... and apply those wrappers throughout the code. Fixes KhronosGroup#149
To get correct and consistent encoding and decoding of string literals on big-endian platforms, use spvtools::utils::MakeString and MakeVector (or wrapper functions) consistently for handling string literals. - add variant of MakeVector that encodes a string literal into an existing vector of words - add variants of MakeString - add a wrapper spvDecodeLiteralStringOperand in source/ - fix wrapper Operand::AsString to use MakeString (source/opt) - remove Operand::AsCString as broken and unused - add a variant of GetOperandAs for string literals (source/val) ... and apply those wrappers throughout the code. Fixes KhronosGroup#149
* Fix endianness of string literals To get correct and consistent encoding and decoding of string literals on big-endian platforms, use spvtools::utils::MakeString and MakeVector (or wrapper functions) consistently for handling string literals. - add variant of MakeVector that encodes a string literal into an existing vector of words - add variants of MakeString - add a wrapper spvDecodeLiteralStringOperand in source/ - fix wrapper Operand::AsString to use MakeString (source/opt) - remove Operand::AsCString as broken and unused - add a variant of GetOperandAs for string literals (source/val) ... and apply those wrappers throughout the code. Fixes #149 * Extend round trip test for StringLiterals to flip word order In the encoding/decoding roundtrip tests for string literals, include a case that flips byte order in words after encoding and then checks for successful decoding. That is, on a little-endian host flip to big-endian byte order and then decode, and vice versa. * BinaryParseTest.InstructionWithStringOperand: also flip byte order Test binary parsing of string operands both with the host's and with the reversed byte order.
Add SPV_KHR_ray_{tracing,query} to headers
The code currently assumes that strings can be directly read in and written to disk without being swapped, but this isn't true. The SPIR-V spec, in the definition of "literal string" (section 2.2.1), says that "the UTF-8 octets (8-bit bytes) are packed four per word, following the little-endian convention (i.e., the first octet is in the lowest-order 8 bits of the word)." That is, literal strings are defined in terms of words, just like everything else in SPIR-V. While the way that bytes on-disk get transformed to words isn't defined by the spec, it wouldn't make any sense to define it "just so" so that the way SPIR-V Tools is currently parsing the SPIR-V would be correct, since it would make the definition in section 2.2.1 confusing and meaningless, and I can say that that wasn't the intent when the language in section 2.2.1 was added. That is, it's expected that when reading from byte-oriented formats, if the magic word is flipped, then the entire module is flipped. Practically, this also makes loading SPIR-V from disk in games much easier, since they don't need to know about what's a string vs. what isn't -- just check the magic word and flip the bytes if necessary.
The text was updated successfully, but these errors were encountered: