Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Literal string endianness is handled incorrectly #149

Closed
cwabbott0 opened this issue Mar 13, 2016 · 6 comments · Fixed by #4622
Closed

Literal string endianness is handled incorrectly #149

cwabbott0 opened this issue Mar 13, 2016 · 6 comments · Fixed by #4622

Comments

@cwabbott0
Copy link

The code currently assumes that strings can be directly read in and written to disk without being swapped, but this isn't true. The SPIR-V spec, in the definition of "literal string" (section 2.2.1), says that "the UTF-8 octets (8-bit bytes) are packed four per word, following the little-endian convention (i.e., the first octet is in the lowest-order 8 bits of the word)." That is, literal strings are defined in terms of words, just like everything else in SPIR-V. While the way that bytes on-disk get transformed to words isn't defined by the spec, it wouldn't make any sense to define it "just so" so that the way SPIR-V Tools is currently parsing the SPIR-V would be correct, since it would make the definition in section 2.2.1 confusing and meaningless, and I can say that that wasn't the intent when the language in section 2.2.1 was added. That is, it's expected that when reading from byte-oriented formats, if the magic word is flipped, then the entire module is flipped. Practically, this also makes loading SPIR-V from disk in games much easier, since they don't need to know about what's a string vs. what isn't -- just check the magic word and flip the bytes if necessary.

@dekimir
Copy link

dekimir commented Mar 14, 2016

While the way that bytes on-disk get transformed to words isn't defined by the spec, it wouldn't make any sense to define it "just so" so that the way SPIR-V Tools is currently parsing the SPIR-V would be correct

Sorry, I'm having difficulty parsing this. Can you describe more specifically this incorrect assumption regarding writing words from memory to disk? And why doesn't it apply to all words, not just strings?

@cwabbott0
Copy link
Author

Sorry, I'm having difficulty parsing this. Can you describe more specifically this incorrect assumption regarding writing words from memory to disk? And why doesn't it apply to all words, not just strings?

Ok, I'll try. Endianness is hard!

Let's say that a big-endian machine is used to create a SPIR-V module. Furthermore, let's say that this module contains the literal string "foo" as an operand somewhere. When creating the stream of 32-bit words that comprise the SPIR-V, it follows the spec. text that I quoted, putting "foo" in a word in little-endian order such that word & 0xFF == 'f', (word >> 8) & 0xFF == 'o', (word >> 16) & 0xFF == 'o', and (word >> 24) & 0xFF == '\0' and then appends the word to the array. When the time comes to write the module to disk, which is a byte-oriented medium, it does the natural and expected thing and reinterprets this array of words as an array of bytes. Since it's a big-endian machine, the highest byte of each word appears first in the stream. That is, if we look at our literal string, it will start with '\0', followed by 'o', etc. -- the string looks like "\0oof" on-disk. That's because the two kinds of endianness in play here -- the endianness which says that to represent a string as a stream of words, you put the first byte in bits 0-7 of the word, and the endianness which says that to represent the stream of words as a stream of bytes, you consider bits 24-31 of the first word to be the first byte -- are different.

Now, let's suppose that a program on a little-endian machine wants to parse this SPIR-V module. It loads the module from disk, reinterprets the array of bytes as an array of 32-bit integers, and since the endianness is swapped compared to the machine that produced the module, it notices that the magic word is swapped compared to what it expects. So it does the natural and expected thing, swapping each of the words until it has an array of 32-bit words that's the same as the original array that the big-endian machine had -- nevermind the fact that the bytes are stored in a different order. Now, when it wants to convert our stream of words to a normal string, it knows that the string is stored in little-endian format, matching the endianness of the machine, so it can simply reinterpret the array of words as an array of bytes and get a C-style string. Note that although the bytes of the string happened to be swapped on-disk, we swapped them when loading the module, so everything is a-ok. If we were on a big-endian machine, then at this point we would've had to swap the bytes to get the string (perhaps a second time, yes, but what can you do...). Note that if this is a game passing the SPIR-V to a Vulkan driver, then the first part is done by the game loading the module from disk, and the second part by the driver consuming the module handed to it.

SPIR-V Tools doesn't work like this, though. Let's see what happens when we take this module that we have and try to disassemble it using spirv-dis on the same little-endian machine. The first bit starts out the same -- it reads the bytes from disk, reinterprets them as an array of words, and notices that the first word is flipped. Here is where things start to diverge, though. Rather than byteswapping all the words immediately after loading them, it swaps words only on-demand. This would be ok, except that it fails to swap literal strings, and only literal strings, at all -- either the first step, of turning the bytes into words, or the second step of turning the words into a string. It just takes the untouched words, reinterpreted from bytes when loading the module, and reinterprets them again as a C-style string. But, if you remember, the bytes happened to be swapped on-disk, and so the resulting string will be scrambled -- spirv-dis will see that it starts with '\0' and think it's an empty string. Oops. I think, but haven't checked, that spirv-as is also getting things backwards, such that a module produced by spirv-as will be read correctly by spirv-dis, but not anything else.

Now, note that in the first two parts, where I described how our hypothetical producer and consumer handled the literal string, when I got to the "words -> disk" and "disk -> words" parts I said that how they handled it is "natural and expected" rather than something like "required by the spec." The SPIR-V specification intentionally doesn't specify how a SPIR-V module should be converted to/from byte-oriented mediums, such as on-disk storage, but it does strongly suggest that the way SPIRV-Tools is doing it is wrong in the case of literal strings. SPIRV-Tools pretty much ignores the word-based nature of SPIR-V when dealing with strings, directly converting the bytes on-disk directly to bytes in a C-style string and vice-versa. You could, entirely post-hoc, derive a way to convert the byte stream on disk to/from a word stream which treats string literals differently from other words so that the way SPIRV-Tools is doing things would be "correct," but it would be ugly and it would make loading modules produced by SPIRV-Tools from disk and feeding them to a driver more complicated. That's because the two parts of parsing a module on-disk, converting the bytes on-disk to 32-bit words and then extracting the desired information from those words, are done separately, the former by the app and the latter by the driver, whereas spirv-dis handles both at once (or, as the case may be, not at all!).

For example, let's say that a game on a big-endian machine wants to load a module produced by spirv-as also on a big-endian machine and hand it to a Vulkan driver. Normally, there wouldn't be any endian conversion required when going from bytes to words, since the on-disk endianness and host endianness match. The driver knows, though, that it has to swap bytes to reconstruct any literal strings from the stream of words handed to it. But since spirv-as just copied literal strings directly to disk with no swapping, the app now now has to byteswap literal strings, and only literal strings, to undo the swapping that the driver is going to do. If the app is on a little-endian machine instead, then the opposite happens -- the app has to swap everything except the strings. This happens because there's an impedance mismatch between SPIR-V Tools, which thinks of literal strings, and only literal strings, in terms of bytes, and the driver, which thinks of everything in terms of words. While the most pedantic person could point out that this is technically allowed by the spec, based on internal conversations a while ago, it's neither desired nor intended.

So, what should be done? There are a few changes that need to be made to SPIRV-Tools:

  • All the stuff in binary.cpp about not byteswapping literal strings needs to be ripped out.
  • When parsing literal strings, we need to do something like:
string[0] = word[0] & 0xFF;
string[1] = (word[0] >> 8) & 0xFF;
string[2] = (word[0] >> 16) & 0xFF;
// ...
  • When outputting literal strings, we need to do the opposite.

Does any of this make sense now? This will break the parsing of modules created by spirv-as on big-endian machines, but hopefully there won't be too many of those out there :) and it's best to fix this now before someone actually gets bitten by this.

@dekimir
Copy link

dekimir commented Mar 14, 2016

Thanks, @cwabbott0! This prompted some internal discussion on the team, and our conclusion is that the tools do, indeed, seem to violate the current letter of the spec. I think the issue can be stated more succinctly by leaving out the memory-to-disk process and simply focusing on byte addresses in memory: if the word containing "foo" begins at, say, byte address 0x1000, then the spec requires that it be stored like this:

       BE   LE
0x1000 '\0' 'f'
0x1001 'o'  'o'
0x1002 'o'  'o'
0x1003 'f'  '\0'

(BE = big endian, LE = little endian)

This is not how the SPIR-V tools (nor glslang, apparently!) currently pack strings, so that's a spec violation.

However, we do find this spec requirement odd, and we filed this spec bug to express our concerns. If it turns out that this is really the spec's intent, we'll fix the SPIR-V tools code and close this issue.

@cwabbott0
Copy link
Author

@dekimir Ok, thanks. I'll wait for johnk to comment on the spec bug, but I believe that the requirement was intentional. I also made KhronosGroup/glslang#202 to track the glslang issue.

@dneto0
Copy link
Collaborator

dneto0 commented Mar 15, 2016

The SPIR working group confirmed that the spec will stay the same, and this indeed is a bug in SPIRV-Tools.
See Khronos public bug https://www.khronos.org/bugzilla/show_bug.cgi?id=1474

@antiagainst antiagainst self-assigned this Mar 16, 2016
@antiagainst antiagainst assigned dneto0 and unassigned antiagainst Mar 29, 2016
@cheery
Copy link

cheery commented Oct 3, 2017

I stumbled upon this as well when testing SPIRV-tools on big-endian output from my SPIRV encoder/decoder.

Also, when passing it to the Vulkan, it seems that the endian in the input has to match with the system's endian.

I suppose it's a rare feature that you have a big-endian platform with Vulkan?

mhillenbrand added a commit to mhillenbrand/SPIRV-Tools that referenced this issue Nov 8, 2021
To get correct and consistent encoding and decoding of string literals
on big-endian platforms, use spvtools::utils::MakeString and MakeVector
(or wrapper functions) consistently for handling string literals.

- add variant of MakeVector that encodes a string literal into an
  existing vector of words
- add variants of MakeString
- add a wrapper spvDecodeLiteralStringOperand in source/
- fix wrapper Operand::AsString to use MakeString (source/opt)
- remove Operand::AsCString as broken and unused
- add a variant of GetOperandAs for string literals (source/val)
... and apply those wrappers throughout the code.

Fixes  KhronosGroup#149
mhillenbrand added a commit to mhillenbrand/SPIRV-Tools that referenced this issue Nov 8, 2021
To get correct and consistent encoding and decoding of string literals
on big-endian platforms, use spvtools::utils::MakeString and MakeVector
(or wrapper functions) consistently for handling string literals.

- add variant of MakeVector that encodes a string literal into an
  existing vector of words
- add variants of MakeString
- add a wrapper spvDecodeLiteralStringOperand in source/
- fix wrapper Operand::AsString to use MakeString (source/opt)
- remove Operand::AsCString as broken and unused
- add a variant of GetOperandAs for string literals (source/val)
... and apply those wrappers throughout the code.

Fixes  KhronosGroup#149
mhillenbrand added a commit to mhillenbrand/SPIRV-Tools that referenced this issue Nov 10, 2021
To get correct and consistent encoding and decoding of string literals
on big-endian platforms, use spvtools::utils::MakeString and MakeVector
(or wrapper functions) consistently for handling string literals.

- add variant of MakeVector that encodes a string literal into an
  existing vector of words
- add variants of MakeString
- add a wrapper spvDecodeLiteralStringOperand in source/
- fix wrapper Operand::AsString to use MakeString (source/opt)
- remove Operand::AsCString as broken and unused
- add a variant of GetOperandAs for string literals (source/val)
... and apply those wrappers throughout the code.

Fixes  KhronosGroup#149
mhillenbrand added a commit to mhillenbrand/SPIRV-Tools that referenced this issue Nov 10, 2021
To get correct and consistent encoding and decoding of string literals
on big-endian platforms, use spvtools::utils::MakeString and MakeVector
(or wrapper functions) consistently for handling string literals.

- add variant of MakeVector that encodes a string literal into an
  existing vector of words
- add variants of MakeString
- add a wrapper spvDecodeLiteralStringOperand in source/
- fix wrapper Operand::AsString to use MakeString (source/opt)
- remove Operand::AsCString as broken and unused
- add a variant of GetOperandAs for string literals (source/val)
... and apply those wrappers throughout the code.

Fixes  KhronosGroup#149
mhillenbrand added a commit to mhillenbrand/SPIRV-Tools that referenced this issue Nov 15, 2021
To get correct and consistent encoding and decoding of string literals
on big-endian platforms, use spvtools::utils::MakeString and MakeVector
(or wrapper functions) consistently for handling string literals.

- add variant of MakeVector that encodes a string literal into an
  existing vector of words
- add variants of MakeString
- add a wrapper spvDecodeLiteralStringOperand in source/
- fix wrapper Operand::AsString to use MakeString (source/opt)
- remove Operand::AsCString as broken and unused
- add a variant of GetOperandAs for string literals (source/val)
... and apply those wrappers throughout the code.

Fixes  KhronosGroup#149
mhillenbrand added a commit to mhillenbrand/SPIRV-Tools that referenced this issue Dec 7, 2021
To get correct and consistent encoding and decoding of string literals
on big-endian platforms, use spvtools::utils::MakeString and MakeVector
(or wrapper functions) consistently for handling string literals.

- add variant of MakeVector that encodes a string literal into an
  existing vector of words
- add variants of MakeString
- add a wrapper spvDecodeLiteralStringOperand in source/
- fix wrapper Operand::AsString to use MakeString (source/opt)
- remove Operand::AsCString as broken and unused
- add a variant of GetOperandAs for string literals (source/val)
... and apply those wrappers throughout the code.

Fixes  KhronosGroup#149
mhillenbrand added a commit to mhillenbrand/SPIRV-Tools that referenced this issue Dec 7, 2021
To get correct and consistent encoding and decoding of string literals
on big-endian platforms, use spvtools::utils::MakeString and MakeVector
(or wrapper functions) consistently for handling string literals.

- add variant of MakeVector that encodes a string literal into an
  existing vector of words
- add variants of MakeString
- add a wrapper spvDecodeLiteralStringOperand in source/
- fix wrapper Operand::AsString to use MakeString (source/opt)
- remove Operand::AsCString as broken and unused
- add a variant of GetOperandAs for string literals (source/val)
... and apply those wrappers throughout the code.

Fixes  KhronosGroup#149
mhillenbrand added a commit to mhillenbrand/SPIRV-Tools that referenced this issue Dec 7, 2021
To get correct and consistent encoding and decoding of string literals
on big-endian platforms, use spvtools::utils::MakeString and MakeVector
(or wrapper functions) consistently for handling string literals.

- add variant of MakeVector that encodes a string literal into an
  existing vector of words
- add variants of MakeString
- add a wrapper spvDecodeLiteralStringOperand in source/
- fix wrapper Operand::AsString to use MakeString (source/opt)
- remove Operand::AsCString as broken and unused
- add a variant of GetOperandAs for string literals (source/val)
... and apply those wrappers throughout the code.

Fixes  KhronosGroup#149
s-perron pushed a commit that referenced this issue Dec 8, 2021
* Fix endianness of string literals

To get correct and consistent encoding and decoding of string literals
on big-endian platforms, use spvtools::utils::MakeString and MakeVector
(or wrapper functions) consistently for handling string literals.

- add variant of MakeVector that encodes a string literal into an
  existing vector of words
- add variants of MakeString
- add a wrapper spvDecodeLiteralStringOperand in source/
- fix wrapper Operand::AsString to use MakeString (source/opt)
- remove Operand::AsCString as broken and unused
- add a variant of GetOperandAs for string literals (source/val)
... and apply those wrappers throughout the code.

Fixes  #149

* Extend round trip test for StringLiterals to flip word order

In the encoding/decoding roundtrip tests for string literals, include
a case that flips byte order in words after encoding and then checks for
successful decoding. That is, on a little-endian host flip to big-endian
byte order and then decode, and vice versa.

* BinaryParseTest.InstructionWithStringOperand: also flip byte order

Test binary parsing of string operands both with the host's and with the
reversed byte order.
rjodinchr pushed a commit to rjodinchr/SPIRV-Tools that referenced this issue Jun 22, 2022
Add SPV_KHR_ray_{tracing,query} to headers
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants