-
Notifications
You must be signed in to change notification settings - Fork 346
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rename compiler-generated constants non-sequentially #1570
Conversation
The patch LGTM, and fixes our test scene's ptx differences. |
In the course of our runtime optimization of shader networks, we generate lots of new constant symbols (for example, as we optimize an "a = b + c" into "a = newconst" if b and c can be determined to be constants). The new constants are named with a numerical suffix doled out in a sequental manner. I've always thought that maybe it would be helpful for me when debugging optimizer output if the name of the symbol also revealed the value of the constant it held (when possible). Never did anything about it. Now it turns out that this has an application for JIT to PTX, where the PTX cache can save compilation-to-PTX time if it encounters LLVM IR that is exactly the same as it has seen before and is cache. If the constants are generated in a different order, it generates non-functional, purely text differences in the code we generate, and this means the PTX cache doesn't function as efficiently as possible. So this change alters the compiler-generated constant naming for two cases: int constants, and string constants. The int constants append the value of the integer (using '_' instead of '-' for negatives), and the string constants append the hexidecimal value of the ustring hash for that string. In both cases it does this without using or incrementing the `m_next_newconst` counter that it uses to sequentially name the other types that can't easily encode the value to generate a unique name. It's really to help the PTX cache, but as a side benefit, it will be slightly easier for me to make sense of the optimized code when I need to read it with my own eyes while debugging. Signed-off-by: Larry Gritz <lg@larrygritz.com>
3378227
to
3e91535
Compare
Naive question: Does it work if the same constant value appears twice in the same function? For deterministic PTX output: Do we know why the sequence number sometimes changed? |
Yes, the compiler-generated constants are unique for a given value. |
Aha, yes, the nondeterminism was an interesting comedy act that took days to fully unravel. I'll try to explain. It requires some background info on a bunch of seemingly unrelated issues, then they all fall together.
Part of the joy of debugging this and the reason it wasn't caught by any unit tests is that it required all of:
So with this patch, runtime optimizer-generated constants for strings and ints (only) will have 100% deterministic names that incorporate their actual value (or hash, for the strings) instead of using the incrementing unique name counter. The error messages are strings, so whether they are generated or not (as long as they are eventually optimized away) will no longer change the order-dependent naming of any other symbols that are generated during optimization. Incidentally, all these things happened on the CPU as well. But since the error message eventually gets elided, in neither case was the execution of the shader nondeterministic, nor were any error messages issued incorrectly or skipped. The only difference was the TIMING of how long it took to get the PTX to the card, because in some cases but not others, we'd randomly get a PTX cache miss. Fun! |
Programming is hard 🙃 Nice debugging work! It seems like the trickiest aspect of this was the 100 max error messages deep inside OIIO. Maybe it would be better for the library to always report errors and for the host application to decide how to throttle error messages? (or fail fast when they matter) Having a function call that can randomly be an error or not given identical input depending on what else was going on in the system seems like it could lead to other confusing behavior. |
Maybe this is a "grass is always greener" attitude, but I strongly suspect that potato farmers never have to worry about nonsense like this. |
By the way, it was @chellmuth who did most of the work narrowing down the problem and understanding that the PTX cache misses were related to particular gettextureinfo calls in the shader optimizing differently from run to run when multithreaded. |
In my experience with productions, you want to deal with these error messages (or rather, not deal) as early as possible. Maybe I'm a bit cynical , but it seems texture error messages have a way of showing up on things people can't spend as much time tweaking but are used everywhere, like the leaf that gets pulled into all the background trees in a shot. That can lead to spewage that The grass is only greener once someone's spread out the manure -- so programming or potato farming, somebody's gotta shovel it. At any rate, nice sluething all around, and this demonstrates the value of something like ASWF DPEL to provide cases for testing "at scale." |
To date, we have tracked down three (I think) independent sources of the PTX cache having misses for what we thought ought to have been the same shader code. We're pretty sure this is the last known one. |
Indeed. This kind of bizarre corner case is in some sense totally not a surprise for a real set of production shaders on a real scene. It's probably not even in the top 10 most unusual things the renderer will encounter this week. I make jokes about the potatoes, but it is intensely satisfying to build software that is robust enough to handle everything that a production will do in anger. The scale of the shaders this thing is fed constantly amazes me. The software is really solid. For each legit OSL bug, we probably investigate 10 cases that turn out to simply be shading networks that are so complicated that even their authors can't reason about their behavior correctly. |
I think there is a legit point that the error message suppression that is crucial to happen as far upstream as possible for a shader call when it's executed a billion times, maybe should not happen in the special case of doing the constant folding in the optimizer. One way to solve that is for the ImageCache::get_image_info call to have a parameter that allows the option of issuing error messages unconditionally. But that's an API change and given how rare these circumstances are (and now effectively worked around for the only case we've found in practice), I'm thinking about just tabling this until next time we're in a position to want to introduce API changes. |
…Foundation#1570) In the course of our runtime optimization of shader networks, we generate lots of new constant symbols (for example, as we optimize an "a = b + c" into "a = newconst" if b and c can be determined to be constants). The new constants are named with a numerical suffix doled out in a sequental manner. I've always thought that maybe it would be helpful for me when debugging optimizer output if the name of the symbol also revealed the value of the constant it held (when possible). Never did anything about it. Now it turns out that this has an application for JIT to PTX, where the PTX cache can save compilation-to-PTX time if it encounters LLVM IR that is exactly the same as it has seen before and is cache. If the constants are generated in a different order, it generates non-functional, purely text differences in the code we generate, and this means the PTX cache doesn't function as efficiently as possible. So this change alters the compiler-generated constant naming for two cases: int constants, and string constants. The int constants append the value of the integer (using '_' instead of '-' for negatives), and the string constants append the hexidecimal value of the ustring hash for that string. In both cases it does this without using or incrementing the `m_next_newconst` counter that it uses to sequentially name the other types that can't easily encode the value to generate a unique name. It's really to help the PTX cache, but as a side benefit, it will be slightly easier for me to make sense of the optimized code when I need to read it with my own eyes while debugging. Signed-off-by: Larry Gritz <lg@larrygritz.com>
In the course of our runtime optimization of shader networks, we
generate lots of new constant symbols (for example, as we optimize an
"a = b + c" into "a = newconst" if b and c can be determined to be
constants). The new constants are named with a numerical suffix doled
out in a sequental manner.
I've always thought that maybe it would be helpful for me when
debugging optimizer output if the name of the symbol also revealed the
value of the constant it held (when possible). Never did anything
about it.
Now it turns out that this has an application for JIT to PTX, where
the PTX cache can save compilation-to-PTX time if it encounters LLVM
IR that is exactly the same as it has seen before and is cache. If
the constants are generated in a different order, it generates
non-functional, purely text differences in the code we generate, and
this means the PTX cache doesn't function as efficiently as possible.
So this change alters the compiler-generated constant naming for two
cases: int constants, and string constants. The int constants append
the value of the integer (using '_' instead of '-' for negatives), and
the string constants append the hexidecimal value of the ustring hash
for that string. In both cases it does this without using or
incrementing the
m_next_newconst
counter that it uses tosequentially name the other types that can't easily encode the value
to generate a unique name.
It's really to help the PTX cache, but as a side benefit, it will
be slightly easier for me to make sense of the optimized code when
I need to read it with my own eyes while debugging.
Signed-off-by: Larry Gritz lg@larrygritz.com