Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename compiler-generated constants non-sequentially #1570

Merged
merged 1 commit into from
Sep 2, 2022

Conversation

lgritz
Copy link
Collaborator

@lgritz lgritz commented Sep 1, 2022

In the course of our runtime optimization of shader networks, we
generate lots of new constant symbols (for example, as we optimize an
"a = b + c" into "a = newconst" if b and c can be determined to be
constants). The new constants are named with a numerical suffix doled
out in a sequental manner.

I've always thought that maybe it would be helpful for me when
debugging optimizer output if the name of the symbol also revealed the
value of the constant it held (when possible). Never did anything
about it.

Now it turns out that this has an application for JIT to PTX, where
the PTX cache can save compilation-to-PTX time if it encounters LLVM
IR that is exactly the same as it has seen before and is cache. If
the constants are generated in a different order, it generates
non-functional, purely text differences in the code we generate, and
this means the PTX cache doesn't function as efficiently as possible.

So this change alters the compiler-generated constant naming for two
cases: int constants, and string constants. The int constants append
the value of the integer (using '_' instead of '-' for negatives), and
the string constants append the hexidecimal value of the ustring hash
for that string. In both cases it does this without using or
incrementing the m_next_newconst counter that it uses to
sequentially name the other types that can't easily encode the value
to generate a unique name.

It's really to help the PTX cache, but as a side benefit, it will
be slightly easier for me to make sense of the optimized code when
I need to read it with my own eyes while debugging.

Signed-off-by: Larry Gritz lg@larrygritz.com

@chellmuth
Copy link
Collaborator

The patch LGTM, and fixes our test scene's ptx differences.

In the course of our runtime optimization of shader networks, we
generate lots of new constant symbols (for example, as we optimize an
"a = b + c" into "a = newconst" if b and c can be determined to be
constants). The new constants are named with a numerical suffix doled
out in a sequental manner.

I've always thought that maybe it would be helpful for me when
debugging optimizer output if the name of the symbol also revealed the
value of the constant it held (when possible). Never did anything
about it.

Now it turns out that this has an application for JIT to PTX, where
the PTX cache can save compilation-to-PTX time if it encounters LLVM
IR that is exactly the same as it has seen before and is cache.  If
the constants are generated in a different order, it generates
non-functional, purely text differences in the code we generate, and
this means the PTX cache doesn't function as efficiently as possible.

So this change alters the compiler-generated constant naming for two
cases: int constants, and string constants. The int constants append
the value of the integer (using '_' instead of '-' for negatives), and
the string constants append the hexidecimal value of the ustring hash
for that string. In both cases it does this without using or
incrementing the `m_next_newconst` counter that it uses to
sequentially name the other types that can't easily encode the value
to generate a unique name.

It's really to help the PTX cache, but as a side benefit, it will
be slightly easier for me to make sense of the optimized code when
I need to read it with my own eyes while debugging.

Signed-off-by: Larry Gritz <lg@larrygritz.com>
@lgritz lgritz merged commit cd2fa16 into AcademySoftwareFoundation:main Sep 2, 2022
@ee33
Copy link

ee33 commented Sep 2, 2022

Naive question: Does it work if the same constant value appears twice in the same function?
out1 = 2 + 3;
out2 = 1 + 4;

For deterministic PTX output: Do we know why the sequence number sometimes changed?

@lgritz
Copy link
Collaborator Author

lgritz commented Sep 2, 2022

Yes, the compiler-generated constants are unique for a given value.

@lgritz
Copy link
Collaborator Author

lgritz commented Sep 2, 2022

Aha, yes, the nondeterminism was an interesting comedy act that took days to fully unravel.

I'll try to explain. It requires some background info on a bunch of seemingly unrelated issues, then they all fall together.

  1. What happens if a shader references a texture that's missing or broken? You don't want all billion executions of the shader to issue identical (but separate) error messages. So deep in the underlying OIIO ImageCache, there's a limit to how many error messages will be generated for each texture name. That happens to default to 100.

  2. When doing runtime optimization of a function call like gettextureinfo(), if the shader name is known, it will try to constant-fold the call by replacing the gettextureinfo call with just storing the value where it's supposed to go. But what happens if the optimizer's calling gettextureinfo() finds a broken or missing file? You don't want to issue an error THEN (i.e. during optimization), because maybe that shader will never be called (no ray hits it), or maybe the shader is called, but the branch of a conditional containing the lookup to the broken texture is never taken (unreachable code, but you can't tell), or maybe that instruction will be elided by a later optimization (say, because the result is never used).

    Put another way, the fact that at this early stage, a shader appears to want data from a nonexistent texture is not itself an error that should be bubbled up to the renderer. So if, at optimization time, the texture is broken, we replace the gettextureinfo call with a call to error(), so that IF and WHEN the shader executes, it will parrot back the error message that the constant_fold_gettextureinfo got when trying to do the lookup.

  3. How do (1) and (2) above interact if there are a whole lot of shaders in the scene that reference the same bogus texture? Well -- this was a surprise to us -- the first 100 to optimize will see the original error message, then after that, any further broken texture calls will get the empty error message.

  4. When we have a lot of shaders to JIT, they're put in the thread queue, so the order in which they're jitted can vary from run to run.

  5. So for one particular shader, on some runs, it's in the first 100 times gettextureinfo() fails for a particular texture name, but on other runs, it's after the 100th so gets a different error message. That means that on some runs, the constant string that's printed as the error message changes (if that line is ever executed).

  6. As a further humorous element, the clause of the "if" that has the bogus texture is, later in optimization, discovered to never execute, and is elided entirely. So the ordering in which the shaders are JIT does not, in the end, change the code, or the data variables, of the eventual PTX. BUT... but...

  7. That constant string which was at first allocated, then later not needed and skipped by the time we hit PTX, had a NAME that was uniquified by appending a running counter ("$newconst42"). The constant string, as we said, is eliminated. It's not in the PTX. But whatever entirely different optimizer-generated constant in one case became "$newconst43" and is STILL in the PTX, in another case was the 42nd constant needed and so has name "$newconst42". Ha ha! So the PTX is different after all.

Part of the joy of debugging this and the reason it wasn't caught by any unit tests is that it required all of:

  • Use of an idiom in the shader where a bogus texture was named, but in a conditional branch that would either never be executed, or would be optimized away AFTER the gettextureinfo was constant folded.
  • Enough shader nodes optimized that it would end up doing over 100 failed lookups to the same bogus name.
  • Enough materials, and enough threads, that run to run you might end up with the shader groups JITing in a different order, so that it was inconsistent about which shader groups would fall on either side of the 100-error-message boundary.

So with this patch, runtime optimizer-generated constants for strings and ints (only) will have 100% deterministic names that incorporate their actual value (or hash, for the strings) instead of using the incrementing unique name counter. The error messages are strings, so whether they are generated or not (as long as they are eventually optimized away) will no longer change the order-dependent naming of any other symbols that are generated during optimization.

Incidentally, all these things happened on the CPU as well. But since the error message eventually gets elided, in neither case was the execution of the shader nondeterministic, nor were any error messages issued incorrectly or skipped. The only difference was the TIMING of how long it took to get the PTX to the card, because in some cases but not others, we'd randomly get a PTX cache miss.

Fun!

@fpsunflower
Copy link
Contributor

Programming is hard 🙃 Nice debugging work!

It seems like the trickiest aspect of this was the 100 max error messages deep inside OIIO. Maybe it would be better for the library to always report errors and for the host application to decide how to throttle error messages? (or fail fast when they matter)

Having a function call that can randomly be an error or not given identical input depending on what else was going on in the system seems like it could lead to other confusing behavior.

@lgritz
Copy link
Collaborator Author

lgritz commented Sep 2, 2022

Maybe this is a "grass is always greener" attitude, but I strongly suspect that potato farmers never have to worry about nonsense like this.

@lgritz
Copy link
Collaborator Author

lgritz commented Sep 2, 2022

By the way, it was @chellmuth who did most of the work narrowing down the problem and understanding that the PTX cache misses were related to particular gettextureinfo calls in the shader optimizing differently from run to run when multithreaded.

@sfriedmapixar
Copy link
Contributor

In my experience with productions, you want to deal with these error messages (or rather, not deal) as early as possible. Maybe I'm a bit cynical , but it seems texture error messages have a way of showing up on things people can't spend as much time tweaking but are used everywhere, like the leaf that gets pulled into all the background trees in a shot. That can lead to spewage that
A) is critical to be clear as to what needs fixing so it can be fixed quickly -- otherwise it won't be -- so the output is valuable
B) needs to be throttled as early/cheaply as possible because the amount of spewage has significant render time implications (even the ones that don't make it to printing) in the cases where it hasn't or can't be cleaned up
so while the 100 max error limitation could lead to some non-determinism, don't discount the usefulness of it happening ASAP.

The grass is only greener once someone's spread out the manure -- so programming or potato farming, somebody's gotta shovel it.

At any rate, nice sluething all around, and this demonstrates the value of something like ASWF DPEL to provide cases for testing "at scale."

@lgritz
Copy link
Collaborator Author

lgritz commented Sep 2, 2022

To date, we have tracked down three (I think) independent sources of the PTX cache having misses for what we thought ought to have been the same shader code. We're pretty sure this is the last known one.

@lgritz
Copy link
Collaborator Author

lgritz commented Sep 2, 2022

At any rate, nice sluething all around, and this demonstrates the value of something like ASWF DPEL to provide cases for testing "at scale."

Indeed. This kind of bizarre corner case is in some sense totally not a surprise for a real set of production shaders on a real scene. It's probably not even in the top 10 most unusual things the renderer will encounter this week. I make jokes about the potatoes, but it is intensely satisfying to build software that is robust enough to handle everything that a production will do in anger.

The scale of the shaders this thing is fed constantly amazes me. The software is really solid. For each legit OSL bug, we probably investigate 10 cases that turn out to simply be shading networks that are so complicated that even their authors can't reason about their behavior correctly.

@lgritz
Copy link
Collaborator Author

lgritz commented Sep 2, 2022

I think there is a legit point that the error message suppression that is crucial to happen as far upstream as possible for a shader call when it's executed a billion times, maybe should not happen in the special case of doing the constant folding in the optimizer. One way to solve that is for the ImageCache::get_image_info call to have a parameter that allows the option of issuing error messages unconditionally. But that's an API change and given how rare these circumstances are (and now effectively worked around for the only case we've found in practice), I'm thinking about just tabling this until next time we're in a position to want to introduce API changes.

@lgritz lgritz deleted the lg-constnames branch September 3, 2022 05:22
lgritz added a commit to lgritz/OpenShadingLanguage that referenced this pull request Oct 3, 2022
…Foundation#1570)

In the course of our runtime optimization of shader networks, we
generate lots of new constant symbols (for example, as we optimize an
"a = b + c" into "a = newconst" if b and c can be determined to be
constants). The new constants are named with a numerical suffix doled
out in a sequental manner.

I've always thought that maybe it would be helpful for me when
debugging optimizer output if the name of the symbol also revealed the
value of the constant it held (when possible). Never did anything
about it.

Now it turns out that this has an application for JIT to PTX, where
the PTX cache can save compilation-to-PTX time if it encounters LLVM
IR that is exactly the same as it has seen before and is cache.  If
the constants are generated in a different order, it generates
non-functional, purely text differences in the code we generate, and
this means the PTX cache doesn't function as efficiently as possible.

So this change alters the compiler-generated constant naming for two
cases: int constants, and string constants. The int constants append
the value of the integer (using '_' instead of '-' for negatives), and
the string constants append the hexidecimal value of the ustring hash
for that string. In both cases it does this without using or
incrementing the `m_next_newconst` counter that it uses to
sequentially name the other types that can't easily encode the value
to generate a unique name.

It's really to help the PTX cache, but as a side benefit, it will
be slightly easier for me to make sense of the optimized code when
I need to read it with my own eyes while debugging.

Signed-off-by: Larry Gritz <lg@larrygritz.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants