-
Notifications
You must be signed in to change notification settings - Fork 207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can't interleave multiple edits/computes within an UndoContext #1971
Comments
That sounds about right.
I don't think this is the problem. The hash cache is (currently) stored per-thread, so we're not deferring for another thread to clear the one used on this thread, we're just telling all threads "yo, clear yours before you use it next". Because you cannot edit graphs while concurrently running them, we know that no other thread is in the process of a compute while we're saying this.
I think this is the clue to what's really happening. The ScriptEditor wraps the execution of the script within an UndoContext, so that a single undo event is generated for the whole script. The UndoContext in turn consolidates all the dirty propagation into a final step at the end (so observers of the graph don't recompute until all changes have been made). And it's only during the final dirty propagation that we invalidate the hash cache, hence why we're seeing stale cache values. As an aside, the UndoContext has nothing to do with Context, and should be renamed to UndoScope or some such thing to avoid confusion. I guess this has gone unnoticed until now because most code either mutates the graph (most likely within an UndoContext), or runs the graph (most likely not within an UndoContext). It seems that this is the first instance of someone trying to do both at the same time.
Although it's a rare case as I've explained above, I think we should just fix this to work no matter - as you say, it's confusing otherwise. I'm hoping this can be done by moving the |
Out of interest, and totally tangentially, what was this for? It feels like this sort of loop might be quite slow in Python, so I'm wondering if the ImageSampler should have an additional array in/out pair to do multiple queries at once, or if we can provide some other API feature to improve things? Although actually, maybe they would have been better to use a Sampler directly instead of using a node? |
I've had a chance to take a look at this today, and although the basic problem is as I described above, the easy solution I was hoping for doesn't work. Here's what's going on :
As far as alternative approaches go, I've only got the following at the moment :
|
For reference, here's the branch where I've been working on this. It contains a binding for DirtyPropagationScope (we didn't have one before), a test case for this issue, and actually does have a hack to fix that test, but which I didn't feel happy enough with to present as an option 3). |
If we go with 1), is there an easy way to escape the UndoContext in the Gaffer script editor if you want to do this? I think I would be OK with this as long as it's documented and there was something quick you could put in the script editor to get into a mode where you could do this if you want to. ( As for current use case, we don't have anything productiony. An artist had a bit of down time and was teaching themselves more about Gaffer by trying some silly things. ) |
There's nothing at the moment. It would be pretty easy to have some sort of "undoable-or-not" toggle button on the ScriptEditor, but I don't think I could explain that it's really an "i-want-to-interleave-edits-and-computes-in-one-execution" button with a straight face. I don't really want to resort to hacks of any sort for this, but if we do, I'd rather it was a behind-the-scenes one that didn't push the responsibility onto the user. |
I wasn't thinking of a toggle, more like a script thing:
or
It's not the most clean, but it's also not the most common. As long as there is something there for technical users on special occasions, I don't see a problem. |
Honestly, I'd be embarrassed if we had to push this responsibility on to the user. The branch I referenced above does have a workaround (albeit a bit rubbish), so unless I have a better idea soon, or we refactor the internals to make the fix more natural (and I suspect at the expense of flexibility in the future), I'd be more inclined to go with that. |
This came up again for us, when we are trying to update dynamic boxes that themselves contain dynamic boxes, given that which box to load is defined by a computation, which may change during the whole process. That means that option 1 would not work for us, unfortunately. @johnhaddon, how to you feel about option 2 now? It's been a long time that this issue was discussed. |
Option 2) isn't great. The recent refactor of the hash cache has actually put us in a situation where we could do partial invalidation of the cache (just for dirty plugs rather than clearing everything). I have a test branch with that on, and it's a decent win in terms of performance for interactive work. So I'm reluctant to lose that possibility. |
I'm still liking the idea of something explicit. Like if there was a ScriptNode.forceDirtyPropogation()? You said you didn't want to make this the user's responsibility John - but I'm not sure it's really that bad? It doesn't seem to come up very often, and it doesn't seem that bad to explain "If you need to alternate between changing things and computing them within an UndoScope, then you need to do a forceDirtyPropogation before you compute things". That seems better than hurting performance in any of the common cases. Otherwise, I guess we need to start looking at whatever your option 3) was that you didn't want to talk about? |
I still feel like something explicit is a failure on my part. I'd need to look into it some more, but I think the only real overhead in an automatic forcing would be the TLS lookup in |
This came up again in conversation today. I think perhaps one way of fixing things would be to have a way to request that all pending calls to |
Would be nice to see if we can also fix GafferHQ#1971 though.
Would be nice to see if we can also fix GafferHQ#1971 though.
Would be nice to see if we can also fix GafferHQ#1971 though.
Would be nice to see if we can also fix GafferHQ#1971 though.
…ionScopes This means calling `ValuePlug::dirty()` before the scope closes, so that the hash cache is invalidated before the compute is performed. This was first requested way back in 2017 in issue GafferHQ#1971, where the use case was edit/compute loops in the PythonEditor. It has become a high priority more recently though, as it turned out we were actually interleaving edits and computes in the TransformTools, and noticed incorrect results while working on the LightTool in GafferHQ#5218. The problem was as follows, and is demonstrated in `testMultipleSelectionWithEditScope` : 1. We push a dirty propagation scope using UndoScope. 2. We create a transform edit for `/cube1`, which dirties some plugs, incrementing their dirty count. 3. We create a transform edit for `/cube`, but as part of doing that we need to compute the _current_ transform. This "bakes" that transform into the (hash) cache, keyed by the current dirty count. 4. We propagate dirtiness for this new transform edit, but don't increment the dirty count again, because we've already visited those plugs in step 2. 5. After closing the scope, we query the transform for `/cube` and end up with the cached value from step 3. This is incorrect, because it isn't accounting for the edits made subsequently. The problem is demonstrated even more simply by `ComputeNodeTest.testInterleavedEditsAndComputes`. There is an overhead to calling `flushDirtyPropagationScope()` for each `getValue()` call, and it is measurable in `ValuePlugTest.testCacheOverhead()`. But taking into account the performance improvements made in GafferHQ#5362, we still come out with an %18 reduction in runtime even with the overhead. Fixes GafferHQ#1971
…ionScopes This means calling `ValuePlug::dirty()` before the scope closes, so that the hash cache is invalidated before the compute is performed. This was first requested way back in 2017 in issue GafferHQ#1971, where the use case was edit/compute loops in the PythonEditor. It has become a high priority more recently though, as it turned out we were actually interleaving edits and computes in the TransformTools, and noticed incorrect results while working on the LightTool in GafferHQ#5218. The problem was as follows, and is demonstrated in `testMultipleSelectionWithEditScope` : 1. We push a dirty propagation scope using UndoScope. 2. We create a transform edit for `/cube1`, which dirties some plugs, incrementing their dirty count. 3. We create a transform edit for `/cube`, but as part of doing that we need to compute the _current_ transform. This "bakes" that transform into the (hash) cache, keyed by the current dirty count. 4. We propagate dirtiness for this new transform edit, but don't increment the dirty count again, because we've already visited those plugs in step 2. 5. After closing the scope, we query the transform for `/cube` and end up with the cached value from step 3. This is incorrect, because it isn't accounting for the edits made subsequently. The problem is demonstrated even more simply by `ComputeNodeTest.testInterleavedEditsAndComputes`. There is an overhead to calling `flushDirtyPropagationScope()` for each `getValue()` call, and it is measurable in `ValuePlugTest.testCacheOverhead()`. But taking into account the performance improvements made in GafferHQ#5362, we still come out with an %18 reduction in runtime even with the overhead. Fixes GafferHQ#1971
…ionScopes This means calling `ValuePlug::dirty()` before the scope closes, so that the hash cache is invalidated before the compute is performed. This was first requested way back in 2017 in issue GafferHQ#1971, where the use case was edit/compute loops in the PythonEditor. It has become a high priority more recently though, as it turned out we were actually interleaving edits and computes in the TransformTools, and noticed incorrect results while working on the LightTool in GafferHQ#5218. The problem was as follows, and is demonstrated in `testMultipleSelectionWithEditScope` : 1. We push a dirty propagation scope using UndoScope. 2. We create a transform edit for `/cube1`, which dirties some plugs, incrementing their dirty count. 3. We create a transform edit for `/cube`, but as part of doing that we need to compute the _current_ transform. This "bakes" that transform into the (hash) cache, keyed by the current dirty count. 4. We propagate dirtiness for this new transform edit, but don't increment the dirty count again, because we've already visited those plugs in step 2. 5. After closing the scope, we query the transform for `/cube` and end up with the cached value from step 3. This is incorrect, because it isn't accounting for the edits made subsequently. The problem is demonstrated even more simply by `ComputeNodeTest.testInterleavedEditsAndComputes`. There is an overhead to calling `flushDirtyPropagationScope()` for each `getValue()` call, and it is measurable in `ValuePlugTest.testCacheOverhead()`. But taking into account the performance improvements made in GafferHQ#5362, we still come out with an %18 reduction in runtime even with the overhead. Fixes GafferHQ#1971
This came up while someone was trying to use the ImageSampler node from a Python script by repeatedly changing the sample position and querying it. They found that the sample value appeared to be getting cached, and they would just get the same value over and over.
The simplest repro I've found is to create a Sphere, and run this:
The expected output is "-1 -2", but it will produce either "-1 -1", or "-2 -2" if run all at once in the Script Editor. If you run it one line at a time it works.
It looks to me like this is due to the hash cache. When you change the input plug, it should invalidate the hash cache, but according to this:
gaffer/src/Gaffer/ValuePlug.cpp
Line 166 in 4301c52
... invalidating the hash cache is potentially deferred into another thread. This appears to mean that immediately querying the output could pull the old cached hash incorrectly.
One workaround is instead of changing the input plug from a script, drive the input plug from a context variable, and change the context variable in a script. This seems to work fine, but it's a bit complicated to explain why the obvious way doesn't work.
The text was updated successfully, but these errors were encountered: