-
Notifications
You must be signed in to change notification settings - Fork 207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Double Hierarchy View Crash #5364
Comments
johnhaddon
added a commit
to johnhaddon/gaffer
that referenced
this issue
Jun 28, 2023
We were updating `m_allocMap` (which maintains ownership) _after_ assigning to `m_map` (which deals in raw pointers) and emitting `changedSignal()`. This meant we were in an inconsistent internal state at the point we triggered arbitrary observer code via slots connected to the signal. If a slot set the _same_ variable again, we'd end up with the raw pointer from the second call to `set()`, but with the owning pointer for the first call. Hence we were referencing a dangling pointer and subsequent attempts at accessing it would crash. The solution is to store to `m_allocMap` _before_ calling `internalSet()`, so everything is in sync when we emit the signal. This fixes crashes triggered by the interaction between multiple HierarchyViews synchronising via `ContextAlgo::set/getVisibleSet()`, as described in (GafferHQ#5364). I'm still not 100% sure how that was leading to a re-entrant call, but it seems worth fixing this at the lowest level possible, and "no signalling without consistent internal state" is a rule to live by anyway.
johnhaddon
added a commit
to johnhaddon/gaffer
that referenced
this issue
Jun 29, 2023
We were updating `m_allocMap` (which maintains ownership) _after_ assigning to `m_map` (which deals in raw pointers) and emitting `changedSignal()`. This meant we were in an inconsistent internal state at the point we triggered arbitrary observer code via slots connected to the signal. If a slot set the _same_ variable again, we'd end up with the raw pointer from the second call to `set()`, but with the owning pointer for the first call. Hence we were referencing a dangling pointer and subsequent attempts at accessing it would crash. The solution is to store to `m_allocMap` _before_ calling `internalSet()`, so everything is in sync when we emit the signal. This fixes crashes triggered by the interaction between multiple HierarchyViews synchronising via `ContextAlgo::set/getVisibleSet()`, as described in (GafferHQ#5364). I'm still not 100% sure how that was leading to a re-entrant call, but it seems worth fixing this at the lowest level possible, and "no signalling without consistent internal state" is a rule to live by anyway.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
This is a bizarrely specific bug showing up in production at IE, but with help from Linas, we do seem to have a reproducible case. I'll try to describe the steps of the incantation quite exactly, because any slight deviation seems to make it not occur the same way.
<H>
<H>
, shift click on the root group to fully expand the scene<H>
, collapse location A<H>
, click in the right column to set A as excluded<H>
, collapse location B<H>
, click in the left column to set B as expanded<H>
, collapse the root "/group"This crashes on both Gaffer 1.2 and 1.3, and I think is consistent - I've been wondering if there's an element of randomness to it, but I think any of the times it didn't fail, I deviated slightly in the incantation.
In Linas's testing, the crash seems to occur in GafferSceneUI.ContextAlgo.getVisibleSet. It looks like the only way this could crash is if the context passed in is null or invalid. We probably ought to guard against null, but when Linas instrumented it, he sees it always getting called on the same address, including the time it crashes ... suggesting that somehow the context it's using has been deallocated? ( Though this is a bit complicated by there being two Hierarchy Views going at once ).
Linas also found that he was unable to reproduce the crash if he moved the getVisibleSet inside __transferExpansionFromContext to inside of the BlockedConnection - I don't really understand this code, and he was mostly just going based on intuition based on the getVisibleSet in __expansionChanged being inside of a different BlockedConnection. Maybe this is the solution, or maybe it just shifts the problem around ... either way, we probably need someone who understands this code better than I do to take a look.
The text was updated successfully, but these errors were encountered: