Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Double Hierarchy View Crash #5364

Closed
danieldresser-ie opened this issue Jun 27, 2023 · 0 comments
Closed

Double Hierarchy View Crash #5364

danieldresser-ie opened this issue Jun 27, 2023 · 0 comments

Comments

@danieldresser-ie
Copy link
Contributor

danieldresser-ie commented Jun 27, 2023

This is a bizarrely specific bug showing up in production at IE, but with help from Linas, we do seem to have a reproducible case. I'll try to describe the steps of the incantation quite exactly, because any slight deviation seems to make it not occur the same way.

  1. paste in the scene listed at the bottom of this issue
  2. in your top right UI pane, create a second hierarchy view, and set it to also follow the focus node, so you have two identical hiearchy views. I will refer to this new hierarchy view as <H>
  3. focus the bottom group of the scene
  4. using <H>, shift click on the root group to fully expand the scene
  5. using <H>, collapse location A
  6. using <H>, click in the right column to set A as excluded
  7. using <H>, collapse location B
  8. using <H>, click in the left column to set B as expanded
  9. using <H>, collapse the root "/group"
  10. using the Viewer, click on the background to make sure nothing is selected
  11. using the Viewer, click on the visible cube. This should expand the hierarchy to show the location clicked, but if you've done this all right, you should see a crash.

This crashes on both Gaffer 1.2 and 1.3, and I think is consistent - I've been wondering if there's an element of randomness to it, but I think any of the times it didn't fail, I deviated slightly in the incantation.

In Linas's testing, the crash seems to occur in GafferSceneUI.ContextAlgo.getVisibleSet. It looks like the only way this could crash is if the context passed in is null or invalid. We probably ought to guard against null, but when Linas instrumented it, he sees it always getting called on the same address, including the time it crashes ... suggesting that somehow the context it's using has been deallocated? ( Though this is a bit complicated by there being two Hierarchy Views going at once ).

Linas also found that he was unable to reproduce the crash if he moved the getVisibleSet inside __transferExpansionFromContext to inside of the BlockedConnection - I don't really understand this code, and he was mostly just going based on intuition based on the getVisibleSet in __expansionChanged being inside of a different BlockedConnection. Maybe this is the solution, or maybe it just shifts the problem around ... either way, we probably need someone who understands this code better than I do to take a look.

import GafferScene
import imath

Gaffer.Metadata.registerValue( parent, "serialiser:milestoneVersion", 1, persistent=False )
Gaffer.Metadata.registerValue( parent, "serialiser:majorVersion", 2, persistent=False )
Gaffer.Metadata.registerValue( parent, "serialiser:minorVersion", 8, persistent=False )
Gaffer.Metadata.registerValue( parent, "serialiser:patchVersion", 0, persistent=False )

__children = {}

__children["Cube"] = GafferScene.Cube( "Cube" )
parent.addChild( __children["Cube"] )
__children["Cube"].addChild( Gaffer.V2fPlug( "__uiPosition", defaultValue = imath.V2f( 0, 0 ), flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["Group"] = GafferScene.Group( "Group" )
parent.addChild( __children["Group"] )
__children["Group"]["in"].addChild( GafferScene.ScenePlug( "in1", flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["Group"]["in"].addChild( GafferScene.ScenePlug( "in2", flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["Group"].addChild( Gaffer.V2fPlug( "__uiPosition", defaultValue = imath.V2f( 0, 0 ), flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["Group1"] = GafferScene.Group( "Group1" )
parent.addChild( __children["Group1"] )
__children["Group1"]["in"].addChild( GafferScene.ScenePlug( "in1", flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["Group1"]["in"].addChild( GafferScene.ScenePlug( "in2", flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["Group1"].addChild( Gaffer.V2fPlug( "__uiPosition", defaultValue = imath.V2f( 0, 0 ), flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["Group2"] = GafferScene.Group( "Group2" )
parent.addChild( __children["Group2"] )
__children["Group2"]["in"].addChild( GafferScene.ScenePlug( "in1", flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["Group2"]["in"].addChild( GafferScene.ScenePlug( "in2", flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["Group2"].addChild( Gaffer.V2fPlug( "__uiPosition", defaultValue = imath.V2f( 0, 0 ), flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["Cube"]["__uiPosition"].setValue( imath.V2f( 58.457634, 4.66671658 ) )
__children["Group"]["in"][0].setInput( __children["Cube"]["out"] )
__children["Group"]["in"][1].setInput( __children["Cube"]["out"] )
__children["Group"]["name"].setValue( 'A' )
__children["Group"]["__uiPosition"].setValue( imath.V2f( 59.9567871, -3.49734592 ) )
__children["Group1"]["in"][0].setInput( __children["Cube"]["out"] )
__children["Group1"]["in"][1].setInput( __children["Cube"]["out"] )
__children["Group1"]["name"].setValue( 'B' )
__children["Group1"]["transform"]["translate"].setValue( imath.V3f( 2, 0, 0 ) )
__children["Group1"]["__uiPosition"].setValue( imath.V2f( 77.3576279, -3.49734569 ) )
__children["Group2"]["in"][0].setInput( __children["Group"]["out"] )
__children["Group2"]["in"][1].setInput( __children["Group1"]["out"] )
__children["Group2"]["__uiPosition"].setValue( imath.V2f( 69.757637, -14.233284 ) )


del __children
johnhaddon added a commit to johnhaddon/gaffer that referenced this issue Jun 28, 2023
We were updating `m_allocMap` (which maintains ownership) _after_ assigning to `m_map` (which deals in raw pointers) and emitting `changedSignal()`. This meant we were in an inconsistent internal state at the point we triggered arbitrary observer code via slots connected to the signal. If a slot set the _same_ variable again, we'd end up with the raw pointer from the second call to `set()`, but with the owning pointer for the first call. Hence we were referencing a dangling pointer and subsequent attempts at accessing it would crash.

The solution is to store to `m_allocMap` _before_ calling `internalSet()`, so everything is in sync when we emit the signal.

This fixes crashes triggered by the interaction between multiple HierarchyViews synchronising via `ContextAlgo::set/getVisibleSet()`, as described in (GafferHQ#5364). I'm still not 100% sure how that was leading to a re-entrant call, but it seems worth fixing this at the lowest level possible, and "no signalling without consistent internal state" is a rule to live by anyway.
johnhaddon added a commit to johnhaddon/gaffer that referenced this issue Jun 29, 2023
We were updating `m_allocMap` (which maintains ownership) _after_ assigning to `m_map` (which deals in raw pointers) and emitting `changedSignal()`. This meant we were in an inconsistent internal state at the point we triggered arbitrary observer code via slots connected to the signal. If a slot set the _same_ variable again, we'd end up with the raw pointer from the second call to `set()`, but with the owning pointer for the first call. Hence we were referencing a dangling pointer and subsequent attempts at accessing it would crash.

The solution is to store to `m_allocMap` _before_ calling `internalSet()`, so everything is in sync when we emit the signal.

This fixes crashes triggered by the interaction between multiple HierarchyViews synchronising via `ContextAlgo::set/getVisibleSet()`, as described in (GafferHQ#5364). I'm still not 100% sure how that was leading to a re-entrant call, but it seems worth fixing this at the lowest level possible, and "no signalling without consistent internal state" is a rule to live by anyway.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants