Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sporadic Hang With TaskCollaboration Plugs With Many Threads #5403

Closed
danieldresser-ie opened this issue Jul 20, 2023 · 2 comments
Closed

Sporadic Hang With TaskCollaboration Plugs With Many Threads #5403

danieldresser-ie opened this issue Jul 20, 2023 · 2 comments

Comments

@danieldresser-ie
Copy link
Contributor

Summarizing this investigation which turns out to not be very high priority because it only occurs if the "-threads" argument is used to run Gaffer with more threads than cores.

The theory is that TaskMutex uses mechanisms like executionState->arena.execute( [&executionState]{ executionState->taskGroup.wait(); } ); for threads to join an arena - we expect this to join the arena, and immediately execute the taskGroup.wait. However, if the arena is already considered "full", then this instead enqueues the "taskGroup.wait", which means that threads already in the arena could pick up this wait, causing a deadlock.

I will include below a simple repro.

A guard that I tried out which did not prevent the deadlock, but did strongly indicate that something weird is happening with high thread counts, is putting this in TaskMutex::acquireOr:

                    tbb::enumerable_thread_specific<int> badThreadFlag;

                    badThreadFlag.local() = 37;
                    executionState->arena.execute(
                        [&executionState, &badThreadFlag]{
                            executionState->taskGroup.wait();
                            if( badThreadFlag.local() != 37 )
                            {
                                throw IECore::Exception( "Run from wrong thread?" );
                            }
                            badThreadFlag.local() = 0;
                        }
                    );

This is triggered when running Gaffer with -threads set higher than the core count, indicating that there is the possibility of this wait running at a time we don't expect ( though this guard by itself does not prevent hangs, indicating that something else weird may be going on at high thread counts ).

John spotted this weird work-around indicates TBB folks being aware of the issue internally:
https://github.com/oneapi-src/oneTBB/blob/v2020.3/src/tbb/arena.cpp#L1031-L1048
( If they added the try_execute method mentioned in that comment, that would probably solve the issue for us ).

We haven't figured out why that workaround appears to no longer be needed in bleeding edge TBB, nor can we find the forum post that John mentions in TaskMutex.h as being the original inspiration for this approach.

As far as we can tell, this issue currently only occurs when using -threads to run with more threads than cores ( if we don't over-subscribe at all, then we never run out of max_concurrency in the arena, and never trigger this behaviour from tbb ), so this may not be too high priority.

Maybe tbb will eventually add try_execute, or maybe on new tbb we'll be able to just replace TaskMutex with collaborative_call_once?

@danieldresser-ie
Copy link
Contributor Author

Here is the contents of my test file, deleteFacesHang3.gfr:

import Gaffer
import GafferImage
import GafferOSL
import GafferScene
import IECore
import imath

Gaffer.Metadata.registerValue( parent, "serialiser:milestoneVersion", 1, persistent=False )
Gaffer.Metadata.registerValue( parent, "serialiser:majorVersion", 2, persistent=False )
Gaffer.Metadata.registerValue( parent, "serialiser:minorVersion", 10, persistent=False )
Gaffer.Metadata.registerValue( parent, "serialiser:patchVersion", 0, persistent=False )

__children = {}

parent["variables"].addChild( Gaffer.NameValuePlug( "project:name", Gaffer.StringPlug( "value", defaultValue = 'default', flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ), "projectName", Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic ) )
parent["variables"].addChild( Gaffer.NameValuePlug( "project:rootDirectory", Gaffer.StringPlug( "value", defaultValue = '$HOME/gaffer/projects/${project:name}', flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ), "projectRootDirectory", Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic ) )
parent["variables"].addChild( Gaffer.NameValuePlug( "image:viewName", Gaffer.StringPlug( "value", defaultValue = 'default', flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ), "defaultViewName", Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic ) )
parent["variables"].addChild( Gaffer.NameValuePlug( "workingPath", Gaffer.StringPlug( "value", defaultValue = '', flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ), "workingPath", Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic ) )
parent["variables"].addChild( Gaffer.NameValuePlug( "job", Gaffer.StringPlug( "value", defaultValue = '', flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ), "job", Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic ) )
parent["variables"].addChild( Gaffer.NameValuePlug( "sequence", Gaffer.StringPlug( "value", defaultValue = '', flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ), "sequence", Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic ) )
parent["variables"].addChild( Gaffer.NameValuePlug( "shot", Gaffer.StringPlug( "value", defaultValue = '', flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ), "shot", Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic ) )
parent["variables"].addChild( Gaffer.NameValuePlug( "scene", Gaffer.StringPlug( "value", defaultValue = '', flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ), "scene", Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic ) )
parent["variables"].addChild( Gaffer.NameValuePlug( "defaultResolution", Gaffer.V2iPlug( "value", defaultValue = imath.V2i( 1920, 1080 ), flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ), "defaultResolution", Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic ) )
parent["variables"].addChild( Gaffer.NameValuePlug( "defaultPixelAspectRatio", Gaffer.FloatPlug( "value", defaultValue = 1.0, flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ), "defaultPixelAspectRatio", Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic ) )
parent["variables"].addChild( Gaffer.NameValuePlug( "defaultOverscan", Gaffer.V2fPlug( "value", defaultValue = imath.V2f( 0, 0 ), flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ), "defaultOverscan", Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic ) )
parent["variables"].addChild( Gaffer.NameValuePlug( "image:catalogue:port", Gaffer.IntPlug( "value", defaultValue = 0, flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ), "imageCataloguePort", Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic ) )
parent["variables"].addChild( Gaffer.NameValuePlug( "renderPassVariation", Gaffer.StringPlug( "value", defaultValue = '', flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ), "renderPassVariation", Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic ) )
parent["variables"].addChild( Gaffer.NameValuePlug( "renderPassName", Gaffer.StringPlug( "value", defaultValue = '', flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ), "renderPassName", Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic ) )
__children["defaultFormat"] = GafferImage.FormatPlug( "defaultFormat", defaultValue = GafferImage.Format( 1920, 1080, 1.000 ), flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, )
parent.addChild( __children["defaultFormat"] )
__children["DeleteFaces1"] = GafferScene.DeleteFaces( "DeleteFaces1" )
parent.addChild( __children["DeleteFaces1"] )
__children["DeleteFaces1"].addChild( Gaffer.V2fPlug( "__uiPosition", defaultValue = imath.V2f( 0, 0 ), flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["DeleteFaces2"] = GafferScene.DeleteFaces( "DeleteFaces2" )
parent.addChild( __children["DeleteFaces2"] )
__children["DeleteFaces2"].addChild( Gaffer.V2fPlug( "__uiPosition", defaultValue = imath.V2f( 0, 0 ), flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["DeleteFaces5"] = GafferScene.DeleteFaces( "DeleteFaces5" )
parent.addChild( __children["DeleteFaces5"] )
__children["DeleteFaces5"].addChild( Gaffer.V2fPlug( "__uiPosition", defaultValue = imath.V2f( 0, 0 ), flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["DeleteFaces6"] = GafferScene.DeleteFaces( "DeleteFaces6" )
parent.addChild( __children["DeleteFaces6"] )
__children["DeleteFaces6"].addChild( Gaffer.V2fPlug( "__uiPosition", defaultValue = imath.V2f( 0, 0 ), flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["Filter"] = GafferScene.PathFilter( "Filter" )
parent.addChild( __children["Filter"] )
__children["Filter"].addChild( Gaffer.V2fPlug( "__uiPosition", defaultValue = imath.V2f( 0, 0 ), flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["DeleteFaces7"] = GafferScene.DeleteFaces( "DeleteFaces7" )
parent.addChild( __children["DeleteFaces7"] )
__children["DeleteFaces7"].addChild( Gaffer.V2fPlug( "__uiPosition", defaultValue = imath.V2f( 0, 0 ), flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["DeleteFaces8"] = GafferScene.DeleteFaces( "DeleteFaces8" )
parent.addChild( __children["DeleteFaces8"] )
__children["DeleteFaces8"].addChild( Gaffer.V2fPlug( "__uiPosition", defaultValue = imath.V2f( 0, 0 ), flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["DeleteFaces9"] = GafferScene.DeleteFaces( "DeleteFaces9" )
parent.addChild( __children["DeleteFaces9"] )
__children["DeleteFaces9"].addChild( Gaffer.V2fPlug( "__uiPosition", defaultValue = imath.V2f( 0, 0 ), flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["DeleteFaces10"] = GafferScene.DeleteFaces( "DeleteFaces10" )
parent.addChild( __children["DeleteFaces10"] )
__children["DeleteFaces10"].addChild( Gaffer.V2fPlug( "__uiPosition", defaultValue = imath.V2f( 0, 0 ), flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["OSLObject"] = GafferOSL.OSLObject( "OSLObject" )
parent.addChild( __children["OSLObject"] )
__children["OSLObject"]["primitiveVariables"].addChild( Gaffer.NameValuePlug( "customFloat", Gaffer.FloatPlug( "value", defaultValue = 0.0, flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ), True, "primitiveVariable5", Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic ) )
__children["OSLObject"]["primitiveVariables"].addChild( Gaffer.NameValuePlug( "customFloat", Gaffer.FloatPlug( "value", defaultValue = 0.0, ), True, "primitiveVariable6", Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic ) )
__children["OSLObject"]["primitiveVariables"].addChild( Gaffer.NameValuePlug( "customFloat", Gaffer.FloatPlug( "value", defaultValue = 0.0, ), True, "primitiveVariable7", Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic ) )
__children["OSLObject"]["primitiveVariables"].addChild( Gaffer.NameValuePlug( "customFloat", Gaffer.FloatPlug( "value", defaultValue = 0.0, ), True, "primitiveVariable8", Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic ) )
__children["OSLObject"]["primitiveVariables"].addChild( Gaffer.NameValuePlug( "customFloat", Gaffer.FloatPlug( "value", defaultValue = 0.0, flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ), True, "primitiveVariable9", Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic ) )
__children["OSLObject"]["primitiveVariables"].addChild( Gaffer.NameValuePlug( "customFloat2", Gaffer.FloatPlug( "value", defaultValue = 0.0, flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ), True, "primitiveVariable10", Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic ) )
__children["OSLObject"]["primitiveVariables"].addChild( Gaffer.NameValuePlug( "customFloat3", Gaffer.FloatPlug( "value", defaultValue = 0.0, flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ), True, "primitiveVariable11", Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic ) )
__children["OSLObject"]["primitiveVariables"].addChild( Gaffer.NameValuePlug( "customFloat4", Gaffer.FloatPlug( "value", defaultValue = 0.0, flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ), True, "primitiveVariable12", Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic ) )
__children["OSLObject"].addChild( Gaffer.V2fPlug( "__uiPosition", defaultValue = imath.V2f( 0, 0 ), flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["OSLCode"] = GafferOSL.OSLCode( "OSLCode" )
parent.addChild( __children["OSLCode"] )
__children["OSLCode"]["out"].addChild( Gaffer.FloatPlug( "output1", direction = Gaffer.Plug.Direction.Out, defaultValue = 0.0, flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["OSLCode"]["out"].addChild( Gaffer.FloatPlug( "output2", direction = Gaffer.Plug.Direction.Out, defaultValue = 0.0, flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["OSLCode"]["out"].addChild( Gaffer.FloatPlug( "output3", direction = Gaffer.Plug.Direction.Out, defaultValue = 0.0, flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["OSLCode"]["out"].addChild( Gaffer.FloatPlug( "output4", direction = Gaffer.Plug.Direction.Out, defaultValue = 0.0, flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["OSLCode"]["out"].addChild( Gaffer.FloatPlug( "output5", direction = Gaffer.Plug.Direction.Out, defaultValue = 0.0, flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["OSLCode"]["out"].addChild( Gaffer.FloatPlug( "output6", direction = Gaffer.Plug.Direction.Out, defaultValue = 0.0, flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["OSLCode"]["out"].addChild( Gaffer.FloatPlug( "output7", direction = Gaffer.Plug.Direction.Out, defaultValue = 0.0, flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["OSLCode"]["out"].addChild( Gaffer.FloatPlug( "output8", direction = Gaffer.Plug.Direction.Out, defaultValue = 0.0, flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["OSLCode"].addChild( Gaffer.V2fPlug( "__uiPosition", defaultValue = imath.V2f( 0, 0 ), flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["Sphere"] = GafferScene.Sphere( "Sphere" )
parent.addChild( __children["Sphere"] )
__children["Sphere"].addChild( Gaffer.V2fPlug( "__uiPosition", defaultValue = imath.V2f( 0, 0 ), flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["CollectScenes"] = GafferScene.CollectScenes( "CollectScenes" )
parent.addChild( __children["CollectScenes"] )
__children["CollectScenes"].addChild( Gaffer.V2fPlug( "__uiPosition", defaultValue = imath.V2f( 0, 0 ), flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["Expression"] = Gaffer.Expression( "Expression" )
parent.addChild( __children["Expression"] )
__children["Expression"]["__out"].addChild( Gaffer.StringVectorDataPlug( "p0", direction = Gaffer.Plug.Direction.Out, defaultValue = IECore.StringVectorData( [  ] ), flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["Expression"].addChild( Gaffer.V2fPlug( "__uiPosition", defaultValue = imath.V2f( 0, 0 ), flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["Expression1"] = Gaffer.Expression( "Expression1" )
parent.addChild( __children["Expression1"] )
__children["Expression1"]["__out"].addChild( Gaffer.V2iPlug( "p0", direction = Gaffer.Plug.Direction.Out, defaultValue = imath.V2i( 20, 40 ), minValue = imath.V2i( 3, 6 ), flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["Expression1"].addChild( Gaffer.V2fPlug( "__uiPosition", defaultValue = imath.V2f( 0, 0 ), flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["Group"] = GafferScene.Group( "Group" )
parent.addChild( __children["Group"] )
__children["Group"]["in"].addChild( GafferScene.ScenePlug( "in1", flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["Group"]["in"].addChild( GafferScene.ScenePlug( "in2", flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["Group"]["in"].addChild( GafferScene.ScenePlug( "in3", flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["Group"]["in"].addChild( GafferScene.ScenePlug( "in4", flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["Group"]["in"].addChild( GafferScene.ScenePlug( "in5", flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["Group"]["in"].addChild( GafferScene.ScenePlug( "in6", flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["Group"]["in"].addChild( GafferScene.ScenePlug( "in7", flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["Group"]["in"].addChild( GafferScene.ScenePlug( "in8", flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["Group"].addChild( Gaffer.V2fPlug( "__uiPosition", defaultValue = imath.V2f( 0, 0 ), flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
parent["frameRange"]["start"].setValue( 1014 )
parent["frameRange"]["end"].setValue( 1081 )
parent["frame"].setValue( 1081.0 )
parent["variables"]["shot"]["value"].setValue( 'generic' )
parent["variables"]["scene"]["value"].setValue( '${script:name}' )
parent["variables"]["defaultResolution"]["value"].setValue( imath.V2i( 4096, 2534 ) )
parent["variables"]["defaultOverscan"]["value"].setValue( imath.V2f( 0, 0.000789265963 ) )
Gaffer.Metadata.registerValue( parent["variables"]["projectName"]["name"], 'readOnly', True )
Gaffer.Metadata.registerValue( parent["variables"]["projectRootDirectory"]["name"], 'readOnly', True )
Gaffer.Metadata.registerValue( parent["variables"]["defaultViewName"], 'layout:visibilityActivator', False )
Gaffer.Metadata.registerValue( parent["variables"]["defaultViewName"]["name"], 'readOnly', True )
Gaffer.Metadata.registerValue( parent["variables"]["defaultViewName"]["value"], 'readOnly', True )
Gaffer.Metadata.registerValue( parent["variables"]["imageCataloguePort"], 'readOnly', True )
__children["defaultFormat"].setValue( GafferImage.Format( 4096, 2534, 1.000 ) )
__children["DeleteFaces1"]["in"].setInput( __children["OSLObject"]["out"] )
__children["DeleteFaces1"]["filter"].setInput( __children["Filter"]["out"] )
__children["DeleteFaces1"]["adjustBounds"].setValue( False )
__children["DeleteFaces1"]["faces"].setValue( 'varA' )
__children["DeleteFaces1"]["invert"].setValue( True )
__children["DeleteFaces1"]["__uiPosition"].setValue( imath.V2f( 131.852158, -25.2778969 ) )
__children["DeleteFaces2"]["in"].setInput( __children["OSLObject"]["out"] )
__children["DeleteFaces2"]["filter"].setInput( __children["Filter"]["out"] )
__children["DeleteFaces2"]["adjustBounds"].setValue( False )
__children["DeleteFaces2"]["faces"].setValue( 'varB' )
__children["DeleteFaces2"]["invert"].setValue( True )
__children["DeleteFaces2"]["__uiPosition"].setValue( imath.V2f( 144.727859, -25.2778969 ) )
__children["DeleteFaces5"]["in"].setInput( __children["OSLObject"]["out"] )
__children["DeleteFaces5"]["filter"].setInput( __children["Filter"]["out"] )
__children["DeleteFaces5"]["adjustBounds"].setValue( False )
__children["DeleteFaces5"]["faces"].setValue( 'varC' )
__children["DeleteFaces5"]["invert"].setValue( True )
__children["DeleteFaces5"]["__uiPosition"].setValue( imath.V2f( 155.879608, -25.2779045 ) )
__children["DeleteFaces6"]["in"].setInput( __children["OSLObject"]["out"] )
__children["DeleteFaces6"]["filter"].setInput( __children["Filter"]["out"] )
__children["DeleteFaces6"]["adjustBounds"].setValue( False )
__children["DeleteFaces6"]["faces"].setValue( 'varD' )
__children["DeleteFaces6"]["invert"].setValue( True )
__children["DeleteFaces6"]["__uiPosition"].setValue( imath.V2f( 166.630936, -25.2779045 ) )
__children["Filter"]["paths"].setValue( IECore.StringVectorData( [ '/*' ] ) )
__children["Filter"]["__uiPosition"].setValue( imath.V2f( 197.610046, 5.44718122 ) )
__children["DeleteFaces7"]["in"].setInput( __children["OSLObject"]["out"] )
__children["DeleteFaces7"]["filter"].setInput( __children["Filter"]["out"] )
__children["DeleteFaces7"]["adjustBounds"].setValue( False )
__children["DeleteFaces7"]["faces"].setValue( 'varE' )
__children["DeleteFaces7"]["invert"].setValue( True )
__children["DeleteFaces7"]["__uiPosition"].setValue( imath.V2f( 176.607651, -25.2779045 ) )
__children["DeleteFaces8"]["in"].setInput( __children["OSLObject"]["out"] )
__children["DeleteFaces8"]["filter"].setInput( __children["Filter"]["out"] )
__children["DeleteFaces8"]["adjustBounds"].setValue( False )
__children["DeleteFaces8"]["faces"].setValue( 'varF' )
__children["DeleteFaces8"]["invert"].setValue( True )
__children["DeleteFaces8"]["__uiPosition"].setValue( imath.V2f( 187.350082, -25.2779045 ) )
__children["DeleteFaces9"]["in"].setInput( __children["OSLObject"]["out"] )
__children["DeleteFaces9"]["filter"].setInput( __children["Filter"]["out"] )
__children["DeleteFaces9"]["adjustBounds"].setValue( False )
__children["DeleteFaces9"]["faces"].setValue( 'varG' )
__children["DeleteFaces9"]["invert"].setValue( True )
__children["DeleteFaces9"]["__uiPosition"].setValue( imath.V2f( 197.515213, -25.2779045 ) )
__children["DeleteFaces10"]["in"].setInput( __children["OSLObject"]["out"] )
__children["DeleteFaces10"]["filter"].setInput( __children["Filter"]["out"] )
__children["DeleteFaces10"]["adjustBounds"].setValue( False )
__children["DeleteFaces10"]["faces"].setValue( 'varH' )
__children["DeleteFaces10"]["invert"].setValue( True )
__children["DeleteFaces10"]["__uiPosition"].setValue( imath.V2f( 210.566727, -25.2779121 ) )
__children["OSLObject"]["in"].setInput( __children["CollectScenes"]["out"] )
__children["OSLObject"]["filter"].setInput( __children["Filter"]["out"] )
__children["OSLObject"]["interpolation"].setValue( 2 )
__children["OSLObject"]["primitiveVariables"]["primitiveVariable5"]["name"].setValue( 'varA' )
__children["OSLObject"]["primitiveVariables"]["primitiveVariable5"]["value"].setInput( __children["OSLCode"]["out"]["output1"] )
__children["OSLObject"]["primitiveVariables"]["primitiveVariable6"]["name"].setValue( 'varB' )
__children["OSLObject"]["primitiveVariables"]["primitiveVariable6"]["value"].setInput( __children["OSLCode"]["out"]["output2"] )
__children["OSLObject"]["primitiveVariables"]["primitiveVariable7"]["name"].setValue( 'varC' )
__children["OSLObject"]["primitiveVariables"]["primitiveVariable7"]["value"].setInput( __children["OSLCode"]["out"]["output3"] )
__children["OSLObject"]["primitiveVariables"]["primitiveVariable8"]["name"].setValue( 'varD' )
__children["OSLObject"]["primitiveVariables"]["primitiveVariable8"]["value"].setInput( __children["OSLCode"]["out"]["output4"] )
__children["OSLObject"]["primitiveVariables"]["primitiveVariable9"]["name"].setValue( 'varE' )
__children["OSLObject"]["primitiveVariables"]["primitiveVariable9"]["value"].setInput( __children["OSLCode"]["out"]["output5"] )
__children["OSLObject"]["primitiveVariables"]["primitiveVariable10"]["name"].setValue( 'varF' )
__children["OSLObject"]["primitiveVariables"]["primitiveVariable10"]["value"].setInput( __children["OSLCode"]["out"]["output6"] )
__children["OSLObject"]["primitiveVariables"]["primitiveVariable11"]["name"].setValue( 'varG' )
__children["OSLObject"]["primitiveVariables"]["primitiveVariable11"]["value"].setInput( __children["OSLCode"]["out"]["output7"] )
__children["OSLObject"]["primitiveVariables"]["primitiveVariable12"]["name"].setValue( 'varH' )
__children["OSLObject"]["primitiveVariables"]["primitiveVariable12"]["value"].setInput( __children["OSLCode"]["out"]["output8"] )
__children["OSLObject"]["__uiPosition"].setValue( imath.V2f( 155.879608, -2.04510856 ) )
__children["OSLCode"]["code"].setValue( 'int q = int( u * 10000 ) % 512;\n\noutput1 = 0;\noutput2 = 0;\noutput3 = 0;\noutput4 = 0;\noutput5 = 0;\noutput6 = 0;\noutput7 = 0;\noutput8 = 0;\n\nif( q < 1 ) output1 = 1;\nelse if( q < 2 ) output2 = 1;\nelse if( q < 4 ) output3 = 1;\nelse if( q < 8 ) output4 = 1;\nelse if( q < 16 ) output5 = 1;\nelse if( q < 32 ) output6 = 1;\nelse if( q < 64 ) output7 = 1;\nelse if( q < 128 ) output8 = 1;\n' )
__children["OSLCode"]["__uiPosition"].setValue( imath.V2f( 139.589676, -1.44510877 ) )
__children["Sphere"]["divisions"].setInput( __children["Expression1"]["__out"]["p0"] )
__children["Sphere"]["__uiPosition"].setValue( imath.V2f( 155.879761, 18.2009869 ) )
__children["CollectScenes"]["in"].setInput( __children["Sphere"]["out"] )
__children["CollectScenes"]["rootNames"].setInput( __children["Expression"]["__out"]["p0"] )
__children["CollectScenes"]["rootNameVariable"].setValue( 'collect:test' )
__children["CollectScenes"]["sourceRoot"].setValue( '/sphere' )
__children["CollectScenes"]["__uiPosition"].setValue( imath.V2f( 155.879608, 10.0369234 ) )
__children["Expression"]["__uiPosition"].setValue( imath.V2f( 145.378922, 10.0377245 ) )
__children["Expression1"]["__uiPosition"].setValue( imath.V2f( 145.379059, 18.2018032 ) )
__children["Group"]["in"][0].setInput( __children["DeleteFaces1"]["out"] )
__children["Group"]["in"][1].setInput( __children["DeleteFaces2"]["out"] )
__children["Group"]["in"][2].setInput( __children["DeleteFaces5"]["out"] )
__children["Group"]["in"][3].setInput( __children["DeleteFaces6"]["out"] )
__children["Group"]["in"][4].setInput( __children["DeleteFaces7"]["out"] )
__children["Group"]["in"][5].setInput( __children["DeleteFaces8"]["out"] )
__children["Group"]["in"][6].setInput( __children["DeleteFaces9"]["out"] )
__children["Group"]["in"][7].setInput( __children["DeleteFaces10"]["out"] )
__children["Group"]["__uiPosition"].setValue( imath.V2f( 166.63092, -42.2246513 ) )
__children["Expression"]["__engine"].setValue( 'python' )
__children["Expression"]["__expression"].setValue( 'parent["__out"]["p0"] = IECore.StringVectorData( [ "test%i_C_REN" % i for i in range( 100 ) ] )' )
__children["Expression1"]["__engine"].setValue( 'python' )
__children["Expression1"]["__expression"].setValue( 'import imath\nn = context.get( "collect:test" )\n\ni = int( n[4:-6] )\n\n\nparent["__out"]["p0"] = imath.V2i( 3, 6 ) * i' )


del __children

I've been running it like this:

repeat 40 timeout -s KILL 30s gaffer stats deleteFacesHang3.gfr -nodeSummary 0 -threads 50 -scene Group.out

This should produce a usual runtime of about 1 second scene generation ( plus several seconds loading ), but in the run of 40, you should see some take over 30 seconds and get killed by the timeout command.

The stack trace shows that one thread gets into the parallel for within ShadingEngine::shade, which should only be running OSL stuff, but then somehow that hits a mutex and deadlocks ... our understanding is that this is because another thread has tried to help with the parallel_for, but is over max_concurrency, so has instead enqueued a task which gets picked up while running the parallel_for.

@johnhaddon
Copy link
Member

Good hunting Daniel! I think there a couple of relatively easy ways to resolve this :

  1. Put a limit on the -threads argument so you can't oversubscribe.
  2. Construct the task arena with an explicit argument of max_concurrency = global_control::active_value( max_allowed_parallelism ) to match the true thread count.

I'm fairly inclined to go with option 1 for its simplicity, and because I can't think of a good reason to oversubscribe.

johnhaddon added a commit to johnhaddon/gaffer that referenced this issue Aug 9, 2023
This fixes deadlocks caused by worker threads not being able to join a `tbb::task_arena`, because by default they are constructed to only allow the same number of workers as cores. Fixes GafferHQ#5403.

Also added support for specifying negative thread counts, to reserve some cores for other applications.
johnhaddon added a commit to johnhaddon/gaffer that referenced this issue Aug 11, 2023
This fixes deadlocks caused by worker threads not being able to join a `tbb::task_arena`, because by default they are constructed to only allow the same number of workers as cores. Fixes GafferHQ#5403.

Also added support for specifying negative thread counts, to reserve some cores for other applications.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants