-
Notifications
You must be signed in to change notification settings - Fork 207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ShadingEngine : Add missing this_task_arena::isolate #5402
ShadingEngine : Add missing this_task_arena::isolate #5402
Conversation
I guess I'll just paste in my repro file here. I've been running it like this:
This should produce a usual runtime of about 1 second scene generation ( plus several seconds loading ), but in the run of 40, you should see some take over 30 seconds and get killed by the timeout command. With the fix, I was able to run a couple of hundred times with no hangs. Contents of deleteFacesHang3.gfr:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it was just a missing this_task_arena::isolate which affected OSLObject.
Something smells fishy here. Although there's a case to be made that ShadingEngine should do an isolation itself, because it could be called from anywhere, it shouldn't be necessary for the call from OSLObject. OSLObject::processedObjectComputeCachePolicy
is TaskCollaboration
, which means the whole thing should be running isolated in its own arena anyway. Perhaps that's not working? Or perhaps this PR doesn't really fix the root problem, but masks it by changing thread timings somehow?
auto f = [¶ms, &renderState, &shaderGroup, &shadingSystem, &results]( const tbb::blocked_range<size_t> &r ) | ||
{ | ||
ThreadInfo &threadInfo = params.threadInfoCache.local(); | ||
tbb::this_task_arena::isolate( [ ¶ms, &renderState, &shaderGroup, &shadingSystem, &results ]{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this could be put around just the tbb::parallel_for
call, so it has less overall impact on the code layout, and the meat of the code doesn't end up with an extra level of indentation?
Closing, since this doesn't fix the original production problem, and as discussed above isn't necessary because of the |
The hang that's been affecting IE turned out to be completely different from what we were originally thinking - it was just a missing this_task_arena::isolate which affected OSLObject.
I haven't thought that deeply about where this should live - it could go outside ShadingEngine, but it probably needs to affect both OSLImage and OSLObject, so this is probably a reasonable place for it?
I also haven't got a unit test. I do have a fairly clean repro ( Murray has confirmed that it hangs on his computer too ), which I'll try to include here, but it is quite sensitive, and requires being run a large number of times with a large number of threads in order to catch anything.
What would probably be more useful anyway if we were going to improve test coverage would be some way of detecting any tbb parallelism inside TaskCollaborations that isn't isolated ( even if it isn't currently causing a hang ). I started wondering whether we could create a honeypot arena that should never be executed, and could then report an error if anything picks up a job from it? Didn't quite figure out what that would look like though.