-
Notifications
You must be signed in to change notification settings - Fork 4.4k
Add global lock for torch.onnx.export() #4659
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I think this works, but I would prefer the lock be part of the |
|
I was thinking a static lock on the class. Something like : Since it is a static variable, it should be created only once and be shared on all threads no? |
I was thinking about the same thing and I just tried it now. It seems to work. |
|
Tested with match3 with checkpoint_interval=100. Without the fix It usually fail within 1000 steps without the fix. |
Co-authored-by: Vincent-Pierre BERGES <vincentpierre@unity3d.com>
* Cherry-pick fix from #4659
Proposed change(s)
torch.onnx.export()can only be used by one thread at a time.When training multiple behavior with threading=True, multiple thread could be trying to export model at the same time and resulted in exception.
Adding a global lock for model serialization solves the problem.
Useful links (Github issues, JIRA tickets, ML-Agents forum threads etc.)
Types of change(s)
Checklist
Other comments