Skip to content

sqzhang_flight4_plus

@shuqiangzhang shuqiangzhang tagged this 11 Apr 16:19
Summary:
We seperated the FR dump logic from the desync debug logic,
so we no longer set collectiveDebugInfoMode_ to true when we just need FR
dump. That's why monitor thread did not sleep and try to kill the
process without waiting for the dump.

The fix is simple, we should sleep whenever shouldDump_ is true
Test Plan:
Existing unit tests

Pull Request resolved: https://github.com/pytorch/pytorch/pull/123788
Approved by: https://github.com/wconstab
Assets 2
Loading