Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Deadlock in LibMxNet.dll during unload due to synchronization call in destructor #11163

Open
eftiquar opened this issue Jun 6, 2018 · 6 comments

Comments

@eftiquar
Copy link

eftiquar commented Jun 6, 2018

Description

The destructor for "mxnet::Engine" local static instance calls into STL's condition_variable::notify_all , which in turn calls NtReleaseKeyedEvent, an undocumented Windows API which results in deadlock, as the call is being run with loader lock acquired, in the context of DLL unload machinery.
Independent of this hang, having non-trivial destructor for static local variable is recipe for unforeseen problems. These destructors are called in unspecified order, and may call into other DLLs or ( Shared Objects) that are unloaded.

(Brief description of the problem in no more than 2 sentences.)

Environment info (Required)

Windows 7, any MxNet build

Compiler (gcc/clang/mingw/visual studio):

Error Message:

No error message, the Python process hangs upon exit

Minimum reproducible example

(If you are using your own code, please provide a short script that reproduces the error. Otherwise, please provide link to the existing example.)

Steps to reproduce

  1. Set up a standard Windows 7 system
  2. Install Python from https://sourceforge.net/projects/winpython/files/WinPython_3.6/3.6.5.1/beta... ...
  3. Run the following in the Windows Python command shell:
    a. pip install mxnet
    b. python
  4. Run the following in the Python shell:
    a. import mxnet
    b. exit()

What have you tried to solve it?

Proposed solution

a. Provide explicit API to destroy the "mxnet::Engine" instance that is complement to Engine::Create.
b. Have LibMxNet consumers call Destroy API explicitly before exiting.
c. Modify the destructor to not perform cleanup, as Destroy will have already done that
d. If no one calls cleanup, it should not matter. As, after the process is destroyed, all the resources will be reclaimed by the OS

@anirudh2290
Copy link
Member

Related to #8921 and #9271

@m-ky
Copy link

m-ky commented Jun 19, 2018

This should be the same as #8754. It's only win7. win10 works fine.
Could someone please take a look at this?

@apeforest
Copy link
Contributor

apeforest commented Aug 27, 2018

@nswamy @sandeep-krishnamurthy Please add label [Call for Contribution]. This has been the rootcause of several issues

@larroy
Copy link
Contributor

larroy commented Aug 8, 2019

Let's close this. We can't invest energy on Windows 7. As I understand this doesn't happen in Windows 10.

@gmx1992
Copy link

gmx1992 commented Dec 4, 2020

@romintomasetti
Copy link

romintomasetti commented May 12, 2021

This is still happening as of MxNet 1.7.0 (on Windows 7), using the C++ or Python API.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

9 participants