-
Notifications
You must be signed in to change notification settings - Fork 6.8k
gpu memory allocate will be error when using multiprocessing.Process #4659
Comments
v0.7.0 and v0.8.0 is ok, master will bring this error. |
I may meet similar problem: [11:09:02] src/nnvm/legacy_json_util.cc:153: Loading symbol saved by previous version v0.8.0. Attempting to upgrade... Stack trace returned 6 entries: terminate called after throwing an instance of 'dmlc::Error' Stack trace returned 6 entries: |
The only way to reliably use cuda with multiprocessing is to import mxnet in after creating subprocesses. |
. |
I don't have any problems executing above code with a current version of mxnet. In general I believe using python multiprocessing and specifying the forkserver start method before importing mxnet should be a workaround for any cuda related multiprocessing issues. In particular it should still allow creating new processes after mxnet was imported, as the processes are forked from the forkserver which has no cuda context. This also seems to be what pytorch is doing. |
This issue is closed due to lack of activity in the last 90 days. Feel free to ping me to reopen if this is still an active issue. Thanks! |
SO , no solution to solve the problem? |
It is magical! |
@mxnet-label-bot add [Python, Bug] |
@szha Has this issue been resolved? I have not been able to reproduce the exact issue, It stalls to fail only when GPU runs out of memory, I have been able to spawn more than 10 workers with the example script. I see a related PR has been merged in dmlc/gluon-nlp repo |
@leezu might still have some issue with it so let's wait for his comment too. |
Here is an updated test case
Here Cuda is initialized on the parent process before calling the child processes. You may argue, that GPU operations in the child processes should not be supported, but then the situation must be handled gracefully, ie. throw some error on the Python side and not the C++ side. But let's accept the current C++ exception. Even then, if we only want to do CPU work in the child process, above example will crash as the |
@leezu
Hope it helps. |
Still facing issues unable to work and now have to change the entire architecture of the application because of this |
@mxnet-label-bot add [Backend] |
Related: #14979 Forking the library is not supported as of now. |
I also can't reproduce this with the latest master
|
@PascalIversen provided a new reproducer: #19291 |
reproduce code
Os: linux, centos 7 + cuda7.5 + cuDNN 5.1
log:
The text was updated successfully, but these errors were encountered: