Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

gpu memory allocate will be error when using multiprocessing.Process #4659

Open
tornadomeet opened this issue Jan 13, 2017 · 18 comments
Open
Labels
Backend Issues related to the backend of MXNet Bug Python

Comments

@tornadomeet
Copy link
Contributor

tornadomeet commented Jan 13, 2017

reproduce code

import numpy as np
import mxnet as mx
from multiprocessing import Process, current_process

def test():
    print("process id is {:s}".format(current_process().name))
    a = mx.nd.array(np.zeros((100, 100, 100, 100)), mx.gpu(0))
    a.asnumpy()

if __name__ == '__main__':
    runs = [Process(target=test) for i in range(1)]  # 1 or 2 or N process is the same error
    for p in runs:
      p.start()
    for p in runs:
      p.join()
    print("done!")

Os: linux, centos 7 + cuda7.5 + cuDNN 5.1

log:

[14:32:58] /home/work/wuwei/project/dmlc/mxnet/dmlc-core/include/dmlc/./logging.h:300: [14:32:58] src/storage/storage.cc:38: Check failed: e == cudaSuccess || e == cudaErrorCudartUnloading CUDA: initialization error

Stack trace returned 40 entries:
[bt] (0) /home/work/wuwei/tools/mxnet/lib64/python2.7/site-packages/mxnet-0.9.1-py2.7-linux-x86_64.egg/mxnet/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x29) [0x7f32b9501039]
[bt] (1) /home/work/wuwei/tools/mxnet/lib64/python2.7/site-packages/mxnet-0.9.1-py2.7-linux-x86_64.egg/mxnet/libmxnet.so(_ZN5mxnet11StorageImpl14ActivateDeviceENS_7ContextE+0x2a6) [0x7f32b9fb4de6]
[bt] (2) /home/work/wuwei/tools/mxnet/lib64/python2.7/site-packages/mxnet-0.9.1-py2.7-linux-x86_64.egg/mxnet/libmxnet.so(_ZN5mxnet11StorageImpl5AllocEmNS_7ContextE+0x4a) [0x7f32b9fb263a]
[bt] (3) /home/work/wuwei/tools/mxnet/lib64/python2.7/site-packages/mxnet-0.9.1-py2.7-linux-x86_64.egg/mxnet/libmxnet.so(MXNDArrayCreateEx+0x595) [0x7f32b9fe6685]
[bt] (4) /lib64/libffi.so.6(ffi_call_unix64+0x4c) [0x7f32c24f9dac]
[bt] (5) /lib64/libffi.so.6(ffi_call+0x1f5) [0x7f32c24f96d5]
[bt] (6) /usr/lib64/python2.7/lib-dynload/_ctypes.so(_ctypes_callproc+0x30b) [0x7f32c270cc8b]
[bt] (7) /usr/lib64/python2.7/lib-dynload/_ctypes.so(+0xaa85) [0x7f32c2706a85]
[bt] (8) /lib64/libpython2.7.so.1.0(PyObject_Call+0x43) [0x7f32cd97c0b3]
[bt] (9) /lib64/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x1d4c) [0x7f32cda1025c]
[bt] (10) /lib64/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x7ed) [0x7f32cda140bd]
[bt] (11) /lib64/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x425f) [0x7f32cda1276f]
[bt] (12) /lib64/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x7ed) [0x7f32cda140bd]
[bt] (13) /lib64/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x425f) [0x7f32cda1276f]
[bt] (14) /lib64/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x7ed) [0x7f32cda140bd]
[bt] (15) /lib64/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x425f) [0x7f32cda1276f]
[bt] (16) /lib64/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x7ed) [0x7f32cda140bd]
[bt] (17) /lib64/libpython2.7.so.1.0(+0x6f05d) [0x7f32cd9a105d]
[bt] (18) /lib64/libpython2.7.so.1.0(PyObject_Call+0x43) [0x7f32cd97c0b3]
[bt] (19) /lib64/libpython2.7.so.1.0(PyEval_EvalFrameEx+0xde7) [0x7f32cda0f2f7]
[bt] (20) /lib64/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x4350) [0x7f32cda12860]
[bt] (21) /lib64/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x4350) [0x7f32cda12860]
[bt] (22) /lib64/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x7ed) [0x7f32cda140bd]
[bt] (23) /lib64/libpython2.7.so.1.0(+0x6ef68) [0x7f32cd9a0f68]
[bt] (24) /lib64/libpython2.7.so.1.0(PyObject_Call+0x43) [0x7f32cd97c0b3]
[bt] (25) /lib64/libpython2.7.so.1.0(+0x590a5) [0x7f32cd98b0a5]
[bt] (26) /lib64/libpython2.7.so.1.0(PyObject_Call+0x43) [0x7f32cd97c0b3]
[bt] (27) /lib64/libpython2.7.so.1.0(+0xa1057) [0x7f32cd9d3057]
[bt] (28) /lib64/libpython2.7.so.1.0(+0x9fd6f) [0x7f32cd9d1d6f]
[bt] (29) /lib64/libpython2.7.so.1.0(PyObject_Call+0x43) [0x7f32cd97c0b3]
[bt] (30) /lib64/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x1d4c) [0x7f32cda1025c]
[bt] (31) /lib64/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x4350) [0x7f32cda12860]
[bt] (32) /lib64/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x7ed) [0x7f32cda140bd]
[bt] (33) /lib64/libpython2.7.so.1.0(PyEval_EvalCode+0x32) [0x7f32cda141c2]
[bt] (34) /lib64/libpython2.7.so.1.0(+0xfb5ff) [0x7f32cda2d5ff]
[bt] (35) /lib64/libpython2.7.so.1.0(PyRun_FileExFlags+0x7e) [0x7f32cda2e7be]
[bt] (36) /lib64/libpython2.7.so.1.0(PyRun_SimpleFileExFlags+0xe9) [0x7f32cda2fa49]
[bt] (37) /lib64/libpython2.7.so.1.0(Py_Main+0xc9f) [0x7f32cda40b9f]
[bt] (38) /lib64/libc.so.6(__libc_start_main+0xf5) [0x7f32ccc6cb15]
[bt] (39) python() [0x400721]

[14:32:58] /home/work/wuwei/project/dmlc/mxnet/dmlc-core/include/dmlc/./logging.h:300: [14:32:58] src/storage/storage.cc:38: Check failed: e == cudaSuccess || e == cudaErrorCudartUnloading CUDA: initialization error
......
@tornadomeet tornadomeet changed the title gpu memory allocate will be when using multiprocessing.Process gpu memory allocate will be error when using multiprocessing.Process Jan 13, 2017
@tornadomeet
Copy link
Contributor Author

v0.7.0 and v0.8.0 is ok, master will bring this error.

@xlvector
Copy link
Contributor

I may meet similar problem:

[11:09:02] src/nnvm/legacy_json_util.cc:153: Loading symbol saved by previous version v0.8.0. Attempting to upgrade...
[11:09:03] /data00/tiger/.jenkins/workspace/lab_mxnet/dmlc-core/include/dmlc/./logging.h:300: [11:09:03] /data00/tiger/.jenkins/workspace/lab_mxnet/mshadow/mshadow/./tensor_gpu-inl.h:35: Check failed: e == cudaSuccess CUDA: initialization error

Stack trace returned 6 entries:
[bt] (0) /opt/tiger/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x29) [0x7f241b51d2b9]
[bt] (1) /opt/tiger/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN7mshadow9SetDeviceINS_3gpuEEEvi+0xb8) [0x7f241bfeb078]
[bt] (2) /opt/tiger/mxnet/python/mxnet/../../lib/libmxnet.so(_ZNSt17_Function_handlerIFvvEZZN5mxnet6engine23ThreadedEnginePerDevice13PushToExecuteEPNS2_8OprBlockEbENKUlvE1_clEvEUlvE_E9_M_invokeERKSt9_Any_data+0x20) [0x7f241bfee840]
[bt] (3) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb6970) [0x7f24afe93970]
[bt] (4) /lib/x86_64-linux-gnu/libpthread.so.0(+0x80a4) [0x7f24b400e0a4]
[bt] (5) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f24b342062d]

terminate called after throwing an instance of 'dmlc::Error'
what(): [11:09:03] /data00/tiger/.jenkins/workspace/lab_mxnet/mshadow/mshadow/./tensor_gpu-inl.h:35: Check failed: e == cudaSuccess CUDA: initialization error

Stack trace returned 6 entries:
[bt] (0) /opt/tiger/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x29) [0x7f241b51d2b9]
[bt] (1) /opt/tiger/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN7mshadow9SetDeviceINS_3gpuEEEvi+0xb8) [0x7f241bfeb078]
[bt] (2) /opt/tiger/mxnet/python/mxnet/../../lib/libmxnet.so(_ZNSt17_Function_handlerIFvvEZZN5mxnet6engine23ThreadedEnginePerDevice13PushToExecuteEPNS2_8OprBlockEbENKUlvE1_clEvEUlvE_E9_M_invokeERKSt9_Any_data+0x20) [0x7f241bfee840]
[bt] (3) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb6970) [0x7f24afe93970]
[bt] (4) /lib/x86_64-linux-gnu/libpthread.so.0(+0x80a4) [0x7f24b400e0a4]
[bt] (5) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f24b342062d]

@piiswrong
Copy link
Contributor

The only way to reliably use cuda with multiprocessing is to import mxnet in after creating subprocesses.

@mcxkarakoc
Copy link

.

@leezu
Copy link
Contributor

leezu commented Jul 6, 2017

I don't have any problems executing above code with a current version of mxnet. @piiswrong do you have any insight into why it is working now compared to earlier this year? @tornadomeet do you still experience this issue? Perhaps it is related to a different cuda version/system configuration.. #4695 seems to contain the fix.

In general I believe using python multiprocessing and specifying the forkserver start method before importing mxnet should be a workaround for any cuda related multiprocessing issues. In particular it should still allow creating new processes after mxnet was imported, as the processes are forked from the forkserver which has no cuda context. This also seems to be what pytorch is doing.

@szha
Copy link
Member

szha commented Oct 8, 2017

This issue is closed due to lack of activity in the last 90 days. Feel free to ping me to reopen if this is still an active issue. Thanks!

@szha szha closed this as completed Oct 8, 2017
@anxingle
Copy link
Contributor

SO , no solution to solve the problem?

@anxingle
Copy link
Contributor

import numpy as np
import mxnet as mx
from multiprocessing import Process, current_process

def test():
    print("process id is {:s}".format(current_process().name))
    a = mx.nd.array(np.zeros((100, 100, 100, 100)), mx.gpu(0))
    a.asnumpy()

if __name__ == '__main__':
    # worker_count = multiprocessing.cpu_count() -2
    worker_count = 8 
    runs = [Process(target=test) for i in range(1)]  # 1 or 2 or N process is the same error
    for p in runs:
      p.start()
    for p in runs:
      p.join()
    print("done!")

It is magical!
I found it is OK when I set worker_count less than 8, while it doesn't work when worker_count more than 8!

@zachgk
Copy link
Contributor

zachgk commented Nov 13, 2018

@mxnet-label-bot add [Python, Bug]

leezu added a commit to leezu/gluon-nlp that referenced this issue Nov 20, 2018
leezu added a commit to leezu/gluon-nlp that referenced this issue Nov 20, 2018
szha pushed a commit to dmlc/gluon-nlp that referenced this issue Nov 20, 2018
@vrakesh
Copy link
Contributor

vrakesh commented Nov 26, 2018

@szha Has this issue been resolved? I have not been able to reproduce the exact issue, It stalls to fail only when GPU runs out of memory, I have been able to spawn more than 10 workers with the example script. I see a related PR has been merged in dmlc/gluon-nlp repo

@szha
Copy link
Member

szha commented Nov 27, 2018

@leezu might still have some issue with it so let's wait for his comment too.

@leezu
Copy link
Contributor

leezu commented Nov 27, 2018

Here is an updated test case

import numpy as np
import mxnet as mx
from multiprocessing import Process, current_process

def test():
    a = mx.random.seed(1)

if __name__ == '__main__':
    a = mx.nd.random_normal(shape=(10,10), ctx=mx.gpu(0))
    runs = [Process(target=test) for i in range(1)]
    for p in runs:
      p.start()
    for p in runs:
      p.join()

Here Cuda is initialized on the parent process before calling the child processes. You may argue, that GPU operations in the child processes should not be supported, but then the situation must be handled gracefully, ie. throw some error on the Python side and not the C++ side. But let's accept the current C++ exception. Even then, if we only want to do CPU work in the child process, above example will crash as the random.seed calls some Cuda related code internally. So there is currently no option to have deterministic execution of code in the child processes and code may crash at unexpected times (such as calling random.seed).

@nkhdiscovery
Copy link

nkhdiscovery commented Feb 23, 2019

@leezu
Here is something even more complex which works, I thought anybody else may come here and needs the solution - which does not work if you do not force mp.set_start_method('forkserver', force=True)

import random
import numpy as np
import mxnet as mx
import multiprocessing as mp

def test():
    mx.random.seed(random.randint(10,200))
    a = mx.nd.random_normal(shape=(2,2), ctx=mx.gpu(0))
    print('child no. ', mp.current_process().name, ':' , a)

if __name__ == '__main__':
    mp.set_start_method('forkserver', force=True)
    ab = mx.nd.random_normal(shape=(2,2), ctx=mx.gpu(0))
    print('main proc.: ', ab)
    runs = [mp.Process(target=test) for i in range(3)]
    for p in runs:
      p.start()
    for p in runs:
      p.join()

    print('done')

Hope it helps.

@mdv3101
Copy link

mdv3101 commented Apr 4, 2019

Still facing issues unable to work and now have to change the entire architecture of the application because of this

paperplanet pushed a commit to paperplanet/gluon-nlp that referenced this issue Jun 9, 2019
@larroy
Copy link
Contributor

larroy commented Jul 17, 2019

@mxnet-label-bot add [Backend]

@marcoabreu marcoabreu added the Backend Issues related to the backend of MXNet label Jul 17, 2019
@larroy
Copy link
Contributor

larroy commented Aug 26, 2019

Related: #14979

Forking the library is not supported as of now.

@larroy
Copy link
Contributor

larroy commented Aug 27, 2019

I also can't reproduce this with the latest master

In [2]: import numpy as np
   ...: import mxnet as mx
   ...: from multiprocessing import Process, current_process
   ...: 
   ...: def test():
   ...:     print("process id is {:s}".format(current_process().name))
   ...:     a = mx.nd.array(np.zeros((100, 100, 100, 100)), mx.gpu(0))
   ...:     a.asnumpy()
   ...: 
   ...: if __name__ == '__main__':
   ...:     runs = [Process(target=test) for i in range(2)]  # 1 or 2 or N process is the same error
   ...:     for p in runs:
   ...:       p.start()
   ...:     for p in runs:
   ...:       p.join()
   ...:     print("done!")
   ...: 
process id is Process-2
process id is Process-3
done!

In [1]: import numpy as np
   ...: import mxnet as mx
   ...: from multiprocessing import Process, current_process
   ...: 
   ...: def test():
   ...:     print("process id is {:s}".format(current_process().name))
   ...:     a = mx.nd.array(np.zeros((100, 100, 100, 100)), mx.gpu(0))
   ...:     a.asnumpy()
   ...: 
   ...: if __name__ == '__main__':
   ...:     runs = [Process(target=test) for i in range(1)]  # 1 or 2 or N process is the same error
   ...:     for p in runs:
   ...:       p.start()
   ...:     for p in runs:
   ...:       p.join()
   ...:     print("done!")
   ...: 

process id is Process-1


done!

@leezu
Copy link
Contributor

leezu commented Oct 5, 2020

@PascalIversen provided a new reproducer: #19291

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Backend Issues related to the backend of MXNet Bug Python
Projects
None yet
Development

No branches or pull requests