gpu memory allocate will be error when using multiprocessing.Process #4659

tornadomeet · 2017-01-13T06:41:28Z

reproduce code

import numpy as np
import mxnet as mx
from multiprocessing import Process, current_process

def test():
    print("process id is {:s}".format(current_process().name))
    a = mx.nd.array(np.zeros((100, 100, 100, 100)), mx.gpu(0))
    a.asnumpy()

if __name__ == '__main__':
    runs = [Process(target=test) for i in range(1)]  # 1 or 2 or N process is the same error
    for p in runs:
      p.start()
    for p in runs:
      p.join()
    print("done!")

Os: linux, centos 7 + cuda7.5 + cuDNN 5.1

log:

[14:32:58] /home/work/wuwei/project/dmlc/mxnet/dmlc-core/include/dmlc/./logging.h:300: [14:32:58] src/storage/storage.cc:38: Check failed: e == cudaSuccess || e == cudaErrorCudartUnloading CUDA: initialization error

Stack trace returned 40 entries:
[bt] (0) /home/work/wuwei/tools/mxnet/lib64/python2.7/site-packages/mxnet-0.9.1-py2.7-linux-x86_64.egg/mxnet/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x29) [0x7f32b9501039]
[bt] (1) /home/work/wuwei/tools/mxnet/lib64/python2.7/site-packages/mxnet-0.9.1-py2.7-linux-x86_64.egg/mxnet/libmxnet.so(_ZN5mxnet11StorageImpl14ActivateDeviceENS_7ContextE+0x2a6) [0x7f32b9fb4de6]
[bt] (2) /home/work/wuwei/tools/mxnet/lib64/python2.7/site-packages/mxnet-0.9.1-py2.7-linux-x86_64.egg/mxnet/libmxnet.so(_ZN5mxnet11StorageImpl5AllocEmNS_7ContextE+0x4a) [0x7f32b9fb263a]
[bt] (3) /home/work/wuwei/tools/mxnet/lib64/python2.7/site-packages/mxnet-0.9.1-py2.7-linux-x86_64.egg/mxnet/libmxnet.so(MXNDArrayCreateEx+0x595) [0x7f32b9fe6685]
[bt] (4) /lib64/libffi.so.6(ffi_call_unix64+0x4c) [0x7f32c24f9dac]
[bt] (5) /lib64/libffi.so.6(ffi_call+0x1f5) [0x7f32c24f96d5]
[bt] (6) /usr/lib64/python2.7/lib-dynload/_ctypes.so(_ctypes_callproc+0x30b) [0x7f32c270cc8b]
[bt] (7) /usr/lib64/python2.7/lib-dynload/_ctypes.so(+0xaa85) [0x7f32c2706a85]
[bt] (8) /lib64/libpython2.7.so.1.0(PyObject_Call+0x43) [0x7f32cd97c0b3]
[bt] (9) /lib64/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x1d4c) [0x7f32cda1025c]
[bt] (10) /lib64/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x7ed) [0x7f32cda140bd]
[bt] (11) /lib64/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x425f) [0x7f32cda1276f]
[bt] (12) /lib64/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x7ed) [0x7f32cda140bd]
[bt] (13) /lib64/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x425f) [0x7f32cda1276f]
[bt] (14) /lib64/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x7ed) [0x7f32cda140bd]
[bt] (15) /lib64/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x425f) [0x7f32cda1276f]
[bt] (16) /lib64/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x7ed) [0x7f32cda140bd]
[bt] (17) /lib64/libpython2.7.so.1.0(+0x6f05d) [0x7f32cd9a105d]
[bt] (18) /lib64/libpython2.7.so.1.0(PyObject_Call+0x43) [0x7f32cd97c0b3]
[bt] (19) /lib64/libpython2.7.so.1.0(PyEval_EvalFrameEx+0xde7) [0x7f32cda0f2f7]
[bt] (20) /lib64/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x4350) [0x7f32cda12860]
[bt] (21) /lib64/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x4350) [0x7f32cda12860]
[bt] (22) /lib64/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x7ed) [0x7f32cda140bd]
[bt] (23) /lib64/libpython2.7.so.1.0(+0x6ef68) [0x7f32cd9a0f68]
[bt] (24) /lib64/libpython2.7.so.1.0(PyObject_Call+0x43) [0x7f32cd97c0b3]
[bt] (25) /lib64/libpython2.7.so.1.0(+0x590a5) [0x7f32cd98b0a5]
[bt] (26) /lib64/libpython2.7.so.1.0(PyObject_Call+0x43) [0x7f32cd97c0b3]
[bt] (27) /lib64/libpython2.7.so.1.0(+0xa1057) [0x7f32cd9d3057]
[bt] (28) /lib64/libpython2.7.so.1.0(+0x9fd6f) [0x7f32cd9d1d6f]
[bt] (29) /lib64/libpython2.7.so.1.0(PyObject_Call+0x43) [0x7f32cd97c0b3]
[bt] (30) /lib64/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x1d4c) [0x7f32cda1025c]
[bt] (31) /lib64/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x4350) [0x7f32cda12860]
[bt] (32) /lib64/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x7ed) [0x7f32cda140bd]
[bt] (33) /lib64/libpython2.7.so.1.0(PyEval_EvalCode+0x32) [0x7f32cda141c2]
[bt] (34) /lib64/libpython2.7.so.1.0(+0xfb5ff) [0x7f32cda2d5ff]
[bt] (35) /lib64/libpython2.7.so.1.0(PyRun_FileExFlags+0x7e) [0x7f32cda2e7be]
[bt] (36) /lib64/libpython2.7.so.1.0(PyRun_SimpleFileExFlags+0xe9) [0x7f32cda2fa49]
[bt] (37) /lib64/libpython2.7.so.1.0(Py_Main+0xc9f) [0x7f32cda40b9f]
[bt] (38) /lib64/libc.so.6(__libc_start_main+0xf5) [0x7f32ccc6cb15]
[bt] (39) python() [0x400721]

[14:32:58] /home/work/wuwei/project/dmlc/mxnet/dmlc-core/include/dmlc/./logging.h:300: [14:32:58] src/storage/storage.cc:38: Check failed: e == cudaSuccess || e == cudaErrorCudartUnloading CUDA: initialization error
......

The text was updated successfully, but these errors were encountered:

tornadomeet · 2017-01-13T07:25:20Z

v0.7.0 and v0.8.0 is ok, master will bring this error.

xlvector · 2017-01-17T03:55:56Z

I may meet similar problem:

[11:09:02] src/nnvm/legacy_json_util.cc:153: Loading symbol saved by previous version v0.8.0. Attempting to upgrade...
[11:09:03] /data00/tiger/.jenkins/workspace/lab_mxnet/dmlc-core/include/dmlc/./logging.h:300: [11:09:03] /data00/tiger/.jenkins/workspace/lab_mxnet/mshadow/mshadow/./tensor_gpu-inl.h:35: Check failed: e == cudaSuccess CUDA: initialization error

Stack trace returned 6 entries:
[bt] (0) /opt/tiger/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x29) [0x7f241b51d2b9]
[bt] (1) /opt/tiger/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN7mshadow9SetDeviceINS_3gpuEEEvi+0xb8) [0x7f241bfeb078]
[bt] (2) /opt/tiger/mxnet/python/mxnet/../../lib/libmxnet.so(_ZNSt17_Function_handlerIFvvEZZN5mxnet6engine23ThreadedEnginePerDevice13PushToExecuteEPNS2_8OprBlockEbENKUlvE1_clEvEUlvE_E9_M_invokeERKSt9_Any_data+0x20) [0x7f241bfee840]
[bt] (3) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb6970) [0x7f24afe93970]
[bt] (4) /lib/x86_64-linux-gnu/libpthread.so.0(+0x80a4) [0x7f24b400e0a4]
[bt] (5) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f24b342062d]

terminate called after throwing an instance of 'dmlc::Error'
what(): [11:09:03] /data00/tiger/.jenkins/workspace/lab_mxnet/mshadow/mshadow/./tensor_gpu-inl.h:35: Check failed: e == cudaSuccess CUDA: initialization error

Stack trace returned 6 entries:
[bt] (0) /opt/tiger/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x29) [0x7f241b51d2b9]
[bt] (1) /opt/tiger/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN7mshadow9SetDeviceINS_3gpuEEEvi+0xb8) [0x7f241bfeb078]
[bt] (2) /opt/tiger/mxnet/python/mxnet/../../lib/libmxnet.so(_ZNSt17_Function_handlerIFvvEZZN5mxnet6engine23ThreadedEnginePerDevice13PushToExecuteEPNS2_8OprBlockEbENKUlvE1_clEvEUlvE_E9_M_invokeERKSt9_Any_data+0x20) [0x7f241bfee840]
[bt] (3) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb6970) [0x7f24afe93970]
[bt] (4) /lib/x86_64-linux-gnu/libpthread.so.0(+0x80a4) [0x7f24b400e0a4]
[bt] (5) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f24b342062d]

piiswrong · 2017-01-17T04:28:26Z

The only way to reliably use cuda with multiprocessing is to import mxnet in after creating subprocesses.

mcxkarakoc · 2017-05-10T16:13:25Z

.

leezu · 2017-07-06T16:48:29Z

I don't have any problems executing above code with a current version of mxnet. @piiswrong do you have any insight into why it is working now compared to earlier this year? @tornadomeet do you still experience this issue? Perhaps it is related to a different cuda version/system configuration.. #4695 seems to contain the fix.

In general I believe using python multiprocessing and specifying the forkserver start method before importing mxnet should be a workaround for any cuda related multiprocessing issues. In particular it should still allow creating new processes after mxnet was imported, as the processes are forked from the forkserver which has no cuda context. This also seems to be what pytorch is doing.

szha · 2017-10-08T23:52:54Z

This issue is closed due to lack of activity in the last 90 days. Feel free to ping me to reopen if this is still an active issue. Thanks!

anxingle · 2018-06-27T02:39:41Z

SO , no solution to solve the problem？

anxingle · 2018-06-27T02:51:48Z

import numpy as np
import mxnet as mx
from multiprocessing import Process, current_process

def test():
    print("process id is {:s}".format(current_process().name))
    a = mx.nd.array(np.zeros((100, 100, 100, 100)), mx.gpu(0))
    a.asnumpy()

if __name__ == '__main__':
    # worker_count = multiprocessing.cpu_count() -2
    worker_count = 8 
    runs = [Process(target=test) for i in range(1)]  # 1 or 2 or N process is the same error
    for p in runs:
      p.start()
    for p in runs:
      p.join()
    print("done!")

It is magical！
I found it is OK when I set worker_count less than 8, while it doesn't work when worker_count more than 8!

zachgk · 2018-11-13T21:52:56Z

@mxnet-label-bot add [Python, Bug]

vrakesh · 2018-11-26T23:20:40Z

@szha Has this issue been resolved? I have not been able to reproduce the exact issue, It stalls to fail only when GPU runs out of memory, I have been able to spawn more than 10 workers with the example script. I see a related PR has been merged in dmlc/gluon-nlp repo

szha · 2018-11-27T03:10:17Z

@leezu might still have some issue with it so let's wait for his comment too.

leezu · 2018-11-27T03:39:13Z

Here is an updated test case

import numpy as np
import mxnet as mx
from multiprocessing import Process, current_process

def test():
    a = mx.random.seed(1)

if __name__ == '__main__':
    a = mx.nd.random_normal(shape=(10,10), ctx=mx.gpu(0))
    runs = [Process(target=test) for i in range(1)]
    for p in runs:
      p.start()
    for p in runs:
      p.join()

Here Cuda is initialized on the parent process before calling the child processes. You may argue, that GPU operations in the child processes should not be supported, but then the situation must be handled gracefully, ie. throw some error on the Python side and not the C++ side. But let's accept the current C++ exception. Even then, if we only want to do CPU work in the child process, above example will crash as the random.seed calls some Cuda related code internally. So there is currently no option to have deterministic execution of code in the child processes and code may crash at unexpected times (such as calling random.seed).

nkhdiscovery · 2019-02-23T08:58:40Z

@leezu
Here is something even more complex which works, I thought anybody else may come here and needs the solution - which does not work if you do not force mp.set_start_method('forkserver', force=True)

import random
import numpy as np
import mxnet as mx
import multiprocessing as mp

def test():
    mx.random.seed(random.randint(10,200))
    a = mx.nd.random_normal(shape=(2,2), ctx=mx.gpu(0))
    print('child no. ', mp.current_process().name, ':' , a)

if __name__ == '__main__':
    mp.set_start_method('forkserver', force=True)
    ab = mx.nd.random_normal(shape=(2,2), ctx=mx.gpu(0))
    print('main proc.: ', ab)
    runs = [mp.Process(target=test) for i in range(3)]
    for p in runs:
      p.start()
    for p in runs:
      p.join()

    print('done')

Hope it helps.

mdv3101 · 2019-04-04T12:37:06Z

Still facing issues unable to work and now have to change the entire architecture of the application because of this

larroy · 2019-07-17T20:29:36Z

@mxnet-label-bot add [Backend]

larroy · 2019-08-26T18:56:17Z

Related: #14979

Forking the library is not supported as of now.

larroy · 2019-08-27T00:48:15Z

I also can't reproduce this with the latest master

In [2]: import numpy as np
   ...: import mxnet as mx
   ...: from multiprocessing import Process, current_process
   ...: 
   ...: def test():
   ...:     print("process id is {:s}".format(current_process().name))
   ...:     a = mx.nd.array(np.zeros((100, 100, 100, 100)), mx.gpu(0))
   ...:     a.asnumpy()
   ...: 
   ...: if __name__ == '__main__':
   ...:     runs = [Process(target=test) for i in range(2)]  # 1 or 2 or N process is the same error
   ...:     for p in runs:
   ...:       p.start()
   ...:     for p in runs:
   ...:       p.join()
   ...:     print("done!")
   ...: 
process id is Process-2
process id is Process-3
done!

In [1]: import numpy as np
   ...: import mxnet as mx
   ...: from multiprocessing import Process, current_process
   ...: 
   ...: def test():
   ...:     print("process id is {:s}".format(current_process().name))
   ...:     a = mx.nd.array(np.zeros((100, 100, 100, 100)), mx.gpu(0))
   ...:     a.asnumpy()
   ...: 
   ...: if __name__ == '__main__':
   ...:     runs = [Process(target=test) for i in range(1)]  # 1 or 2 or N process is the same error
   ...:     for p in runs:
   ...:       p.start()
   ...:     for p in runs:
   ...:       p.join()
   ...:     print("done!")
   ...: 

process id is Process-1


done!

leezu · 2020-10-05T16:15:57Z

@PascalIversen provided a new reproducer: #19291

tornadomeet changed the title ~~gpu memory allocate will be when using multiprocessing.Process~~ gpu memory allocate will be error when using multiprocessing.Process Jan 13, 2017

szha closed this as completed Oct 8, 2017

fhieber mentioned this issue Oct 1, 2018

Sockeye freezes at new validation start [v1.18.54] awslabs/sockeye#544

Closed

leezu added a commit to leezu/gluon-nlp that referenced this issue Nov 12, 2018

Workaround apache/mxnet#4659

a342195

szha reopened this Nov 13, 2018

marcoabreu added Bug Python labels Nov 13, 2018

leezu added a commit to leezu/gluon-nlp that referenced this issue Nov 20, 2018

Workaround apache/mxnet#4659

7b8ac12

leezu added a commit to leezu/gluon-nlp that referenced this issue Nov 20, 2018

Workaround apache/mxnet#4659

044b595

szha pushed a commit to dmlc/gluon-nlp that referenced this issue Nov 20, 2018

Workaround apache/mxnet#4659

288b73c

paperplanet pushed a commit to paperplanet/gluon-nlp that referenced this issue Jun 9, 2019

Workaround apache/mxnet#4659

c432c6a

marcoabreu added the Backend Issues related to the backend of MXNet label Jul 17, 2019

PascalIversen mentioned this issue Sep 30, 2020

#898 causes cuda initialization error awslabs/gluonts#1054

Closed

leezu mentioned this issue Oct 5, 2020

MP: Check failed: e == cudaSuccess || e == cudaErrorCudartUnloading: CUDA: initialization error #19291

Closed

leezu mentioned this issue Oct 5, 2020

Multi-gpu Mxnet training in sagemaker gives cuda error when dataloader is using multiprocessing #18734

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gpu memory allocate will be error when using multiprocessing.Process #4659

gpu memory allocate will be error when using multiprocessing.Process #4659

tornadomeet commented Jan 13, 2017 •

edited

Loading

tornadomeet commented Jan 13, 2017

xlvector commented Jan 17, 2017

piiswrong commented Jan 17, 2017

mcxkarakoc commented May 10, 2017

leezu commented Jul 6, 2017 •

edited

Loading

szha commented Oct 8, 2017

anxingle commented Jun 27, 2018

anxingle commented Jun 27, 2018

zachgk commented Nov 13, 2018

vrakesh commented Nov 26, 2018

szha commented Nov 27, 2018

leezu commented Nov 27, 2018

nkhdiscovery commented Feb 23, 2019 •

edited

Loading

mdv3101 commented Apr 4, 2019

larroy commented Jul 17, 2019

larroy commented Aug 26, 2019

larroy commented Aug 27, 2019

leezu commented Oct 5, 2020 •

edited

Loading

gpu memory allocate will be error when using multiprocessing.Process #4659

gpu memory allocate will be error when using multiprocessing.Process #4659

Comments

tornadomeet commented Jan 13, 2017 • edited Loading

tornadomeet commented Jan 13, 2017

xlvector commented Jan 17, 2017

piiswrong commented Jan 17, 2017

mcxkarakoc commented May 10, 2017

leezu commented Jul 6, 2017 • edited Loading

szha commented Oct 8, 2017

anxingle commented Jun 27, 2018

anxingle commented Jun 27, 2018

zachgk commented Nov 13, 2018

vrakesh commented Nov 26, 2018

szha commented Nov 27, 2018

leezu commented Nov 27, 2018

nkhdiscovery commented Feb 23, 2019 • edited Loading

mdv3101 commented Apr 4, 2019

larroy commented Jul 17, 2019

larroy commented Aug 26, 2019

larroy commented Aug 27, 2019

leezu commented Oct 5, 2020 • edited Loading

tornadomeet commented Jan 13, 2017 •

edited

Loading

leezu commented Jul 6, 2017 •

edited

Loading

nkhdiscovery commented Feb 23, 2019 •

edited

Loading

leezu commented Oct 5, 2020 •

edited

Loading