PyTorch Version: `torch_shm_manager` error when running with multiprocessing #402

alex-razor · 2019-08-14T11:40:40Z

Running code doesnt work. I get the following error:

(venv) juggernaut@xmen9:/hdd/AlphaPose$ python demo.py --indir examples/demo/
Loading YOLO model..
torch_shm_manager: error while loading shared libraries: libcudart.so.10.0: cannot open shared object file: No such file or directory
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/queues.py", line 234, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/usr/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
  File "/hdd/kps_pipeline/venv/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 314, in reduce_storage
    metadata = storage._share_filename_()
RuntimeError: error executing torch_shm_manager at "/hdd/kps_pipeline/venv/lib/python3.6/site-packages/torch/bin/torch_shm_manager" at /pytorch/torch/lib/libshm/core.cpp:99
torch_shm_manager: error while loading shared libraries: libcudart.so.10.0: cannot open shared object file: No such file or directory
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/queues.py", line 234, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/usr/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
  File "/hdd/kps_pipeline/venv/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 314, in reduce_storage
    metadata = storage._share_filename_()
RuntimeError: error executing torch_shm_manager at "/hdd/kps_pipeline/venv/lib/python3.6/site-packages/torch/bin/torch_shm_manager" at /pytorch/torch/lib/libshm/core.cpp:99
torch_shm_manager: error while loading shared libraries: libcudart.so.10.0: cannot open shared object file: No such file or directory
torch_shm_manager: error while loading shared libraries: libcudart.so.10.0: cannot open shared object file: No such file or directory
Traceback (most recent call last):
  File "demo.py", line 50, in <module>
    det_loader = DetectionLoader(data_loader, batchSize=args.detbatch).start()
  File "/hdd/AlphaPose/dataloader.py", line 309, in start
    p.start()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 105, in start
    self._popen = self._Popen(self)
  File "/usr/lib/python3.6/multiprocessing/context.py", line 223, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/usr/lib/python3.6/multiprocessing/context.py", line 291, in _Popen
    return Popen(process_obj)
  File "/usr/lib/python3.6/multiprocessing/popen_forkserver.py", line 35, in __init__
    super().__init__(process_obj)
  File "/usr/lib/python3.6/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/usr/lib/python3.6/multiprocessing/popen_forkserver.py", line 47, in _launch
    reduction.dump(process_obj, buf)
  File "/usr/lib/python3.6/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
  File "/hdd/kps_pipeline/venv/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 314, in reduce_storage
    metadata = storage._share_filename_()
RuntimeError: error executing torch_shm_manager at "/hdd/kps_pipeline/venv/lib/python3.6/site-packages/torch/bin/torch_shm_manager" at /pytorch/torch/lib/libshm/core.cpp:99

Although, when i add flag --sp it works fine.

Python 3.6
CUDA 9.0
CUDNN 7
torch 1.2.0    
torchfile 0.1.0    
torchvision 0.4.0

The text was updated successfully, but these errors were encountered:

Fang-Haoshu · 2019-08-15T12:07:39Z

Hi, can you try modifying line 26 of 'demo.py' as below?
torch.multiprocessing.set_start_method('spawn', force=True)

alex-razor · 2019-08-15T14:14:06Z

Hi, can you try modifying line 26 of 'demo.py' as below?
torch.multiprocessing.set_start_method('spawn', force=True)

Thank you for your reply. However, it didn't help. same error.

Fang-Haoshu · 2019-08-16T02:38:09Z

Oh, it's so weird.. We have only tested for PyTorch 1.1 so far. Can you check if PyTorch 1.1 works for you?

alex-razor · 2019-08-18T06:38:57Z

That did work for me. Thanks!

David-on-Code · 2019-10-18T12:31:56Z

RuntimeError: error executing torch_shm_manager at "/hdd/kps_pipeline/venv/lib/python3.6/site-packages/torch/bin/torch_shm_manager" at /pytorch/torch/lib/libshm/core.cpp:99

how can i solve it?

schmmd · 2019-10-18T15:37:09Z

I'm also hitting this, but on torch==1.3.0

maochen · 2019-10-24T06:07:21Z

same on torch==1.3.0
os: MacOS 10.14.6

waiting-gy · 2019-11-05T06:45:58Z

RuntimeError: error executing torch_shm_manager at "/hdd/kps_pipeline/venv/lib/python3.6/site-packages/torch/bin/torch_shm_manager" at /pytorch/torch/lib/libshm/core.cpp:99

how can i solve it?

do you know how to solve it? thank you!

Abhipray · 2019-12-03T00:03:05Z

I was seeing this error with 1.3.0. Upgrading to 1.3.1 fixed it for me.

asheeshcric · 2019-12-05T00:23:29Z

@Abhipray I have torch==1.3.1 installed, but it isn't working for me. I get the same error. Has anyone found the solution to this problem?

Ehsan-Yaghoubi · 2019-12-09T17:11:13Z

I had the same problem. When I used the following versions, Alphapose worked and generated a Jason file for the images.

I created a virtual environment with Python 3.6. If you don't know how to do it, have a look at https://gist.github.com/frfahim/73c0fad6350332cef7a653bcd762f08d
I installed the latest version of PyTorch using
https://pytorch.org/ and selected CUDA 9.2 (Cuda 10.0 did not work)
I used (pip3 install torch==1.3.1+cu92 torchvision==0.4.2+cu92 -f https://download.pytorch.org/whl/torch_stable.html)
I installed Cuda 9.2 from https://developer.nvidia.com/cuda-92-download-archive?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu&target_version=1604&target_type=runfilelocal

Then follow the instruction of the Alphapose that says download the models and:

git clone -b pytorch https://github.com/MVIG-SJTU/AlphaPose.git
- pip3 install -r requirements.txt (remove the torch and torchvision and ntpath from this file and then run this code)
python3 demo.py --indir examples/demo --outdir examples/res

SUMMARY:

Linux 16.04
Python3.6
CUDA 9.2
CUDNN 7
torch==1.3.1+cu92
torchvision==0.4.2+cu92
GPU NVIDIA 2080ti

phamdat09 · 2020-01-09T10:45:07Z

Hello !!!
@Ehsan-Yaghoubi , how many FPS did you get ? Thanks

Ehsan-Yaghoubi · 2020-01-09T11:41:34Z

Hello !!!
@Ehsan-Yaghoubi , how many FPS did you get ? Thanks

Hi, I only used it to produce the pose information for my own dataset. I didn't check the metrics as I didn't need them.

phamdat09 · 2020-01-09T11:44:28Z

Hi !! @Ehsan-Yaghoubi thank for your reply !!

cslxiao · 2020-02-09T14:06:20Z

It still happens with PyTorch 1.4

cdyangbo · 2020-03-25T15:33:38Z

Set num_workers=0

cdyangbo · 2020-03-25T15:47:31Z

torch.multiprocessing.set_start_method('spawn', force=True) work well with num_works > 0 in macos

nlml · 2020-04-20T03:09:27Z

I was just able to fix this by commenting a line I had added to fix an issue on a different system:

Old: torch.multiprocessing.set_sharing_strategy('file_system')

New: # torch.multiprocessing.set_sharing_strategy('file_system')

I think the problem in my case might be caused by my system having CUDA 10.2 while Pytorch is installed as the 10.1 version. But commenting the above line at the start of my script fixed the problem, at least in my case.

Amir22010 · 2020-05-06T20:30:30Z

@nlml works for me thanks!! i have pytorch 1.4 with cuda 10.2...

qhdqhd · 2021-04-26T11:48:34Z

add --sp works fine for me

Zrrr1997 · 2021-06-04T17:58:42Z

Hitting the same error:

(alphapose) zrrr@zrrr-GL552VW:~/Projects/AlphaPose$ python scripts/demo_inference.py --cfg configs/coco/resnet/256x192_res50_lr1e-3_1x.yaml --checkpoint pretrained_models/fast_res50_256x192.pth --indir examples/demo/

Traceback (most recent call last):
  File "scripts/demo_inference.py", line 175, in <module>
    det_loader = DetectionLoader(input_source, get_detector(args), cfg, args, batchSize=args.detbatch, mode=mode, queueSize=args.qsize)
  File "/home/zrrr/Projects/AlphaPose/detector/apis.py", line 12, in get_detector
    from detector.yolo_api import YOLODetector
  File "/home/zrrr/Projects/AlphaPose/detector/yolo_api.py", line 27, in <module>
    from detector.nms import nms_wrapper
  File "/home/zrrr/Projects/AlphaPose/detector/nms/__init__.py", line 1, in <module>
    from .nms_wrapper import nms, soft_nms
  File "/home/zrrr/Projects/AlphaPose/detector/nms/nms_wrapper.py", line 4, in <module>
    from . import nms_cpu, nms_cuda
ImportError: libcudart.so.10.0: cannot open shared object file: No such file or directory

Python 3.6.13
Cuda Toolkit 9.0
cudnn 7.6.5
torch 1.1.0
torchvision 0.3.0

How can I fix this?

maochen · 2021-06-04T18:01:01Z

Hitting the same error:

(alphapose) zrrr@zrrr-GL552VW:~/Projects/AlphaPose$ python scripts/demo_inference.py --cfg configs/coco/resnet/256x192_res50_lr1e-3_1x.yaml --checkpoint pretrained_models/fast_res50_256x192.pth --indir examples/demo/

Traceback (most recent call last):
  File "scripts/demo_inference.py", line 175, in <module>
    det_loader = DetectionLoader(input_source, get_detector(args), cfg, args, batchSize=args.detbatch, mode=mode, queueSize=args.qsize)
  File "/home/zrrr/Projects/AlphaPose/detector/apis.py", line 12, in get_detector
    from detector.yolo_api import YOLODetector
  File "/home/zrrr/Projects/AlphaPose/detector/yolo_api.py", line 27, in <module>
    from detector.nms import nms_wrapper
  File "/home/zrrr/Projects/AlphaPose/detector/nms/__init__.py", line 1, in <module>
    from .nms_wrapper import nms, soft_nms
  File "/home/zrrr/Projects/AlphaPose/detector/nms/nms_wrapper.py", line 4, in <module>
    from . import nms_cpu, nms_cuda
ImportError: libcudart.so.10.0: cannot open shared object file: No such file or directory

Python 3.6.13
Cuda Toolkit 9.0
cudnn 7.6.5
torch 1.1.0
torchvision 0.3.0

How can I fix this?

Could you try any version of torch >= 1.3.1 to see if the issue still there?

qhdqhd · 2021-06-05T01:31:50Z

add --sp is ok

angerhang · 2021-12-29T16:59:17Z

I was just able to fix this by commenting a line I had added to fix an issue on a different system:

Old: torch.multiprocessing.set_sharing_strategy('file_system')

New: # torch.multiprocessing.set_sharing_strategy('file_system')

I think the problem in my case might be caused by my system having CUDA 10.2 while Pytorch is installed as the 10.1 version. But commenting the above line at the start of my script fixed the problem, at least in my case.

I had to do the same to make the code work on Linux. Any ideas why so strange?

tianhangpan · 2024-01-12T01:52:06Z

Hi, can you try modifying line 26 of 'demo.py' as below? torch.multiprocessing.set_start_method('spawn', force=True)

Thanks, that work for me on the Linux!

alex-razor closed this as completed Aug 18, 2019

alex-razor reopened this Aug 18, 2019

alex-razor closed this as completed Aug 18, 2019

pengsida mentioned this issue May 31, 2020

RuntimeError: error executing torch_shm_manager zju3dv/snake#47

Closed

JiwanChung mentioned this issue Jan 30, 2022

Dataset Not Available on LSMDC? JiwanChung/tapm#14

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PyTorch Version: `torch_shm_manager` error when running with multiprocessing #402

PyTorch Version: `torch_shm_manager` error when running with multiprocessing #402

alex-razor commented Aug 14, 2019 •

edited

Loading

Fang-Haoshu commented Aug 15, 2019 •

edited

Loading

alex-razor commented Aug 15, 2019

Fang-Haoshu commented Aug 16, 2019

alex-razor commented Aug 18, 2019

David-on-Code commented Oct 18, 2019

schmmd commented Oct 18, 2019

maochen commented Oct 24, 2019

waiting-gy commented Nov 5, 2019

Abhipray commented Dec 3, 2019

asheeshcric commented Dec 5, 2019

Ehsan-Yaghoubi commented Dec 9, 2019 •

edited

Loading

phamdat09 commented Jan 9, 2020

Ehsan-Yaghoubi commented Jan 9, 2020

phamdat09 commented Jan 9, 2020

cslxiao commented Feb 9, 2020

cdyangbo commented Mar 25, 2020

cdyangbo commented Mar 25, 2020

nlml commented Apr 20, 2020 •

edited

Loading

Amir22010 commented May 6, 2020

qhdqhd commented Apr 26, 2021

Zrrr1997 commented Jun 4, 2021

maochen commented Jun 4, 2021

qhdqhd commented Jun 5, 2021

angerhang commented Dec 29, 2021

tianhangpan commented Jan 12, 2024

PyTorch Version: torch_shm_manager error when running with multiprocessing #402

PyTorch Version: torch_shm_manager error when running with multiprocessing #402

Comments

alex-razor commented Aug 14, 2019 • edited Loading

Fang-Haoshu commented Aug 15, 2019 • edited Loading

alex-razor commented Aug 15, 2019

Fang-Haoshu commented Aug 16, 2019

alex-razor commented Aug 18, 2019

David-on-Code commented Oct 18, 2019

schmmd commented Oct 18, 2019

maochen commented Oct 24, 2019

waiting-gy commented Nov 5, 2019

Abhipray commented Dec 3, 2019

asheeshcric commented Dec 5, 2019

Ehsan-Yaghoubi commented Dec 9, 2019 • edited Loading

phamdat09 commented Jan 9, 2020

Ehsan-Yaghoubi commented Jan 9, 2020

phamdat09 commented Jan 9, 2020

cslxiao commented Feb 9, 2020

cdyangbo commented Mar 25, 2020

cdyangbo commented Mar 25, 2020

nlml commented Apr 20, 2020 • edited Loading

Amir22010 commented May 6, 2020

qhdqhd commented Apr 26, 2021

Zrrr1997 commented Jun 4, 2021

maochen commented Jun 4, 2021

qhdqhd commented Jun 5, 2021

angerhang commented Dec 29, 2021

tianhangpan commented Jan 12, 2024

PyTorch Version: `torch_shm_manager` error when running with multiprocessing #402

PyTorch Version: `torch_shm_manager` error when running with multiprocessing #402

alex-razor commented Aug 14, 2019 •

edited

Loading

Fang-Haoshu commented Aug 15, 2019 •

edited

Loading

Ehsan-Yaghoubi commented Dec 9, 2019 •

edited

Loading

nlml commented Apr 20, 2020 •

edited

Loading