Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Execution Freeze while using CudaDevice.count() in global scope #69

Closed
3 tasks done
aradhyamathur opened this issue Apr 10, 2023 · 5 comments · Fixed by #70
Closed
3 tasks done

[Bug] Execution Freeze while using CudaDevice.count() in global scope #69

aradhyamathur opened this issue Apr 10, 2023 · 5 comments · Fixed by #70
Assignees
Labels
api Something related to the core APIs bug Something isn't working enhancement New feature or request

Comments

@aradhyamathur
Copy link

Required prerequisites

  • I have read the documentation https://nvitop.readthedocs.io.
  • I have searched the Issue Tracker that this hasn't already been reported. (comment there if it has.)
  • I have tried the latest version of nvitop in a new isolated virtual environment.

Questions

Upon calling print(CudaDevice.count()) I receive the following error and the execution gets stuck and have to interrupt it manually. Can you please guide ?
nvitop version 1.1.1
NVIDIA-SMI 470.182.03 Driver Version: 470.182.03 CUDA Version: 11.4


  File "schedule_clients.py", line 27, in <module>
    print(CudaDevice.count())
  File "/home/aradhya/anaconda3/envs/dreamfuse/lib/python3.9/site-packages/nvitop/api/device.py", line 2132, in count
    return len(super().parse_cuda_visible_devices())
  File "/home/aradhya/anaconda3/envs/dreamfuse/lib/python3.9/site-packages/nvitop/api/device.py", line 488, in parse_cuda_visible_devices
    return parse_cuda_visible_devices(cuda_visible_devices)
  File "/home/aradhya/anaconda3/envs/dreamfuse/lib/python3.9/site-packages/nvitop/api/device.py", line 2357, in parse_cuda_visible_devices
    return _parse_cuda_visible_devices(cuda_visible_devices, format='index')
  File "/home/aradhya/anaconda3/envs/dreamfuse/lib/python3.9/site-packages/nvitop/api/device.py", line 2491, in _parse_cuda_visible_devices
    raw_uuids = _parse_cuda_visible_devices_to_uuids(cuda_visible_devices, verbose=False)
  File "/home/aradhya/anaconda3/envs/dreamfuse/lib/python3.9/site-packages/nvitop/api/device.py", line 2616, in _parse_cuda_visible_devices_to_uuids
    parser.start()
  File "/home/aradhya/anaconda3/envs/dreamfuse/lib/python3.9/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/home/aradhya/anaconda3/envs/dreamfuse/lib/python3.9/multiprocessing/context.py", line 284, in _Popen
    return Popen(process_obj)
  File "/home/aradhya/anaconda3/envs/dreamfuse/lib/python3.9/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/home/aradhya/anaconda3/envs/dreamfuse/lib/python3.9/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/home/aradhya/anaconda3/envs/dreamfuse/lib/python3.9/multiprocessing/popen_spawn_posix.py", line 42, in _launch
    prep_data = spawn.get_preparation_data(process_obj._name)
  File "/home/aradhya/anaconda3/envs/dreamfuse/lib/python3.9/multiprocessing/spawn.py", line 154, in get_preparation_data
    _check_not_importing_main()
  File "/home/aradhya/anaconda3/envs/dreamfuse/lib/python3.9/multiprocessing/spawn.py", line 134, in _check_not_importing_main
    raise RuntimeError('''
RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.
^CTraceback (most recent call last):
  File "/home/aradhya/stable-dreamfusionBkp/schedule_clients.py", line 27, in <module>
    print(CudaDevice.count())
  File "/home/aradhya/anaconda3/envs/dreamfuse/lib/python3.9/site-packages/nvitop/api/device.py", line 2132, in count
    return len(super().parse_cuda_visible_devices())
  File "/home/aradhya/anaconda3/envs/dreamfuse/lib/python3.9/site-packages/nvitop/api/device.py", line 488, in parse_cuda_visible_devices
    return parse_cuda_visible_devices(cuda_visible_devices)
  File "/home/aradhya/anaconda3/envs/dreamfuse/lib/python3.9/site-packages/nvitop/api/device.py", line 2357, in parse_cuda_visible_devices
    return _parse_cuda_visible_devices(cuda_visible_devices, format='index')
  File "/home/aradhya/anaconda3/envs/dreamfuse/lib/python3.9/site-packages/nvitop/api/device.py", line 2491, in _parse_cuda_visible_devices
    raw_uuids = _parse_cuda_visible_devices_to_uuids(cuda_visible_devices, verbose=False)
  File "/home/aradhya/anaconda3/envs/dreamfuse/lib/python3.9/site-packages/nvitop/api/device.py", line 2623, in _parse_cuda_visible_devices_to_uuids
    result = queue.get()
  File "/home/aradhya/anaconda3/envs/dreamfuse/lib/python3.9/multiprocessing/queues.py", line 365, in get
    res = self._reader.recv_bytes()
  File "/home/aradhya/anaconda3/envs/dreamfuse/lib/python3.9/multiprocessing/connection.py", line 216, in recv_bytes
    buf = self._recv_bytes(maxlength)
  File "/home/aradhya/anaconda3/envs/dreamfuse/lib/python3.9/multiprocessing/connection.py", line 414, in _recv_bytes
    buf = self._recv(4)
  File "/home/aradhya/anaconda3/envs/dreamfuse/lib/python3.9/multiprocessing/connection.py", line 379, in _recv
    chunk = read(handle, remaining)
KeyboardInterrupt
@aradhyamathur aradhyamathur added the question Further information is requested label Apr 10, 2023
@aradhyamathur
Copy link
Author

Seems like it pertains to name, i was directly running without setting the namespace, which led to the above error in the script, whereas in the ipython shell it is already set by already.

@XuehaiPan XuehaiPan changed the title [Question] Execution Freeze while using CudaDevice.count() [Bug] Execution Freeze while using CudaDevice.count() in global scope Apr 10, 2023
@XuehaiPan
Copy link
Owner

XuehaiPan commented Apr 10, 2023

Hi @aradhyamathur, thanks for raising this. Having CudaDevice.count() in global scope is a common use case. I think this should be considered a bug. Executing the CUDA_VISIBLE_DEVICE parser in the global scope will always reports this annoying message:

# test.py

from nvitop import CudaDevice

print(CudaDevice.count())
RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

I opened a PR #70 to resolve this. Could you try the patch with the following command?

pip3 install git+https://github.com/XuehaiPan/nvitop@spawn-subprocess

@XuehaiPan XuehaiPan reopened this Apr 10, 2023
@XuehaiPan XuehaiPan added bug Something isn't working enhancement New feature or request api Something related to the core APIs and removed question Further information is requested labels Apr 10, 2023
@XuehaiPan
Copy link
Owner

This fix is included in release 1.1.2.

pip3 install --upgrade nvitop

@aradhyamathur
Copy link
Author

Thanks I'll check.

@aradhyamathur
Copy link
Author

Yeah @XuehaiPan it works, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api Something related to the core APIs bug Something isn't working enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants