Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA out of memory when using POCA #5725

Closed
chenzhutian opened this issue Apr 5, 2022 · 3 comments
Closed

CUDA out of memory when using POCA #5725

chenzhutian opened this issue Apr 5, 2022 · 3 comments
Labels
bug Issue describes a potential bug in ml-agents.

Comments

@chenzhutian
Copy link

Describe the bug
I try to use POCA in a customized environment with 10 agents. However, it keep showing CUDA out of memory.
Any idea for fixing this?
Thanks!

The full logs are attached below:

Traceback (most recent call last):
  File "C:\Users\ztchen\.conda\envs\kdma\lib\runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\ztchen\.conda\envs\kdma\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Users\ztchen\.conda\envs\kdma\Scripts\mlagents-learn.exe\__main__.py", line 7, in <module>
  File "C:\Users\ztchen\.conda\envs\kdma\lib\site-packages\mlagents\trainers\learn.py", line 260, in main
    run_cli(parse_command_line())
  File "C:\Users\ztchen\.conda\envs\kdma\lib\site-packages\mlagents\trainers\learn.py", line 256, in run_cli
    run_training(run_seed, options, num_areas)
  File "C:\Users\ztchen\.conda\envs\kdma\lib\site-packages\mlagents\trainers\learn.py", line 132, in run_training
    tc.start_learning(env_manager)
  File "C:\Users\ztchen\.conda\envs\kdma\lib\site-packages\mlagents_envs\timers.py", line 305, in wrapped
    return func(*args, **kwargs)
  File "C:\Users\ztchen\.conda\envs\kdma\lib\site-packages\mlagents\trainers\trainer_controller.py", line 176, in start_learning
    n_steps = self.advance(env_manager)
  File "C:\Users\ztchen\.conda\envs\kdma\lib\site-packages\mlagents_envs\timers.py", line 305, in wrapped
    return func(*args, **kwargs)
  File "C:\Users\ztchen\.conda\envs\kdma\lib\site-packages\mlagents\trainers\trainer_controller.py", line 251, in advance
    trainer.advance()
  File "C:\Users\ztchen\.conda\envs\kdma\lib\site-packages\mlagents\trainers\trainer\rl_trainer.py", line 315, in advance
    if self._update_policy():
  File "C:\Users\ztchen\.conda\envs\kdma\lib\site-packages\mlagents\trainers\poca\trainer.py", line 230, in _update_policy
    update_stats = self.optimizer.update(
  File "C:\Users\ztchen\.conda\envs\kdma\lib\site-packages\mlagents_envs\timers.py", line 305, in wrapped
    return func(*args, **kwargs)
  File "C:\Users\ztchen\.conda\envs\kdma\lib\site-packages\mlagents\trainers\poca\optimizer_torch.py", line 302, in update
    baselines, _ = self.critic.baseline(
  File "C:\Users\ztchen\.conda\envs\kdma\lib\site-packages\mlagents\trainers\poca\optimizer_torch.py", line 92, in baseline
    encoding, memories = self.network_body(
  File "C:\Users\ztchen\.conda\envs\kdma\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\ztchen\.conda\envs\kdma\lib\site-packages\mlagents\trainers\torch\networks.py", line 401, in forward
    encoded = self.observation_encoder(inputs)
  File "C:\Users\ztchen\.conda\envs\kdma\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\ztchen\.conda\envs\kdma\lib\site-packages\mlagents\trainers\torch\networks.py", line 132, in forward
    attention_embedding = self.rsa(qkv, masks)
  File "C:\Users\ztchen\.conda\envs\kdma\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\ztchen\.conda\envs\kdma\lib\site-packages\mlagents\trainers\torch\attention.py", line 287, in forward
    output = self.fc_out(output) + inp
RuntimeError: CUDA out of memory. Tried to allocate 30.00 MiB (GPU 0; 12.00 GiB total capacity; 10.13 GiB already allocated; 0 bytes free; 10.67 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Environment (please complete the following information):

  • Unity Version: Unity 2021.2.5f1
  • OS + version: Windows 10
  • ML-Agents version: 0.28.0
  • Torch version: 1.7.1
@chenzhutian chenzhutian added the bug Issue describes a potential bug in ml-agents. label Apr 5, 2022
@chenzhutian
Copy link
Author

Fix after upgrading pytorch to the latest version

@GamerLordMat12345
Copy link

Fix after upgrading pytorch to the latest version
Hello,
to which Version you upgraded?

@github-actions
Copy link

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Issue describes a potential bug in ml-agents.
Projects
None yet
Development

No branches or pull requests

2 participants