CUDA out of memory when using POCA #5725

chenzhutian · 2022-04-05T00:17:02Z

Describe the bug
I try to use POCA in a customized environment with 10 agents. However, it keep showing CUDA out of memory.
Any idea for fixing this?
Thanks!

The full logs are attached below:

Traceback (most recent call last):
  File "C:\Users\ztchen\.conda\envs\kdma\lib\runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\ztchen\.conda\envs\kdma\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Users\ztchen\.conda\envs\kdma\Scripts\mlagents-learn.exe\__main__.py", line 7, in <module>
  File "C:\Users\ztchen\.conda\envs\kdma\lib\site-packages\mlagents\trainers\learn.py", line 260, in main
    run_cli(parse_command_line())
  File "C:\Users\ztchen\.conda\envs\kdma\lib\site-packages\mlagents\trainers\learn.py", line 256, in run_cli
    run_training(run_seed, options, num_areas)
  File "C:\Users\ztchen\.conda\envs\kdma\lib\site-packages\mlagents\trainers\learn.py", line 132, in run_training
    tc.start_learning(env_manager)
  File "C:\Users\ztchen\.conda\envs\kdma\lib\site-packages\mlagents_envs\timers.py", line 305, in wrapped
    return func(*args, **kwargs)
  File "C:\Users\ztchen\.conda\envs\kdma\lib\site-packages\mlagents\trainers\trainer_controller.py", line 176, in start_learning
    n_steps = self.advance(env_manager)
  File "C:\Users\ztchen\.conda\envs\kdma\lib\site-packages\mlagents_envs\timers.py", line 305, in wrapped
    return func(*args, **kwargs)
  File "C:\Users\ztchen\.conda\envs\kdma\lib\site-packages\mlagents\trainers\trainer_controller.py", line 251, in advance
    trainer.advance()
  File "C:\Users\ztchen\.conda\envs\kdma\lib\site-packages\mlagents\trainers\trainer\rl_trainer.py", line 315, in advance
    if self._update_policy():
  File "C:\Users\ztchen\.conda\envs\kdma\lib\site-packages\mlagents\trainers\poca\trainer.py", line 230, in _update_policy
    update_stats = self.optimizer.update(
  File "C:\Users\ztchen\.conda\envs\kdma\lib\site-packages\mlagents_envs\timers.py", line 305, in wrapped
    return func(*args, **kwargs)
  File "C:\Users\ztchen\.conda\envs\kdma\lib\site-packages\mlagents\trainers\poca\optimizer_torch.py", line 302, in update
    baselines, _ = self.critic.baseline(
  File "C:\Users\ztchen\.conda\envs\kdma\lib\site-packages\mlagents\trainers\poca\optimizer_torch.py", line 92, in baseline
    encoding, memories = self.network_body(
  File "C:\Users\ztchen\.conda\envs\kdma\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\ztchen\.conda\envs\kdma\lib\site-packages\mlagents\trainers\torch\networks.py", line 401, in forward
    encoded = self.observation_encoder(inputs)
  File "C:\Users\ztchen\.conda\envs\kdma\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\ztchen\.conda\envs\kdma\lib\site-packages\mlagents\trainers\torch\networks.py", line 132, in forward
    attention_embedding = self.rsa(qkv, masks)
  File "C:\Users\ztchen\.conda\envs\kdma\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\ztchen\.conda\envs\kdma\lib\site-packages\mlagents\trainers\torch\attention.py", line 287, in forward
    output = self.fc_out(output) + inp
RuntimeError: CUDA out of memory. Tried to allocate 30.00 MiB (GPU 0; 12.00 GiB total capacity; 10.13 GiB already allocated; 0 bytes free; 10.67 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Environment (please complete the following information):

Unity Version: Unity 2021.2.5f1
OS + version: Windows 10
ML-Agents version: 0.28.0
Torch version: 1.7.1

The text was updated successfully, but these errors were encountered:

chenzhutian · 2022-04-05T12:29:41Z

Fix after upgrading pytorch to the latest version

GamerLordMat12345 · 2022-04-23T10:52:52Z

Fix after upgrading pytorch to the latest version
Hello,
to which Version you upgraded?

github-actions · 2022-05-23T12:03:23Z

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

chenzhutian added the bug Issue describes a potential bug in ml-agents. label Apr 5, 2022

chenzhutian closed this as completed Apr 5, 2022

github-actions bot locked as resolved and limited conversation to collaborators May 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA out of memory when using POCA #5725

CUDA out of memory when using POCA #5725

chenzhutian commented Apr 5, 2022

chenzhutian commented Apr 5, 2022

GamerLordMat12345 commented Apr 23, 2022

github-actions bot commented May 23, 2022

CUDA out of memory when using POCA #5725

CUDA out of memory when using POCA #5725

Comments

chenzhutian commented Apr 5, 2022

chenzhutian commented Apr 5, 2022

GamerLordMat12345 commented Apr 23, 2022

github-actions bot commented May 23, 2022