Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail to run tutorial_Isaac_Gym.py #169

Closed
planetbalileua opened this issue Jun 17, 2022 · 5 comments
Closed

Fail to run tutorial_Isaac_Gym.py #169

planetbalileua opened this issue Jun 17, 2022 · 5 comments
Assignees
Labels
bug Something isn't working

Comments

@planetbalileua
Copy link

Hello! Thank you for creating this brilliant library! This is so helpful on a personal project I am working on.
I faced an error when trying to run tutorial_Isaac_Gym.py in the example folder:

Traceback (most recent call last):
  File "/home/meow/anaconda3/envs/igym/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/home/meow/anaconda3/envs/igym/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/meow/ElegantRL/elegantrl/train/run.py", line 162, in run
    env = build_env(args.env, args.env_func, args.env_args)
  File "/home/meow/ElegantRL/elegantrl/train/config.py", line 249, in build_env
    env = env_func(**kwargs_filter(env_func.__init__, env_args.copy()))
  File "/home/meow/ElegantRL/elegantrl/envs/IsaacGym.py", line 45, in __init__
    env: VecTask = isaac_task(
  File "/home/meow/ElegantRL/elegantrl/envs/isaac_tasks/ant.py", line 69, in __init__
    super().__init__(
  File "/home/meow/ElegantRL/elegantrl/envs/isaac_tasks/base/vec_task.py", line 213, in __init__
    self.create_sim()
  File "/home/meow/ElegantRL/elegantrl/envs/isaac_tasks/ant.py", line 156, in create_sim
    self._create_envs(
  File "/home/meow/ElegantRL/elegantrl/envs/isaac_tasks/ant.py", line 199, in _create_envs
    self.joint_gears = to_torch(motor_efforts, device=self.device)
  File "/home/meow/Downloads/IsaacGym_Preview_3_Package/isaacgym/python/isaacgym/torch_utils.py", line 16, in to_torch
    return torch.tensor(x, dtype=dtype, device=device, requires_grad=requires_grad)
  File "/home/meow/anaconda3/envs/igym/lib/python3.8/site-packages/torch/cuda/__init__.py", line 216, in _lazy_init
    torch._C._cuda_init()
RuntimeError: CUDA error: out of memory

I'm running this on NVIDIA RTX3070TI with 8GB VRAM, and my CUDA version is:

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Nov_30_19:08:53_PST_2020
Cuda compilation tools, release 11.2, V11.2.67
Build cuda_11.2.r11.2/compiler.29373293_0

The same Ant(with 2048env) example was working when I test it using the original isaac gym train.py. I'm pretty sure that I have free VRAM (~7.2GB) when running this but it still appears the CUDA out of memory error. My torch version is 1.11.0.

I have also tried to reduce the number of envs, batch size, network size and other parameters, but the error remains.

Once again thank you so much for any possible help on this issue

@YangletLiu YangletLiu added the bug Something isn't working label Jun 19, 2022
@supersglzc
Copy link
Collaborator

Hi planetbalileua and thanks for reaching out!

We realize that some codes are not consistent due to the fast iteration and we are doing refactorings.
For Isaac Gym users, I have published a single process version with a demo on Ant and Humanoid. Could you please try that and see if the error remains?

@planetbalileua
Copy link
Author

planetbalileua commented Jun 21, 2022

Hi supersglzc!
The single process version after some small modifications works fine!
The changes I made:
Add

    args.if_use_per 
and comment out line 60 in elegant/rl/train/evaluator.py (which is using wandb)
Thank you so so much for you help!

@YangletLiu
Copy link
Contributor

Would you like to test the updated file at: https://github.com/AI4Finance-Foundation/ElegantRL/blob/master/examples/tutorial_Isaac_Gym.py

@planetbalileua
Copy link
Author

Hi!
I have tested the updated file and there's an error on finding train_and_evaluate_mp in run.py for the latest release.
Some other errors from my side:

ImportError: cannot import name 'ReplayBufferList' from 'elegantrl.train.replay_buffer' (/home/meow/ElegantRL/elegantrl/train/replay_buffer.py)

So I added replay buffer list in replay_buffer.py

  File "/home/meow/ElegantRL/elegantrl/agents/AgentPPO.py", line 657, in AgentPPOHterm
    def __init__(self, net_dim: int, state_dim: int, action_dim: int, gpu_id: int = 0, args: Arguments = None):
NameError: name 'Arguments' is not defined

Added from elegantrl.train.config import Arguments

Thank you again for updating!

@supersglzc
Copy link
Collaborator

Fixed the errors. The issue is closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants