Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Looking for information on expected speed / RAM usage #6

Closed
joshua-a-harris opened this issue Apr 7, 2022 · 5 comments
Closed

Looking for information on expected speed / RAM usage #6

joshua-a-harris opened this issue Apr 7, 2022 · 5 comments

Comments

@joshua-a-harris
Copy link

Hi,

Apologies if I have missed something obvious.

When I run the rgb_stacking environment for 300 steps with STATE_ONLY observations (and fixed action) it seems to take usually well over 20x longer than running 300 steps (with fixed action) using the Meta-World benchmark environment, which also uses a Sawyer arm with the MuJoCo simulator to do pick and place (and other) tasks.

rgb_stacking is obviously a more complicated environment/simulation but this seemed like a lot so I wanted to check if this is about the slowdown you would expect compared to other MuJoCo sawyer arm simulations (given the greater complexity) or whether this looks like an issue with my implementation?

On my machine the rgb_stacking environment also uses up much more RAM c.0.75GB for every instance - is this also what you would expect?

Couple of other questions:

  1. Does c.0.2-0.3 seconds per rgb_stacking env simulation step sound about right or is that very slow?
  2. Whether there are any settings that might be worth trying to speed up the simulation?

Any help would be really appreciated.

Example Execution Times 300 steps (code below):

rgb_stacking:
real 1m13.886s
user 1m42.632s

meta-world:
real 0m1.571s
user 0m4.835s

rgb_stacking test code:


from absl import app
import numpy as np
from rgb_stacking import environment

    
def test_run(argv):
 	env = environment.rgb_stacking(object_triplet='rgb_test_random', observation_set=environment.ObservationSet.STATE_ONLY)
 	step, reward, discount, obs = env.reset()
 	for x in range(300):
	 	step, reward, discount, obs = env.step(np.array([-0.01, 0.01, 0.03, 0.5, 100]))
 	env.close()
 	

if __name__ == '__main__':
	app.run(test_run)

meta-world test code:

import metaworld
import numpy as np


if __name__ == '__main__':
	task_name = 'push-v2'
	meta_world = metaworld.ML1(task_name, seed=0)
	env = meta_world.train_classes[task_name]() 
	task = meta_world.train_tasks[0] 
	env.set_task(task)
	obs = env.reset()
	for x in range(300):
		obs, env_reward, done, info = env.step(np.array([0.1, -0.1, 0.2, 0.01]))
	env.close()
@joshua-a-harris
Copy link
Author

joshua-a-harris commented Apr 8, 2022

Having dug into this some more I believe it looks like for me over 50% of the execution time is used running the enabled.observation_callable() function in this line on the 3 camera sensors / observables - which as I understand it generates the RGB inputs. If I've understood this correctly a few questions:

  1. When using STATE_ONLY observations is there any reason I wouldn't remove the 3 cameras from the environment to get a big improvement in execution time? And to do this, do I just replace these lines in task.py with an empty dictionary and comment out these lines in environment.py? Or does this create issues?

  2. When using ALL observations - from the paper it doesn't look like the 'basket_back_left/pixels' is ever used. Would it be worth just always omitting this camera given the significant speed up using the approach above?

  3. Is this share of the total execution time spent generating the RGB inputs what I should expect or is my version very slow?

Making the edits in (1) above to remove the 3 cameras I get the following execution times for rgb_stacking using the same code as in the original comment:
real 0m19.544s
user 0m20.410s

@alaurens
Copy link
Collaborator

alaurens commented Apr 8, 2022

Hello! Thank you very much for the detailed message here are some answers I hope will be useful:

  • It is expected that the environment is slow because we are doing the rendering of the images. This will be particularly slow if you are not using the GPU at all to do the rendering. Therefore 0.2-0.3 steps per second is expected with only CPU. To look into how to use the GPU to do rendering of mujoco scenes I would suggest looking here regarding rendering with mujoco.
  • Removing the lines you mentioned to remove all observations and rendering from the cameras will work fine and not induce any issues on the overall environment.
  • What exactly do you mean by "it doesn't look like the 'basket_back_left/pixels' is ever used"? If indeed you are not using that specific observation for an agent or other then yes, removing it does make sense.
  • We cannot run as fast as meta world because we need to keep the simulation step very low do to instability in the MuJoCo contacts at higher physics timesteps.
  • I am not sure about the ram usage, but I suspect most of it to be due to the images, have you checked if anything changes when you disable the cameras?

@joshua-a-harris
Copy link
Author

joshua-a-harris commented Apr 8, 2022

Hi! Thanks so much for your response, that's really helpful. I have added some replies to your bullets in order below:

  • Ok perfect, I ll look into shifting the rendering onto a GPU when I get onto vision tasks
  • Thanks that's really useful to know
  • Ah sorry I was just meaning that the original paper doesn't seem to use the back camera as agent input and just wanted to check there wasn't some other use for it in the paper's setup that I ve missed.
  • Ok that makes sense. I also think it might relate to the other variable that seems to make a big difference to execution time _PHYSICS_TIMESTEP in environment.py. It looks like one could in theory increase this up to around 0.002 (from 0.0005 currently) before you start getting errors. Is this highly inadvisable even if I m just going to use the simulation version (as far as I can tell the demo policy still works)? Or was the small _PHYSICS_TIMESTEP mainly chosen because the end goal of the paper was to do sim->real and so needed a very realistic simulation?
  • RAM usage is maybe 10-20% lower without the cameras but still quite large. I think possibly it is just a larger environment though.

Thanks again, really appreciate all your help!

@alaurens
Copy link
Collaborator

No worries glad to help! Here are further answers

Ah sorry I was just meaning that the original paper doesn't seem to use the back camera as agent input and just wanted to check there wasn't some other use for it in the paper's setup that I ve missed.

The back camera was indeed not used as input for the agent, the main use was to use with the blob detector to improve the localisation of the object

Ok that makes sense. I also think it might relate to the other variable that seems to make a big difference to execution time _PHYSICS_TIMESTEP in environment.py. It looks like one could in theory increase this up to around 0.002 (from 0.0005 currently) before you start getting errors. Is this highly inadvisable even if I m just going to use the simulation version (as far as I can tell the demo policy still works)? Or was the small _PHYSICS_TIMESTEP mainly chosen because the end goal of the paper was to do sim->real and so needed a very realistic simulation?

You can indeed increase the speed of the simulation by increasing the timestep. Please note however that this will make the contacts between the objects and the plan fairly unstable and they will start shacking and moving around significatly (the might even slide out of the gripper). But appart from that there is nothing inherently wrong with increasing the physics timestep.

RAM usage is maybe 10-20% lower without the cameras but still quite large. I think possibly it is just a larger environment though.

The environment is indeed large so this doesn't surprise me to much.

@joshua-a-harris
Copy link
Author

Great thanks that's really helpful!

Think that is all my questions for now so happy to close this issue.

Just finally on RAM though (in case it is useful for anyone else who might read this), I dug into it some more and it looks like the main thing driving the greater RAM usage is the setting of the Mujoco parameters nconmax and njmax to 5000 in these lines. If I revert to the Mujoco default (which I think is 1000) it cuts the env size from around 0.7GB to more like 0.15GB (because it allocates an n * n array for each of these parameters). I imagine this also potentially leads to less accurate contact dynamics though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants