Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError: 'EnvName' #9

Open
Charlie0257 opened this issue Dec 6, 2022 · 15 comments
Open

KeyError: 'EnvName' #9

Charlie0257 opened this issue Dec 6, 2022 · 15 comments

Comments

@Charlie0257
Copy link

Hello, when I the command python launch_experiment.py, there is an error occurring——

"""
Traceback (most recent call last):
File "/home/charlie/anaconda3/envs/CORRO/lib/python3.6/multiprocessing/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/home/charlie/anaconda3/envs/CORRO/lib/python3.6/multiprocessing/pool.py", line 47, in starmapstar
return list(itertools.starmap(args[0], args[1]))
File "launch_experiment.py", line 36, in experiment
env = NormalizedBoxEnv(ENVSvariant['env_name'], obs_absmax=obs_absmax)
KeyError: 'walker-rand-params'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "launch_experiment.py", line 211, in
main()
File "/home/charlie/anaconda3/envs/CORRO/lib/python3.6/site-packages/click/core.py", line 829, in call
return self.main(*args, **kwargs)
File "/home/charlie/anaconda3/envs/CORRO/lib/python3.6/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/home/charlie/anaconda3/envs/CORRO/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/charlie/anaconda3/envs/CORRO/lib/python3.6/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "launch_experiment.py", line 202, in main
p.starmap(experiment, product([variant], variant['seed_list']))
File "/home/charlie/anaconda3/envs/CORRO/lib/python3.6/multiprocessing/pool.py", line 274, in starmap
return self._map_async(func, iterable, starmapstar, chunksize).get()
File "/home/charlie/anaconda3/envs/CORRO/lib/python3.6/multiprocessing/pool.py", line 644, in get
raise self._value
KeyError: 'walker-rand-params'

And when the tested env is 'cheetah_dir', there is the same error occurs :(

Thanks for any help :)

@LanqingLi1993
Copy link
Owner

Hi,

Under the project home directory, can you try running "python launch_experiment.py ./configs/walker_rand_params.json" for the Walker env and "python launch_experiment.py ./configs/cheetah-dir.json" for the Cheetah-Dir env? Basically you would need to invoke different config files for different MuJoCo environments.

For new environments not included in "./configs", you would need to set up your own config file.

Let me know if you have further questions. @Charlie0257

@Charlie0257
Copy link
Author

When I run the command python launch_experiment.py ./configs/walker_rand_params.json, there is the same error occurring :(

@LanqingLi1993
Copy link
Owner

LanqingLi1993 commented Dec 6, 2022

Hi,

Thank you for your feedback.

I just checked the code. The current errors were most likely caused by the absence of MuJoCo or wrong installed verson of it.

Each env in FOCAL is registered by the code here: https://github.com/LanqingLi1993/FOCAL-ICLR/blob/master/rlkit/envs/__init__.py (scripts for declaring the envs can be found here), which requires proper installation of the MuJoCo module. In order to run the Walker env, you need a special version of MuJoCo (MuJoCo131 at the time FOCAL was published), for which you can follow the installation instruction here: https://github.com/LanqingLi1993/FOCAL-ICLR/blob/master/rlkit/envs/__init__.py. For other envs like Cheetah-Dir, installing MuJoCo150 or plus should work.

Please give it a try and let me know if you have further questions. Good luck! @Charlie0257

@Charlie0257
Copy link
Author

Charlie0257 commented Dec 7, 2022

Thanks for your reply! I have solved the above problem :)

However, when I run the command python launch_experiment.py ./configs/walker_rand_params.json, an another error occurs——

"""
Traceback (most recent call last):
File "/home/charlie/anaconda3/envs/CORRO/lib/python3.6/multiprocessing/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/home/charlie/anaconda3/envs/CORRO/lib/python3.6/multiprocessing/pool.py", line 47, in starmapstar
return list(itertools.starmap(args[0], args[1]))
File "launch_experiment.py", line 36, in experiment
env = NormalizedBoxEnv(ENVSvariant['env_name'], obs_absmax=obs_absmax)
File "/home/charlie/PycharmProjects/FOCAL-ICLR/rlkit/envs/walker_rand_params_wrapper.py", line 41, in init
self.tasks = self.sample_tasks(n_tasks)
File "/home/charlie/PycharmProjects/FOCAL-ICLR/rlkit/envs/walker_rand_params_wrapper.py", line 86, in sample_tasks
task_params = read_log_params("./data/walker_rand_params/goal_idx{}/log.txt".format(i))
File "/home/charlie/PycharmProjects/FOCAL-ICLR/rlkit/envs/walker_rand_params_wrapper.py", line 8, in read_log_params
with open(log_file) as f:
FileNotFoundError: [Errno 2] No such file or directory: './data/walker_rand_params/goal_idx1/log.txt'
"""

I think this error is from line 86 in rlkit/envs/walker_rand_params_wrapper.py. Following this, I found there is no log.txt in data/walker_rand_params/goal_idx1/ which only exists in ./data/walker_rand_params/goal_idx0/. But I downloaded the dataset from https://drive.google.com/file/d/1zdaUX-LC8c6AaS9We85bUvoA_9iHZcyg/view?usp=sharing.

Could you give me some suggestions?

@LanqingLi1993
Copy link
Owner

Hi,

I just downloaded the dataset and checked, I do see there is a log.txt in all goal_idx files. Did you unzip it correctly? Can you double check? @Charlie0257

@Charlie0257
Copy link
Author

Thanks for your reply! I have solved the above problem :)

However, there is an another error occurring :(

"""
Traceback (most recent call last):
File "/home/charlie/anaconda3/envs/CORRO/lib/python3.6/multiprocessing/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/home/charlie/anaconda3/envs/CORRO/lib/python3.6/multiprocessing/pool.py", line 47, in starmapstar
return list(itertools.starmap(args[0], args[1]))
File "launch_experiment.py", line 172, in experiment
algorithm.train()
File "/home/charlie/PycharmProjects/FOCAL-ICLR/rlkit/core/rl_algorithm.py", line 441, in train
z_means, z_vars = self._do_training(indices, zloss=True)
File "/home/charlie/PycharmProjects/FOCAL-ICLR/rlkit/torch/sac/sac.py", line 182, in _do_training
z_means, z_vars = self._take_step(indices, context, zloss=zloss)
File "/home/charlie/PycharmProjects/FOCAL-ICLR/rlkit/torch/sac/sac.py", line 371, in _take_step
policy_loss.backward(retain_graph=True)

File "/home/charlie/anaconda3/envs/CORRO/lib/python3.6/site-packages/torch/tensor.py", line 221, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/home/charlie/anaconda3/envs/CORRO/lib/python3.6/site-packages/torch/autograd/init.py", line 132, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [256, 1]], which is output 0 of TBackward, is at version 4; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

My GPU is RTX3070, and the version of pytorch is 1.7.1 with cuda 11.0. I don't know if this error is caused by the torch version :(

Could you give me some suggestions?
Thanks for any help :)

@LanqingLi1993
Copy link
Owner

LanqingLi1993 commented Dec 15, 2022

Hi, did you solve the above problem? Can you try to delete all retrain_graph=True arguments in the _take_step() function? @Charlie0257

@YTao-123
Copy link

Hi, first of all thank you very much for your work. At present, I have encountered the same problem. I have tried to remove all retain-graph=True, but it seems to have no effect, and the problem still occurs. @LanqingLi1993

@Lagrant
Copy link

Lagrant commented Dec 15, 2022

Hi Charlie,

This problem comes from task z vectors not being detached from sac training. What you need to do is to add detach() in the following:
Add task_z.detach() in rlkit/torch/sac/sac.py at line 278 and line 281 respectively:

div_estimate = self._divergence.dual_estimate(
obs, new_actions, actions, task_z.detach())

c_loss = self._divergence.dual_critic_loss(obs, new_actions, actions, task_z.detach())

Add task_z.detach() in rlkit/torch/sac/agent.py at line 204

in_ = torch.cat([obs, task_z.detach()], dim=1)

Stop backpropagation in rlkit/torch/brac/divergences.py at line 67

with torch.no_grad():
    logits_p = self.c(s.size()[0], s.size()[1], s, a_p, task_z)
    logits_b = self.c(s.size()[0], s.size()[1], s, a_b, task_z)

Then it should work. @Charlie0257 @YTao-123

@Lagrant
Copy link

Lagrant commented Dec 26, 2022

Hi Charlie and Tao, do these issue fixings work for you and can you reproduce FOCAL results successfully? @Charlie0257 @YTao-123

@Charlie0257
Copy link
Author

Thanks for your help. The project is now working properly:) @Lagrant @LanqingLi1993

@Lagrant
Copy link

Lagrant commented Dec 28, 2022

That's great. By the way, can you get the same result as the paper declared?@Charlie0257

@ZRZ-Unknow
Copy link

When I run the command python launch_experiment.py ./configs/walker_rand_params.json, there is the same error occurring :(

Hi, I had the same problem, can you tell me how you solved it? Thanks for reply.

@LanqingLi1993
Copy link
Owner

When I run the command python launch_experiment.py ./configs/walker_rand_params.json, there is the same error occurring :(

Hi, I had the same problem, can you tell me how you solved it? Thanks for reply.

Hi, please check my answers to @Charlie0257 above. If the problem remains, you might want to ask @Charlie0257 since he/she claimed to have resolved the issue. Let me know if this works @ZRZ-Unknow.

@ZRZ-Unknow
Copy link

When I run the command python launch_experiment.py ./configs/walker_rand_params.json, there is the same error occurring :(

Hi, I had the same problem, can you tell me how you solved it? Thanks for reply.

Hi, please check my answers to @Charlie0257 above. If the problem remains, you might want to ask @Charlie0257 since he/she claimed to have resolved the issue. Let me know if this works @ZRZ-Unknow.

Yes, I have solved the problem, thanks for your apply!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants