GitHub - SZU-AdvTech-2022/349-Efficient-Off-policy-Meta-reinforcement-Learning-via-Probabilistic-Context-Variables

NoRML: No-Reward Meta Learning

This repository contains code released for the paper NoRML: No-Reward Meta Learning.

First, install all dependencies by

pip install -r norml/requirements.txt

The HalfCheetah environment requires Mujoco, so make sure you also followed the proper instructions to install mujoco and mujoco-py.

Example checkpoints are stored in Google Cloud Storage and can be downloaded by running:

gsutil cp -r gs://gresearch/norml/example_checkpoints/ .

You can start training from scratch by

python -m norml.train_maml --config MOVE_POINT_ROTATE_MAML --logs maml_checkpoints

Where config should be one of the configs defined in config_maml.py. The config string is of the type {ENV_NAME}_{ALG_NAME}, where ENV_NAME is one of MOVE_POINT_ROTATE, MOVE_POINT_ROTATE_SPARSE, CARTPOLE_SENSOR, HALFCHEETAH_MOTOR and ALG_NAME is one of DR, MAML, MAML_OFFSET, MAML_LAF, NORML as mentioned in the paper.

MOVE_POINT_ROTATE are fast to train and can converge within minutes. Training MOVE_POINT_ROTATE_SPARSE and CARTPOLE_SENSOR can take as long as a day. The Halfcheetah training was done via parallelized workers on a cloud server, and can take a long time on a single machine.

We also provide a convenient script to evaluate the training performance:

python -m norml.eval_maml \
--model_dir norml/example_checkpoints/move_point_rotate_sparse/norml/all_weights.ckpt-991 \
--output_dir maml_eval_results \
--render=True \
--num_finetune_steps 1 \
--test_task_index 0 \
--eval_finetune=True

You should be able to see states/actions logs and an optional rendered video in the maml_eval_results folder.

Citing

If you use this code in your research, please cite the following paper:

Yang, Y., Caluwaerts, K., Iscen, A., Tan, J. & Finn, C. (2019). NoRML: No-Reward Meta Learning.

@article{yang2019norml,
  title={NoRML: No-Reward Meta Learning},
  author={Yang, Yuxiang and Caluwaerts, Ken and Iscen, Atil and Tan, Jie and Finn, Chelsea},
  journal={arXiv preprint arXiv:1903.01063},
  year={2019}
}

Disclaimer: This is not an official Google product.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.idea		.idea
__pycache__		__pycache__
envs		envs
example_checkpoints		example_checkpoints
maml_checkpoints/train		maml_checkpoints/train
tools		tools
README.md		README.md
ResearchReport.pdf		ResearchReport.pdf
__init__.py		__init__.py
citation.txt		citation.txt
config_maml.py		config_maml.py
eval_maml.py		eval_maml.py
maml_rl.py		maml_rl.py
networks.py		networks.py
networks_test.py		networks_test.py
policies.py		policies.py
policies_test.py		policies_test.py
requirements.txt		requirements.txt
rollout_service.py		rollout_service.py
run.sh		run.sh
test.py		test.py
train_maml.py		train_maml.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NoRML: No-Reward Meta Learning

Citing

About

Releases

Packages

Languages

SZU-AdvTech-2022/349-Efficient-Off-policy-Meta-reinforcement-Learning-via-Probabilistic-Context-Variables

Folders and files

Latest commit

History

Repository files navigation

NoRML: No-Reward Meta Learning

Citing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages