Minimal implementation of Proximal Policy Optimization (PPO) in PyTorch
- support discrete and continuous action space
- In continuous action space, we use the constance std for sampling.
- utils to plot learning graphs in tensorboard
- 2023-09-09
- Update "Generative Adversarial Imitation Learning(GAIL)"
Find or make a config file and run the following command.
python main.py --config=configs/Ant-v4.yaml
--exp_name=test
--train
python make_expert_dataset.py --experiment_path=checkpoints/Ant/test
--load_postfix=last
--minimum_score=5000
--n_episode=30
python main.py --experiment_path=checkpoints/Ant/test
--eval
--eval_n_episode=50
--load_postfix=last
--video_path=videos/Ant
- load_path: pretrained model prefix(ex/ number of episode, 'best' or 'last') to play
ant.mp4
reacher.mp4
cheetah.mp4
- IMPLEMENTATION MATTERS IN DEEP POLICY GRADIENT
S: A CASE STUDY ON PPO AND TRPO