Skip to content

cenkcorapci/ppo-agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PPO-Agent

License: MIT GitHub last commit Requirements Status

A Proximal Policy Optimization implementation with PyTorch.

Usage

To train on BipedalWalker-v2 with default parameters use;

python experiment.py

Here are the optional parameters;

parameter name description type default
--env_name gym environment to be used str BipedalWalker-v2
--render render gym environment bool False
--solved_reward stop training if avg_reward > solved_reward int 300
--log_interval print avg reward in the interval int 20
--max_episodes max training episodes int 10000
--max_timesteps max timesteps in one episode int 1500
--update_timestep max timesteps in one episode int 4000
--action_std constant std for action distribution (Multivariate Normal) float default=0.5
--K_epochs update policy for K epochs int 80
--eps_clip clip parameter for PPO float 0.2
--gamma discount factor float 0.99
--lr learning rate float 0003
--log_path Tensorboard log path str tb_logs
--model_path Path for model persistence str models

Releases

No releases published

Packages

No packages published

Languages