PPO-Agent

A Proximal Policy Optimization implementation with PyTorch.

Usage

To train on BipedalWalker-v2 with default parameters use;

python experiment.py

Here are the optional parameters;

parameter name	description	type	default
--env_name	gym environment to be used	str	BipedalWalker-v2
--render	render gym environment	bool	False
--solved_reward	stop training if avg_reward > solved_reward	int	300
--log_interval	print avg reward in the interval	int	20
--max_episodes	max training episodes	int	10000
--max_timesteps	max timesteps in one episode	int	1500
--update_timestep	max timesteps in one episode	int	4000
--action_std	constant std for action distribution (Multivariate Normal)	float	default=0.5
--K_epochs	update policy for K epochs	int	80
--eps_clip	clip parameter for PPO	float	0.2
--gamma	discount factor	float	0.99
--lr	learning rate	float	0003
--log_path	Tensorboard log path	str	tb_logs
--model_path	Path for model persistence	str	models

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
ai		ai
data		data
.gitignore		.gitignore
LICENCE		LICENCE
blog-post.md		blog-post.md
config.py		config.py
experiment.py		experiment.py
ppo-notes.md		ppo-notes.md
readme.md		readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ai

ai

data

data

.gitignore

.gitignore

LICENCE

LICENCE

blog-post.md

blog-post.md

config.py

config.py

experiment.py

experiment.py

ppo-notes.md

ppo-notes.md

readme.md

readme.md

requirements.txt

requirements.txt

Repository files navigation

PPO-Agent

Usage

About

Releases

Packages

Languages

License

cenkcorapci/ppo-agent

Folders and files

Latest commit

History

Repository files navigation

PPO-Agent

Usage

About

Topics

Resources

License

Stars

Watchers

Forks

Languages