Skip to content
This repository has been archived by the owner on Jan 12, 2024. It is now read-only.
/ ProjectProcgen Public archive

A pytorch project to easily run experiments on OpenAI's Procgen Benchmark

Notifications You must be signed in to change notification settings

blahBlahhhJ/ProjectProcgen

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

48 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ProjectProcgen

This project allows quick and easy experiment on OpenAI's Procgen Benchmark.

Unlike OpenAI's baseline, this project is implemented using PyTorch.

Demo on Fruitbot Environment

(A good agent only eats fruits but not other foods.)

image

Currently supported methods:

  • Proximal Policy Optimization (PPO)
  • Deep Q Learning [probably won't work]

Getting started

Install Dependencies

$ conda env create -f environment.yml
$ conda activate procgen
  • essential packages are: pytorch, procgen, tensorboard, tqdm

Run Experiment

$ cd train
$ python run.py

Run our final model

$ cd train
$ python run.py --mixreg --num_levels 50 --l2 0

Logs/Plots...

  • All plots in the report are in train/results/logs/PPO/plots
  • All testing curves are in train/results/logs/PPO/eval_csv
  • Launching Tensorboard
$ cd train
$ tensorboard --logdir results/logs

Visualize Performance

$ cd train
$ python run.py --eval_model <path-to-your-model>

Optional Arguments:

Argument Default Description
--eval_model None The path of trained model (visualizing performance)
--stack 1 The number of recent frames to stack together as input
--flare False Boolean flag for whether to use FLARE
--mixreg False Boolean flag for whether to use mixreg

Environment Arguments

Argument Default Description
--env_name 'fruitbot' The name of the environment
--num_envs 64 The number of copies for the environment
--num_levels 50 The number of levels for the agent to train
--start_level 500 The starting level for the agent to train

PPO Agent Arguments

Argument Default Description
--train_step 5e6 The total number of frames for the agent to train
--train_resume 0 The checkpoint for agent to resume training
--update_freq 256 The number of frames for each environment to gather for training
--eval_freq 10 The frequency (per training loop) to evaluate performance
--saving_freq 10 The frequency (per training loop) to save model
--num_batches 8 The number of batches in one epoch (not batch size)
--num_epochs 3 The number of epochs in one train step
--clip_range 0.2 The range to clip policy deviation and value estimate deviation
--gamma 0.999 The discount factor
--lam 0.95 The hyperparameter in GAE
--ent 0.01 The coefficient for entropy penalty
--cl 0.5 The coefficient for value estimation
--lr_start 5e-4 The learning rate for Adam

PPO Model Arguments

Argument Default Description
--conv_dims [16, 32, 32] The number of filters in each Impala block
--fc_dims [256] The number of hidden units in the fully connected layer
  • Note that although these are passed in as lists, this project doesn't support customizing the number of layers (by now). So the length of these two arguments should match the length of the default ones.

About

A pytorch project to easily run experiments on OpenAI's Procgen Benchmark

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages