RL starter files in order to immediatly train, visualize and evaluate an agent without writing any line of code.
These files are suited for gym-minigrid
environments and torch-ac
RL algorithms. They are easy to adapt to other environments and RL algorithms.
- Script to train the agent, with possibility to:
- Log in txt, CSV and Tensorboard
- Save model
- Stop and restart training
- Use A2C or PPO algorithms
- Script to visualize agent's behavior
- Script to evaluate agent's performance
-
Clone this repository.
-
Install
gym-minigrid
environments andtorch-ac
RL algorithms:
pip3 install -r requirements.pip
Note: If you want to modify torch-ac
algorithms, you will need to rather install a cloned version, i.e.:
git clone https://github.com/lcswillems/torch-ac.git
cd torch-ac
pip3 install -e .
Train, visualize and evaluate an agent on the MiniGrid-DoorKey-5x5-v0
environment:
- Train the agent on the
MiniGrid-DoorKey-5x5-v0
environment with PPO algorithm:
python3 -m scripts.train --algo ppo --env MiniGrid-DoorKey-5x5-v0 --model DoorKey --save-interval 10 --frames 80000
- Visualize agent's behavior:
python3 -m scripts.visualize --env MiniGrid-DoorKey-5x5-v0 --model DoorKey
- Evaluate agent's performance:
python3 -m scripts.evaluate --env MiniGrid-DoorKey-5x5-v0 --model DoorKey
Note: More details on the commands are given below.
This package contains:
- scripts to:
- train an agent
inscript/train.py
- visualize agent's behavior
inscript/visualize.py
- evaluate agent's performances
inscript/evaluate.py
- train an agent
- a default agent's model
inmodel.py
- utilitarian classes and functions used by the scripts
inutils
These files are suited for gym-minigrid
environments and torch-ac
RL algorithms. They are easy to adapt to other environments and RL algorithms by modifying:
model.py
utils/format.py
An example of use:
python3 -m scripts.train --algo ppo --env MiniGrid-DoorKey-5x5-v0 --model DoorKey --save-interval 10 --frames 80000
The script loads the model in storage/DoorKey
or creates it if it doesn't exist, then trains it with the PPO algorithm on the MiniGrid DoorKey environment, and saves it every 10 updates in storage/DoorKey
. It stops after 80 000 frames.
Note: You can define a different storage location in the environment variable PROJECT_STORAGE
.
More generally, the script has 2 required arguments:
--algo ALGO
: name of the RL algorithm used to train--env ENV
: name of the environment to train on
and a bunch of optional arguments among which:
--recurrence N
: gradient will be backpropagated over N timesteps. By default, N = 1. If N > 1, a LSTM is added to the model to have memory.--text
: a GRU is added to the model to handle text input.- ... (see more using
--help
)
During training, logs are printed in your terminal (and saved in text and CSV format):
Note: U
gives the update number, F
the total number of frames, FPS
the number of frames per second, D
the total duration, rR:μσmM
the mean, std, min and max reshaped return per episode, F:μσmM
the mean, std, min and max number of frames per episode, H
the entropy, V
the value, pL
the policy loss, vL
the value loss and ∇
the gradient norm.
During training, logs might also be plotted in Tensorboard if --tb
is added.
Note: tensorboardX
package is required and can be installed with pip3 install tensorboardX
.
An example of use:
python3 -m scripts.visualize --env MiniGrid-DoorKey-5x5-v0 --model DoorKey
In this use case, the script displays how the model in storage/DoorKey
behaves on the MiniGrid DoorKey environment.
More generally, the script has 2 required arguments:
--env ENV
: name of the environment to act on.--model MODEL
: name of the trained model.
and a bunch of optional arguments among which:
--argmax
: select the action with highest probability- ... (see more using
--help
)
An example of use:
python3 -m scripts.evaluate --env MiniGrid-DoorKey-5x5-v0 --model DoorKey
In this use case, the script prints in the terminal the performance among 100 episodes of the model in storage/DoorKey
.
More generally, the script has 2 required arguments:
--env ENV
: name of the environment to act on.--model MODEL
: name of the trained model.
and a bunch of optional arguments among which:
--episodes N
: number of episodes of evaluation. By default, N = 100.- ... (see more using
--help
)
The default model is discribed by the following schema:
By default, the memory part (in red) and the langage part (in blue) are disabled. They can be enabled by setting to True
the use_memory
and use_text
parameters of the model constructor.
This model can be easily adapted to your needs.