RLOR: A Flexible Framework of Deep Reinforcement Learning for Operation Research

1️⃣ First work to incorporate end-to-end vehicle routing model in a modern RL platform (CleanRL)

⚡ Speed up the training of Attention Model by 8 times (25hours $\to$ 3 hours)

🔎 A flexible framework for developing model, algorithm, environment, and search for operation research

News

13/04/2023: We release web demo on Hugging Face 🤗!
24/03/2023: We release our paper on arxiv!
20/03/2023: We release jupyter lab demo and pretrained checkpoints!
10/03/2023: We release our codebase!

Demo

We provide inference demo on colab notebook:

Environment	Search	Demo
TSP	Greedy
CVRP	Multi-Greedy

Installation

Conda

conda env create -n <env name> -f environment.yml
# The environment.yml was generated from
# conda env export --no-builds > environment.yml

It can take a few minutes.

Optional dependency

wandb

Refer to their quick start guide for installation.

File structures

All the major implementations were under rlor folder.

./rlor
├── envs
│   ├── tsp_data.py # load pre-generated data for evaluation
│   ├── tsp_vector_env.py # define the (vectorized) gym environment
│   ├── cvrp_data.py 
│   └── cvrp_vector_env.py 
├── models
│   ├── attention_model_wrapper.py # wrap refactored attention model to cleanRL
│   └── nets # contains refactored attention model
└── ppo_or.py # implementaion of ppo with attention model for operation research problems

The ppo_or.py was modified from cleanrl/ppo.py. To see what's changed, use diff:

# apt install diff
diff --color ppo.py ppo_or.py

Training OR model with PPO

TSP

python ppo_or.py --num-steps 51 --env-id tsp-v0 --env-entry-point envs.tsp_vector_env:TSPVectorEnv --problem tsp

CVRP

python ppo_or.py --num-steps 60 --env-id cvrp-v0 --env-entry-point envs.cvrp_vector_env:CVRPVectorEnv --problem cvrp

Enable WandB

python ppo_or.py ... --track

Add --track argument to enable tracking with WandB.

Where is the tsp data?

It can be generated from the official repo of the attention-learn-to-route paper. You may modify the ./envs/tsp_data.py to update the path to data accordingly.

Acknowledgements

The neural network model is refactored and developed from Attention, Learn to Solve Routing Problems!.

The idea of multiple trajectory training/ inference is from POMO: Policy Optimization with Multiple Optima for Reinforcement Learning.

The RL environments are defined with OpenAI Gym.

The PPO algorithm implementation is based on CleanRL.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
demo		demo
envs		envs
models		models
runs		runs
wrappers		wrappers
.pre-commit-config.yaml		.pre-commit-config.yaml
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
ppo.py		ppo.py
ppo_or.py		ppo_or.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RLOR: A Flexible Framework of Deep Reinforcement Learning for Operation Research

News

Demo

Installation

Conda

Optional dependency

File structures

Training OR model with PPO

TSP

CVRP

Enable WandB

Where is the tsp data?

Acknowledgements

About

Contributors 2

Languages

License

cpwan/RLOR

Folders and files

Latest commit

History

Repository files navigation

RLOR: A Flexible Framework of Deep Reinforcement Learning for Operation Research

News

Demo

Installation

Conda

Optional dependency

File structures

Training OR model with PPO

TSP

CVRP

Enable WandB

Where is the tsp data?

Acknowledgements

About

Topics

Resources

License

Stars

Watchers

Forks

Contributors 2

Languages