Online 3D Bin Packing with Constrained Deep Reinforcement Learning



Video link of our project: YouTube, bilibili

This repository contains the implementation of the paper Online 3D Bin Packing with Constrained Deep Reinforcement Learning.


To make this project work, there are two things you should do:
* Install Python packages in '' (by 'pip install -r requirements.txt').
* (This code works on Python 3.7)


We provide a unified interface in ''. There are examples of running our project.

For training:

Example: Train a new model on sequences generated randomly.
You can run 'python --mode train --load-model False --use-cuda --item-seq rs'.
It will take about one day to get a model with satisfying performance.

You can run 'python --help' for some information of common parameters.
There are many other parameters of our project in '', and all of them are given default values. You can change it if you like.

For test:

If you want to test a model trained on sequences generated by CUT-2 Algorithm(get more details in our article).
You can run 'python --mode test --load-model True --use-cuda --data-name --load-name'.

If you want to see how the model works in a lookahead setting,
You can run 'python --mode test --load-model True --use-cuda --data-name --load-name --preview x', x is the lookahead number.

Codes of user-study applications, multi-bin algorithm, and MCTS for comparison are also provided,
Please check 'user_study/', 'multi_bin/', 'MCTS/' for details.


* Different input state sizes need different kinds of CNN for encoding, you can adjust the network architecture in ./acktr/ to satisfy your needs. 

* Predicted mask is mainly for reducing MCTS computing costs. If you only need the BPP-1 model, you can replace the predicted mask with a ground-truth mask during the training and it will be easy for training.

* If you relax the constraint of stability rules, you may get a better result, but it may be dangerous in practice.

* The computing overhead of our implementation is sensitive to the length of the network layer, you should avoid a large network layer appearing in your network architecture. 

* Bin packing problem's difficulty is related to its item set. The trained model's performance is also affected by it.


Hang Zhao and Qijin She are co-authors of this repository.

Some codes are modified from the open-source project 'pytorch-a2c-ppo-acktr-gail' (


Note that this source code is released only for academic use. Please do not use it for commercial purposes without authorization of the authors. The method is being patent protected. For commercial use, please contact Kai Xu (


