# Bin Packing Problem (BPP)

Online Bin Packing Problem with bins containing 1D resource.

state/action/reward formulation are inspired from:  https://epub.jku.at/obvulihs/download/pdf/6996324?originalFilename=true

Short description (P22,P23):
* **BPP specification:** 50 items of size 2 and 50 of size 3. Bins have a capacity of 9.
* **Observable state:** [B_0, B_1, ... B_n-1, I]. B_i is the number of bins of filling level i. I is the new incoming item size.
* **Action space:**  Select a bin according its filling level. The action 0 means "we open a new bin".
* **Reward at a given timestep:** 0 if an already existing bin is used, otherwise the negative incremental waste (reward=BIN_SIZE - I). Penalty of -1000 is given if we enter in one of those situation: bin overflow or when the action try to select an innexisting bin.



In [1]:
from rl_factory import rl_agent_factory # Factory to build RL agents
from rl_factory import default_hyperparam_factory # Factor to make easier hyperparameter setting
from EnvBinPacking import EnvBinPacking # <- Definition of the BPP. Global variables contains bins size and items distribution, number of items...
from AgentRL import AgentRLLIB
from AgentHeuristic import AgentBestFit

# Environment pointer
env_config = {"action_type": "discrete"} # PPO implementation fits {"continuous" or "discrete"}. DQN "discrete". DDPG "continuous".
env_class=EnvBinPacking # ptr on the environment class (not an OOP object)

## Reinforcement Learning

In [2]:
# build the RL agent
rl_name = "PPO" 
hyperparameters = default_hyperparam_factory(rl_name)

# update value similar to the publication
hyperparameters["lr"]=1e-3
hyperparameters["deep"]=2
hyperparameters["wide"]=16
hyperparameters["train_batch_size"]=64
hyperparameters["sgd_minibatch_size"]=64
hyperparameters["lambda"]=0.99
hyperparameters["grad_clip"]=0.3

# Build the Trainer (contains RL object and Environment simulator object)
rllib_trainer = rl_agent_factory(rl_name, hyperparameters, env_class, env_config=env_config)

2023-02-01 16:48:14,569	INFO worker.py:1538 -- Started a local Ray instance.
2023-02-01 16:48:23,993	INFO trainable.py:172 -- Trainable.setup took 11.652 seconds. If your trainable is slow to initialize, consider setting reuse_actors=True to reduce actor creation overheads.


In [3]:
agent=AgentRLLIB(rllib_trainer, env_class, env_config) 
for ep in range(10):
    for it in range(10):
        agent.train()

    score=agent.evaluate()
    print("Info:", score['cumulated_rewards'])

Info: -1278.0
Info: -1048.0
Info: -1012.0
Info: -1024.0
Info: -1398.0
Info: -611.0
Info: -637.0
Info: -516.0
Info: -630.0
Info: -606.0


## Heuristics (Best Fit)

In [41]:
agent = AgentBestFit(None, env_class, env_config)
score = agent.evaluate()
print("Info:", score['cumulated_rewards'])

Info: -193.0


In [42]:
print(score)

{'cumulated_rewards': -193.0, 'last_reward': -7.0, 'last_state': array([ 0.,  0.,  1.,  0.,  0.,  0.,  0.,  0., 13.,  0.], dtype=float32), 'last_action': 0}
