# Stable Baselines Example

Below you can find a minimal setup for using the Graph Matrix Job Shop Environment with StableBaselines3.
Since there are valid and invalid action in this environment it is highly recommended to use an agent with action masking.

The package `jsp_instance_utils` include many benchmark instances of the job shop problem from the literature. 
It can be installed via pip.

In [1]:
!pip install jsp_instance_utils

Collecting jsp_instance_utils
  Using cached jsp_instance_utils-1.0.1-py3-none-any.whl.metadata (5.1 kB)
Collecting ortools (from jsp_instance_utils)
  Using cached ortools-9.11.4210-cp311-cp311-macosx_11_0_arm64.whl.metadata (3.0 kB)
Collecting absl-py>=2.0.0 (from ortools->jsp_instance_utils)
  Using cached absl_py-2.1.0-py3-none-any.whl.metadata (2.3 kB)
Collecting protobuf<5.27,>=5.26.1 (from ortools->jsp_instance_utils)
  Using cached protobuf-5.26.1-cp37-abi3-macosx_10_9_universal2.whl.metadata (592 bytes)
Collecting immutabledict>=3.0.0 (from ortools->jsp_instance_utils)
  Using cached immutabledict-4.2.1-py3-none-any.whl.metadata (3.5 kB)
Using cached jsp_instance_utils-1.0.1-py3-none-any.whl (324 kB)
Using cached ortools-9.11.4210-cp311-cp311-macosx_11_0_arm64.whl (20.7 MB)
Using cached absl_py-2.1.0-py3-none-any.whl (133 kB)
Using cached immutabledict-4.2.1-py3-none-any.whl (4.7 kB)
Using cached protobuf-5.26.1-cp37-abi3-macosx_10_9_universal2.whl (404 kB)
Inst

To train a PPO agent using the environment with Stable Baselines3 one first needs to install the required dependencies:

In [2]:
!pip install stable_baselines3
!pip install sb3_contrib



In [3]:
from jsp_instance_utils.instances import (ft06)

import gymnasium as gym
import sb3_contrib
import stable_baselines3 as sb3

import numpy as np
from sb3_contrib.common.maskable.policies import MaskableActorCriticPolicy
from sb3_contrib.common.wrappers import ActionMasker

from graph_matrix_jsp_env.disjunctive_jsp_env import DisjunctiveGraphJspEnv

if __name__ == '__main__':

    # You can also use other instances ft06, ft20, ta21, ta80
    # just make sure to import them from jsp_instance_utils.instances
    env = DisjunctiveGraphJspEnv(jsp_instance=ft06)
    env = sb3.common.monitor.Monitor(env)


    def mask_fn(env: gym.Env) -> np.ndarray:
        return env.unwrapped.valid_action_mask()


    env = ActionMasker(env, mask_fn)

    model = sb3_contrib.MaskablePPO(
        MaskableActorCriticPolicy,
        env,
        verbose=1,
        device="cpu" # Note: You can also use "cuda" if you have a GPU with CUDA
    )

    # Train the agent
    model.learn(total_timesteps=10_000)

          
[38;2;34;83;154m
                                            
     ▐███▌         ▟███▛▟███████▛▐███▌      
     ▐███▌        ▟███▛▟███████▛ ▐███▌      
     ▐███▌ ▟███  ▟███▛    ▟███▛  ▐███▌             
     ▐███▌▟████ ▟███▛    ▟███▛   ▐███▌
     ▐██████▛▐█████▛    ▟████████▐█████████▛ 
     ▐█████▛ ▐████▛    ▟█████████▐████████▛ 


     [38;2;34;83;154m▐█▀▜▙█▙   ███████▐█ ▐█ [38;2;151;185;255m   ▟█▙   ▟█▙  ▟███▐█ ▐█▐█▀▀▐█▙ █[34m
     [38;2;34;83;154m▐█▄▟▛▜█▙▟▙██ ▐█  ▐████ [38;2;151;185;255m  ▟▛ ▜▙ ▟▛ ▜▙ █▍  ▐████▐█▀▀▐██▙█[34m
     [38;2;34;83;154m▐█ ▜▙ ▜█▛▜██ ▐█  ▐█ ▐█ [38;2;151;185;255m ▟█▛▀▜█▙█▛▀▜█▙▜███▐█ ▐█▐█▆▆▐█ ▜█[34m
           [38;2;151;185;255m▐█  ▐█▐█▙ █▐███▜█▙ ▟███▀▀▐█▀▜▙▟█▀▜█▐███▐█████▙ ▟▛
           [38;2;151;185;255m▐█  ▐█▐██▙█ ▐█  ▜█▄█▛▐█▀▀▐█▄▟▛▜█▆▆▄ ▐█  ▐█  ▜█▄▛ 
           [38;2;151;185;255m ▜███▛▐█ ▜█▐███  ▜█▛ ▐███▐█ ▜▙▐█▆▆▛▐███ ▐█   ██  
[0m
    [34m[1m
    Graph Matrix Job Shop Problem Environment
    [0m     

    Version:    [32m0.1.0