
# Job Shop Scheduling Problem 



## System Description (DFJSP)

The **Dynamic Flexible Job Shop Scheduling Problem (DFJSP)** schedules dynamically arriving jobs on multiple machines to minimize total tardiness.

- There are $n$ successively arriving jobs $J=\{J_1,\dots,J_n\}$ to be processed with $m$ machines $M=\{M_1,\dots,M_m\}$.

- Each job $J_i$ has $n_i$ ordered operations $ O_{i,1},\; O_{i,2},\; \dots,\; O_{i,n_i}$.
- Each operation $O_{i,j}$ can be processed on any compatible machine from its set $M_{i,j}\subseteq M$, with processing time $t_{i,j,k}$ on machine $M_k$.

- Each job has:
    - Arrival time $A_i$
    - Due date $D_i$
    - Operation completion time $C_{i,j}$

- To simplify the problem there are some constraints on the system.
    - **Capacity:** each machine processes at most one operation at a time.  
    - **Precedence:** operations of the same job follow the fixed order $O_{i,1}\!\to\! \cdots \!\to\! O_{i,n_i}$.  
    - **Non-preemption:** once started, an operation runs to completion.  
    - **No setups/transport:** setup and transfer times are neglected.

## State Representaion:

-  In previous RL-based scheduling methods, state feature was defind as a some indicatorus of the production status, i.e. number of machines/jobs/operations in shop floor, the remaining processing time of uncompleted jobs, the current workload/queue length of each machine and so on. However, the problem of this approach is that in real world the number of jobs/machine/operations are large and can vary in a wide range, and taking these indicators as state decrease the generalizability of the RL agent. Since it can only perform well under the same same state size.
- To seprate the state representaion from the the direct indicator mentiond earlier, we use seven elaborately state feature with each value in the range of [0, 1].

At each rescheduling point *t*, the environment state is represented by the following features:

1. **Average machine utilization:**  $U_{ave}(t) = \frac{\sum_{k=1}^{m} U_k(t)}{m}$

2. **Standard deviation of machine utilization:**  $U_{std}(t) = \sqrt{\frac{\sum_{k=1}^{m} (U_k(t) - U_{ave}(t))^2}{m}}$

3. **Average operation completion rate:**  $CRO_{ave}(t) = \frac{\sum_{i=1}^{n} OP_i(t)}{\sum_{i=1}^{n} n_i}$

4. **Average job completion rate:**  $CRJ_{ave}(t) = \frac{\sum_{i=1}^{n} CRJ_i(t)}{n}$

5. **Standard deviation of job completion rate:**  $CRJ_{std}(t) = \sqrt{\frac{\sum_{i=1}^{n} (CRJ_i(t) - CRJ_{ave}(t))^2}{n}}$

6. **Estimated tardiness rate:**  $Tard_{e}(t) = \frac{N_{tard}}{N_{left}}$

7. **Actual tardiness rate:**  $Tard_{a}(t) = \frac{\text{Number of actual tardy operations}}{\text{Number of uncompleted operations}}$


## Action Space:

Before going explaning action space for the RL agent we need to explain what are the decision that need to be taken for succesful and efficient job shop scheduling.
There are two main categories of decision that need to be taken, **sequencing** and **machine assignment (routing)**.

- Sequencing: Determines the order in which jobs are processed on each machine.
- Machine assignment: Determines which machine will execute a specific operation when multiple machines are capable of performing it.


Traditionaly some rules have been used for job shop scheduling problem, but no specific rule has found to perform well across all shop configuration. Here the goal of reinforcemnet learning agent is to select between six different comosite of sequencing and machine assignment rule. In this way the reinforcement learning can learn to dispatch each rules based on the status of the system. 

List of the rules: 
| **Rule** | **Description** | **Formula / Logic** |
|-----------|-----------------|----------------------|
| **LWKRSPT** | Least Work Remaining, tie-break by Shortest Processing Time | Lexicographic rule: minimize **LWKR**, then **PT** |
| **LWKRMOD** | Least Work Remaining, tie-break by Modified Operation Due Date | Lexicographic rule: minimize **LWKR**, then **MOD** |
| **PTWINQ** | Processing Time plus Work In Next Queue | **Priority = PT + WINQ** |
| **PTWINQS** | Processing Time plus Work In Next Queue plus Slack | **Priority = PT + WINQ + Slack** |
| **DPTLWKRS** | Double Processing Time plus Least Work Remaining plus Slack | **Priority = 2×PT + LWKR + Slack** |
| **DPTWINQNPT** | Double Processing Time plus Work In Next Queue plus Next Processing Time | **Priority = 2×PT + WINQ + NPT** |

### Rule 1: **LWKRSPT**

In [None]:
# This is a placeholder for the project code.

### Rule 2: **LWKRMOD**

In [None]:
# This is a placeholder for the project code.

### Rule 3: **PTWINQ**

In [None]:
# This is a placeholder for the project code.

### Rule 4: **PTWINQS**

In [None]:
# This is a placeholder for the project code.

### Rule 5: **DPTLWKRS**

In [None]:
# This is a placeholder for the project code.

### Rule 6: **DPTWINQNPT**

In [None]:
# This is a placeholder for the project code.

## Reward definition:

The goal of the reward function is to lower these values follwing the same priority order.
1. Lower **actual tardiness**
2. Lower **estimated tardiness**
3. Higher **machine utilization**


### **How It Works**

#### **1. Primary goal — Minimize actual tardiness**
- If the **actual tardiness rate (Tarda)** decreases → **reward = +1**  
- If it increases → **reward = –1**

#### **2. Secondary goal — Reduce estimated tardiness**
- If the actual tardiness remains unchanged, compare the **estimated tardiness rate (Tarde):**  
  - If Tarde decreases → **reward = +1**  
  - If Tarde increases → **reward = –1**

#### **3. Tertiary goal — Maintain high machine utilization**
- If tardiness values remain unchanged, check the **average machine utilization (Uave):**  
  - If Uave increases → **reward = +1**  
  - If Uave remains within 95% of its previous value → **reward = 0**  
  - Otherwise → **reward = –1**




In [1]:
# Reward function implementation goes here.

In [2]:
# des_jobshop_core.py
import simpy
from typing import List, Dict, Optional

### Operation Class


In [7]:
# ======================= PHYSICAL CLASSES ======================= #
class Operation:
    def __init__(self, op_id: int,
                 compatible_machines: List[int],
                 processing_times: Dict[int, float]):
        self.id = op_id
        self.compatible_machines = compatible_machines
        self.processing_times = processing_times

    def get_proc_time(self, machine_id: int) -> float:
        return self.processing_times[machine_id]


class Job:
    def __init__(self, job_id: int, arrival_time: float,
                 due_date: float, operations: List[Operation]):
        self.id = job_id
        self.arrival_time = arrival_time
        self.due_date = due_date
        self.operations = operations
        self.current_op_index = 0
        self.completion_time: Optional[float] = None

    # ---- helpers ----
    def has_next_operation(self) -> bool:
        return self.current_op_index < len(self.operations)

    def next_operation(self) -> Optional[Operation]:
        if self.has_next_operation():
            return self.operations[self.current_op_index]
        return None

    def advance_operation(self, finish_time: float):
        """Call when the current operation is done."""
        self.current_op_index += 1
        if not self.has_next_operation():
            self.completion_time = finish_time


class Machine:
    def __init__(self, env: simpy.Environment, machine_id: int):
        self.env = env
        self.id = machine_id

        # logical state
        self.state = "IDLE"
        self.current_job: Optional[Job] = None
        self.waitingToBeProcessed: List[Job] = []

        # metrics
        self.expectedTimeToIdle: float = 0.0
        self.total_busy_time: float = 0.0

        # simpy resource
        self.resource = simpy.Resource(env, capacity=1)

    @property
    def utilization(self):
        return 0 if self.env.now == 0 else self.total_busy_time / self.env.now

    # ---- machine-level event ----
    def ready_to_start(self) -> bool:
        return self.state == "IDLE" and len(self.waitingToBeProcessed) > 0


class JobShop:
    def __init__(self, env: simpy.Environment, machines: List[Machine]):
        self.env = env
        self.machines = machines
        self.numberOfMachines = len(machines)

        self.jobsInTheShop: List[Job] = []
        self.eligibleJobs: List[Job] = []
        self.totalJobs: int = 0

        # to avoid scheduling multiple make-decisions at the same sim time
        self._decision_scheduled = False

    # ------------- arrivals ------------- #
    def add_job(self, job: Job):
        """Called when a job enters the system (ArriveJobEvent)."""
        print(f"[{self.env.now:5.1f}] Job {job.id} ARRIVES")
        self.jobsInTheShop.append(job)
        self.eligibleJobs.append(job)
        self.totalJobs += 1

        # every arrival → we want a decision
        self.schedule_make_decision()

    # ------------- decisions ------------- #
    def schedule_make_decision(self):
        """Schedule exactly one decision at current time."""
        if not self._decision_scheduled:
            self._decision_scheduled = True
            self.env.process(self._make_decision_event())

    def _make_decision_event(self):
        """This is equivalent to MakeDecisionEvent in the paper."""
        # we yield 0 so other same-time events (arrivals) can run first
        yield self.env.timeout(0)
        self._decision_scheduled = False
        print(f"[{self.env.now:5.1f}] MAKE DECISION")
        self.execute_dispatching_rule()

    # ------------- dispatching ------------- #
    def execute_dispatching_rule(self):
        """
        Very simple rule:
          - while we have eligible jobs
          - pick the first one
          - pick the first compatible machine that can take it (idle or with shortest queue)
        """
        # we may assign multiple jobs in one decision, like the paper
        assigned_any = False
        while self.eligibleJobs:
            job = self.eligibleJobs.pop(0)
            op = job.next_operation()
            if op is None:
                # job already completed (shouldn't happen, but be safe)
                continue

            # choose machine: first idle compatible, else compatible with smallest expected time
            chosen_machine = self._select_machine_for_operation(op)
            if chosen_machine is None:
                # no machine can take it now → put back? for now we stop
                self.eligibleJobs.insert(0, job)
                break

            # assign job to the machine's waiting list
            chosen_machine.waitingToBeProcessed.append(job)
            job.currMachine = chosen_machine.id if hasattr(job, "currMachine") else None
            print(f"[{self.env.now:5.1f}]   assign Job {job.id}→M{chosen_machine.id}")

            # if machine is idle, start it right away
            if chosen_machine.ready_to_start():
                self.env.process(start_processing(self.env, self, chosen_machine))

            assigned_any = True

        if not assigned_any:
            # nothing assigned → fine, just wait for next event
            pass

    def _select_machine_for_operation(self, op: Operation) -> Optional[Machine]:
        # get compatible machines
        candidates = [m for m in self.machines if m.id in op.compatible_machines]
        if not candidates:
            return None
        # try idle first
        idle = [m for m in candidates if m.state == "IDLE"]
        if idle:
            return idle[0]
        # else pick with smallest expectedTimeToIdle
        return sorted(candidates, key=lambda m: m.expectedTimeToIdle)[0]


# ======================= EVENT PROCESSES ======================= #
def start_processing(env: simpy.Environment, shop: JobShop, machine: Machine):
    # take next job from machine queue
    job = machine.waitingToBeProcessed.pop(0)
    op = job.next_operation()
    ptime = op.get_proc_time(machine.id)

    with machine.resource.request() as req:
        yield req
        machine.state = "PROCESSING"
        machine.current_job = job
        start_time = env.now
        finish_time = env.now + ptime
        machine.expectedTimeToIdle = finish_time
        print(f"[{env.now:5.1f}] M{machine.id} START Job {job.id} (op {op.id}) for {ptime}")

        # process
        yield env.timeout(ptime)

        # ✅ call plain function (NOT env.process)
        finish_processing(env, shop, machine, job, start_time, finish_time)


def finish_processing(env: simpy.Environment, shop: JobShop,
                      machine: Machine, job: Job,
                      start_time: float, finish_time: float):
    """
    FinishProcessingEvent in the paper.
    """
    # update job
    job.advance_operation(finish_time)
    print(f"[{env.now:5.1f}] M{machine.id} FINISH Job {job.id}")

    # update machine
    machine.total_busy_time += (finish_time - start_time)
    machine.current_job = None

    # if machine still has queue → start next
    if machine.waitingToBeProcessed:
        # expectedTimeToIdle will be updated by next start
        env.process(start_processing(env, shop, machine))
    else:
        machine.state = "IDLE"
        machine.expectedTimeToIdle = env.now

    # if job not finished → becomes eligible again
    if job.has_next_operation():
        shop.eligibleJobs.append(job)
    else:
        # job completed
        print(f"[{env.now:5.1f}] Job {job.id} COMPLETED at {env.now}")

    # every finish → make decision
    shop.schedule_make_decision()


# ======================= ARRIVAL GENERATOR ======================= #
def job_arrival_process(env: simpy.Environment, shop: JobShop, jobs_to_arrive: List[Job]):
    """
    Simple arrival: each job has its own arrivalTime, we just wait and drop it in.
    """
    for job in jobs_to_arrive:
        # wait until its arrival time
        yield env.timeout(job.arrival_time - env.now)
        shop.add_job(job)




### Define The Problem Setup

In [8]:
# ======================= MAIN TEST ======================= #
if __name__ == "__main__":
    env = simpy.Environment()

    # 1) make machines
    machines = [Machine(env, 0), Machine(env, 1)]
    shop = JobShop(env, machines)

    # 2) make some jobs
    # Job 1: arrives at 0, two ops
    op1_j1 = Operation(0, [0, 1], {0: 5, 1: 4})
    op2_j1 = Operation(1, [1], {1: 3})
    j1 = Job(1, arrival_time=0, due_date=30, operations=[op1_j1, op2_j1])

    # Job 2: arrives at 2, one op
    op1_j2 = Operation(0, [0], {0: 6})
    j2 = Job(2, arrival_time=2, due_date=25, operations=[op1_j2])

    # Job 3: arrives at 4, two ops, both can go to M0
    op1_j3 = Operation(0, [0], {0: 2})
    op2_j3 = Operation(1, [0], {0: 4})
    j3 = Job(3, arrival_time=4, due_date=40, operations=[op1_j3, op2_j3])

    # 3) start arrival process
    env.process(job_arrival_process(env, shop, [j1, j2, j3]))

    # 4) run
    env.run(until=50)


[  0.0] Job 1 ARRIVES
[  0.0] MAKE DECISION
[  0.0]   assign Job 1→M0
[  0.0] M0 START Job 1 (op 0) for 5
[  2.0] Job 2 ARRIVES
[  2.0] MAKE DECISION
[  2.0]   assign Job 2→M0
[  4.0] Job 3 ARRIVES
[  4.0] MAKE DECISION
[  4.0]   assign Job 3→M0
[  5.0] M0 FINISH Job 1
[  5.0] M0 START Job 2 (op 0) for 6
[  5.0] MAKE DECISION
[  5.0]   assign Job 1→M1
[  5.0] M1 START Job 1 (op 1) for 3
[  8.0] M1 FINISH Job 1
[  8.0] Job 1 COMPLETED at 8
[  8.0] MAKE DECISION
[ 11.0] M0 FINISH Job 2
[ 11.0] Job 2 COMPLETED at 11
[ 11.0] M0 START Job 3 (op 0) for 2
[ 11.0] MAKE DECISION
[ 13.0] M0 FINISH Job 3
[ 13.0] MAKE DECISION
[ 13.0]   assign Job 3→M0
[ 13.0] M0 START Job 3 (op 1) for 4
[ 17.0] M0 FINISH Job 3
[ 17.0] Job 3 COMPLETED at 17
[ 17.0] MAKE DECISION


# Poroduction Configuration:

- Based on the ref 1 the production configuration is defined based on the value shown in the table below:

    ![Parameter settings of different production configurations.](./figures/image1.png)
- The production configuration used for training the RL is defined as:

    ![Production configuration for the training.png](./figures/image2.png)


In [None]:

NOM = 10 # Number of Machines
NIJ = 20 # Initial Number of Jobs
TNIJ = 50 # Number of Arrival Jobs
simulation_time = 1000 # Total Simulation Time
ddt = 0.5 # Due Date Tightness
NAMFEO = (0, NOM) # Number of Available Machines for Each Operation




### Run The Simulation

# Learning Materials


https://www.youtube.com/watch?v=8SLk_uRRcgc
https://www.youtube.com/watch?v=NypbxgytScM

# References:
For developement of this code three main references was used.
- Reference 1: Dynamic scheduling for flexible job shop with new job insertions by
deep reinforcement learning. Shu Luo 2020 
- Reference 2: A discrete event simulator to implement deep reinforcement
learning for the dynamic flexible job shop scheduling problem
- Reference 3: Deep reinforcement learning for dynamic
scheduling of a flexible job shop.



Q: For training the RL do we need to generate one shop configuration or different shop configuration: From the reference 1 we can see that when training the RL agent they use single production configuration as described in section 6.1, and since they are using non dimensional state representation their method is generalizable to other production configuration as shown in their result section. So in short, we do not need to change the production configuration in the 10 episode (~8000 training steps).

