
# Job Shop Scheduling Problem 



## System Description (DFJSP)

The **Dynamic Flexible Job Shop Scheduling Problem (DFJSP)** schedules dynamically arriving jobs on multiple machines to minimize total tardiness.

- There are $n$ successively arriving jobs $J=\{J_1,\dots,J_n\}$ to be processed with $m$ machines $M=\{M_1,\dots,M_m\}$.

- Each job $J_i$ has $n_i$ ordered operations $ O_{i,1},\; O_{i,2},\; \dots,\; O_{i,n_i}$.
- Each operation $O_{i,j}$ can be processed on any compatible machine from its set $M_{i,j}\subseteq M$, with processing time $t_{i,j,k}$ on machine $M_k$.

- Each job has:
    - Arrival time $A_i$
    - Due date $D_i$
    - Operation completion time $C_{i,j}$

- To simplify the problem there are some constraints on the system.
    - **Capacity:** each machine processes at most one operation at a time.  
    - **Precedence:** operations of the same job follow the fixed order $O_{i,1}\!\to\! \cdots \!\to\! O_{i,n_i}$.  
    - **Non-preemption:** once started, an operation runs to completion.  
    - **No setups/transport:** setup and transfer times are neglected.

## State Representaion:

-  In previous RL-based scheduling methods, state feature was defind as a some indicatorus of the production status, i.e. number of machines/jobs/operations in shop floor, the remaining processing time of uncompleted jobs, the current workload/queue length of each machine and so on. However, the problem of this approach is that in real world the number of jobs/machine/operations are large and can vary in a wide range, and taking these indicators as state decrease the generalizability of the RL agent. Since it can only perform well under the same same state size.
- To seprate the state representaion from the the direct indicator mentiond earlier, we use seven elaborately state feature with each value in the range of [0, 1].

At each rescheduling point *t*, the environment state is represented by the following features:

1. **Average machine utilization:**  $U_{ave}(t) = \frac{\sum_{k=1}^{m} U_k(t)}{m}$

2. **Standard deviation of machine utilization:**  $U_{std}(t) = \sqrt{\frac{\sum_{k=1}^{m} (U_k(t) - U_{ave}(t))^2}{m}}$

3. **Average operation completion rate:**  $CRO_{ave}(t) = \frac{\sum_{i=1}^{n} OP_i(t)}{\sum_{i=1}^{n} n_i}$

4. **Average job completion rate:**  $CRJ_{ave}(t) = \frac{\sum_{i=1}^{n} CRJ_i(t)}{n}$

5. **Standard deviation of job completion rate:**  $CRJ_{std}(t) = \sqrt{\frac{\sum_{i=1}^{n} (CRJ_i(t) - CRJ_{ave}(t))^2}{n}}$

6. **Estimated tardiness rate:**  $Tard_{e}(t) = \frac{N_{tard}}{N_{left}}$

7. **Actual tardiness rate:**  $Tard_{a}(t) = \frac{\text{Number of actual tardy operations}}{\text{Number of uncompleted operations}}$


## Action Space:

Before going explaning action space for the RL agent we need to explain what are the decision that need to be taken for succesful and efficient job shop scheduling.
There are two main categories of decision that need to be taken, **sequencing** and **machine assignment (routing)**.

- Sequencing: Determines the order in which jobs are processed on each machine.
- Machine assignment: Determines which machine will execute a specific operation when multiple machines are capable of performing it.


Traditionaly some rules have been used for job shop scheduling problem, but no specific rule has found to perform well across all shop configuration. Here the goal of reinforcemnet learning agent is to select between six different comosite of sequencing and machine assignment rule. In this way the reinforcement learning can learn to dispatch each rules based on the status of the system. 

List of the rules: 
| **Rule** | **Description** | **Formula / Logic** |
|-----------|-----------------|----------------------|
| **LWKRSPT** | Least Work Remaining, tie-break by Shortest Processing Time | Lexicographic rule: minimize **LWKR**, then **PT** |
| **LWKRMOD** | Least Work Remaining, tie-break by Modified Operation Due Date | Lexicographic rule: minimize **LWKR**, then **MOD** |
| **PTWINQ** | Processing Time plus Work In Next Queue | **Priority = PT + WINQ** |
| **PTWINQS** | Processing Time plus Work In Next Queue plus Slack | **Priority = PT + WINQ + Slack** |
| **DPTLWKRS** | Double Processing Time plus Least Work Remaining plus Slack | **Priority = 2×PT + LWKR + Slack** |
| **DPTWINQNPT** | Double Processing Time plus Work In Next Queue plus Next Processing Time | **Priority = 2×PT + WINQ + NPT** |

### Rule 1: **LWKRSPT**

In [None]:
# This is a placeholder for the project code.

### Rule 2: **LWKRMOD**

In [None]:
# This is a placeholder for the project code.

### Rule 3: **PTWINQ**

In [None]:
# This is a placeholder for the project code.

### Rule 4: **PTWINQS**

In [None]:
# This is a placeholder for the project code.

### Rule 5: **DPTLWKRS**

In [None]:
# This is a placeholder for the project code.

### Rule 6: **DPTWINQNPT**

In [None]:
# This is a placeholder for the project code.

## Reward definition:

The goal of the reward function is to lower these values follwing the same priority order.
1. Lower **actual tardiness**
2. Lower **estimated tardiness**
3. Higher **machine utilization**


### **How It Works**

#### **1. Primary goal — Minimize actual tardiness**
- If the **actual tardiness rate (Tarda)** decreases → **reward = +1**  
- If it increases → **reward = –1**

#### **2. Secondary goal — Reduce estimated tardiness**
- If the actual tardiness remains unchanged, compare the **estimated tardiness rate (Tarde):**  
  - If Tarde decreases → **reward = +1**  
  - If Tarde increases → **reward = –1**

#### **3. Tertiary goal — Maintain high machine utilization**
- If tardiness values remain unchanged, check the **average machine utilization (Uave):**  
  - If Uave increases → **reward = +1**  
  - If Uave remains within 95% of its previous value → **reward = 0**  
  - Otherwise → **reward = –1**




# Discrete Event Simulation:
- Discrete Event Simulation (DES) operates on the premise of modeling dynamic systems by tracking   individual events as they occur at distinct points in time. It’s a powerful technique employed to simulate and analyze systems where changes occur instantaneously, focusing on key events that drive system behavior.

## Fundamental Concepts of DES
At its core, DES revolves around entities, events, the simulation clock, and queues.

- **Entities** represent the objects or elements within the system under observation.
- **Events**, triggered by entities, denote occurrences that affect the system’s state or condition.
- **The simulation clock** marks discrete instances when these events transpire, driving the progression of time within the simulated environment.
- **Queues** play a pivotal role in managing and processing entities, representing waiting lines or buffers where entities await processing or service.

If we go one abstraction layer up, common DES components can be classified into sources, servers (with or withtout queues) and sinks.

- **Sources** are responsible of producing entities and inserting them into the simulating system. Times between arrival and amount of entities per arrival are common parameters of sources.
- **Servers** are in charge of delaying entities in the system for a given time period defined usually as a processing time.
- **Sinks** are used to remove entities from the simlating system and are useful to collect information about times spend by entities and performance of the overall system.

# Implementing Entities:
Jobs are the entities that go through the system, each job has different operation that need to be go through machines to be processed.


In [1]:
class Operation:
    def __init__(self, op_id, compatible_machines, processing_times):
        self.id = op_id
        self.compatible_machines = compatible_machines    # e.g. [0, 3, 5]
        self.processing_times = processing_times          # e.g. {0: 5.0, 3: 4.2, 5: 6.1}

    def get_proc_time(self, machine_id):
        return self.processing_times[machine_id]


class Job:
    def __init__(self, job_id, arrival_time, due_date, operations):
        self.id = job_id
        self.arrival_time = arrival_time
        self.due_date = due_date
        self.operations = operations
        self.current_op_index = 0

    def next_operation(self):
        if self.current_op_index < len(self.operations):
            return self.operations[self.current_op_index]
        return None

    def advance_operation(self, finish_time=None):
        self.current_op_index += 1
        # you can store finish_time later if you want


# Implementing Servers:
The machines is our servers that has queue and process jobs.

In [3]:

import simpy
from typing import Optional, List


class Machine:
    """
    Machine (server) in the job shop.

    Responsibilities:
    - can process ONE job at a time (capacity = 1)
    - keeps a queue of jobs waiting for this machine
    - knows when it will be free (expected_time_to_idle)
    - tracks busy time → utilization
    """

    def __init__(self, env: simpy.Environment, machine_id: int):
        self.env = env
        self.id = machine_id

        # simpy resource = the actual "server"
        self.resource = simpy.Resource(env, capacity=1)

        # logical state
        self.state: str = "IDLE"          # or "PROCESSING"
        self.current_job = None           # job being processed now
        self.queue: List = []             # jobs assigned here but not started

        # metrics
        self.expected_time_to_idle: float = 0.0
        self.total_busy_time: float = 0.0  # accumulated processing time

    @property
    def utilization(self) -> float:
        """Busy time / time passed."""
        if self.env.now == 0:
            return 0.0
        return self.total_busy_time / self.env.now

    def ready_to_start(self) -> bool:
        """True if machine is idle and has something in its queue."""
        return self.state == "IDLE" and len(self.queue) > 0

# Implementing the Model
Here we implement the model of our jobshop, where it knows how many maachine it has and what is the state of each machine.

In [4]:
import simpy

class JobShop:
    """
    JobShop
    -------
    - holds machines
    - holds jobs currently in the system
    - keeps the list of eligible jobs (ready for dispatch)
    - contains ALL event processes: decision, start, finish
    """
    def __init__(self, env: simpy.Environment, machines, agent):
        self.env = env
        self.machines = machines
        self.agent = agent

        self.numberOfMachines = len(machines)
        self.jobsInTheShop = []    # all not-yet-completed jobs
        self.eligibleJobs = []     # jobs whose current op is ready
        self.totalJobs = 0

        # to avoid multiple decisions at same time
        self._decision_scheduled = False

    # ------------------------------------------------------------------
    # 1) ARRIVAL
    # called by the Source / JobSource
    # ------------------------------------------------------------------
    def add_job(self, job):
        """JobArrivalEvent"""
        self.jobsInTheShop.append(job)
        self.eligibleJobs.append(job)
        self.totalJobs += 1
        # every arrival => we want a decision
        self.schedule_decision()

    # ------------------------------------------------------------------
    # 2) DECISION
    # ------------------------------------------------------------------
    def schedule_decision(self):
        if not self._decision_scheduled:
            self._decision_scheduled = True
            self.env.process(self._decision_event())

    def _decision_event(self):
        """MakeDecisionEvent (paper)"""
        # let other events at this same time run first
        yield self.env.timeout(0)
        self._decision_scheduled = False

        if not self.eligibleJobs:
            return

        # ask agent which rule to use
        rule_name = self.agent.select_rule()
        self._apply_rule(rule_name)

    def _apply_rule(self, rule_name: str):
        # pick a job according to the rule
        job = self._select_job_by_rule(rule_name)
        if job is None:
            return

        op = job.next_operation()
        # pick a machine for that operation
        machine = self._select_machine_for_operation(op)
        if machine is None:
            # no machine can take it → leave job eligible
            return

        # assign
        self.eligibleJobs.remove(job)
        machine.queue.append(job)

        # if machine is idle, start immediately
        if machine.ready_to_start():
            self.env.process(self._start_processing_proc(machine))

    def _select_job_by_rule(self, rule_name: str):
        if not self.eligibleJobs:
            return None

        if rule_name == "FIFO":
            # earliest arrival time
            return sorted(self.eligibleJobs, key=lambda j: j.arrival_time)[0]

        if rule_name == "EDD":
            # earliest due date
            return sorted(self.eligibleJobs, key=lambda j: j.due_date)[0]

        if rule_name == "SPT":
            # shortest processing time for CURRENT op on ANY compatible machine
            def est_pt(job):
                op = job.next_operation()
                # find min PT among compatible machines that exist
                pts = [
                    op.processing_times[m.id]
                    for m in self.machines
                    if m.id in op.compatible_machines
                ]
                return min(pts)
            return sorted(self.eligibleJobs, key=est_pt)[0]

        # fallback
        return self.eligibleJobs[0]

    def _select_machine_for_operation(self, op):
        # list of machines that can do this op
        candidates = [m for m in self.machines if m.id in op.compatible_machines]
        if not candidates:
            return None
        # prefer idle
        idle = [m for m in candidates if m.state == "IDLE"]
        if idle:
            return idle[0]
        # otherwise the one that becomes free first
        return sorted(candidates, key=lambda m: m.expected_time_to_idle)[0]

    # ------------------------------------------------------------------
    # 3) START PROCESSING
    # ------------------------------------------------------------------
    def _start_processing_proc(self, machine):
        """StartProcessingEvent (paper)"""
        # pull next job from this machine’s queue
        job = machine.queue.pop(0)
        op = job.next_operation()
        ptime = op.get_proc_time(machine.id)

        # request the simpy resource
        with machine.resource.request() as req:
            yield req

            # start
            machine.state = "PROCESSING"
            machine.current_job = job
            start_time = self.env.now
            finish_time = self.env.now + ptime
            machine.expected_time_to_idle = finish_time
            print(f"[{self.env.now:5.1f}] M{machine.id} START J{job.id} op{op.id} for {ptime}")

            # processing delay
            yield self.env.timeout(ptime)

            # done → call finish
            self._finish_processing(machine, job, start_time, finish_time)

    # ------------------------------------------------------------------
    # 4) FINISH PROCESSING
    # ------------------------------------------------------------------
    def _finish_processing(self, machine, job, start_time, finish_time):
        """FinishProcessingEvent (paper)"""
        print(f"[{self.env.now:5.1f}] M{machine.id} FINISH J{job.id}")

        # update machine stats
        machine.total_busy_time += (finish_time - start_time)
        machine.current_job = None

        # update job
        job.advance_operation(finish_time)

        # if job still has work, make it eligible again
        if job.next_operation() is not None:
            self.eligibleJobs.append(job)
        else:
            # job is completed → remove from system
            if job in self.jobsInTheShop:
                self.jobsInTheShop.remove(job)
            print(f"[{self.env.now:5.1f}] J{job.id} COMPLETED")

        # machine: if more jobs waiting, start next; else go idle
        if machine.queue:
            # start next job on same machine
            self.env.process(self._start_processing_proc(machine))
        else:
            machine.state = "IDLE"
            machine.expected_time_to_idle = self.env.now

        # every finish → new decision
        self.schedule_decision()


## Implementing the Source
-  In our project we have job generator that creats our entities, as mentioned it should create some initial jobs at time zero and the dynamically create jobs that arrive later in the simulation with randome arriaval time. 


In [None]:
import simpy
import random

class JobSource:
    """
    JobSource class
    ----------------
    SimPy-based component that generates jobs for the discrete event simulation.

    **Inputs:**
    - env: SimPy environment object controlling simulation time and events.
    - shop: JobShop instance where generated jobs are added.
    - machine_ids: list of available machine IDs in the shop.
    - n_initial: number of jobs to create at simulation start (time = 0).
    - n_dynamic: number of additional jobs to generate during simulation.
    - mean_interarrival: average time interval (exponential distribution) between dynamic job arrivals.
    - nops_range: tuple (min, max) defining possible number of operations per job.
    - compat_range: tuple (min, max) defining how many machines can perform each operation.
    - pt_range: tuple (min, max) defining processing time range for each operation.
    - ddt: due date tightness factor for job due date calculation.
    - start_job_id: ID number to assign to the first job.
    - seed: random seed for reproducibility.

    **Outputs:**
    - Creates Job instances (each containing its own operations, machine compatibility, and due date)
    and inserts them into the JobShop.
    - Generates two event streams:
        1. Initial job arrivals at time 0.
        2. Dynamic job arrivals following an exponential interarrival distribution.
    """

    def __init__(self,
                 env: simpy.Environment,
                 shop,
                 machine_ids,
                 n_initial=20,
                 n_dynamic=50,
                 mean_interarrival=100.0,
                 nops_range=(2, 5),
                 compat_range=(1, 3),
                 pt_range=(2.0, 10.0),
                 ddt=1.0,
                 start_job_id=1,
                 seed=42):
        
        self.env = env
        self.shop = shop
        self.machine_ids = list(machine_ids)
        self.n_initial = n_initial
        self.n_dynamic = n_dynamic
        self.mean_interarrival = mean_interarrival
        self.nops_range = nops_range
        self.compat_range = compat_range
        self.pt_range = pt_range
        self.ddt = ddt
        self.next_job_id = start_job_id
        self.rng = random.Random(seed)

        # create initial jobs now
        self._create_initial_jobs()

        # start dynamic arrivals
        self.env.process(self._dynamic_arrivals())

    # ---------- job builder (WHAT) ---------- #
    def _make_job(self, job_id: int, now: float):
        # how many operations?
        n_ops = self.rng.randint(self.nops_range[0], self.nops_range[1])

        operations = []
        total_avg_pt = 0.0
        for op_id in range(n_ops):
            # how many machines can do this op?
            k = self.rng.randint(self.compat_range[0],
                                 min(self.compat_range[1], len(self.machine_ids)))
            compat = self.rng.sample(self.machine_ids, k=k)

            # processing times per compatible machine
            pt_low, pt_high = self.pt_range
            pt_dict = {}
            for m in compat:
                pt = self.rng.uniform(pt_low, pt_high)
                pt_dict[m] = pt
            avg_pt = sum(pt_dict.values()) / len(pt_dict)
            total_avg_pt += avg_pt

            op = Operation(op_id, compat, pt_dict)
            operations.append(op)

            # NOTE: Operation & Job come from your other file

        due_date = now + self.ddt * total_avg_pt
        return Job(job_id, arrival_time=now, due_date=due_date, operations=operations)

    # ---------- arrival logic (WHEN) ---------- #
    def _create_initial_jobs(self):
        for _ in range(self.n_initial):
            job = self._make_job(self.next_job_id, now=0.0)
            self.shop.add_job(job)
            self.next_job_id += 1

    def _dynamic_arrivals(self):
        for _ in range(self.n_dynamic):
            gap = self.rng.expovariate(1.0 / self.mean_interarrival)
            yield self.env.timeout(gap)
            job = self._make_job(self.next_job_id, now=self.env.now)
            self.shop.add_job(job)
            self.next_job_id += 1



In [2]:
# des_jobshop_core.py
import simpy
from typing import List, Dict, Optional

### Operation Class


In [7]:
# ======================= PHYSICAL CLASSES ======================= #
class Operation:
    def __init__(self, op_id: int,
                 compatible_machines: List[int],
                 processing_times: Dict[int, float]):
        self.id = op_id
        self.compatible_machines = compatible_machines
        self.processing_times = processing_times

    def get_proc_time(self, machine_id: int) -> float:
        return self.processing_times[machine_id]


class Job:
    def __init__(self, job_id: int, arrival_time: float,
                 due_date: float, operations: List[Operation]):
        self.id = job_id
        self.arrival_time = arrival_time
        self.due_date = due_date
        self.operations = operations
        self.current_op_index = 0
        self.completion_time: Optional[float] = None

    # ---- helpers ----
    def has_next_operation(self) -> bool:
        return self.current_op_index < len(self.operations)

    def next_operation(self) -> Optional[Operation]:
        if self.has_next_operation():
            return self.operations[self.current_op_index]
        return None

    def advance_operation(self, finish_time: float):
        """Call when the current operation is done."""
        self.current_op_index += 1
        if not self.has_next_operation():
            self.completion_time = finish_time


class Machine:
    def __init__(self, env: simpy.Environment, machine_id: int):
        self.env = env
        self.id = machine_id

        # logical state
        self.state = "IDLE"
        self.current_job: Optional[Job] = None
        self.waitingToBeProcessed: List[Job] = []

        # metrics
        self.expectedTimeToIdle: float = 0.0
        self.total_busy_time: float = 0.0

        # simpy resource
        self.resource = simpy.Resource(env, capacity=1)

    @property
    def utilization(self):
        return 0 if self.env.now == 0 else self.total_busy_time / self.env.now

    # ---- machine-level event ----
    def ready_to_start(self) -> bool:
        return self.state == "IDLE" and len(self.waitingToBeProcessed) > 0


class JobShop:
    def __init__(self, env: simpy.Environment, machines: List[Machine]):
        self.env = env
        self.machines = machines
        self.numberOfMachines = len(machines)

        self.jobsInTheShop: List[Job] = []
        self.eligibleJobs: List[Job] = []
        self.totalJobs: int = 0

        # to avoid scheduling multiple make-decisions at the same sim time
        self._decision_scheduled = False

    # ------------- arrivals ------------- #
    def add_job(self, job: Job):
        """Called when a job enters the system (ArriveJobEvent)."""
        print(f"[{self.env.now:5.1f}] Job {job.id} ARRIVES")
        self.jobsInTheShop.append(job)
        self.eligibleJobs.append(job)
        self.totalJobs += 1

        # every arrival → we want a decision
        self.schedule_make_decision()

    # ------------- decisions ------------- #
    def schedule_make_decision(self):
        """Schedule exactly one decision at current time."""
        if not self._decision_scheduled:
            self._decision_scheduled = True
            self.env.process(self._make_decision_event())

    def _make_decision_event(self):
        """This is equivalent to MakeDecisionEvent in the paper."""
        # we yield 0 so other same-time events (arrivals) can run first
        yield self.env.timeout(0)
        self._decision_scheduled = False
        print(f"[{self.env.now:5.1f}] MAKE DECISION")
        self.execute_dispatching_rule()

    # ------------- dispatching ------------- #
    def execute_dispatching_rule(self):
        """
        Very simple rule:
          - while we have eligible jobs
          - pick the first one
          - pick the first compatible machine that can take it (idle or with shortest queue)
        """
        # we may assign multiple jobs in one decision, like the paper
        assigned_any = False
        while self.eligibleJobs:
            job = self.eligibleJobs.pop(0)
            op = job.next_operation()
            if op is None:
                # job already completed (shouldn't happen, but be safe)
                continue

            # choose machine: first idle compatible, else compatible with smallest expected time
            chosen_machine = self._select_machine_for_operation(op)
            if chosen_machine is None:
                # no machine can take it now → put back? for now we stop
                self.eligibleJobs.insert(0, job)
                break

            # assign job to the machine's waiting list
            chosen_machine.waitingToBeProcessed.append(job)
            job.currMachine = chosen_machine.id if hasattr(job, "currMachine") else None
            print(f"[{self.env.now:5.1f}]   assign Job {job.id}→M{chosen_machine.id}")

            # if machine is idle, start it right away
            if chosen_machine.ready_to_start():
                self.env.process(start_processing(self.env, self, chosen_machine))

            assigned_any = True

        if not assigned_any:
            # nothing assigned → fine, just wait for next event
            pass

    def _select_machine_for_operation(self, op: Operation) -> Optional[Machine]:
        # get compatible machines
        candidates = [m for m in self.machines if m.id in op.compatible_machines]
        if not candidates:
            return None
        # try idle first
        idle = [m for m in candidates if m.state == "IDLE"]
        if idle:
            return idle[0]
        # else pick with smallest expectedTimeToIdle
        return sorted(candidates, key=lambda m: m.expectedTimeToIdle)[0]


# ======================= EVENT PROCESSES ======================= #
def start_processing(env: simpy.Environment, shop: JobShop, machine: Machine):
    # take next job from machine queue
    job = machine.waitingToBeProcessed.pop(0)
    op = job.next_operation()
    ptime = op.get_proc_time(machine.id)

    with machine.resource.request() as req:
        yield req
        machine.state = "PROCESSING"
        machine.current_job = job
        start_time = env.now
        finish_time = env.now + ptime
        machine.expectedTimeToIdle = finish_time
        print(f"[{env.now:5.1f}] M{machine.id} START Job {job.id} (op {op.id}) for {ptime}")

        # process
        yield env.timeout(ptime)

        # ✅ call plain function (NOT env.process)
        finish_processing(env, shop, machine, job, start_time, finish_time)


def finish_processing(env: simpy.Environment, shop: JobShop,
                      machine: Machine, job: Job,
                      start_time: float, finish_time: float):
    """
    FinishProcessingEvent in the paper.
    """
    # update job
    job.advance_operation(finish_time)
    print(f"[{env.now:5.1f}] M{machine.id} FINISH Job {job.id}")

    # update machine
    machine.total_busy_time += (finish_time - start_time)
    machine.current_job = None

    # if machine still has queue → start next
    if machine.waitingToBeProcessed:
        # expectedTimeToIdle will be updated by next start
        env.process(start_processing(env, shop, machine))
    else:
        machine.state = "IDLE"
        machine.expectedTimeToIdle = env.now

    # if job not finished → becomes eligible again
    if job.has_next_operation():
        shop.eligibleJobs.append(job)
    else:
        # job completed
        print(f"[{env.now:5.1f}] Job {job.id} COMPLETED at {env.now}")

    # every finish → make decision
    shop.schedule_make_decision()


# ======================= ARRIVAL GENERATOR ======================= #
def job_arrival_process(env: simpy.Environment, shop: JobShop, jobs_to_arrive: List[Job]):
    """
    Simple arrival: each job has its own arrivalTime, we just wait and drop it in.
    """
    for job in jobs_to_arrive:
        # wait until its arrival time
        yield env.timeout(job.arrival_time - env.now)
        shop.add_job(job)




### Define The Problem Setup

In [None]:
# ======================= MAIN TEST ======================= #
if __name__ == "__main__":
    env = simpy.Environment()

    # 1) make machines
    machines = [Machine(env, 0), Machine(env, 1)]
    shop = JobShop(env, machines)

    # 2) make some jobs
    # Job 1: arrives at 0, two ops
    op1_j1 = Operation(0, [0, 1], {0: 5, 1: 4})
    op2_j1 = Operation(1, [1], {1: 3})
    j1 = Job(1, arrival_time=0, due_date=30, operations=[op1_j1, op2_j1])

    # Job 2: arrives at 2, one op
    op1_j2 = Operation(0, [0], {0: 6})
    j2 = Job(2, arrival_time=2, due_date=25, operations=[op1_j2])

    # Job 3: arrives at 4, two ops, both can go to M0
    op1_j3 = Operation(0, [0], {0: 2})
    op2_j3 = Operation(1, [0], {0: 4})
    j3 = Job(3, arrival_time=4, due_date=40, operations=[op1_j3, op2_j3])

    # 3) start arrival process
    env.process(job_arrival_process(env, shop, [j1, j2, j3]))

    # 4) run
    env.run(until=50)


[  0.0] Job 1 ARRIVES
[  0.0] MAKE DECISION
[  0.0]   assign Job 1→M0
[  0.0] M0 START Job 1 (op 0) for 5
[  2.0] Job 2 ARRIVES
[  2.0] MAKE DECISION
[  2.0]   assign Job 2→M0
[  4.0] Job 3 ARRIVES
[  4.0] MAKE DECISION
[  4.0]   assign Job 3→M0
[  5.0] M0 FINISH Job 1
[  5.0] M0 START Job 2 (op 0) for 6
[  5.0] MAKE DECISION
[  5.0]   assign Job 1→M1
[  5.0] M1 START Job 1 (op 1) for 3
[  8.0] M1 FINISH Job 1
[  8.0] Job 1 COMPLETED at 8
[  8.0] MAKE DECISION
[ 11.0] M0 FINISH Job 2
[ 11.0] Job 2 COMPLETED at 11
[ 11.0] M0 START Job 3 (op 0) for 2
[ 11.0] MAKE DECISION
[ 13.0] M0 FINISH Job 3
[ 13.0] MAKE DECISION
[ 13.0]   assign Job 3→M0
[ 13.0] M0 START Job 3 (op 1) for 4
[ 17.0] M0 FINISH Job 3
[ 17.0] Job 3 COMPLETED at 17
[ 17.0] MAKE DECISION


: 

# Poroduction Configuration:

- Based on the ref 1 the production configuration is defined based on the value shown in the table below:

    ![Parameter settings of different production configurations.](./figures/image1.png)
- The production configuration used for training the RL is defined as:

    ![Production configuration for the training.png](./figures/image2.png)


In [None]:

NOM = 10 # {10, 20, 30, 40, 50}Number of Machines
NIJ = 20 # Initial Number of Jobs
TNIJ = 50 # {50, 100, 200} Number of Arrival Jobs
SIM_t = 1000 # Total Simulation Time
DDT = 0.5 # {0.5, 1.0, 1.5} Due Date Tightness
NAMFEO = (0, NOM) # Number of Available Machines for Each Operation
NOBJ = (0, 50)  # Number of Operations per Job
PTOM = (0, 50) # processing time range for each operation on each machine
EXP_BJ = 100 # {50, 100, 200} average exponential distribution between two successive job arrivals




### Run The Simulation

- Here we use simpy to run our simulation.

In [None]:

env = simpy.Environment()

# machines
machines = [Machine(env, i) for i in range(10)]

# random-rule agent (from earlier)
agent = RandomRuleAgent(["FIFO", "SPT", "EDD"])

# job shop
shop = JobShop(env, machines, agent)

# job factory (the thing that actually builds Job+Operations)
rng = random.Random(42)
factory = JobFactory(
    machine_ids=[m.id for m in machines],
    NOBJ=(1, 5),
    NAMFEO=(1, 5),
    PTOM=(1, 20),
    DDT=1.0,
    rng=rng
)

# our single source
source = Source(
    env=env,
    shop=shop,
    factory=factory,
    n_initial=20,
    n_dynamic=50,
    mean_interarrival=100.0,
    start_job_id=1
)

env.run(until=1000)


# Learning Materials


https://www.youtube.com/watch?v=8SLk_uRRcgc
https://www.youtube.com/watch?v=NypbxgytScM


- great weblog for discrete event simulation explination:
https://medium.com/@vitostamatti1995/introduction-to-discrete-event-simulation-with-python-3b0cce67f92e

# References:
For developement of this code three main references was used.
- Reference 1: Dynamic scheduling for flexible job shop with new job insertions by
deep reinforcement learning. Shu Luo 2020 
- Reference 2: A discrete event simulator to implement deep reinforcement
learning for the dynamic flexible job shop scheduling problem
- Reference 3: Deep reinforcement learning for dynamic
scheduling of a flexible job shop.



Q: For training the RL do we need to generate one shop configuration or different shop configuration: From the reference 1 we can see that when training the RL agent they use single production configuration as described in section 6.1, and since they are using non dimensional state representation their method is generalizable to other production configuration as shown in their result section. So in short, we do not need to change the production configuration in the 10 episode (~8000 training steps).

