
# Unified Manufacturing Scheduling — MIP (PuLP) + QUBO (neal) on One Dataset

This notebook demonstrates **Job Shop Scheduling (JSSP)** on a **single synthetic dataset** using two approaches:

1. **Classical MIP (PuLP)**  
   - Variables: start times \(S_{j,k}\) for each job \(j\) operation \(k\), sequencing binaries on each machine, and makespan \(C_{\max}\).  
   - Objective: **minimize makespan** subject to machine non-overlap and job precedence.

2. **QUBO (time-indexed)** with **simulated annealing** (`neal`)  
   - Variables: binary one-hot \(x_{(j,k),t}\) indicates operation \((j,k)\) starts at time \(t\).  
   - Penalties enforce: **each operation starts exactly once**, **no machine overlap per time slot**, and **precedence** between consecutive operations within a job.

**Dataset:** We generate one set of \(J\) jobs and \(M\) machines. Each job requires exactly one operation on **each** machine in a (random) route, with processing times sampled randomly. The same dataset feeds both approaches.


In [None]:

# Install missing deps (works in Colab):
# - PuLP for MIP
# - dimod/neal for QUBO simulated annealing

def _silent_imports():
    flags = {"pulp": False, "dimod": False, "neal": False}
    try:
        import pulp  # noqa: F401
        flags["pulp"] = True
    except Exception:
        pass
    try:
        import dimod  # noqa: F401
        flags["dimod"] = True
    except Exception:
        pass
    try:
        import neal  # noqa: F401
        flags["neal"] = True
    except Exception:
        pass
    return flags

flags = _silent_imports()
if not flags["pulp"]:
    %pip -q install pulp
if not flags["dimod"] or not flags["neal"]:
    %pip -q install dimod neal

flags = _silent_imports()
print("PuLP:", flags["pulp"], "| dimod:", flags["dimod"], "| neal:", flags["neal"])


In [None]:

# ==== One Synthetic Dataset ====
import numpy as np, pandas as pd

rng = np.random.default_rng(123)

J = 5   # jobs
M = 3   # machines
jobs = [f"J{j+1}" for j in range(J)]
machines = [f"M{m+1}" for m in range(M)]

# Each job has M operations; route is a random permutation of machines
routes = {j: list(rng.permutation(machines)) for j in jobs}

# Processing times p[j][op_machine]
proc_times = {j: {m: int(rng.integers(2, 9)) for m in machines} for j in jobs}

# Upper bound on horizon (sum of max per job)
H = int(sum(max(proc_times[j][m] for m in machines) for j in jobs) + 5)

df_routes = pd.DataFrame({j: routes[j] for j in jobs})
df_pt = pd.DataFrame(proc_times).T[machines]  # rows jobs, cols machines

print("Jobs:", jobs)
print("Machines:", machines)
print("Time horizon H:", H)
df_routes, df_pt



## Part 1 — MIP (Minimize Makespan) with PuLP (Heuristic Fallback)

**MIP model (disjunctive):**
- \(S_{j,k}\): start time of operation \(k\) of job \(j\) (continuous \(\ge 0\)).  
- \(C_{\max}\): makespan (continuous).  
- For each machine \(m\), for any two operations \(a,b\) processed on \(m\), introduce binary \(y_{a,b}\) s.t. either \(a\) precedes \(b\) or vice versa.

**Constraints:**
- **Precedence** within a job: next operation starts after previous finishes.  
- **Machine non-overlap**: one operation at a time on each machine.  
- **Makespan**: operation completion \(\le C_{\max}\).

If PuLP is unavailable, we use a **greedy heuristic** (Giffler-Thompson–style) as a fallback.


In [None]:

import math, pandas as pd, numpy as np

try:
    import pulp
    HAVE_PULP = True
except Exception:
    HAVE_PULP = False

# Build operation list: for each job j, we define ordered ops as (j, k, machine m_k, proc p)
ops = []  # list of dicts
op_index = {}
for j in jobs:
    route = routes[j]
    for k, m in enumerate(route):
        ops.append({"job": j, "op_idx": k, "machine": m, "p": int(proc_times[j][m])})
op_ids = [f"{o['job']}_{o['op_idx']}" for o in ops]
for idx, oid in enumerate(op_ids):
    op_index[oid] = idx

def solve_mip_jobshop(ops, jobs, machines, H):
    prob = pulp.LpProblem("JSSP_Makespan", pulp.LpMinimize)
    # Start time variables and Cmax
    S = {oid: pulp.LpVariable(f"S_{oid}", lowBound=0, cat="Continuous") for oid in op_ids}
    Cmax = pulp.LpVariable("Cmax", lowBound=0, cat="Continuous")

    # Objective
    prob += Cmax

    # Precedence constraints within each job
    for j in jobs:
        route = routes[j]
        for k in range(len(route)-1):
            oid_prev = f"{j}_{k}"
            oid_next = f"{j}_{k+1}"
            p_prev = next(o["p"] for o in ops if o["job"]==j and o["op_idx"]==k)
            prob += S[oid_next] >= S[oid_prev] + p_prev

    # Machine non-overlap via disjunctive binaries
    BIGM = H + sum(o["p"] for o in ops)
    for m in machines:
        # operations on machine m
        ops_m = [o for o in ops if o["machine"]==m]
        for i in range(len(ops_m)):
            for k in range(i+1, len(ops_m)):
                oi = ops_m[i]; ok = ops_m[k]
                oi_id = f"{oi['job']}_{oi['op_idx']}"
                ok_id = f"{ok['job']}_{ok['op_idx']}"
                y = pulp.LpVariable(f"y_{oi_id}_before_{ok_id}", lowBound=0, upBound=1, cat="Binary")
                # If y=1 then oi before ok; else ok before oi
                prob += S[ok_id] >= S[oi_id] + oi["p"] - BIGM*(1-y)
                prob += S[oi_id] >= S[ok_id] + ok["p"] - BIGM*(y)

    # Makespan constraints
    for o in ops:
        oid = f"{o['job']}_{o['op_idx']}"
        prob += Cmax >= S[oid] + o["p"]

    # Solve
    _ = prob.solve(pulp.PULP_CBC_CMD(msg=False))
    status = pulp.LpStatus[prob.status]
    if status not in ("Optimal", "Feasible"):
        return status, None, None

    S_sol = {oid: pulp.value(S[oid]) for oid in op_ids}
    Cmax_sol = pulp.value(Cmax)
    return status, S_sol, Cmax_sol

def heuristic_schedule(ops, jobs, machines):
    # Simple list scheduling: each job follows its route; pick next schedulable op with earliest machine availability.
    # Track availability times for machines and jobs.
    m_avail = {m: 0 for m in machines}
    j_avail = {j: 0 for j in jobs}
    # Next operation index to schedule per job
    next_k = {j: 0 for j in jobs}

    S_sol = {}
    finished_ops = 0
    total_ops = len(ops)

    # Pre-index processing times by (job, op_idx)
    p = {(o["job"], o["op_idx"]): o["p"] for o in ops}
    m_of = {(o["job"], o["op_idx"]): o["machine"] for o in ops}

    while finished_ops < total_ops:
        # Candidate ops ready to schedule: next per job if any remain
        candidates = []
        for j in jobs:
            k = next_k[j]
            if k < len(routes[j]):
                m = m_of[(j,k)]
                start_time = max(j_avail[j], m_avail[m])
                candidates.append((start_time, j, k, m))
        if not candidates:
            break
        # Choose the one with earliest possible start; tie-break by shortest proc time
        candidates.sort(key=lambda t: (t[0], p[(t[1],t[2])]))
        start, j, k, m = candidates[0]
        oid = f"{j}_{k}"
        S_sol[oid] = start
        finish = start + p[(j,k)]
        # Update availability
        j_avail[j] = finish
        m_avail[m] = finish
        next_k[j] += 1
        finished_ops += 1

    Cmax = max(S_sol[oid] + next(o["p"] for o in ops if f"{o['job']}_{o['op_idx']}"==oid) for oid in S_sol)
    return "Heuristic", S_sol, Cmax

if HAVE_PULP:
    mip_status, S_mip, Cmax_mip = solve_mip_jobshop(ops, jobs, machines, H)
else:
    mip_status, S_mip, Cmax_mip = heuristic_schedule(ops, jobs, machines)

print("MIP/Heuristic status:", mip_status)
print("Cmax (MIP/Heuristic):", Cmax_mip)


In [None]:

# Simple Gantt chart for the MIP/heuristic schedule
import matplotlib.pyplot as plt

fig = plt.figure()
y_ticks = []
y_labels = []
y = 0
for j in jobs:
    # draw each job's ops as horizontal bars at level y
    for k, m in enumerate(routes[j]):
        oid = f"{j}_{k}"
        s = S_mip[oid]
        p = proc_times[j][m]
        plt.barh(y, p, left=s)
        plt.text(s + p/2, y, f"{m}", ha="center", va="center")
    y_ticks.append(y)
    y_labels.append(j)
    y += 1

plt.yticks(y_ticks, y_labels)
plt.xlabel("Time")
plt.title("Gantt Chart — JSSP Schedule (MIP/Heuristic)")
plt.tight_layout()



## Part 2 — QUBO (Time-Indexed) on the Same Dataset

We define one-hot binary variables \(x_{(j,k),t}\) that equal 1 iff operation \((j,k)\) **starts** at time \(t\).  
Let \(p_{(j,k)}\) be its processing time and \(m_{(j,k)}\) its machine.

**Penalties:**
1. **Start once:** \(\forall (j,k): \big(1 - \sum_t x_{(j,k),t}\big)^2\)  
2. **Machine capacity:** \(\forall m, \forall \tau:\; \big(\sum_{(j,k): m_{(j,k)}=m} \sum_{t: \tau\in[t, t+p_{(j,k)}-1]} x_{(j,k),t} - 1\big)^2\)  
3. **Precedence:** for consecutive ops \((j,k)\to(j,k+1)\), forbid starts where \(t_{k+1} < t_k + p_k\) via pairwise penalties \(x_{(j,k),t} \cdot x_{(j,k+1),t'}\) when \(t' < t + p_k\).

**Objective:** Minimize makespan proxy by encouraging earlier starts: add a small linear bias \(\alpha \sum_{(j,k),t} t \cdot x_{(j,k),t}\).  
(Alternatively, include an explicit time-indexed completion penalty relative to a target horizon.)


In [None]:

from collections import defaultdict
import numpy as np, dimod, neal

# Build op metadata
op_list = []
for j in jobs:
    for k, m in enumerate(routes[j]):
        op_list.append((j, k, m, int(proc_times[j][m])))

# Restrict feasible start times to keep model compact: [0, H - p]
start_domain = {}
for (j,k,m,p) in op_list:
    start_domain[(j,k)] = list(range(0, max(1, H - p + 1)))

# Penalty weights
A = 5.0   # start-once
B = 5.0   # machine capacity
C = 5.0   # precedence
alpha = 0.05  # small bias to prefer earlier starts

def var_key(j,k,t):  # label helper
    return f"x|{j}|{k}|{t}"

Q = defaultdict(float)

# (1) Start-once penalties
for (j,k,m,p) in op_list:
    dom = start_domain[(j,k)]
    # (1 - sum_t x)^2 = 1 - 2 sum x + sum x + 2 sum_{t<u} x_t x_u
    for t in dom:
        Q[(var_key(j,k,t), var_key(j,k,t))] += (-A)  # -A after expansion
    # pairwise within the op (t<u): +2A
    for idx, t in enumerate(dom):
        for u in dom[idx+1:]:
            Q[(var_key(j,k,t), var_key(j,k,u))] += (2*A)

# (2) Machine capacity: at each time tau, the number of running ops on machine m shouldn't exceed 1
# O_{m,tau} = sum_{(j,k): m_{jk}=m} sum_{t in dom: tau in [t, t+p-1]} x_{jkt}
# penalty: (O_{m,tau}-1)^2
for m in machines:
    # gather op indices on m
    ops_m = [(j,k,p) for (j,k,mm,p) in op_list if mm==m]
    for tau in range(H):
        # collect all (j,k,t) that run at tau
        run_vars = []
        for (j,k,p) in ops_m:
            for t in start_domain[(j,k)]:
                if (t <= tau) and (tau < t + p):
                    run_vars.append((j,k,t))
        # add terms: -2B sum x + B sum x  + 2B sum_{pairs} x x  => net linear (-B), pairwise (2B)
        for (j,k,t) in run_vars:
            Q[(var_key(j,k,t), var_key(j,k,t))] += (-B)
        for idx, a in enumerate(run_vars):
            for b in run_vars[idx+1:]:
                Q[(var_key(*a), var_key(*b))] += (2*B)

# (3) Precedence: forbid t_next < t + p_prev using pairwise penalties +C * x_prev(t) x_next(t')
for j in jobs:
    route = routes[j]
    for k in range(len(route)-1):
        p_prev = int(proc_times[j][route[k]])
        for t_prev in start_domain[(j,k)]:
            for t_next in start_domain[(j,k+1)]:
                if t_next < t_prev + p_prev:
                    Q[(var_key(j,k,t_prev), var_key(j,k+1,t_next))] += C

# (4) Early-start bias: alpha * sum t * x  (linear, on-diagonal)
for (j,k,m,p) in op_list:
    for t in start_domain[(j,k)]:
        Q[(var_key(j,k,t), var_key(j,k,t))] += (alpha * t)

# Build and solve
bqm = dimod.BinaryQuadraticModel.from_qubo(dict(Q))
sampler = neal.SimulatedAnnealingSampler()
sampleset = sampler.sample(bqm, num_reads=2000)
best = sampleset.first

x = best.sample  # var -> 0/1
# Decode to start times: pick t with x=1 per op; if multiple, choose earliest
S_qubo = {}
for (j,k,m,p) in op_list:
    dom = start_domain[(j,k)]
    chosen = [t for t in dom if x.get(var_key(j,k,t), 0)==1]
    if chosen:
        S_qubo[f"{j}_{k}"] = min(chosen)
    else:
        # fall back: assign None (unscheduled)
        S_qubo[f"{j}_{k}"] = None

# Compute Cmax if fully scheduled
complete = all(S_qubo[f"{j}_{k}"] is not None for (j,k,m,p) in op_list)
if complete:
    Cmax_qubo = max(S_qubo[f"{j}_{k}"] + p for (j,k,m,p) in op_list)
else:
    Cmax_qubo = None

print("QUBO energy:", best.energy)
print("Cmax (QUBO):", Cmax_qubo)


In [None]:

# Gantt chart for the QUBO schedule (if complete)
import matplotlib.pyplot as plt

if all(S_qubo.get(f"{j}_{k}") is not None for (j,k,m,p) in op_list):
    fig = plt.figure()
    y_ticks = []
    y_labels = []
    y = 0
    for j in jobs:
        for k, m in enumerate(routes[j]):
            oid = f"{j}_{k}"
            s = S_qubo[oid]
            p = proc_times[j][m]
            plt.barh(y, p, left=s)
            plt.text(s + p/2, y, f"{m}", ha="center", va="center")
        y_ticks.append(y)
        y_labels.append(j)
        y += 1
    plt.yticks(y_ticks, y_labels)
    plt.xlabel("Time")
    plt.title("Gantt Chart — QUBO Schedule")
    plt.tight_layout()
else:
    print("QUBO schedule incomplete; skipping Gantt.")



### Comparing MIP vs QUBO
- **MIP** is exact (given a solver) for this formulation and typically finds optimal or near-optimal schedules for small–medium instances.  
- **QUBO** uses a **time-indexed** encoding with penalties; it’s flexible and maps to **quantum annealing / QAOA**.  
- The QUBO includes a small **early-start bias** as a makespan proxy; you can increase/decrease it or add a target-horizon penalty for tighter control.
- Differences in \(C_{\max}\) are expected; QUBO trades strict optimality for a formulation compatible with quantum/hybrid solvers.
