# Note on the Notebook

The examples in this notebook are inspired by real-world scenarios I've observed or experienced across different projects. They touch on several of the points discussed during the PoC to Production talk.

Each example is designed to:
- Work within the limitations of the free Gurobi restricted license, and  
- Fit within the time constraints of our session.

While the datasets and models here are much smaller than those used in production, the lessons, best practices, and pitfalls to avoid remain exactly the same.

In every example that follows, something is wrong.  
Your mission, should you choose to accept it, is to identify the culprit(s) using what you've learned so far.

**Happy debugging!**

# Install Required Packages

In [None]:
%pip install gurobipy
%pip install pandas
# %pip install plotly

In [None]:
# Import packages
import math
import random
import pandas as pd
import gurobipy as gp
from gurobipy import GRB
import plotly.graph_objects as go

# Example 1: Assign Items to Resources
Replace *items* with orders, goods, products, groceries, food, or workers, and *resources* with trucks, buses, baskets, or boxes, and you'll start seeing this problem everywhere.

Any time you have a limited number of resources with finite capacity, and you need to place, match, assign, or pack your items into those resources, often with goals such as minimizing the number of resources used, you're dealing with a variation of the _bin packing problem_.

In [None]:
def ex1():
    # Load the data
    orders = pd.read_csv('ex1_orders.csv')
    vehicles = pd.read_csv('ex1_vehicles.csv')
    num_orders = len(orders)
    num_vehicles = len(vehicles)
    # Build the model
    model = gp.Model('vehicle_loading')
    # x[i,t]: 1 if order i assigned to vehicle t
    x = model.addVars(num_orders, num_vehicles, vtype=GRB.BINARY, name='x')
    # y[t]: 1 if vehicle t is used
    y = model.addVars(num_vehicles, vtype=GRB.BINARY, name='y')

    # Each order assigned to exactly one vehicle
    model.addConstrs((x.sum(i, '*') == 1 for i in range(num_orders)), name='assign')
    # vehicle capacity constraints
    model.addConstrs(
        (gp.quicksum(orders.loc[i, 'weight'] * x[i, t] for i in range(num_orders))
         <= vehicles.loc[t, 'capacity'] * y[t] for t in range(num_vehicles)), name='capacity')

    # Objective: minimize number of used vehicles
    model.setObjective(y.sum(), GRB.MINIMIZE)
    model.Params.OutputFlag = 0  # turn off the log
    model.optimize()
    if model.status == GRB.OPTIMAL:
        print(f'Optimal. Best Objective: {model.ObjVal}')
    elif model.status == GRB.INFEASIBLE:
        print('Infeasible!')
    else:
        print(f'Model Status: {model.status}')

In [None]:
ex1()

# Solution 1: 
THAT'S FOR YOU TO FILL...

# Example 2: Unexpected Cost
You're working on a problem to determine the optimal flow of products between distribution centers (DCs) and customers, given supply and demand, with the goal of minimizing transportation cost.

You've been told that the current solution is around \\$6M, and the optimal solution of the model is expected not to exceed this value.  
However, the model's optimal solution is close to \\$8M. What do you think is going wrong?

In [None]:
def ex2():
    dcs_df = pd.read_csv('ex2_dcs.csv')
    cust_df = pd.read_csv('ex2_customers.csv')
    lanes_df = pd.read_csv('ex2_lanes.csv')
    # Built some objects for convenience and speed
    dc = dcs_df['dc_id'].tolist()
    cust = cust_df['cust_id'].tolist()
    arcs = list(lanes_df[['dc', 'cust']].itertuples(index=False, name=None))
    supply = dcs_df.set_index('dc_id')['supply'].to_dict()
    demand = cust_df.set_index('cust_id')['demand'].to_dict()
    cost = lanes_df.set_index(['dc', 'cust'])['cost'].to_dict()

    # Model
    model = gp.Model('transport')
    # x[d,c]: flow from dc d to customer c
    x = model.addVars(arcs, lb=0.0, vtype=GRB.CONTINUOUS, name='x')

    # supply constraints
    model.addConstrs((x.sum(d, '*') <= supply[d] for d in dc), name='supply')
    # demand constraints
    model.addConstrs((x.sum('*', c) >= demand[c] for c in cust), name='demand')
    # Objective: minimize total transport cost
    model.setObjective(gp.quicksum(cost[d, c] * x[d, c] for (d, c) in arcs), GRB.MINIMIZE)
    model.Params.OutputFlag = 0  # turn off the log
    model.optimize()
    if model.Status == GRB.OPTIMAL:
        print(f'Objective: {model.ObjVal:,.0f}')

In [None]:
ex2()

# Solution 2:
THAT'S FOR YOU TO FILL...

# Example 3: Why Infeasible?
There are situations where a set of tasks share limited resources, meaning some tasks cannot be executed simultaneously because they depend on or compete for the same resource. We can model this as a compatibility graph, where each task is a node and an edge between two tasks indicates that they have no conflict and can run in parallel. If there is no edge between two tasks (i.e., no compatibility), they cannot both be in the solution. The goal is to find the largest subset of mutually compatible tasks; those that can all be performed at the same time without resource overlap. This is known as the _maximum clique problem_.

This problem appears in social network analysis (e.g., finding the largest group of mutual friends), telecom, bioinformatics, manufacturing, and many other domains.

In the example below, find out why the function returns infeasible.

In [None]:
def ex3():
    # Using this to ensure all are working with the same data
    n, p, seed = 200, 0.95, 42
    random.seed(seed)
    edges = set()
    for i in range(n):
        for j in range(i + 1, n):
            if random.random() < p:
                edges.add((i, j))

    model = gp.Model('max_clique')
    model.Params.TimeLimit = 3
    model.Params.OutputFlag = 0

    x = model.addVars(n, vtype=GRB.BINARY, name='x')
    for i in range(n):
        for j in range(i + 1, n):
            if (i, j) not in edges:  # no edge, no compatibility
                model.addConstr(x[i] + x[j] <= 1, name=f'conflict_{i}_{j}')
    # Find the maximum set of compatible nodes
    model.setObjective(x.sum(), GRB.MAXIMIZE)
    model.optimize()
    if model.status == GRB.OPTIMAL:
        print(f'Optimal. Best Objective: {model.ObjVal}')
    else:
        print('Infeasible')

In [None]:
ex3()

# Solution 3:
THAT'S FOR YOU TO FILL...

# Example 4: Non-Integer Integers! 
This small model looks straightforward with two integer variables.
But when you look at the optimal values, something seems off. What's going on?

In [None]:
def ex4():
    model = gp.Model()
    x = model.addVar(vtype=GRB.INTEGER, ub=1000, name="x")
    y = model.addVar(vtype=GRB.INTEGER, ub=1000, name="y")
    # max Ï€x + ey
    model.setObjective(math.pi * x + math.e * y, GRB.MAXIMIZE)
    model.addConstr(math.pi * x + 1 / 3 * y <= 3141.59265)
    model.addConstr(x + y <= 1000)
    model.Params.OutputFlag = 0
    model.optimize()
    if model.Status == GRB.OPTIMAL:
        print(f'Optimal Solution: {model.ObjVal}')
        for v in model.getVars():
            print(v.VarName, v.X)

In [None]:
ex4()

# Solution 4:
THAT'S FOR YOU TO FILL...

# Example 5: Unexpected Solution
We're revisiting Example 2, but with a new constraint.  
This time, we want to decide which DCs to open and determine the flow of products from the open DCs to customers, given supply and demand, to minimize total cost (transportation + fixed opening costs).

But the optimal solution is totally unexpected. See if you can find out why.

In [None]:
def ex5():
    dcs_df = pd.read_csv('ex2_dcs.csv')
    cust_df = pd.read_csv('ex2_customers.csv')
    lanes_df = pd.read_csv('ex2_lanes.csv')

    # Built some objects for convenience and speed
    dc = dcs_df['dc_id'].tolist()
    cust = cust_df['cust_id'].tolist()
    arcs = list(lanes_df[['dc', 'cust']].itertuples(index=False, name=None))
    supply = dcs_df.set_index('dc_id')['supply'].to_dict()
    dem_max = cust_df.set_index('cust_id')['demand'].to_dict()
    dem_min = cust_df.set_index('cust_id')['min_demand'].to_dict()
    cost = lanes_df.set_index(['dc', 'cust'])['cost'].to_dict()

    # Model
    model = gp.Model('facility_location')
    # x[d,c]: flow from dc d to customer c
    x = model.addVars(arcs, lb=0.0, vtype=GRB.CONTINUOUS, name='x')
    # y[d]: 1 if dc d is open
    y = model.addVars(dc, vtype=GRB.BINARY, name='y')

    # per customer: min_demand[c] <= sum_d x[d,c] <= demand[c]
    model.addConstrs((x.sum('*', c) >= dem_min[c] for c in cust), name='min_demand')
    model.addConstrs((x.sum('*', c) <= dem_max[c] for c in cust), name='max_demand')

    # dc capacity: sum_c x[d,c] <= supply[d] * y[d]
    model.addConstrs((x.sum(d, '*') <= supply[d] * y[d] for d in dc), name='capacity')

    # Objective: shipping + fixed cost of opening a dc
    shipping = gp.quicksum(cost[d, c] * x[d, c] for (d, c) in arcs)
    fixed = 10_000 * y.sum()
    model.setObjective(shipping + fixed, GRB.MINIMIZE)
    model.Params.OutputFlag = 0
    model.optimize()
    if model.Status == GRB.OPTIMAL:
        print(f'Objective value: {model.ObjVal:.0f}')
        print(f'Facilities opened: {sum(int(round(y[d].X)) for d in dc)}')
        print(f'Total shipped: {sum(x[arc].X for arc in arcs):.0f}')

In [None]:
ex5()

# Solution 5:
THAT'S FOR YOU TO FILL...

# Example 6: Another Unexpected Solution
Let's look at a simplified version of the facility location problem. Our goal is to decide how many facilities to open and how much to ship from each one to meet total demand at the lowest possible cost.

The model appears correct, and the solution is reported as optimal,  but something isn't right. Can you figure out what's happening?

_Hint_: It's not related to `model.Params.Presolve = 0`. We turned presolve off intentionally to make sure the issue shows up. In a small model like this, Gurobi's presolve is smart enough that if it were on, the model would solve without showing the problem.

In [None]:
def ex6():
    num_facilities = 3
    demand = 100
    shipping_cost = 1
    facility_cost = 5000
    M = 1e8

    model = gp.Model()
    model.Params.Presolve = 0
    # Quantity to ship from each facility
    x = model.addVars(num_facilities, name="ship")
    # Whether or not we open a facility
    y = model.addVars(num_facilities, vtype=GRB.BINARY, name="open_facility")
    # Satisfy customer demand
    model.addConstr(x.sum() == demand)
    # Only ship if facility is open
    for i in range(num_facilities):
        model.addConstr(x[i] <= M * y[i])
    # Minimize shipping costs + facility-opening costs
    model.setObjective(shipping_cost * x.sum() + facility_cost * y.sum(), GRB.MINIMIZE)
    model.optimize()
    if model.Status == GRB.OPTIMAL:
        for v in model.getVars():
            print(v.VarName, v.X)

In [None]:
ex6()

# Solution 6:
THAT'S FOR YOU TO FILL...

# Other Cautionary Tales
Here, we share additional examples, cautionary tales, and best practices to keep in your toolkit as you move from simple notebooks toward production-ready code.

## Failure After Project Handoff
- If you can't have unit tests for your code, at least deliver it with a few small datasets (in text formats like CSV or JSON that are version-controlled) that run successfully with your model. These act as your basic tests.
- If someone modifies the model and later reports an issue, you can rerun your small test dataset first to check whether it still passes. This helps you isolate whether the problem lies in the code or in the new data.

## Data-Specific Rules
- It's rare to build a model that runs only once and on a single dataset. Different datasets may include their own rules or hidden constraints that aren't obvious at first.
- Your model might run perfectly on one dataset and fail on another because of these differences.
- **Test your model on multiple datasets that capture the diversity of the system you're modeling.**