# Customer Assignment Problem

Customer assignment is a type of facility location problem that deals with finding the best locations for one or more facilities to serve a given set of customers. The objective is usually to minimize the total distance or cost of traveling from the facilities to the customers, while satisfying some constraints such as capacity, budget, or demand.

Customer assignment problems are relevant for many industries and sectors that need to plan their operations strategically. For example:

- Producers of goods need to design their supply chains, which involve choosing the locations and capacities of factories, distribution centers, warehouses, and retail stores.
- Healthcare providers need to optimize their population coverage, which involves deciding where to build hospitals, clinics, or other health facilities.

These are long-term decisions that require careful analysis and evaluation, as they involve high costs and have a significant impact on customer satisfaction and operational efficiency. One of the key factors to consider in these problems is the location of the customers, as it affects the distance or cost of traveling from the facilities to the customers.

In this tutorial, we will use Pyomo to solve a customer assignment problem. We will model the problem as a mixed-integer programming (MIP) problem, and use an example from the [Gurobi GitHub repository](https://github.com/Gurobi/modeling-examples/). We will also investigate the application of the *k-means algorithm* to pre-process the customer location data.

## Problem description

We are dealing with a customer assignment problem, which aims to find the best locations for one or more facilities from a set of possible sites, so that the total distance or cost of traveling from the facilities to the customers is minimized. If the facilities have no capacity limit, we can assume that each customer is served by the nearest facility.

However, if we have a large number of customers, it may be impractical or inefficient to consider each customer's location individually. In that case, we can group the customers into clusters based on their proximity, and use the cluster centers as representative locations for the customers. This simplifies the problem, but also introduces an assumption that all the customers in a cluster are served by the same facility. To find the optimal clusters, we can use the *k-means algorithm*, which partitions $n$ objects into $k$ non-overlapping clusters that minimize the within-cluster variation.

## Mathematical formulation

### Sets and parameters

Let us define the following sets and parameters for our problem:

- $I$: Set of customer clusters.
- $J$: Set of potential facility locations.
- $w_i$: Number of customers in cluster $i \in I$.
- $d_{j,i}$ : Distance from facility location $j \in J$ to customer cluster $i \in I$.
- $\tau$: Maximum distance for a cluster-facility pairing to be considered.
- $m$: Maximum number of facilities to be opened.
- $P$: Set of allowed pairings. A pairing is allowed if the distance between the cluster and the location is less than or equal to the given threshold. That is, $P=\{(i,j)\in I\times J \,|\, d_{ij} \leq \tau\}$.

### Decision variables

We use the following variables to represent our decisions:

- $y_j$: Binary variable indicating whether facility location $j \in J$ is selected.
- $x_{j,i}$: Binary variable indicating whether cluster $i \in I$ is assigned to facility location $j \in J$.

### Objective function

Our goal is to minimize the total distance from the customer clusters to their assigned facilities. This distance is weighted by the number of customers in each cluster, and multiplied by the binary variable that indicates whether the cluster is assigned to the facility or not. The objective function is:

$$
\min \ Z = \sum_{i \in I} \sum_{j \in J} w_i \, d_{ji} \, x_{ji}
$$

### Constraints

We need to satisfy the following constraints for our problem:

- Facility limit. We cannot open more facilities than the maximum limit:

$$
\sum_{j \in J} y_j \leq m
$$

- Open to assign. We can only assign a customer cluster to a facility location only if we have opened a facility there:
  
$$
x_{ji} \leq y_j,\quad i \in I,\ j \in J
$$

- Closest store. Each customer cluster must be assigned to exactly one facility location. We cannot split a cluster among multiple facilities, or leave a cluster unassigned.

$$
\sum_{j\in J} x_{ji} = 1,\quad i \in I
$$

## Problem instance generation

In this example, we generate some random data for the customer locations and the facility locations. We assume that the customers are clustered around a few population centers, which are randomly chosen in a two-dimensional space. We use Gaussian distributions to simulate the variation of the customer locations around each center. The facility locations are also randomly chosen in the same space, but with a uniform distribution.

In [None]:
!pip install numpy

In [None]:
import numpy as np

seed = 2020  # seed for random number generator
num_customers = 50_000  # total number of customers
num_candidates = 20  # total number of facility locations
max_facilities = 8  # maximum number of facilities to be opened
num_clusters = 50  # number of customer clusters to be used in the model
num_gaussians = 10  # number of population centers for the customers
threshold = 0.99  # maximum distance for a cluster-facility pairing to be considered

np.random.seed(seed)  # set seed for reproducibility

# generate the number of customers for each population center
customers_per_gaussian = np.random.multinomial(num_customers, [1 / num_gaussians] * num_gaussians)
print(customers_per_gaussian)

In [None]:
centers = np.random.uniform(-0.5, 0.5, size=(num_gaussians, 2))
customer_locs = np.zeros(shape=(num_customers, 2))

last_customer = 0
for n in range(num_gaussians):
    # create random locations around centers[n] for each customer
    customer_locs[last_customer : last_customer + customers_per_gaussian[n]] = np.random.normal(
        centers[n], 0.1, size=(customers_per_gaussian[n], 2)
    )

    last_customer += customers_per_gaussian[n]

In [None]:
# visualize the customer locations
import matplotlib.pyplot as plt

# Plot the customer locations
customers_x = customer_locs[:, 0]
customers_y = customer_locs[:, 1]

plt.figure(figsize=(8, 8))
plt.scatter(customers_x, customers_y, c="g", alpha=0.4, s=0.5)
plt.xlabel("X Coordinate")
plt.ylabel("Y Coordinate")
plt.title("Customer Locations")
plt.show()

In [None]:
# generate the facility locations using a uniform distribution in [-0.5, 0.5]
facility_locs = np.random.uniform(low=-0.5, high=0.5, size=(num_candidates, 2))

# visualize the facility locations
facilities_x = facility_locs[:, 0]
facilities_y = facility_locs[:, 1]

plt.figure(figsize=(8, 8))
plt.scatter(customers_x, customers_y, c="g", alpha=0.4, s=0.5, label="Customer")
plt.scatter(facilities_x, facilities_y, c="r", marker="s", alpha=0.8, s=20, label="Facility")
plt.xlabel("X Coordinate")
plt.ylabel("Y Coordinate")
plt.title("All Locations")
plt.legend()
plt.show()

### Clustering

To reduce the complexity of the optimization model, we group the customers into clusters based on their locations, and assign each cluster to a facility. We use the *k-means algorithm* to find the optimal clusters, which minimize the within-cluster variation. We use the `scikit-learn` package to implement the k-means algorithm.

In [None]:
!pip install scikit-learn

In [None]:
from sklearn.cluster import MiniBatchKMeans

kmeans = MiniBatchKMeans(n_clusters=num_clusters, random_state=seed)
# Fit the K-means object to the customer locations
kmeans.fit(customer_locs)

# Get the cluster labels for each customer
memberships = kmeans.labels_

# Get the cluster centers for each cluster
centroid_locs = kmeans.cluster_centers_

# Get the number of customers in each cluster
weights = list(np.histogram(memberships, bins=num_clusters)[0])

# Print the first cluster center and the weights for the first 10 clusters
print("First cluster center:", centroid_locs[0])
print("Weights for the first 10 clusters:", weights[:10])

In [None]:
# Visualize the clusters
centers_x = centroid_locs[:, 0]
centers_y = centroid_locs[:, 1]

plt.figure(figsize=(8, 8))
plt.scatter(customers_x, customers_y, c=memberships, alpha=0.4, s=0.5)
plt.scatter(centers_x, centers_y, c="blue", marker="^", alpha=0.8, s=12)
plt.xlabel("X Coordinate")
plt.ylabel("Y Coordinate")
plt.title("Customer Clusters")
plt.show()

In [None]:
# Visualize customer centers and facilities
plt.figure(figsize=(8, 8))
plt.scatter(centers_x, centers_y, c="blue", marker="^", alpha=0.6, s=12)
plt.scatter(facilities_x, facilities_y, c="r", marker="s", alpha=0.8, s=20)
plt.xlabel("X Coordinate")
plt.ylabel("Y Coordinate")
plt.title("Cluster Centroids and Facilities")
plt.show()

### Viable customer-facility pairings

We do not need to consider all possible pairings between customer clusters and facility locations, as some of them may be too far apart to be feasible. We can use a heuristic to filter out the pairings that exceed a given distance threshold. This will reduce the size of our optimization model and make it easier to solve.

We define a function to compute the Euclidean distance between two locations, and then use a dictionary comprehension to create a dictionary of viable pairings. The keys of the dictionary are tuples of facility and cluster indices, and the values are the distances between them. We only include the pairings that have a distance less than the threshold $\tau$.

In [None]:
def dist(loc1, loc2):
    return np.linalg.norm(loc1 - loc2, ord=2)  # Euclidean distance


pairings = {
    (j, i): dist(facility_locs[j], centroid_locs[i])
    for j in range(num_candidates)
    for i in range(num_clusters)
    if dist(facility_locs[j], centroid_locs[i]) < threshold
}

print("Number of all pairings: {}".format(num_candidates * num_clusters))
print("Number of viable pairings: {}".format(len(pairings.keys())))

## The concrete Pyomo model

In [None]:
!pip install gurobipy pyomo

import pyomo.environ as pyo
from pyomo.opt import SolverFactory

# Include your WSL license information
solver_options = {}

In [None]:
# Create the model
mod = pyo.ConcreteModel(name="FLP")

# Decision variables
mod.y = pyo.Var(range(num_candidates), domain=pyo.Binary)
mod.x = pyo.Var(pairings.keys(), domain=pyo.Binary)

# Objective
expr = sum(weights[i] * pairings[j, i] * mod.x[j, i] for (j, i) in pairings.keys())
mod.obj = pyo.Objective(expr=expr, sense=pyo.minimize)

In [None]:
# Constraints

# Do not open more than m facilities
mod.max_open = pyo.Constraint(expr=sum(mod.y[j] for j in range(num_candidates)) <= max_facilities)

# Do not assign customers to closed facilities
mod.open_assign = pyo.ConstraintList()

for j, i in pairings.keys():
    mod.open_assign.add(mod.x[j, i] <= mod.y[j])

# Assign each cluster to exactly one facility
mod.cluster_assign = pyo.ConstraintList()

for i in range(num_clusters):
    expr = sum(mod.x[j, i] for j in range(num_candidates) if (j, i) in pairings.keys())
    mod.cluster_assign.add(expr == 1)

In [None]:
# Call the solve and solve the model
opt = SolverFactory("gurobi", solver_io="python", manage_env=True, solver_options=solver_options)
results = opt.solve(mod, tee=True)

## Solution analysis

To visualize our solution, we can plot a map of the customer and facility locations. The map shows the following features:

- The customer locations are shown as small green dots. These are the original data points that we clustered using the k-means algorithm.
- The customer cluster centroids are shown as blue triangles. These are the representative locations that we used in our optimization model.
- The facility location candidates are shown as red squares. These are the possible sites where we can open a facility.
- The selected facility locations are shown as black squares. These are the sites where we decided to open a facility, based on our optimization model.
- The cluster-facility assignments are shown as black lines. These are the connections between the customer clusters and the facilities that serve them. Notice that each cluster is assigned to exactly one facility, and each facility serves one or more clusters.

In [None]:
open_facilities = [j for j in range(num_candidates) if pyo.value(mod.y[j]) > 0.5]
assignments = [(j, i) for (j, i) in pairings if pyo.value(mod.x[j, i]) > 0.5]

open_facility_x = facility_locs[open_facilities][:, 0]
open_facility_y = facility_locs[open_facilities][:, 1]

plt.figure(figsize=(8, 8))
plt.scatter(customers_x, customers_y, c="g", alpha=0.25, s=0.5)
plt.scatter(centers_x, centers_y, c="b", marker="^", alpha=0.8, s=12)
plt.scatter(facilities_x, facilities_y, c="r", marker="s", alpha=0.6, s=20)
plt.scatter(open_facility_x, open_facility_y, c="black", marker="s", alpha=0.9, s=22)

for j, i in assignments:
    plt.plot([facilities_x[j], centers_x[i]], [facilities_y[j], centers_y[i]], c="black", alpha=0.5)

plt.xlabel("X Coordinate")
plt.ylabel("Y Coordinate")
plt.title("The Solution")
plt.show()

### Exercise

* Show the actual customer-facility assignments.

In [None]:
# map each customer to its cluster centroid location
customer_assigns = [0] * num_customers
for c in range(num_customers):
    i = memberships[c]
    j = [
        j
        for j in range(num_candidates)
        if (j, i) in pairings.keys() and pyo.value(mod.x[j, i]) > 0.5
    ][0]

    customer_assigns[c] = j

plt.figure(figsize=(8, 8))
plt.scatter(customers_x, customers_y, c=customer_assigns, alpha=0.30, s=0.5)
plt.scatter(
    open_facility_x, open_facility_y, c=open_facilities, marker="s", edgecolors="black", s=30
)


plt.xlabel("X Coordinate")
plt.ylabel("Y Coordinate")
plt.title("The Solution")
plt.show()

- Consider a scenario where opening a facility incurs a fixed cost. Let's categorize candidate facility locations into two types: Downtown and Other. Downtown facilities are those located near the center, specifically within a square box with a length of 0.5 centered at the point $(0, 0)$. Other facilities are located outside this box. To incorporate the fixed cost into the model, we assign a cost of $10K to each Downtown facility and a cost of $2K to each Other facility. Reformulating the model to account for these fixed costs and find the optimal facility locations.

In [None]:
l = 0.5 / 2
fixed_cost_downtown = 10_000
fixed_cost_other = 2_000


# define a function to return the fixed cost
def get_fixed_cost(j):
    if -l <= facilities_x[j] <= l and -l <= facilities_y[j] <= l:
        return fixed_cost_downtown
    else:
        return fixed_cost_other


# delete the current objective function
mod.del_component(mod.obj)

# create a new objective function
expr = sum(get_fixed_cost(j) * mod.y[j] for j in range(num_candidates))
expr += sum(weights[i] * pairings[j, i] * mod.x[j, i] for (j, i) in pairings.keys())
mod.obj = pyo.Objective(expr=expr, sense=pyo.minimize)

In [None]:
# resolve the model
results = opt.solve(mod, tee=True)

In [None]:
# plot the solution
open_facilities = [j for j in range(num_candidates) if pyo.value(mod.y[j]) > 0.5]
assignments = [(j, i) for (j, i) in pairings if pyo.value(mod.x[j, i]) > 0.5]

# print the number of open facilities
print("There are", len(open_facilities), "open facilities.")

In [None]:
# visualize the solution

open_facility_x = facility_locs[open_facilities][:, 0]
open_facility_y = facility_locs[open_facilities][:, 1]

plt.figure(figsize=(8, 8))
plt.scatter(customers_x, customers_y, c="g", alpha=0.15, s=0.4)
plt.scatter(centers_x, centers_y, c="b", marker="^", alpha=0.8, s=12, label="Centroids")
plt.scatter(facilities_x, facilities_y, c="r", marker="s", alpha=0.6, s=20)
plt.scatter(
    open_facility_x, open_facility_y, c="black", marker="s", alpha=0.9, s=22, label="Open facility"
)

for j, i in assignments:
    plt.plot([facilities_x[j], centers_x[i]], [facilities_y[j], centers_y[i]], c="black", alpha=0.5)

# draw a square around the center with length 2l
square_x = [-l, l, l, -l, -l]
square_y = [-l, -l, l, l, -l]

plt.plot(square_x, square_y, c="orange", alpha=0.75, linestyle="--", label="Downtown area")

plt.xlabel("X Coordinate")
plt.ylabel("Y Coordinate")
plt.title("The Solution")
plt.legend()
plt.show()