# Applying the optimization to any number fo n features

The Iris data set is a well known small sample of petal and sepal measurements, that allow the distinction of three different varieties. As such, it fulfills the requirements of being reasonably small with a very small k for this algorithm.

In [1]:
import gurobipy as gp
from gurobipy import GRB
import matplotlib.pyplot as plt
from sklearn.metrics import DistanceMetric
import pandas as pd # feels slightly more flexible than np for parsing matrices

## Reading the data, defining the number of features and clusters

In [2]:
def read_data(filename):
    points = pd.read_csv(filename)
    points = points.iloc[:, :-1] # last column is species name
    return points[:-1]

In [3]:
filename = "data_sets/iris.data"
points = read_data(filename)
k = 3
n = len(points.columns)

## Computing the LP

In [None]:
# distance function
dist = DistanceMetric.get_metric('euclidean')

# we need all distances as possible radii
radii = dist.pairwise(points)

# model
m = gp.Model("kmsr")

# variables
R = m.addVars(len(radii), vtype=GRB.CONTINUOUS, lb=0, name="R")
L = m.addVars(len(radii), vtype=GRB.BINARY, name="L")
Y = m.addVars(len(radii), len(radii), vtype=GRB.BINARY, name="Y")

# objective function: minimize sum of r in R_i of i in L: y_{i,r}*r
m.setObjective(gp.quicksum(R[i] for i in range(len(radii))), GRB.MINIMIZE)

# constraints:
# every point covered
for j in range(len(radii)):
    m.addConstr(gp.quicksum(Y[i, j] for i in range(len(radii))) >= 1, f"coverage_{j}")

# covered point has to be within radius (easier to check in two steps than in the first constraint)
for i in range(len(radii)):
    for j in range(len(radii)):
        m.addConstr(radii[i][j] * Y[i, j] <= R[i], f"within_radius_{i}_{j}")

# if a point is covered by a center, that center must be open in this iteration
for i in range(len(radii)):
    for j in range(len(radii)):
        m.addConstr(Y[i, j] <= L[i], f"open_center_{i}_{j}")

# there can only be k centers open at a time
m.addConstr(gp.quicksum(L[i] for i in range(len(radii))) == k, "select_k_Centers")

# run gurobi optimizer
m.optimize()

final_centers = [points.iloc[i] for i in range(len(radii)) if L[i].x == 1]
final_radii = [R[i].x for i in range(len(radii)) if L[i].x == 1]

print("Optimal centers:")
for i in range(len(final_centers)): 
    print(f"Center at point {final_centers[i]} with radius {final_radii[i]}")

Set parameter Username
Set parameter LicenseID to value 2629995
Academic license - for non-commercial use only - expires 2026-03-01
Gurobi Optimizer version 12.0.1 build v12.0.1rc0 (win64 - Windows 11.0 (26100.2))

CPU model: AMD Ryzen 7 5800X3D 8-Core Processor, instruction set [SSE2|AVX|AVX2]
Thread count: 8 physical cores, 16 logical processors, using up to 16 threads

Optimize a model with 43957 rows, 22200 columns and 109512 nonzeros
Model fingerprint: 0x9c558e9a
Variable types: 148 continuous, 22052 integer (22052 binary)
Coefficient statistics:
  Matrix range     [1e-01, 7e+00]
  Objective range  [1e+00, 1e+00]
  Bounds range     [1e+00, 1e+00]
  RHS range        [1e+00, 3e+00]
Presolve removed 1187 rows and 592 columns
Presolve time: 0.30s
Presolved: 42770 rows, 21608 columns, 106856 nonzeros
Variable types: 148 continuous, 21460 integer (21460 binary)
Found heuristic solution: objective 6.5901442
Found heuristic solution: objective 6.5314623
Found heuristic solution: objective

KeyError: 12