# Robust Optimization for Genetic Selection

This notebook provides a didactic for using Python to solve the multi-objective optimization problem which arises in the context of robust genetic selection. It has been tested with Python 3.10 specifically and there are some standard packages which this depends on, imported below.

In [1]:
import numpy as np                  # defines matrix structures
from qpsolvers import solve_qp      # used for quadratic optimization
import gurobipy as gp               # Gurobi optimization interface (1)
from gurobipy import GRB            # Gurobi optimization interface (2)

Utility functions and output settings used in this notebook are defined in the two cells below.

In [2]:
# want to round rather than truncate when printing
np.set_printoptions(threshold=np.inf)

# only show numpy output to five decimal places
np.set_printoptions(formatter={'float_kind':"{:.5f}".format})

## Real (Simulated) Data

Now we've looked at how these problems can be approached with Gurobi, we can try an example with realistic simulated data. Here we have data for a cohort of 50 candidates split across three data files, each containing space-separated values.

1. In `A50.txt` the matrix $\Sigma$ is described, with columns for $i$, $j$, and $a_{ij}$, where only the upper triangle is described since the matrix is symmetric.
2. In `EBV50.txt` the vector $\bar{\mu}$ is described, with only one column containing the posterior mean over 1000 samples.
3. In `S50.txt` the matrix $\Omega$ is described, with columns for $i$, $j$, and $\sigma_{ij}$, where only the upper triangle is described since the matrix is symmetric.

Note that this particular problem does not contain a separate file for sex data; odd indexed candidates are sires and even indexed candidates are dams. For now, we also don't have a $l$ or $u$ bounding the possible weights.


The first step then is clearly going to be loading the matrices in particular from file. We create the following function to do so.


In [3]:
def load_problem(A_filename, E_filename, S_filename, dimension=False):
    """
    Used to load genetic selection problems into NumPy. It takes three
    string inputs for filenames where Sigma, Mu, and Omega are stored,
    as well as an optional integer input for problem dimension if this
    is known. If it's know know, it's worked out based on E_filename.

    As output, it returns (A, E, S, n), where A and S are n-by-n NumPy
    arrays, E is a length n NumPy array, and n is an integer.
    """

    def load_symmetric_matrix(filename, dimension):
        """
        Since NumPy doesn't have a stock way to load matrices
        stored in coordinate format format, this adds one.
        """

        matrix = np.zeros([dimension, dimension])

        with open(filename, 'r') as file:
            for line in file:
                i, j, entry = line.split(" ")
                # data files indexed from 1, not 0
                matrix[int(i)-1, int(j)-1] = entry
                matrix[int(j)-1, int(i)-1] = entry

        return matrix


    # if dimension wasn't supplied, need to find that
    if not dimension:
        # get dimension from EBV, since it's the smallest file
        with open(E_filename, 'r') as file:
            dimension = sum(1 for _ in file)

    # EBV isn't in coordinate format so can be loaded directly
    E = np.loadtxt(E_filename)  
    # A and S are stored by coordinates so need special loader
    A = load_symmetric_matrix(A_filename, dimension)
    S = load_symmetric_matrix(S_filename, dimension)

    return A, E, S, dimension

For the given example problem we can now solve it with Gurobi using the exact same methods as before, for both the standard genetic selection problem and the robust version of the problem. The following cell does both alongside each other to accentuate the differences.

In [4]:
sigma, mubar, omega, n = load_problem(
    "../Example/A50.txt",
    "../Example/EBV50.txt",
    "../Example/S50.txt",
    50)

lam = 0.5
kappa = 2

# define the M so that column i is [1;0] if i is a sire (so even) and [0;1] otherwise 
M = np.zeros((2, n))
M[0, range(0,50,2)] = 1
M[1, range(1,50,2)] = 1
# define the right hand side of the constraint Mx = m
m = np.array([[0.5], [0.5]])

# create models for standard and robust genetic selection
model_std = gp.Model("n50standard")
model_rbs = gp.Model("n50robust")

# initialise w for both models, z for robust model
w_std = model_std.addMVar(shape=n, vtype=GRB.CONTINUOUS, name="w") 
w_rbs = model_rbs.addMVar(shape=n, vtype=GRB.CONTINUOUS, name="w")
z_rbs = model_rbs.addVar(name="z")

# define the objective functions for both models
model_std.setObjective(
    0.5*w_std@(sigma@w_std) - lam*w_std.transpose()@mubar,
GRB.MINIMIZE)

model_rbs.setObjective(
    # Gurobi does offer a way to set one objective in terms of another, i.e.
    # we could use `model_std.getObjective() - lam*kappa*z_rbs` to define this
    # robust objective, but it results in a significant slowdown in code.
    0.5*w_rbs@(sigma@w_rbs) - lam*w_rbs.transpose()@mubar - lam*kappa*z_rbs,
GRB.MINIMIZE)

# add sum-to-half constraints to both models
model_std.addConstr(M @ w_std == m, name="sum-to-half")
model_rbs.addConstr(M @ w_std == m, name="sum-to-half")

# add quadratic uncertainty constraint to the robust model
model_rbs.addConstr(z_rbs**2 <= np.inner(w_rbs, omega@w_rbs), name="uncertainty")
model_rbs.addConstr(z_rbs >= 0, name="z positive")

# since working with non-trivial size, set a time limit
time_limit = 60*5  # 5 minutes
model_std.setParam(GRB.Param.TimeLimit, time_limit)
model_std.setParam(GRB.Param.TimeLimit, time_limit)

# for the same reason, also set a duality gap tolerance
duality_gap = 0.009
model_std.setParam('MIPGap', duality_gap)
model_rbs.setParam('MIPGap', duality_gap)

# solve both problems with Gurobi
model_std.optimize()
model_rbs.optimize()

# HACK code which prints the results for comparison in a nice format
print("\nSIRE WEIGHTS\t\t\t DAM WEIGHTS")
print("-"*20 + "\t\t " + "-"*20)
print(" i   w_std    w_rbs\t\t  i   w_std    w_rbs")
for candidate in range(25):
    print(f"{candidate*2:02d}  {w_std.X[candidate*2]:.5f}  {w_rbs.X[candidate*2]:.5f} \
            {candidate*2+1:02d}  {w_std.X[candidate*2+1]:.5f}  {w_rbs.X[candidate*2+1]:.5f}")

Set parameter Username
Academic license - for non-commercial use only - expires 2025-02-26
Set parameter TimeLimit to value 300
Set parameter MIPGap to value 0.009
Set parameter MIPGap to value 0.009
Gurobi Optimizer version 11.0.0 build v11.0.0rc2 (linux64 - "Ubuntu 22.04.4 LTS")

CPU model: Intel(R) Core(TM) i5-8350U CPU @ 1.70GHz, instruction set [SSE2|AVX|AVX2]
Thread count: 4 physical cores, 8 logical processors, using up to 8 threads

Optimize a model with 4 rows, 50 columns and 100 nonzeros
Model fingerprint: 0x534ff85e
Model has 1275 quadratic objective terms
Coefficient statistics:
  Matrix range     [1e+00, 1e+00]
  Objective range  [7e-01, 1e+00]
  QObjective range [5e-02, 1e+00]
  Bounds range     [0e+00, 0e+00]
  RHS range        [5e-01, 5e-01]
Presolve removed 2 rows and 0 columns
Presolve time: 0.01s
Presolved: 2 rows, 50 columns, 50 nonzeros
Presolved model has 1275 quadratic objective terms
Ordering time: 0.00s

Barrier statistics:
 Free vars  : 49
 AA' NZ     : 1.274e