## Biostat/Biomath M257 Homework 5
### Due May 26 @ 11:59PM'
#### Sisi Shao

System information (for reproducibility):

In [1]:
versioninfo()

Julia Version 1.8.5
Commit 17cfb8e65ea (2023-01-08 06:45 UTC)
Platform Info:
  OS: macOS (arm64-apple-darwin21.5.0)
  CPU: 10 × Apple M1 Max
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, apple-m1)
  Threads: 1 on 8 virtual cores


Load packages:

In [2]:
using Pkg

Pkg.activate(pwd())
Pkg.instantiate()
Pkg.status()

[32m[1m  Activating[22m[39m project at `~/Downloads/SSS/Courses/Bio257_23Spring/HW/hw5`


[32m[1mStatus[22m[39m `~/Downloads/SSS/Courses/Bio257_23Spring/HW/hw5/Project.toml`
 [90m [1e616198] [39mCOSMO v0.8.7
 [90m [61c947e1] [39mClarabel v0.5.0
 [90m [f65535da] [39mConvex v0.15.3
 [90m [a93c6f00] [39mDataFrames v1.5.0
 [90m [60bf3e95] [39mGLPK v1.1.2
 [90m [2e9cd046] [39mGurobi v1.0.1
 [90m [87dc4568] [39mHiGHS v1.5.1
 [90m [b99e6be6] [39mHypatia v0.7.3
 [90m [4076af6c] [39mJuMP v1.11.1
 [90m [67920dd8] [39mKNITRO v0.13.2
 [90m [b8f27783] [39mMathOptInterface v1.16.0
 [90m [1ec41992] [39mMosekTools v0.15.0
 [90m [2f354839] [39mPajarito v0.8.2
 [90m [08abe8d2] [39mPrettyTables v2.2.4
 [90m [c946c3f1] [39mSCS v1.1.4
 [90m [3eaba693] [39mStatsModels v0.7.2


In this exercise, we practice using disciplined convex programming (SDP in particular) to solve optimal design problems.

## Introduction to optimal design

Consider a linear model
\begin{eqnarray*}
	y_i = \mathbf{x}_i^T \boldsymbol{\beta} + \epsilon_i, \quad i = 1,\ldots, n,
\end{eqnarray*}
where $\epsilon_i$ are independent Gaussian noises with common variance $\sigma^2$. It is well known that the least squares estimate $\hat{\boldsymbol{\beta}}$ is unbiased and has covariance $\sigma^2 (\sum_{i=1}^n \mathbf{x}_i \mathbf{x}_i^T)^{-1}$. 

In **exact optimal design**, given total number of $n$ allowable experiments, we want to choose among a list of $m$ candidate design points $\{\mathbf{x}_1, \ldots, \mathbf{x}_m\}$ such that the covariance matrix is minimized in some sense. In mathematical terms, we want to find an integer vector $\mathbf{n} = (n_1, \ldots, n_m)$ such that $n_i \ge 0$, $\sum_{i=1}^m n_i = n$, and the matrix $\mathbf{V} = \left( \sum_{i=1}^m n_i \mathbf{x}_i \mathbf{x}_i^T \right)^{-1}$ is "small".

In **approximate optimal design**,  we want to find a probability vector $\mathbf{p} = (p_1, \ldots, p_m)$ such that $p_i \ge 0$, $\sum_{i=1}^m p_i = 1$, and the matrix $\mathbf{V} = \left( \sum_{i=1}^m p_i \mathbf{x}_i \mathbf{x}_i^T \right)^{-1}$ is "small".

Commonly used optimal design criteria include:

- In **$D$-optimal design**, we minimize the determinant of $\mathbf{V}$
\begin{eqnarray*}
	&\text{minimize}& \det \left( \sum_{i=1}^m p_i \mathbf{x}_i \mathbf{x}_i^T \right)^{-1} \\
	&\text{subject to}& p_i \ge 0, \sum_{i=1}^m p_i = 1.
\end{eqnarray*}

- In **$E$-optimal design**, we minimize the spectral norm, i.e., the maximum eigenvalue of $\mathbf{V}$
\begin{eqnarray*}
	&\text{minimize}& \lambda_{\text{max}} \left( \sum_{i=1}^m p_i \mathbf{x}_i \mathbf{x}_i^T \right)^{-1} \\
	&\text{subject to}& p_i \ge 0, \sum_{i=1}^m p_i = 1.	
\end{eqnarray*}
Statistically we are minimizing the maximum variance of $\sum_{j=1}^p a_j \text{var}(\hat \beta_j)$ over all vectors $\mathbf{a}$ with unit norm.

- In **$A$-optimal design**, we minimize the trace of $\mathbf{V}$
\begin{eqnarray*}
	&\text{minimize}& \text{tr} \left( \sum_{i=1}^m p_i \mathbf{x}_i \mathbf{x}_i^T \right)^{-1} \\
	&\text{subject to}& p_i \ge 0, \sum_{i=1}^m p_i = 1.
\end{eqnarray*}
Statistically we are minimizing the total variance $\sum_{j=1}^p \text{var}(\hat \beta_j)$.

## Q1 (10 pts) 3x4 factorial design

A drug company ask you to help design a two factor clinical trial, in which treatment A has three levels (A1, A2, and A3) and treatment B has four levels (B1, B2, B3, and B4). Drug company also tells you that the treatment combination A3:B4 has undesirable side effects so we ignore this design point. 

Using dummy coding with A1 and B1 as the baseline levels, find the matrix $C$ with each row a unique design point.

## Sol Q1

- Define factors A and B

In [3]:
using Convex, LinearAlgebra, COSMO

fac_A = ["A1", "A2", "A3"]
fac_B = ["B1", "B2", "B3", "B4"]
ref_A = "A1"; ref_B = "B1"
fac_design = [(a, b) for a in fac_A, b in fac_B if !(a == "A3" && b == "B4")]

row  = zeros(Int, length(fac_A) + length(fac_B) - 2)
nrow = length(fac_design)
C = zeros(Int, nrow, length(row));

- Compute 3x4 factorial design

In [4]:
for i in 1:nrow
    # reset row
    row .= zeros(Int, length(fac_A) + length(fac_B) - 2)
    
    # dummy coding for A
    if fac_design[i][1] == ref_A 
        row[1:length(fac_A) - 1] .= 0
    else
        row[findfirst(x -> x == fac_design[i][1], fac_A[2:3])] = 1
    end
    
    # dummy coding for B
    if fac_design[i][2] == ref_B
        row[length(fac_A):end] .= 0
    else
        row[findfirst(x -> x == fac_design[i][2], fac_B[2:4]) + length(fac_A) - 1] = 1
    end
    
    C[i, :] = row
end

# add intercepts
C = [ones(Int, nrow) C]

11×6 Matrix{Int64}:
 1  0  0  0  0  0
 1  1  0  0  0  0
 1  0  1  0  0  0
 1  0  0  1  0  0
 1  1  0  1  0  0
 1  0  1  1  0  0
 1  0  0  0  1  0
 1  1  0  0  1  0
 1  0  1  0  1  0
 1  0  0  0  0  1
 1  1  0  0  0  1

## Q2 (30 pts) Find approximate optimal designs

Using semidefinite programming (SDP) software to find the approximate D-, E-, and A-optimal designs for this clinical trial.

Hint: This is what I got, which may or may not be correct.

```
Approximate Optimal Design
┌───────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┐
│ design_pt │   D_opt │   E_opt │   A_opt │ D_opt_n │ E_opt_n │ A_opt_n │
│    String │ Float64 │ Float64 │ Float64 │ Float64 │ Float64 │ Float64 │
├───────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│      A1B1 │   0.082 │   0.272 │   0.200 │       8 │      27 │      20 │
│      A2B1 │   0.082 │   0.152 │   0.101 │       8 │      15 │      10 │
│      A3B1 │   0.097 │   0.114 │   0.104 │      10 │      11 │      10 │
│      A1B2 │   0.082 │   0.057 │   0.086 │       8 │       6 │       9 │
│      A2B2 │   0.082 │   0.039 │   0.051 │       8 │       4 │       5 │
│      A3B2 │   0.097 │   0.057 │   0.068 │      10 │       6 │       7 │
│      A1B3 │   0.082 │   0.057 │   0.086 │       8 │       6 │       9 │
│      A2B3 │   0.082 │   0.039 │   0.051 │       8 │       4 │       5 │
│      A3B3 │   0.097 │   0.057 │   0.068 │      10 │       6 │       7 │
│      A1B4 │   0.109 │   0.081 │   0.106 │      11 │       8 │      11 │
│      A2B4 │   0.109 │   0.073 │   0.080 │      11 │       7 │       8 │
└───────────┴─────────┴─────────┴─────────┴─────────┴─────────┴─────────┘
```

## Sol Q2

### D-optimality

In [5]:
# define the optimization problem

# Design Measure
p̂ = Variable(length(fac_design))   

# Define D-optimal Design
D_optim = maximize(logdet(C' * diagm(p̂) * C)) 
D_optim.constraints += sum(p̂) == 1 
D_optim.constraints += p̂ >= 0;     

In [6]:
solver = COSMO.Optimizer()
solve!(D_optim, solver)

------------------------------------------------------------------
          COSMO v0.8.7 - A Quadratic Objective Conic Solver
                         Michael Garstka
                University of Oxford, 2017 - 2022
------------------------------------------------------------------

Problem:  x ∈ R^{93},
          constraints: A ∈ R^{262x93} (283 nnz),
          matrix size to factor: 355x355,
          Floating-point precision: Float64
Sets:     ZeroSet of dim: 134
          DensePsdConeTriangle of dim: 78 (12x12)
          PsdConeTriangle of dim: 15 (5x5)
          Nonnegatives of dim: 11
          PsdConeTriangle of dim: 6 (3x3)
          ... and 6 more
Decomp:   Num of original PSD cones: 2
          Num decomposable PSD cones: 1
          Num PSD cones after decomposition: 3
          Merge Strategy: CliqueGraphMerge
Settings: ϵ_abs = 1.0e-05, ϵ_rel = 1.0e-05,
          ϵ_prim_inf = 1.0e-04, ϵ_dual_inf = 1.0e-04,
          ρ = 0.1, σ = 1e-06, α = 1.6,
          max_iter = 5000,


In [7]:
@show D_optim.status
p̂.value

D_optim.status = MathOptInterface.OPTIMAL


11×1 Matrix{Float64}:
 0.08198367711225783
 0.0819519977897672
 0.09679368618602704
 0.08195112396292145
 0.0819656232427522
 0.09682788824260888
 0.08193513776281283
 0.0819376761598751
 0.09680687898486838
 0.1089113098093264
 0.10893491129540801

### E-opitmality

In [8]:
# Define E-optimal design
E_optim = maximize(eigmin(C' * diagm(p̂) * C))   
E_optim.constraints += sum(p̂) == 1 
E_optim.constraints += p̂ >= 0;     

In [9]:
solver = COSMO.Optimizer();
solve!(E_optim, solver);

@show E_optim.status
p̂.value

------------------------------------------------------------------
          COSMO v0.8.7 - A Quadratic Objective Conic Solver
                         Michael Garstka
                University of Oxford, 2017 - 2022
------------------------------------------------------------------

Problem:  x ∈ R^{16},
          constraints: A ∈ R^{49x16} (82 nnz),
          matrix size to factor: 65x65,
          Floating-point precision: Float64
Sets:     ZeroSet of dim: 17
          PsdConeTriangle of dim: 15 (5x5)
          Nonnegatives of dim: 11
          PsdConeTriangle of dim: 6 (3x3)
Decomp:   Num of original PSD cones: 1
          Num decomposable PSD cones: 1
          Num PSD cones after decomposition: 2
          Merge Strategy: CliqueGraphMerge
Settings: ϵ_abs = 1.0e-05, ϵ_rel = 1.0e-05,
          ϵ_prim_inf = 1.0e-04, ϵ_dual_inf = 1.0e-04,
          ρ = 0.1, σ = 1e-06, α = 1.6,
          max_iter = 5000,
          scaling iter = 10 (on),
          check termination every 25 iter,
   

11×1 Matrix{Float64}:
 0.3104287228983093
 0.11303832307063377
 0.1149944925719544
 0.0824744597181178
 0.013874447817003251
 0.057497246293577824
 0.0824744597181007
 0.013874447817011186
 0.05749724629358936
 0.06855672612885096
 0.08528942769691288

## A-optimality

In [10]:
# Define A-optimal design
p̂ = Variable(length(fac_design))  
r = size(C, 2) # number of columns of C
Y = Variable(r, r) # Shur complement

# Formulating the problem
A_optim = minimize(tr(Y))
A_optim.constraints += ([C' * diagm(p̂) * C I(r); I(r) Y] ⪰ 0)
A_optim.constraints += sum(p̂) == 1 
A_optim.constraints += p̂ >= 0 

# Solve the problem
solver = COSMO.Optimizer()
solve!(A_optim, solver);

# Show the A-optimal design
@show A_optim.status
p̂.value

------------------------------------------------------------------
          COSMO v0.8.7 - A Quadratic Objective Conic Solver
                         Michael Garstka
                University of Oxford, 2017 - 2022
------------------------------------------------------------------

Problem:  x ∈ R^{89},
          constraints: A ∈ R^{176x89} (208 nnz),
          matrix size to factor: 265x265,
          Floating-point precision: Float64
Sets:     ZeroSet of dim: 68
          PsdConeTriangle of dim: 36 (8x8)
          PsdConeTriangle of dim: 21 (6x6)
          PsdConeTriangle of dim: 15 (5x5)
          PsdConeTriangle of dim: 15 (5x5)
          ... and 2 more
Decomp:   Num of original PSD cones: 1
          Num decomposable PSD cones: 1
          Num PSD cones after decomposition: 5
          Merge Strategy: CliqueGraphMerge
Settings: ϵ_abs = 1.0e-05, ϵ_rel = 1.0e-05,
          ϵ_prim_inf = 1.0e-04, ϵ_dual_inf = 1.0e-04,
          ρ = 0.1, σ = 1e-06, α = 1.6,
          max_iter = 5000

11×1 Matrix{Float64}:
 0.19990413646111832
 0.10053100947958737
 0.10416514732446445
 0.08637869473209123
 0.05084998157162025
 0.06773931394443254
 0.08637596837979428
 0.05085019442857968
 0.06773908643899029
 0.10556850270663867
 0.07989796020701016

## Q3 (30 pts) Find exact optimal designs

Using mixed-integer semidefinite programming (SDP) software to find the exact D-, E-, and A-optimal designs for this clinical trial **with $n=100$**.

Hint: This is what I got. Apparently I haven't got the E-optimal design right yet.

```
Exact Optimal Design
┌───────────┬───────┬───────┬───────┐
│ design_pt │ D_opt │ E_opt │ A_opt │
│    String │ Int64 │ Int64 │ Int64 │
├───────────┼───────┼───────┼───────┤
│      A1B1 │     8 │    90 │    20 │
│      A2B1 │     8 │     1 │    10 │
│      A3B1 │    10 │     1 │    10 │
│      A1B2 │     8 │     1 │     9 │
│      A2B2 │     8 │     1 │     5 │
│      A3B2 │    10 │     1 │     7 │
│      A1B3 │     8 │     1 │     9 │
│      A2B3 │     8 │     1 │     5 │
│      A3B3 │    10 │     1 │     7 │
│      A1B4 │    11 │     1 │    10 │
│      A2B4 │    11 │     1 │     8 │
└───────────┴───────┴───────┴───────┘
```

## Sol Q3

### D-optimality

In [11]:
using Pajarito, MathOptInterface, JuMP, Gurobi, MosekTools

# Optimization problem
n̂ = Variable(length(fac_design), IntVar)

# D-optimality (exact)
D_exact = maximize(logdet(C' * diagm(n̂) * C)) 
D_exact.constraints += sum(n̂) == 100
D_exact.constraints += n̂ >= 0;      

solver = Pajarito.Optimizer()
MOI = MathOptInterface
MOI.set(solver, MOI.RawOptimizerAttribute("oa_solver"), 
        optimizer_with_attributes(Gurobi.Optimizer, MOI.Silent() => true))
MOI.set(solver, MOI.RawOptimizerAttribute("conic_solver"), 
        optimizer_with_attributes(Mosek.Optimizer, MOI.Silent() => true))
MOI.set(solver, MOI.RawOptimizerAttribute("verbose"), false)

solve!(D_exact, solver)
n̂.value

Set parameter Username
Academic license - for non-commercial use only - expires 2024-05-26


[33m[1m└ [22m[39m[90m@ Pajarito ~/.julia/packages/Pajarito/gSNvz/src/algorithms.jl:396[39m
[33m[1m└ [22m[39m[90m@ Pajarito ~/.julia/packages/Pajarito/gSNvz/src/algorithms.jl:396[39m
[33m[1m└ [22m[39m[90m@ Pajarito ~/.julia/packages/Pajarito/gSNvz/src/algorithms.jl:396[39m
[33m[1m└ [22m[39m[90m@ Pajarito ~/.julia/packages/Pajarito/gSNvz/src/algorithms.jl:396[39m
[33m[1m└ [22m[39m[90m@ Pajarito ~/.julia/packages/Pajarito/gSNvz/src/algorithms.jl:441[39m
[33m[1m└ [22m[39m[90m@ Pajarito ~/.julia/packages/Pajarito/gSNvz/src/algorithms.jl:396[39m
[33m[1m└ [22m[39m[90m@ Pajarito ~/.julia/packages/Pajarito/gSNvz/src/algorithms.jl:396[39m
[33m[1m└ [22m[39m[90m@ Pajarito ~/.julia/packages/Pajarito/gSNvz/src/algorithms.jl:396[39m
[33m[1m└ [22m[39m[90m@ Pajarito ~/.julia/packages/Pajarito/gSNvz/src/algorithms.jl:441[39m


11×1 Matrix{Float64}:
  8.0
  8.0
 10.0
  8.0
  8.0
 10.0
  8.0
  8.0
 10.0
 11.0
 11.0

## E-optimality

In [12]:
# E-optimality
E_exact = maximize(eigmin(C' * diagm(n̂) * C))
# problem constraints
E_exact.constraints += sum(n̂) == 100
E_exact.constraints += n̂ >= 0

# Solve for it!a
solver = Pajarito.Optimizer()
MOI = MathOptInterface
MOI.set(solver, MOI.RawOptimizerAttribute("oa_solver"), 
        optimizer_with_attributes(Gurobi.Optimizer, MOI.Silent() => true))
MOI.set(solver, MOI.RawOptimizerAttribute("conic_solver"), 
        optimizer_with_attributes(Mosek.Optimizer, MOI.Silent() => true))
MOI.set(solver, MOI.RawOptimizerAttribute("verbose"), false)

solve!(E_exact, solver)
n̂.value

Set parameter Username
Academic license - for non-commercial use only - expires 2024-05-26
new incumbent
new incumbent
new incumbent
new incumbent


11×1 Matrix{Float64}:
 34.0
 10.0
 10.0
  8.0
  4.0
  4.0
  9.0
 -0.0
  6.0
  9.0
  6.0

- The answer seems to be correct.

## A-optimality

In [13]:
# A-optimality

r = size(C, 2)
Y = Variable(r, r) 
A_exact = minimize(tr(Y)) 
# problem constraints
A_exact.constraints += ([C' * diagm(n̂) * C I(r); I(r) Y] ⪰ 0)
A_exact.constraints += sum(n̂) == 100 
A_exact.constraints += n̂ >= 0;

In [14]:
solver = Pajarito.Optimizer()
MOI = MathOptInterface
MOI.set(solver, MOI.RawOptimizerAttribute("oa_solver"), 
        optimizer_with_attributes(Gurobi.Optimizer, MOI.Silent() => true))
MOI.set(solver, MOI.RawOptimizerAttribute("conic_solver"), 
        optimizer_with_attributes(Mosek.Optimizer, MOI.Silent() => true))
MOI.set(solver, MOI.RawOptimizerAttribute("verbose"), false)

solve!(A_exact, solver)
n̂.value

Set parameter Username
Academic license - for non-commercial use only - expires 2024-05-26


[33m[1m└ [22m[39m[90m@ Pajarito ~/.julia/packages/Pajarito/gSNvz/src/algorithms.jl:396[39m
[33m[1m└ [22m[39m[90m@ Pajarito ~/.julia/packages/Pajarito/gSNvz/src/algorithms.jl:396[39m


norm of dual is 1.4714835850808552e16


[33m[1m└ [22m[39m[90m@ Pajarito ~/.julia/packages/Pajarito/gSNvz/src/algorithms.jl:396[39m
[33m[1m└ [22m[39m[90m@ Pajarito ~/.julia/packages/Pajarito/gSNvz/src/algorithms.jl:396[39m
[33m[1m└ [22m[39m[90m@ Pajarito ~/.julia/packages/Pajarito/gSNvz/src/algorithms.jl:396[39m
[33m[1m└ [22m[39m[90m@ Pajarito ~/.julia/packages/Pajarito/gSNvz/src/algorithms.jl:396[39m
[33m[1m└ [22m[39m[90m@ Pajarito ~/.julia/packages/Pajarito/gSNvz/src/algorithms.jl:396[39m
[33m[1m└ [22m[39m[90m@ Pajarito ~/.julia/packages/Pajarito/gSNvz/src/algorithms.jl:396[39m
[33m[1m└ [22m[39m[90m@ Pajarito ~/.julia/packages/Pajarito/gSNvz/src/algorithms.jl:396[39m
[33m[1m└ [22m[39m[90m@ Pajarito ~/.julia/packages/Pajarito/gSNvz/src/algorithms.jl:396[39m
[33m[1m└ [22m[39m[90m@ Pajarito ~/.julia/packages/Pajarito/gSNvz/src/algorithms.jl:396[39m
[33m[1m└ [22m[39m[90m@ Pajarito ~/.julia/packages/Pajarito/gSNvz/src/algorithms.jl:396[39m
[33m[1m└ [22m[39m[90m@ Pa

norm of dual is 1.2138997659078474e16


[33m[1m└ [22m[39m[90m@ Pajarito ~/.julia/packages/Pajarito/gSNvz/src/algorithms.jl:396[39m
[33m[1m└ [22m[39m[90m@ Pajarito ~/.julia/packages/Pajarito/gSNvz/src/algorithms.jl:396[39m
[33m[1m└ [22m[39m[90m@ Pajarito ~/.julia/packages/Pajarito/gSNvz/src/algorithms.jl:396[39m
[33m[1m└ [22m[39m[90m@ Pajarito ~/.julia/packages/Pajarito/gSNvz/src/algorithms.jl:396[39m
[33m[1m└ [22m[39m[90m@ Pajarito ~/.julia/packages/Pajarito/gSNvz/src/algorithms.jl:396[39m


new incumbent


11×1 Matrix{Float64}:
 20.0
 10.0
 10.0
  8.0
  5.0
  7.0
  9.0
  5.0
  7.0
 11.0
  8.0

- The answer seems to be correct.

## Q4 (30 bonus points) Optimal design with nuisance parameters

Suppose the regression coefficients of linear model $\boldsymbol{\beta}$ is partitioned as $\boldsymbol{\beta} = (\boldsymbol{\beta}_0^T, \boldsymbol{\beta}_1^T)^T$, where $\boldsymbol{\beta}_0$ are nuisance parameters and $\boldsymbol{\beta}_1$ are parameters of primary interest. Given an approximate design $\mathbf{p} = (p_1, \ldots, p_m)$, let the information matrix be partitioned accordingly
$$
\mathbf{I}(\mathbf{p}) = \sum_{i=1}^m p_i \mathbf{x}_i \mathbf{x}_i^T =  \begin{pmatrix}
\mathbf{I}_{00} & \mathbf{I}_{01} \\
\mathbf{I}_{10} & \mathbf{I}_{11}
\end{pmatrix}.
$$
Then the information matrix for $\boldsymbol{\beta}_1$ adjusted for nuisance parameter $\boldsymbol{\beta}_0$ is
$$
\mathbf{I}_{1 \mid 0}(\mathbf{p}) = \mathbf{I}_{11} - \mathbf{I}_{10} \mathbf{I}_{00}^{-1} \mathbf{I}_{01}.
$$

Revisiting the 3x4 factorial design problem in Q1, suppose the drug company only cares about the estimation of A treatment effects. Find the approximate D-, E-, and A-optimal designs.