## A Numerical Solver for the Dual Variance Problem

### Review of the Problem

This Jupyter notebook accompanies Section 3.6 of my PhD Thesis, and provides a method to explicitly compute the dual supremum. 
In particular, we use the formulas in Proposition 3.20 and Corollary 3.21 to reduce this computation to a finite-dimensional optimization problem over $(\lambda,\eta)\in\R^2\times (-\infty,0).$

First, import some libraries:

In [123]:
import matplotlib.pyplot as plt
import numpy as np
import scipy.optimize as opt
import warnings
warnings.filterwarnings("ignore", category=np.ComplexWarning)

Next, we implement Equation (3.15) in Proposition 3.20 of my thesis which, given $x\in [0,\infty)\times (0,\infty)$ and $(\lambda,\eta)\in \R^2\times (-\infty,0)$, allows us to compute $$Y(x;\lambda,\eta)\in \operatorname{argmin}_{y\in[0,\infty)^2} c(x,y)-\varphi(x)-\lambda\cdot y-\eta|y-m|^2.$$ 

In [124]:
# Start by defining a few auxiliary functions
def func_A(x2: float, eta : float) -> float:
    return - 4 *  x2 * eta

def func_B1(x1 : float, x2: float, lda1 : float, eta : float, m1: float) -> float:
    return 2 * x1 + 2 * x2 * lda1 - 4 * x2 * eta * m1

def func_B2(x1 : float, x2: float, lda2 : float, eta : float,  m2 : float) -> float:
    return x1**2 - 2 * x2 * lda2 + 4 * x2 * eta * m2

def func_C(x1 : float, lda1 : float, lda2 : float, eta : float, m1: float, m2 : float) -> float:
    return (2 * eta * (m1 * x1 + m2) - (lda1 * x1 +lda2))/(2* eta * (x1**2 +1))

# Next, define the cost function
def func_c(x1: float, x2: float, y1: float, y2: float) -> float:
    if (x2 > 0) and (y2 > 0):
        return (y2/(2*x2))*(x1-y1/y2)**2
    elif x1 * y2 - y1 == 0:
        return 0
    else: 
        return float('inf') 

# Define the largest solution of $g(u)=u^3+(2-B_2)u-B_1=0$
def func_u0(B1: float, B2: float) -> float:
    return np.max(np.roots([1, 0, 2-B2, -B1]))

# Compute $Y$ as in Proposition 3.20.
def func_Y(x1 : float, x2: float, lda1 : float, lda2 : float, eta : float, m1: float, m2 : float) -> np.ndarray:
    A = func_A(x2,eta)
    B1 = func_B1(x1, x2, lda1, eta, m1)
    B2 = func_B2(x1, x2, lda2, eta, m2)
    C = func_C(x1, lda1, lda2, eta, m1, m2)
    u0 = func_u0(B1,B2)
    if (x2 > 0) and (B1 > 0) and (4*B2 < B1**2):
        return ((u0**2 - B2)/A) * np.array([u0, 1])
    elif (x2 > 0) and (B1 <= 0) and (B2 < 0):
        return np.array([0,-B2/A])
    elif (x2 == 0) and (C > 0):
        return C*np.array([x1,1])
    else:
        return np.array([0,0])
    

Now, we define a measure and compute key quantities. We encode discrete probability measures as `numpy` `ndarrays`, where each row has three entries -- the mass, the $x_1$ coordinate, and the $x_2$ coordinate. We also set a mean target and a variance target.

In [125]:
# Define one sample measure for which the dual supremum appears to be attained
mu_denorm_attain = np.array([[1, 1, 1], [1, 2, 1], [1, 1, 2], [1, 2,2]], dtype = 'f')
# Define another sample measure for which the dual supremum appears not to be attained. 
mu_denorm_nonattain = np.array([[1, 2, 4]], dtype = 'f')
# Automatically normalize both measures
mu_norm_attain = mu_denorm_attain
mu_norm_attain[:,0] = mu_norm_attain[:,0]/(np.sum(mu_norm_attain[:,0], axis = 0))
mu_norm_nonattain = mu_denorm_nonattain
mu_norm_nonattain[:,0] = mu_norm_nonattain[:,0]/(np.sum(mu_norm_nonattain[:,0], axis = 0))

# Set mean target
m = np.array([1,2])
# Set variance target
tau = 1.5

This choice of $Y,$ along with the presence of a prescribed mean, allows us to compute an optimal value of $\varphi$ as in Corollary 3.21:


In [126]:
def optimal_phi(x1: float, x2: float, lda1: float, lda2: float, eta: float) -> float:
    y1,y2 = func_Y(x1, x2, lda1, lda2, eta, m[0], m[1])
    return func_c(x1, x2, y1, y2) - lda1 * y1 - lda2 * y2 - eta * ((y1-m[0])**2 + (y2-m[1])**2) 


Next, we define a function to compute the dual objective function 
$$\int \varphi\; d\mu+\lambda\cdot y+\eta\tau.$$

In [127]:
def objective_function(params, measure) -> float:
    lda1, lda2, theta = params
    eta = -np.exp(theta) # technical tweak to get an unbounded domain
    # negative because scipy can only minimize
    return (-sum([measure[i,0] * optimal_phi(measure[i, 1], measure[i, 2], lda1, lda2, eta) for i in range(len(measure))])
            - lda1 * m[0] - lda2 * m[1] - eta * tau )

We will also wish to compute the quantities $$\int Y(\cdot;\lambda,\eta)\; d\mu$$ and $$\int |Y(\cdot;\lambda,\eta)-m|^2\; d\mu$$ to verify if our result actually achieves the mean and variance constraints:

In [128]:
def mu_mean(params, measure) -> np.ndarray:
    lda1, lda2, theta = params
    eta = -np.exp(theta)
    return sum([measure[i,0] * func_Y(measure[i,1], measure[i,2], lda1, lda2, eta, m[0], m[1]) for i in range(len(measure))])

def mu_variance(params, measure) -> np.ndarray:
    lda1, lda2, theta = params
    eta = -np.exp(theta)
    return sum([measure[i,0] * np.linalg.norm(func_Y(measure[i,1], measure[i,2], lda1, lda2, eta, m[0], m[1])-m)**2 for i in range(len(measure))])

Moreover, we compute $$\int c(x, Y(x;\lambda,\eta))d\mu(x)$$ in order to get a sense of the value of the primal infimum.

In [129]:
def primal_cost(params, measure) -> np.ndarray:
    lda1, lda2, theta = params
    eta = -np.exp(theta)
    return sum([measure[i,0] * func_c(measure[i,1], measure[i,2], 
                                      func_Y(measure[i,1], measure[i,2], lda1, lda2, eta, m[0], m[1])[0],
                                      func_Y(measure[i,1], measure[i,2], lda1, lda2, eta, m[0], m[1])[1]) 
                                      for i in range(len(measure))])

Finally, for technical reasons, we also create a wrapper so that we can use SciPy's optimize function. 

In [130]:
def make_objective(measure):
    def wrapped_objective(params):
        return objective_function(params, measure)
    return wrapped_objective

Next, we finally complete the optimization:

In [131]:
optimization_result_attain = opt.minimize(make_objective(mu_norm_attain), [-1,1,-1], method = 'Nelder-Mead')

We recover the optimal choices of $(\lambda,\eta).$ 

In [132]:
optimal_params_attain = optimization_result_attain.x
print("lambda = " +  str(optimal_params_attain[0:2]))
print("eta = " +  str(-np.exp(optimal_params_attain[2])))

lambda = [-0.47282743  0.54774745]
eta = -0.09954221634326921


Next, we verify that the result achieves the target mean and variance, which is the case for `mu_norm_attain`. We also compute the value of the objective function, which in this case should be approximately equal to the dual supremum. 

In [133]:
print("mean: " +  str(mu_mean(optimal_params_attain, mu_norm_attain)))
print("variance: " + str(mu_variance(optimal_params_attain, mu_norm_attain)))
print("dual value: " + str(-objective_function(optimal_params_attain, mu_norm_attain)))
print("primal value: " + str(primal_cost(optimal_params_attain, mu_norm_attain)))

mean: [0.99999678+0.j 2.00002156+0.j]
variance: 1.5000480053980814
dual value: (0.32402810105060625+0j)
primal value: (0.3240366522395013+0j)


On the other hand, `mu_norm_nonattain` does not meet the target mean and variance, indicating non-attainment of the dual supremum (which we have already seen in Proposition 3.3):

In [134]:
optimization_result_nonattain = opt.minimize(make_objective(mu_norm_nonattain), [-1,1,-1], method = 'Nelder-Mead')
optimal_params_nonattain = optimization_result_nonattain.x
optimal_params_nonattain[2] = -np.exp(optimal_params_nonattain[2])
print("lambda = " +  str(optimal_params_nonattain[0:2]))
print("eta = " +  str(optimal_params_nonattain[2]))
print("mean: " +  str(mu_mean(optimal_params_nonattain, mu_norm_nonattain)))
print("variance: " + str(mu_variance(optimal_params_nonattain, mu_norm_nonattain)))
print("dual value: " + str(-objective_function(optimal_params_nonattain, mu_norm_nonattain)))
print("primal value: " + str(primal_cost(optimal_params_nonattain, mu_norm_nonattain)))


lambda = [-0.375    0.46875]
eta = -1.7360330996572665e-16
mean: [1.+0.j 2.+0.j]
variance: 3.836616882424559e-18
dual value: (-0.9374999999999996+0j)
primal value: (0.5624999989324116+0j)


Observe that the optimization algorithm yields $\eta \approx -1.7\times 10^{-16}\approx 0.$ We interpret this to mean that the optimization procedure approaches $\eta=0,$ but discontinuity of the objective function here prevents the existence of a maximum. 