## Using linear programming to solve an optiminzation problem with stochastic dominance contraints

### Nikolai Chow

### Outline

1. Stochastic Dominance
2. An optimization problem with stochastic dominance constriants
3. Example in "Dentcheva, D., & Ruszczynski, A. (2003). Optimization with stochastic dominance constraints. SIAM Journal on Optimization." with python

### What is stochastic dominance? 

Consider $X\in \mathcal{L}^1(\Omega,\mathcal{F},P)$

$$F(\eta)=P[X\leq \eta]$$
$$F_2(\eta)=\int^\eta_{-\infty}F(t)dt$$ for $\eta \in R$ 

- $Y$ is said to be first order stochastic dominated by $X$  ($X\succeq_1 Y$) if

$$F(\eta)\leq G(\eta) \;\text{for all}\; \eta \in R$$


- $Y$ is said to be second order stochastic dominated by $X$  ($X\succeq_2Y$) if

$$F_2(\eta)\leq G_2(\eta) \;\text{for all}\; \eta \in R$$

### Why do we care?

For each $X,Y$:
1. the relation $X\succeq_1 Y$ is equivalent to $E[u(X)]\geq E[u(Y)]$
for all $u$ with $u'\geq 0$


2. the relation $X\succeq_2 Y$ is equivalent to $E[u(X)]\geq E[u(Y)]$
for all $u$ with $u'\geq 0$ and $u''\leq 0$

e.g. For 1.
\begin{aligned}
E[u(X)]- E[u(Y)]&=\int^b_au(x)dF(x)-\int^b_au(x)dG(x) \nonumber \\
&=\int^b_a[G(x)-F(x)]u'(x)dx \nonumber \\
\end{aligned}

### Limitation

Both $\succeq_1$ and $\succeq_2$ are partial order
- Most of the time no real value function could represent the ranking of choices under consideration

### Some applications

Atkinson, A. B. (1970). On the measurement of inequality. Journal of economic theory, 2(3), 244-263.

Cho, Y. H., Linton, O., & Whang, Y. J. (2007). Are there Monday effects in stock returns: A stochastic dominance approach. Journal of Empirical Finance, 14(5), 736-755.

Guerre, E., Perrigne, I., & Vuong, Q. (2009). Nonparametric identification of risk aversion in first‐price auctions under exclusion restrictions. Econometrica, 77(4), 1193-1227.

Maasoumi, E., Millimet, D. L., & Rangaprasad, V. (2005). Class size and educational policy: Who benefits from smaller classes?. Econometric Reviews, 24(4), 333-368.

Fang, Y., & Post, T. (2022). Optimal portfolio choice for higher-order risk averters. Journal of Banking & Finance, 137, 106429.

Dentcheva, D., & Ruszczynski, A. (2003). Optimization with stochastic dominance constraints. SIAM Journal on Optimization, 14(2), 548-566.

#### A portfolio optimization problem

$$\max_{z\in Z}E[\varphi(z,\omega)]$$
- $\omega$ is an event
- $z$ is a decision vector (e.g weights)

If some reference outcome $Y$ is avaliable 


\begin{aligned}
\max \ & f(X)\\
\text{subject to} \ & X\succeq_2 Y\\
&X \in C
\end{aligned}


#### Equivalent condition(e.g Bawa et al.(1985))

$$F_2(\eta)\leq G_2(\eta) \;\text{for all}\; \eta \in R$$

$$\Longleftrightarrow$$

\begin{aligned}
E[(\eta - X)_+]\leq E[(\eta -Y)_+] \ & \ \ \text{for all} \ \eta\in R
\end{aligned}


where $(\eta - X)_+=\max\{\eta - X,0\}$

\begin{aligned}
\max \ & f(X)\\
\text{subject to} \ & E[(\eta - X)_+]\leq E[(\eta -Y)_+] \ & \ \ \text{for all} \ \eta\in [a,b] \\
\ &X \in C
\end{aligned}

\begin{aligned}
\max \ & f(X)\\
\text{subject to} \ & X(\omega)+S(\eta,\omega)\geq \eta &\text{for} \;(\eta,\omega) \in [a,b] \times \Omega \nonumber\\
& E[S(\eta,\omega)]\leq E[(\eta-Y)_+] &\text{for} \;\eta \in [a,b] \nonumber\\
&S(\eta,\omega)\geq 0 &\text{for} \;(\eta,\omega) \in [a,b] \times \Omega \nonumber\\
&X \in C \nonumber\\
\end{aligned}


### Example

Assume
- Y has a discrete distribution with realizations $y_i$, $i =1,...,m$
- Finitely many events so $p_k=P[\{\omega_k\}]$, $k=1,...,m$
- $x_k=X(\omega_k), s_{ik}=S(y_i,\omega_k)$, and $v_i=E[(y_i-Y)_+]$

\begin{aligned}
\max &\; \sum_{k=1}^{m}\sum_{n=1}^{N} p_kr_{nk}z_n\nonumber\\
\text{subject to}&\; \sum_{n=1}^{N} r_{nk}z_n+s_{ik}\geq y_i &i=1,...,m, k=1,..., m\nonumber\\
&\sum_{k=1}^{m} p_k s_{ik}\leq v_i&i=1,...,m\nonumber\\
&s_{ik}\geq0&i=1,...,m, k=1,..., m\nonumber\\
&\sum^N_{n=1} z_n =1\nonumber\\
&z_n \geq 0\nonumber\\
\end{aligned}

In [18]:
import pandas as pd
import numpy as np
import itertools
import matplotlib.pyplot as plt

from scipy.optimize import linprog
from matplotlib.patches import Polygon
from itertools import repeat
%matplotlib inline

In [19]:
## The data
#Number of z is N=8 and s = m*m = 22*22 = 484 is 492
# intialise data of lists.

data = {'Asset 1':[  7.5,   8.4,  6.1,  5.2,  5.5,
                     7.7,  10.9, 12.7, 15.6, 11.7,
                     9.2,  10.3,    8,  6.3,  6.1,
                     7.1,   8.7,    8,  5.7,  3.6,  3.1,  4.5],
        'Asset 2':[ -5.8,     2,  5.6, 17.5,  0.2, 
                    -1.8,  -2.2, -5.3,  0.3, 46.5,
                    -1.5,  15.9, 36.6, 30.9, -7.5,
                     8.6,  21.2,  5.4, 19.3,  7.9, 21.7,-11.1],
        'Asset 3':[-14.8, -26.5, 37.1, 23.6, -7.4,
                     6.4,  18.4, 32.3, -5.1, 21.5,
                    22.4,   6.1, 31.6, 18.6,  5.2,
                    16.5,  31.6, -3.2, 30.4,  7.6,   10,  1.2],
        'Asset 4':[-18.5, -28.4, 38.5, 26.6, -2.6,
                     9.3,  25.6, 33.7, -3.7, 18.7,
                    23.5,     3, 32.6, 16.1,  2.3,
                    17.9,  29.2, -6.2, 34.2,    9, 11.3, -0.1],
        'Asset 5':[-30.2, -33.8, 31.8,   28,  9.3, 
                    14.6,  30.7, 36.7,   -1, 21.3,
                    21.7,  -9.7, 33.3,  8.6, -4.1,
                    16.5,  20.4,  -17, 59.4, 17.4, 16.2, -3.2],
        'Asset 6':[  2.3,   0.2, 12.3, 15.6,    3,  
                     1.2,   2.3,  3.1,  7.3, 31.1,
                       8,    15, 21.3, 15.6,  2.3,
                     7.6,  14.2,  8.3, 16.1,  7.6,   11, -3.5],
        'Asset 7':[-14.9, -23.2, 35.4,  2.5, 18.1, 
                    32.6,   4.8, 22.6, -2.3, -1.9,
                    23.7,   7.4, 56.2, 69.4, 24.6,
                    28.3,  10.5,-23.4, 12.1,-12.2, 32.6,  7.8],
        'Asset 8':[ 67.7,  72.2,  -24,   -4,   20, 
                    29.5,  21.2, 29.6,-31.2,  8.4,
                   -12.8, -17.5,  0.6, 21.6, 24.4,
                   -13.9,  -2.3, -7.8, -4.2, -7.4, 14.6,   -1]}

In [20]:
# Create DataFrame
df = pd.DataFrame(data)

# Number of assets
NumA = len(df.columns)
# Number of states
NumS = len(df)

# List for col index
cols = [f'Asset {i}' for i in range(1, NumA+1)]

#print(cols)

### Solve the problem
res_ex1 = linprog(-c_ex1, A_ub=A_ex1, b_ub=b_ex1, A_eq=A_ex2, b_eq=b_ex2,
                  bounds=bounds_ex1, method='revised simplex')


In [4]:
weight = []

weight.extend(repeat(1/NumA,NumA))

#Target porfolio return in state k
y_k = np.average(df[cols], weights=weight, axis=1)


In [5]:
## Objective function parameters
#Probability of state k
p_k = 1/NumS

#Probabilty of state times total return of assets
#e.g pk(r11+r21+...), pk(r21+r22+...)
pk_S_rnk = p_k * df.sum(axis=0) 

# Put it into list
pk_S_rnk = pk_S_rnk[cols].values.tolist()

#The s_ik variables
s_ik = [0]*NumS*NumS

# Setting the objective for linear programing 
# Adding the list together
c_ex1 = np.array(pk_S_rnk + s_ik)  

# Check the objective
print(c_ex1)

[ 7.81363636  9.29090909 11.97727273 12.36363636 12.13181818  9.17727273
 14.12272727  8.35        0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.
  0.          0. 

In [21]:
## Work on inequality constraints
r_list = df.values.tolist()

#Copy the return for m states
for x in range(NumS - 1):
    r_list.extend(df.values.tolist())
#print(r_list)

# Stack two below matrices together
b1 = np.array(r_list)
b2 = np.identity(NumS*NumS)

# The first line of the contraints
A_ex1_1 = np.column_stack((b1, b2))


    


In [23]:
# The second line of the contraints
#b3 is for z_i
b3 = np.array(df.values.tolist())*0
b4 = np.zeros((NumS,NumS*NumS))

for x in range(NumS):
    b4[x,NumS*x:NumS*(x+1)]=1

A_ex1_2 = p_k*np.column_stack((b3, b4))

# The inequality contraints for linear programing
A_ex1 = np.row_stack((-A_ex1_1,A_ex1_2))

#Check the Inequality constraints
print(A_ex1)

[[ -7.5          5.8         14.8        ...  -0.          -0.
   -0.        ]
 [ -8.4         -2.          26.5        ...  -0.          -0.
   -0.        ]
 [ -6.1         -5.6        -37.1        ...  -0.          -0.
   -0.        ]
 ...
 [  0.           0.           0.         ...   0.           0.
    0.        ]
 [  0.           0.           0.         ...   0.           0.
    0.        ]
 [  0.          -0.           0.         ...   0.04545455   0.04545455
    0.04545455]]


In [15]:
#Working on the bounds of inequality contraints
#v_i is patial moments
v_i = np.array([])


for x in range(NumS):
    yp = y_k[x]-y_k
    yp[yp < 0] = 0
    v_i = np.append(v_i, np.mean(yp))

#print(v_i)

# Bounds of inequality contraints used for lP
b_ex1 = np.array([])
for i in range(NumS):
    for x in range(NumS):
        b_ex1 = np.append(b_ex1, y_k[i])


b_ex1 = np.append(-b_ex1,v_i)
print(b_ex1)

[  0.8375       0.8375       0.8375       0.8375       0.8375
   0.8375       0.8375       0.8375       0.8375       0.8375
   0.8375       0.8375       0.8375       0.8375       0.8375
   0.8375       0.8375       0.8375       0.8375       0.8375
   0.8375       0.8375       3.6375       3.6375       3.6375
   3.6375       3.6375       3.6375       3.6375       3.6375
   3.6375       3.6375       3.6375       3.6375       3.6375
   3.6375       3.6375       3.6375       3.6375       3.6375
   3.6375       3.6375       3.6375       3.6375     -17.85
 -17.85       -17.85       -17.85       -17.85       -17.85
 -17.85       -17.85       -17.85       -17.85       -17.85
 -17.85       -17.85       -17.85       -17.85       -17.85
 -17.85       -17.85       -17.85       -17.85       -17.85
 -17.85       -14.375      -14.375      -14.375      -14.375
 -14.375      -14.375      -14.375      -14.375      -14.375
 -14.375      -14.375      -14.375      -14.375      -14.375
 -14.375      -14.375

In [16]:
# Working on equality contraint
A_ex2 = np.array([np.append(np.ones(NumA), np.zeros(NumS*NumS))])

#print(A_ex2)
b_ex2 = np.array([1])

# Bounds on decision variables
bounds_ex1 =[]
for x in range(NumA+NumS*NumS):
    bounds_ex1 = bounds_ex1 + [(0, None)]

In [17]:
# Solve the problem
res_ex1 = linprog(-c_ex1, A_ub=A_ex1, b_ub=b_ex1, A_eq=A_ex2, b_eq=b_ex2,
                  bounds=bounds_ex1, method='revised simplex')


res_ex1

     con: array([6.66133815e-16])
     fun: -11.008198995664424
 message: 'Optimization terminated successfully.'
     nit: 1241
   slack: array([ 2.04834000e+00, -3.55271368e-15,  2.06685399e+01,  1.36401654e+01,
        7.63229696e+00,  1.46081371e+01,  1.14900603e+01,  1.94040901e+01,
        9.76996262e-15,  1.85709122e+01,  1.38262341e+01,  7.26693607e+00,
        3.05035825e+01,  2.98892955e+01,  1.11731190e+01,  1.31438767e+01,
        1.61795531e+01,  1.77635684e-15,  1.79198637e+01,  2.30349284e+00,
        1.72518064e+01,  1.21007607e+00,  4.84834000e+00,  0.00000000e+00,
        2.34685399e+01,  1.64401654e+01,  1.04322970e+01,  1.74081371e+01,
        1.42900603e+01,  2.22040901e+01,  1.12500000e+00,  2.13709122e+01,
        1.66262341e+01,  1.00669361e+01,  3.33035825e+01,  3.26892955e+01,
        1.39731190e+01,  1.59438767e+01,  1.89795531e+01,  1.24344979e-14,
        2.07198637e+01,  5.10349284e+00,  2.00518064e+01,  4.01007607e+00,
        0.00000000e+00,  7.10542736e

#### Reference

Bawa, V. S., Bodurtha Jr, J. N., Rao, M. R., & Suri, H. L. (1985). On determination of stochastic dominance optimal sets. The Journal of Finance, 40(2), 417-431.