In [2]:
# Library
import math

Check if $\pi = 0.7$ is the optimal solution for discrete action-choices for the example 3.2 in R. Korn, Optimal Portfolios (p. 41 ff.)

In [2]:
# See p. 43
def f(x):
    return(4/9 * math.sqrt(x+1) + 5/9 * math.sqrt(1 - x/2))

In [3]:
actions = [0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1]

In [5]:
for x in actions:
    print((x, f(x)))

(0, 1.0)
(0.1, 1.0076258405650098)
(0.2, 1.0139107722548775)
(0.3, 1.018941547671444)
(0.4, 1.022777753497697)
(0.5, 1.0254562782765053)
(0.6, 1.0269938209933094)
(0.7, 1.0273878664790437)
(0.8, 1.0266162769118792)
(0.9, 1.0246354160426578)
(1, 1.021376461713902)


**Calculation of the true Action-Value function for all states (t=1, $\cdot$) for a two-step binomial tree with the parameters chosen as in example 3.2 in R. Korn, Optimal Portfolios (p. 42 ff.) for different utility functions**

Let the parameters of the binomial tree be specified as

$T=2$  
$p_u = \frac{4}{9}, p_d = \frac{5}{9}$  
$r_u = 1, r_d = -\frac{1}{2}, r_f = 0$  
$\mathcal{A} = [ 0, 0.1, \dots, 1]$.

**Power utility (square root utility)**

$U(V_T) = \frac{V_T^{\gamma}}{\gamma}$ for $\gamma \in (0,1)$

Choose $\gamma = \frac{1}{2}$:

For $s=(t=1, V_t=100)$ the optimal Q-value function $Q^*(s, \cdot)$ is given by

$$
\begin{align*}
  &\mathbb{E}\left[ U(V_T) | s, a \right] \text{ for all } a \in \mathcal{A} \\
= &p_u \left[ U\left( V_t a (1+r_u) + V_t (1-a)r_f \right) \right] + p_d \left[ U\left( V_t a (1+r_d) + V_t (1-a)r_f \right) \right] \\
= & p_u \left[ 2 \sqrt{\left( V_t a (1+r_u) + V_t (1-a)r_f \right)} \right] + p_d \left[ 2 \sqrt{\left( V_t a (1+r_d) + V_t (1-a)r_f \right)} \right]
\end{align*}
$$


In [16]:
def true_avf_sqrt(Vt, a, p_u, p_d, r_u, r_d, r_f):
    return(p_u * 2 * math.sqrt(Vt*a*(1+r_u) + (1-a)*Vt) + p_d * 2* math.sqrt(Vt*a*(1+r_d) + (1-a)*Vt))

for a in actions:
    print((a, true_avf_sqrt(100, a, 4/9, 5/9, 1, -0.5, 0)))

(0, 20.0)
(0.1, 20.152516811300195)
(0.2, 20.27821544509755)
(0.3, 20.378830953428878)
(0.4, 20.455555069953938)
(0.5, 20.509125565530113)
(0.6, 20.53987641986619)
(0.7, 20.547757329580875)
(0.8, 20.532325538237583)
(0.9, 20.492708320853154)
(1, 20.42752923427804)


Hence, a precision up to the $\epsilon \leq 10^{-2}$ is required in order to be able to choose the right action.

For this particular state $s=(t=1, V_t=100)$ we can estimate the number of updates needed to achieve above precision with probability ($1-\delta$) by Lemma 3.3.

In [14]:
def num_updates_sqrt(Vt, a, p_u, p_d, r_u, r_d, r_f, delta, eps):
    return ((2*math.sqrt(Vt*a*(1+r_u) + (1-a)*Vt) - 2*math.sqrt(Vt*a*(1+r_d) + (1-a)*Vt))**2 * p_u * p_d)/(delta * (eps **2))

In [17]:
for a in actions:
    print((a, num_updates_sqrt(100, a, 4/9, 5/9, 1, -0.5, 0, 0.05, 0.0001)))

(0, 0.0)
(0.1, 1085465673.8593066)
(0.2, 4254623391.080938)
(0.3, 9406497943.138006)
(0.4, 16473866893.496794)
(0.5, 25418203741.305153)
(0.6, 36226953313.24989)
(0.7, 48912670782.6442)
(0.8, 63513882650.340195)
(0.9, 80097811352.87164)
(1, 98765432098.76544)


We need to update the state action pair (s=(1, 100), a=0.7) approx. $4.9$m times in order to achieve the needed precision with a probability of 95%.

**Log utility**

$U(V_T) = log(V_T)$

In [10]:
def true_avf_log(Vt, a, p_u, p_d, r_u, r_d, r_f):
    return(p_u * math.log(Vt*a*(1+r_u) + (1-a)*Vt) + p_d * math.log(Vt*a*(1+r_d) + (1-a)*Vt))

for a in actions:
    print((a, true_avf_log(100, a, 4/9, 5/9, 1, -0.5, 0)))

(0, 4.605170185988092)
(0.1, 4.619033991241375)
(0.2, 4.627668369197723)
(0.3, 4.63148823158599)
(0.4, 4.630744762645181)
(0.5, 4.625553527118508)
(0.6, 4.6159079412424555)
(0.7, 4.60168112196436)
(0.8, 4.582616690519038)
(0.9, 4.558306912756033)
(1, 4.528153832592542)


Hence, in order to be able to distinguish action $a=0.3$ and $a=0.4$ we need a precision of $\epsilon <= 10^{-3}$

In [22]:
def num_updates_log(Vt, a, p_u, p_d, r_u, r_d, r_f, delta, eps):
    return ((math.log(Vt*a*(1+r_u) + (1-a)*Vt) - math.log(Vt*a*(1+r_d) + (1-a)*Vt))**2 * p_u * p_d)/(delta * (eps **2))

In [23]:
for a in actions:
    print((a, num_updates_log(100, a, 4/9, 5/9, 1, -0.5, 0, 0.05, 0.001)))

(0, 0.0)
(0.1, 106136.19084013824)
(0.2, 408696.1719019835)
(0.3, 891485.079082101)
(0.4, 1546517.6795386863)
(0.5, 2372607.476139267)
(0.6, 3374802.2882089214)
(0.7, 4564500.90039318)
(0.8, 5960241.781790532)
(0.9, 7589301.210530215)
(1, 9490429.90455706)


We need to update the state action pair (s=(1, 100), a=0.3) approx. $1.5$m times in order to achieve the needed precision with a probability of 95%.