In [5]:
# Library
import math

Check if $\pi = 0.7$ is the optimal solution for discrete action-choices for the example 3.2 in R. Korn, Optimal Portfolios (p. 41 ff.)

In [2]:
# See p. 43
def f(x):
    return(4/9 * math.sqrt(x+1) + 5/9 * math.sqrt(1 - x/2))

In [3]:
actions = [0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1]

In [5]:
for x in actions:
    print((x, f(x)))

(0, 1.0)
(0.1, 1.0076258405650098)
(0.2, 1.0139107722548775)
(0.3, 1.018941547671444)
(0.4, 1.022777753497697)
(0.5, 1.0254562782765053)
(0.6, 1.0269938209933094)
(0.7, 1.0273878664790437)
(0.8, 1.0266162769118792)
(0.9, 1.0246354160426578)
(1, 1.021376461713902)


**Calculation of the true Action-Value function for a one-step binomial tree with the parameters chosen as in example 3.2 in R. Korn, Optimal Portfolios (p. 42 ff.)**


As there is only one step, the agent receives an immediate reward after taking an action and the episode terminates with the Q-values of all terminal states beeing equal to zero. Assuming a square root utility function and $V_0=100$, we get:
$$
\begin{align*}
Q(s, a) &= \mathbb{E}[R_1 + max_a Q(s', a)] \\
        &= \mathbb{E}[R_1] \\
        &= \frac{4}{9}\sqrt{2a*100+(1-a)*100} + \frac{5}{9}\sqrt{0.5a*100+(1-a)*100}, \forall a \in \{0, 0.1, 0.2 \dots, 1\}
\end{align*}
$$

In [6]:
def true_avf(x):
    return(4/9 * math.sqrt(2*x*100 + (1-x)*100) + 5/9 * math.sqrt(0.5*x*100 + (1-x)*100))

for x in actions:
    print((x, true_avf(x)))

(0, 10.0)
(0.1, 10.076258405650098)
(0.2, 10.139107722548776)
(0.3, 10.189415476714439)
(0.4, 10.227777534976969)
(0.5, 10.254562782765056)
(0.6, 10.269938209933095)
(0.7, 10.273878664790438)
(0.8, 10.266162769118791)
(0.9, 10.246354160426577)
(1, 10.21376461713902)
