## Q1

From Equation 6.2), wub in with $U(C_t)=log(C_t)$

The HJB equation is:

$$\max_{\pi_t,c_t}(E_t(dV^*(t,W_t))+log(c_t)*d_t)=\rho*V^*(t,W_t)$$

With Ito's lemma on $dV*$.remove $dz_t$ terms and divide by $d_t$,we get:
$$\max_{\pi_t,c_t}(\frac{dV^*}{dt}+\frac{dV^*}{dW_t}+\frac{d^2V^*}{dW_t^2}*\frac{\pi_t^2\sigma^2W_t^2}{2}+log(c_t))=\rho*V^*(t,W_t)$$
sunject to terminal condition $V^*(T,W_t)=\epsilon'log(W_T)$

Using $\Phi$ to represent the LHS to find $\pi_t^*,c_t^*$ that maximizes $\Phi$, we take partial derivatives.


$$\frac{d\Phi}{d\pi_t}=W_t(\mu-r)\frac{dV^*}{dW_t}+\frac{d^2V^*}{dW_t^2}\pi_t\sigma^2W_t^2=0$$

we get:

$$\pi_t^*=\frac{-\frac{dV^*}{dW_t}(\mu-r)}{\frac{d^2V^2}{dW_t^2}\sigma^2W_t}$$

and

$$\frac{d\Phi}{dc_t}=-\frac{dV^*}{dW_t}+\frac{1}{c_t}=0$$

so we get:

$$c_t^*=1/(\frac{dV^*}{dW_t})$$

Then we sub in the two expression back and solve the PDE for the final expression.

## Q3
State: (employment status,skill level,money) employment status is 1(employed) or 0 (unemployed)

Action state: ($\alpha_t$,$c_t$) where $\alpha$ is the amount of time spent on learning and $c_t$ is the consumption on that day.

Reward: $U(c_t)$ where U is the utility function

Transition map:

if employed: with action ($\alpha_t$,$c_t$), money+=f(skill level)-c_t skill level+=g(\alpha*T). p(employment =0)=p 

if unemployed: money+=-c_t, skill level=k(skill level,$\lambda$), p(employemnt=1)=h(skill level)

## trying out textbook codes 

This code take too long to execute because of the neural network.

In [None]:
import rl.chapter7
from rl.chapter7.asset_alloc_discrete import AssetAllocDiscrete
from pprint import pprint
import numpy as np
from rl.distribution import Gaussian, Sequence,Callable,Tuple
from rl.function_approx import DNNSpec,DNNApprox,AdamGradient
if __name__ == '__main__':
    steps: int = 4
    μ: float = 0.13
    σ: float = 0.2
    r: float = 0.07
    a: float = 1.0
    init_wealth: float = 1.0
    init_wealth_var: float = 0.1

    excess: float = μ - r
    var: float = σ * σ
    base_alloc: float = excess / (a * var)

    risky_ret: Sequence[Gaussian] = [Gaussian(μ=μ, σ=σ) for _ in range(steps)]
    riskless_ret: Sequence[float] = [r for _ in range(steps)]
    utility_function: Callable[[float], float] = lambda x: - np.exp(-a * x) / a
    alloc_choices: Sequence[float] = np.linspace(
        2 / 3 * base_alloc,
        4 / 3 * base_alloc,
        11
    )
    feature_funcs: Sequence[Callable[[Tuple[float, float]], float]] = \
        [
            lambda _: 1.,
            lambda w_x: w_x[0],
            lambda w_x: w_x[1],
            lambda w_x: w_x[1] * w_x[1]
        ]
    dnn: DNNSpec = DNNSpec(
        neurons=[],
        bias=False,
        hidden_activation=lambda x: x,
        hidden_activation_deriv=lambda y: np.ones_like(y),
        output_activation=lambda x: - np.sign(a) * np.exp(-x),
        output_activation_deriv=lambda y: -y
    )
    init_wealth_distr: Gaussian = Gaussian(μ=init_wealth, σ=init_wealth_var)

    aad: AssetAllocDiscrete = AssetAllocDiscrete(
        risky_return_distributions=risky_ret,
        riskless_returns=riskless_ret,
        utility_func=utility_function,
        risky_alloc_choices=alloc_choices,
        feature_functions=feature_funcs,
        dnn_spec=dnn,
        initial_wealth_distribution=init_wealth_distr
    )

    # vf_ff: Sequence[Callable[[float], float]] = [lambda _: 1., lambda w: w]
    # it_vf: Iterator[Tuple[DNNApprox[float], Policy[float, float]]] = \
    #     aad.backward_induction_vf_and_pi(vf_ff)

    # print("Backward Induction: VF And Policy")
    # print("---------------------------------")
    # print()
    # for t, (v, p) in enumerate(it_vf):
    #     print(f"Time {t:d}")
    #     print()
    #     opt_alloc: float = p.act(init_wealth).value
    #     val: float = v.evaluate([init_wealth])[0]
    #     print(f"Opt Risky Allocation = {opt_alloc:.2f}, Opt Val = {val:.3f}")
    #     print("Weights")
    #     for w in v.weights:
    #         print(w.weights)
    #     print()

    it_qvf: Iterator[DNNApprox[Tuple[float, float]]] = \
        aad.backward_induction_qvf()

    print("Backward Induction on Q-Value Function")
    print("--------------------------------------")
    print()
    for t, q in enumerate(it_qvf):
        print(f"Time {t:d}")
        print()
        opt_alloc: float = max(
            ((q.evaluate([(init_wealth, ac)])[0], ac) for ac in alloc_choices),
            key=itemgetter(0)
        )[1]
        val: float = max(q.evaluate([(init_wealth, ac)])[0]
                         for ac in alloc_choices)
        print(f"Opt Risky Allocation = {opt_alloc:.3f}, Opt Val = {val:.3f}")
        print("Optimal Weights below:")
        for wts in q.weights:
            pprint(wts.weights)
        print()

    print("Analytical Solution")
    print("-------------------")
    print()

    for t in range(steps):
        print(f"Time {t:d}")
        print()
        left: int = steps - t
        growth: float = (1 + r) ** (left - 1)
        alloc: float = base_alloc / growth
        val: float = - np.exp(- excess * excess * left / (2 * var)
                              - a * growth * (1 + r) * init_wealth) / a
        bias_wt: float = excess * excess * (left - 1) / (2 * var) + \
            np.log(np.abs(a))
        w_t_wt: float = a * growth * (1 + r)
        x_t_wt: float = a * excess * growth
        x_t2_wt: float = - var * (a * growth) ** 2 / 2

        print(f"Opt Risky Allocation = {alloc:.3f}, Opt Val = {val:.3f}")
        print(f"Bias Weight = {bias_wt:.3f}")
        print(f"W_t Weight = {w_t_wt:.3f}")
        print(f"x_t Weight = {x_t_wt:.3f}")
        print(f"x_t^2 Weight = {x_t2_wt:.3f}")
        print()