# Tutorial: Discrete Dynamic Programming (2)

## Computational Economics  (ECO309)

### Job-Search Model

- When unemployed in date, a job-seeker
  - consumes unemployment benefit $c_t = \underline{c}$
  - receives in every date $t$ a job offer $w_t$
    - $w_t$ is i.i.d., 
    - takes values $w_1, w_2, w_3$ with probabilities $p_1, p_2, p_3$
  - if job-seeker accepts, becomes employed at rate $w_t$ in the next period
  - else he stays unemployed
  
- When employed at rate $w$
  - worker consumes salary $c_t = w$
  - with small probability $\lambda>0$ looses his job:
    - starts next period unemployed
  - otherwise stays employed at same rate
- Objective: $\max E_0 \left\{ \sum \beta^t \log(w_t) \right\}$


__What are the states, the controls, the reward of this problem ? Write down the Bellman equation.__

- States: 
    - Unemployed with offer $w_1$
    - Unemployed with offer $w_2$
    - Unemployed with offer $w_3$
    - Employed at $w_1$
    - Employed at $w_2$
    - Employed at $w_3$
- Controls:
    - Unemployed with offer $w_1$ : accept/reject
    - Unemployed with offer $w_2$ : accept/reject
    - Unemployed with offer $w_3$ : accept/reject
    - Employed at $w_1$ : None
    - Employed at $w_2$ : None
    - Employed at $w_3$ : None
- Reward
    - Unemployed at wage $w$: $\log(underline{c})$
    - Employed at wage $w$: $\log(w)$

__Define a parameter structure for the model.__

In [3]:
struct Parameters
    p::Vector
    w::Vector
    λ::Float64
    β::Float64
    cbar::Float64

end

In [82]:
p = Parameters(
    [0.2, 0.6, 0.2],
    [0.9, 1.0, 1.1],
    0.05,
    0.9,
    0.95
)

Parameters([0.2, 0.6, 0.2], [0.9, 1.0, 1.1], 0.05, 0.9, 0.95)

__Define a function  `value_update(V_U::Vector{Float64}, V_E::Vector{Float64}, x::Vector{Bool}, p::Parameters)::Tuple{Vector, Vector}`, which takes in value functions tomorrow and a policy vector and return updated values for today.__

In [17]:
x_0 = [false, false, false]
V_U_0 = [0.0, 0.0, 0.0]
V_E_0 = [0.0, 0.0, 0.0]

3-element Vector{Float64}:
 0.0
 0.0
 0.0

In [33]:
function value_update(V_U::Vector, V_E::Vector, x::Vector{Bool}, p::Parameters)
   
    n_V_U = zeros(3)
    n_V_E = zeros(3)
    
    # fill n_V_E (employed state)
    for i=1:3
        w_i = p.w[i]::Float64 # current wage
        r = log(w_i) # current felicity
        C_V_E = (1-p.λ)*p.β*V_E[i]
        C_V_U = p.λ*p.β*sum(  p.p[j]*V_U[j] for j=1:3 )
        n_V_E[i] = r + C_V_E + C_V_U
    end
    
    # fill n_V_U
    for i=1:3
        r = log(p.cbar) # current reward
        if x[i]==true
            C_V = p.β*V_E[i]
        else
            C_V = p.β*sum(p.p[j]*V_U[j] for j=1:3) 
        end
        n_V_U[i] = r + C_V
    end
    
    return (n_V_U, n_V_E)
    
    
end

value_update (generic function with 2 methods)

In [35]:
V_U, V_E = value_update(V_U_0, V_E_0, x_0, p)

([-1.6094379124341003, -1.6094379124341003, -1.6094379124341003], [-0.10536051565782628, 0.0, 0.09531017980432493])

__Define a function `policy_eval(x::Vector{Bool}, p::Parameter)::Tuple{Vector, Vector}` which takes in a policy vector and returns the value(s) of following this policies forever. You can add relevant arguments to the function.__

In [None]:
# function policy_eval(x::Vector{Bool}, p::Parameter, V_U_0::Vector, V_E_0::Vector)::Tuple{Vector, Vector}
#     ....
# end

# policy_eval(x::Vector{Bool}, p::Parameter) = policy_eval(x,p, zeros(3), zeros(3))



In [41]:
a = [0.3, 0.3]
b = [3.4, 3.2]
using LinearAlgebra: norm

In [57]:
distance(a,b) = norm(a-b,1)

distance (generic function with 1 method)

In [58]:
distance(a::Tuple{Vector, Vector}, b::Tuple{Vector, Vector}) = distance(a[1],b[1]) + distance(a[2],b[2])

distance (generic function with 2 methods)

In [64]:
function policy_eval(x::Vector{Bool}, p::Parameters, V_U::Vector=zeros(3),
        V_E::Vector=zeros(3),
    tol_η=1e-6, maxit=1000)::Tuple{Vector, Vector}
    
    for n=1:maxit
       
        n_V_U, n_V_E = value_update(V_U, V_E, x, p)
        η = distance( (n_V_U, n_V_E), (V_U, V_E) )
        if η<tol_η
            return n_V_U, n_V_E
        end
        V_U, V_E = n_V_U, n_V_E
    end    

end



policy_eval (generic function with 5 methods)

In [73]:
policy_eval(x_0, p)

([-16.09437767883241, -16.09437767883241, -16.09437767883241], [-5.721430115203782, -4.994805869311957, -4.337494284476451])

In [69]:
all_policies = [[i,j,k] for i=(false,true) for j=(false, true) for k=(false,true)]

8-element Vector{Vector{Bool}}:
 [0, 0, 0]
 [0, 0, 1]
 [0, 1, 0]
 [0, 1, 1]
 [1, 0, 0]
 [1, 0, 1]
 [1, 1, 0]
 [1, 1, 1]

In [83]:
policy_eval(all_policies[1],p)

([-0.5129314531275061, -0.5129314531275061, -0.5129314531275061], [-0.8858088390133, -0.1591845973552969, 0.4981269836502504])

In [84]:
all_values = [policy_eval(pol,p) for pol in all_policies]

8-element Vector{Tuple{Vector{Float64}, Vector{Float64}}}:
 ([-0.5129314531275061, -0.5129314531275061, -0.5129314531275061], [-0.8858088390133, -0.1591845973552969, 0.4981269836502504])
 ([0.2178275908946779, 0.2178275908946779, 0.6238064425029485], [-0.6338248515901406, 0.09279939166451519, 0.7501109741144105])
 ([-0.14901119864138318, -0.08161854908109774, -0.14901119864138318], [-0.7603189502678954, -0.03369488717156144, 0.6236165323053081])
 ([0.049820558422034505, -0.0199141331469381, 0.5716662766192118], [-0.691758332658257, 0.03486589581906111, 0.6921774649012284])
 ([-0.932160399255396, -0.7824357865069405, -0.7824357865069405], [-0.9787413898882489, -0.25211714563564314, 0.4051944377170079])
 ([-0.7563955710258234, -0.2160825390833082, 0.48914658717492965], [-0.7834471406989734, -0.05682293942229902, 0.6004886050538931])
 ([-0.790507718586588, -0.13654590707184588, -0.32599919107860614], [-0.8213495134814646, -0.09472527688463331, 0.5625862995425273])
 ([-0.7299680492629766, 

In [85]:
cat([reshape(cat(v[1], v[2]; dims=1),1,6) for v in all_values]..., dims=1)

8×6 Matrix{Float64}:
 -0.512931   -0.512931   -0.512931  -0.885809  -0.159185   0.498127
  0.217828    0.217828    0.623806  -0.633825   0.0927994  0.750111
 -0.149011   -0.0816185  -0.149011  -0.760319  -0.0336949  0.623617
  0.0498206  -0.0199141   0.571666  -0.691758   0.0348659  0.692177
 -0.93216    -0.782436   -0.782436  -0.978741  -0.252117   0.405194
 -0.756396   -0.216083    0.489147  -0.783447  -0.0568229  0.600489
 -0.790508   -0.136546   -0.325999  -0.82135   -0.0947253  0.562586
 -0.729968   -0.0760064   0.515574  -0.754083  -0.0274592  0.629852

__Define a function `bellman_step(V_E::Vector, V_U::Vector, p::Parameters)::Tuple{Vector, Vector, Vector}` which returns updated values, together with improved policy rules.__

In [96]:
function bellman_step(V_U::Vector, V_E::Vector, p::Parameters)
   
    n_V_U = zeros(3)
    n_V_E = zeros(3)
    x = zeros(Bool, 3)
    
    # fill n_V_E (employed state)
    for i=1:3
        w_i = p.w[i]::Float64 # current wage
        r = log(w_i) # current felicity
        C_V_E = (1-p.λ)*p.β*V_E[i]
        C_V_U = p.λ*p.β*sum(  p.p[j]*V_U[j] for j=1:3 )
        n_V_E[i] = r + C_V_E + C_V_U
    end
    
    # fill n_V_U
    for i=1:3
        r = log(p.cbar) # current reward
        C_V_accept = p.β*V_E[i]
        C_V_reject = p.β*sum(p.p[j]*V_U[j] for j=1:3) 
        C_V = max(C_V_accept, C_V_reject)
        x[i] = C_V_accept>C_V_reject    
        n_V_U[i] = r + C_V
    end
    
    return (n_V_U, n_V_E, x)
    
    
end

bellman_step (generic function with 1 method)

In [32]:
bellman_step(V_U_0, V_E_0, p)

([-1.6094379124341003, -1.6094379124341003, -1.6094379124341003], [-0.10536051565782628, 0.0, 0.09531017980432493])

__Implement Value Function__

In [102]:
function vfi(p; maxit=1000, tol_η=1e-6, verbose=true)
    V_U, V_E = zeros(3), zeros(3)
    η_0 = 1
    for it=1:maxit
        n_V_U, n_V_E, x = bellman_step(V_U, V_E, p)
        η = distance( (n_V_U, n_V_E), (V_U, V_E))
        λ = η/η_0
        η_0 = η
        if verbose
            println("$it : $η : $λ")
        end
        if η<tol_η
            return (n_V_U, n_V_E, x)
        end
        V_U, V_E = n_V_U, n_V_E
    end
    
end

vfi (generic function with 1 method)

In [104]:
@time V_U, V_E, x = vfi(p, verbose=false)

  0.000341 seconds (13.27 k allocations: 263.844 KiB)


([0.21782763098990254, 0.21782763098990254, 0.6238064825981732], [-0.6338248114949159, 0.09279943175973977, 0.750111014209635], Bool[0, 0, 1])

__Implement Policy Iteration and compare rates of convergence.__

In [119]:
function vfi_improved(p; maxit=1000, tol_η=1e-6, verbose=true)
    V_U, V_E = zeros(3), zeros(3)
    η_0 = 1
    x0 = zeros(Bool, 3)
    print("0 :   |")
    print(x0)
    print("\n")
    for it=1:maxit
        
        
        V_U, V_E = policy_eval(x0, p)
        n_V_U, n_V_E, x = bellman_step(V_U, V_E, p)
        
        
        η = distance( x, x0)
        x0 = x
        
        λ = η/η_0
        η_0 = η
        if verbose
            print("$it : $η | ")
            print(x)
            print("| ")
            print(V_U)
            print("\n")
        end
        if η<tol_η
            return (n_V_U, n_V_E, x)
        end
        V_U, V_E = n_V_U, n_V_E
    end
    
end

vfi_improved (generic function with 1 method)

In [120]:
vfi_improved(p)

0 :   |Bool[0, 0, 0]
1 : 2.0 | Bool[0, 1, 1]| [-0.5129314531275061, -0.5129314531275061, -0.5129314531275061]
2 : 1.0 | Bool[0, 0, 1]| [0.049820558422034505, -0.0199141331469381, 0.5716662766192118]
3 : 0.0 | Bool[0, 0, 1]| [0.2178275908946779, 0.2178275908946779, 0.6238064425029485]


([0.21782773070714823, 0.21782773070714823, 0.6238065823154189], [-0.6338247125126616, 0.09279953112789542, 0.7501111139268809], Bool[0, 0, 1])

__Discuss the Effects of the Parameters__

In [128]:
p = Parameters(
    [0.2, 0.6, 0.2],
    [0.9, 1.0, 15],
    0.05,
    0.9,
    0.8
)
vfi_improved(p, verbose=false)

0 :   |Bool[0, 0, 0]


([-0.4466193414900138, -0.2924978136313429, 16.516087381483914], [-0.8036845010939739, -0.07706033871006676, 18.599145797751635], Bool[0, 1, 1])

### Brock-Mirman Stochastic Growth model

This is a neoclassical growth model with unpredictable shocks on productivity.

Social planner tries to solve:

$$\max E_t \left[ \sum_{n=0}^{\infty} \beta^n \log C_{t+n} \right]$$

s.t.

$$K_{t+1} = Y_t - C_t$$
$$Y_{t+1} = A_{t+1}K_{t+1}^\alpha$$

where $A_t$ is the level of productivity in period $t$. 
It can take  values $A^h=1.05$ and $A^l=0.95$. The transition between these two states are given by the matrix:
$$P = \begin{bmatrix}
0.9, 0.1\\
0.1, 0.9
\end{bmatrix}$$

__What are the states? What are the controls? Is it possible to bound them in a natural way? Propose a discretization scheme.__

__Implement the Value Function Algorithm__

__Bonus: Propose some ideas to improve performances.__

__Policy iteration__