### Simulation du problème
STATES : 1 new, 2 good shape, 3 old, 4 brocken \
ACTIONS : 1 do nothing, 2 maintain, 3 repair, 4 replace

In [1]:

function simulation(s,a) #Take the action state pair and return the next state and the reward 
    random_number = rand()
    if s == 1 #If the state is new
        if a == 1 #If the action is do nothing (only action available for new state)
            return 1, -30
        else
            AssertionError("The action is not valid for the state")
        end
    
    elseif s == 2 #If the state is in good shape

        if a == 1 #If the action is do nothing
            if random_number < 0.3
                return 1, -20
            else
                return 2, -20
            end

        elseif a == 2 #If the action is maintain
            if random_number < 0.8
                return 1, -10
            else
                return 2, -10
            end

        elseif a == 3 #If the action is repair
            return 1, -5

        elseif a == 4 #If the action is replace
            return 0, -50

        else
            AssertionError("The action is not valid for the state")
        end

    elseif s == 3 #If the state is old

        if a == 1 #If the action is do nothing
            if random_number < 0.5
                return 2, -10
            else
                return 3, -10
            end
        
        elseif a == 2 #If the action is maintain
            if random_number < 0.9
                return 2, 0
            else
                return 3, 0
            end

        elseif a == 3 #If the action is repair
            return 1, 20

        elseif a == 4 #If the action is replace
            return 0, 70

        else
            AssertionError("The action is not valid for the state")
        end

    elseif s == 4 #If the state is broken

        if a == 1 #If the action is do nothing
            return 3, 0

        elseif a == 3 #If the action is ezpair
            return 1, 50

        elseif a == 4 #If the action is replace
            return 0, 70

        else
            AssertionError("The action is not valid for the state")
        end
    
    else
        AssertionError("The state is not valid")
    end
end

println(simulation(2, 1))

(1, -20)


### Dictionnaire associé au problème

In [2]:
using JSON
dict_state_action = JSON.parsefile("state_action.json")

Dict{String, Any} with 4 entries:
  "4" => Dict{String, Any}("4"=>Dict{String, Any}("1"=>Dict{String, Any}("c"=>7…
  "1" => Dict{String, Any}("1"=>Dict{String, Any}("2"=>Dict{String, Any}("c"=>-…
  "2" => Dict{String, Any}("4"=>Dict{String, Any}("1"=>Dict{String, Any}("c"=>5…
  "3" => Dict{String, Any}("4"=>Dict{String, Any}("1"=>Dict{String, Any}("c"=>6…

In [3]:
print(dict_state_action["1"]["1"])

Dict{String, Any}("2" => Dict{String, Any}("c" => -30, "p" => 1))

On définit V de la façon suivante :

$$
V_{t}(x)=E[\sum_{s=t}^{T} R(X_t,t) |  X_{t}=x]
$$

On a donc :
$$
\forall x \in States,   V_{T+1}(x)=0
$$

In [4]:
s = "123"
x = parse(Int, s)
println(typeof(x))

Int64


On cherche à connaitre $ V_{1}(1)$ et sa politique optimale associée.

In [5]:
function dynamic_programming(dict_state_action, T)
    V = fill(Inf, (4, T+1))
    π = zeros(Int, (4, T))
    for x ∈ 1:4
        V[x, T+1] = 0
    end

    for t ∈ T:-1:1
        for x ∈ 1:4
            for (action, dict_action) ∈ dict_state_action[string(x)]
                sum = 0
                for (next_state, values) ∈ dict_action
                    val_next_state = parse(Int, next_state)
                    sum += values["p"]*(values["c"] + V[val_next_state, t+1])
                end
                if sum < V[x, t]
                    V[x, t] = sum
                    π[x, t] = parse(Int,action)
                end
            end

        end
    end
    return V, π
end

dynamic_programming (generic function with 1 method)

In [6]:
V, π = dynamic_programming(dict_state_action, 12)
for i ∈ 1:4
    println("Valeur à l'état ",i," : ",V[i, 1:13])
    println("Action à l'état ",i," : ",π[i, 1:12])
end
println(π)
print(V[1,1])

Valeur à l'état 1 : [-110.82000000000001, -105.82000000000001, -100.82000000000001, -95.82000000000001, -90.82000000000001, -85.82000000000001, -80.82, -75.82, -70.4, -63.0, -50.0, -30.0, 0.0]
Action à l'état 1 : [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
Valeur à l'état 2 : [-85.82000000000001, -80.82000000000001, -75.82000000000001, -70.82000000000001, -65.82000000000001, -60.82000000000001, -55.82000000000001, -50.82, -45.82, -40.4, -33.0, -20.0, 0.0]
Action à l'état 2 : [2, 2, 2, 2, 2, 2, 2, 3, 2, 1, 1, 1]
Valeur à l'état 3 : [-60.82000000000001, -55.82000000000001, -50.82000000000001, -45.82000000000001, -40.82000000000001, -35.82000000000001, -30.82, -25.82, -20.4, -17.5, -15.0, -10.0, 0.0]
Action à l'état 3 : [3, 3, 3, 3, 3, 3, 3, 3, 3, 1, 1, 1]
Valeur à l'état 4 : [-35.82000000000001, -30.820000000000007, -25.820000000000007, -20.820000000000007, -15.820000000000007, -10.819999999999993, -5.819999999999993, -0.4000000000000057, 0.0, 0.0, 0.0, 0.0, 0.0]
Action à l'état 4 : [4, 4, 4, 4

La valeur sous la politique $\pi$ (explicité si dessus) est de <u>110.82</u>. 