# A two-state Markov decision process


Consider a two-state Markov decision process. At each decision epoch the system occupies either state $s_1$, or $s_2$. In state $s_1$, the decision maker can choose either action $a_{1,1}$ or action $a_{1,2}$. In state $s_2$, only action $a_{2,1}$ is available. Choosing action $a_{1,1}$ in $s_1$ provides the decision maker with an immediate reward of 5 units, and at the next decision epoch the system is in state $s_1$ with probability 0.5, and state $s_2$ with probability 0.5. If instead he chooses action $a_{1,2}$ in state $s_1$, he receives an immediate reward of 10 units, and at the next decision epoch the system moves to state $s_2$ with probability 1. In state $s_2$, the decision maker must choose action $a_{2,1}$. As a consequence of this choice, the decision maker incurs a cost of 1 unit and the system occupies state $s_2$ at the next decision epoch with certainty.

### <font color='blue'> <u> MDP formulation: </u> </font> 

%%latex
\begin{itemize} 
	\item \textbf{Decision epochs:} $T=\{1,2, \dots, \infty\}$ 
	\item \textbf{States:} $S=\{s_1,s_2\}$  
	\item \textbf{Actions:} $A_{s_1} = \{a_{1,1},a_{1,2}\}, A_{s_2} = \{a_{2,1}\}$
	\item \textbf{Rewards:}	
        \begin{itemize} 
            \item $r_t(s_1,a_{1,1}) = 5, r_t(s_1,a_{1,2}) = 10, r_t(s_2, a_{2,1})= -1$   
		\end{itemize} 
	\item \textbf{Transition probabilities:}	
        \begin{itemize} 
			\item $\mathbb{P}(s_1|s_1,a_{1,1}) =0.5$, $\mathbb{P}(s_1|s_1,a_{1,2}) =0$  and $\mathbb{P}(s_1|s_2,a_{2,1}) =0$   
			\item $\mathbb{P}(s_2|s_1,a_{1,1}) =0.5$, $\mathbb{P}(s_2|s_1,a_{1,2}) =1$  and $\mathbb{P}(s_2|s_2,a_{2,1}) =1$ 
	\end{itemize}
\end{itemize}

In [1]:
# Draw the two-state Markov decision process
#------------------------------------
from IPython.display import Image
Image(filename='two_state_mdp_min.png')

LoadError: syntax: extra token "IPython" after end of expression

### <font color='red'> <u> Linear programming formulation: </u> </font>



- __Primal model:__
$$ \min \alpha(s_1)\cdot V(s_1) + \alpha(s_2)\cdot V(s_2)$$ 
subject to
\begin{align*}  
& (1-0.5\lambda)\cdot V(s_1) & -& 0.5 \lambda \cdot V(s_2) & \geq 5  \\
& V(s_1) & - &\lambda\cdot V(s_2) &\geq 10  \\
& (1-\lambda) \cdot V(s_2)& & &\geq -1  
\end{align*}

- __Dual model:__
$$ \max 5 x(s_1, a_{1,1}) +  10  x(s_1, a_{1,2}) - x(s_2, a_{2,1})$$ 
subject to
\begin{align*}  
& (1-0.5\lambda)\cdot x(s_1, a_{1,1})& + x(s_1, a_{1,2}) & & =\alpha(s_1)  \\
& -0.5\lambda \cdot x(s_1, a_{1,1}) & - \lambda x(s_1, a_{1,2}) & + (1-\lambda) \cdot x(s_2, a_{2,1}) &=\alpha(s_2)  \\ 
\end{align*}




In [2]:
# Instanciate the problem parameters 
#------------------------------------
LAMBDA = 0.95
ALPHA_1 = 0.8  # alpha1+alpha2=1, but how to set them?
ALPHA_2 = 0.2

0.2

In [3]:
# Add JuMP and GLPK packages
#------------------------------------
import Pkg
Pkg.add("JuMP")
Pkg.add("GLPK") #solver 

using JuMP, GLPK

[32m[1m  Updating[22m[39m registry at `~/.julia/registries/General`
[32m[1m  Updating[22m[39m git-repo `https://github.com/JuliaRegistries/General.git`
[32m[1m  Updating[22m[39m `~/.julia/environments/v1.3/Project.toml`
[90m [no changes][39m
[32m[1m  Updating[22m[39m `~/.julia/environments/v1.3/Manifest.toml`
[90m [no changes][39m
[32m[1m Resolving[22m[39m package versions...
[32m[1m  Updating[22m[39m `~/.julia/environments/v1.3/Project.toml`
[90m [no changes][39m
[32m[1m  Updating[22m[39m `~/.julia/environments/v1.3/Manifest.toml`
[90m [no changes][39m


In [4]:
# Specify the optimizer to be used
#------------------------------------
OPTIMIZER = GLPK.Optimizer 

GLPK.Optimizer

## Primal model

In [5]:
# Write the model to be solved
#------------------------------------
m = Model(with_optimizer(OPTIMIZER))

@variable(m, V_s1)
@variable(m, V_s2)

@objective(m, Min, ALPHA_1*V_s1 + ALPHA_2*V_s2)

@constraint(m, (1-0.5*LAMBDA)*V_s1 - 0.5*LAMBDA*V_s2 >= 5)
@constraint(m, V_s1 - LAMBDA*V_s2 >= 10 )
@constraint(m, (1-LAMBDA)*V_s2 >= -1 )

0.050000000000000044 V_s2 ≥ -1.0

In [6]:
# Print the model in a human-readable form
#------------------------------------
println("The optimization problem to be solved is:")
println()
print(m) 

The optimization problem to be solved is:

Min 0.8 V_s1 + 0.2 V_s2
Subject to
 0.525 V_s1 - 0.475 V_s2 ≥ 5.0
 V_s1 - 0.95 V_s2 ≥ 10.0
 0.050000000000000044 V_s2 ≥ -1.0


In [7]:
# Solve the model and display the output
#------------------------------------
status = optimize!(m)

if termination_status(m) == MOI.OPTIMAL 
    println("Objective value: ", objective_value(m)) 
    println("V_best_s1 = ", value(V_s1))
    println("V_best_s1 = ", value(V_s2))
    
else
    error("No result is found for this model.")
end

Objective value: -10.85714285714284
V_best_s1 = -8.571428571428553
V_best_s1 = -19.999999999999982


## Dual model  对偶模型

In [8]:
# Write the model to be solved
#------------------------------------
m2 = Model(with_optimizer(OPTIMIZER))

@variable(m2, x_s1_a11)
@variable(m2, x_s1_a12)
@variable(m2, x_s2_a21)

@objective(m2, Max, 5*x_s1_a11 + 10*x_s1_a12 - x_s2_a21)

@constraint(m2, (1-0.5*LAMBDA)*x_s1_a11 + x_s1_a12 = ALPHA_1)
@constraint(m2, -0.5*LAMBDA*x_s1_a11 - LAMBDA*x_s1_a12 + (1-LAMBDA)*x_s2_a21 = ALPHA_2)


-0.475 x_s1_a11 - 0.95 x_s1_a12 + 0.050000000000000044 x_s2_a21 ≤ 0.2

In [9]:
# Print the model in a human-readable form
#------------------------------------
println("The optimization problem to be solved is:")
println()
print(m2) 

The optimization problem to be solved is:

Max 5 x_s1_a11 + 10 x_s1_a12 - x_s2_a21
Subject to
 0.525 x_s1_a11 + x_s1_a12 ≤ 0.8
 -0.475 x_s1_a11 - 0.95 x_s1_a12 + 0.050000000000000044 x_s2_a21 ≤ 0.2


In [10]:
# Solve the model and display the output
#------------------------------------
status = optimize!(m2)

if termination_status(m2) == MOI.OPTIMAL 
    println("Objective value: ", objective_value(m2)) 
    println("x_s1_a11_best = ", value(x_s1_a11))
    println("x_s1_a12_best = ", value(x_s1_a12))
    println("x_s2_a21_best = ", value(x_s2_a21))

else
    error("No result is found for this model.")
end

ErrorException: No result is found for this model.