# Markov Property
## Definition
A state $S_t$ is *Markov* if and only if 
$$\mathbb{P}[S_{t+1}|S_t]=\mathbb{P}[S_{t+1}|S_t,...,S_t]$$
- The state captures all relevant information from the history
- Once the state is known, the history may be thrown away
- i.e. The state is a sufficient statistic of the future

# State Transition Matrix
For a Markov state $s$ and successor state $s'$, the *state transition probability* is defined by $$\mathcal{P}_{ss'}=\mathbb{P}[S_{t+1}=s'|S_t=s]$$ $\vec{x}\stackrel{\mathrm{def}}{=}{x_1,\dots,x_n}$
State transition matrix $\mathcal{P}$ defines transition probabilities from all states $s$ to all successor states $s'$,
$$\mathcal{P}=\mathrm{from} 
\begin{matrix}
    \mathrm{to}\\
    \begin{bmatrix}
    \mathcal{P}_{11}&\cdots&\mathcal{P}_{1n}\\
    \vdots&\ddots&\vdots\\
    \mathcal{P}_{n1}&\cdots&\mathcal{P}_{nn}\\
    \end{bmatrix}
\end{matrix}$$
where each row of the matrix sums to 1.

# Markov Process
A Markov process is a memory less random process, random process, i.e. a sequence of random states $S_1,S_2,\cdots$ with the Markov property.
## Definition
A *Markov Process* (or *Markov Chain*) is a tuple $<\mathcal{S,P}>$
- $\mathcal{S}$ is a (finite) set of states
- $\mathcal{P}$ is a state transition probability matrix,
$$\mathcal{P}_{ss'}=\mathbb{P}[S_{t+1}=s'|S_t=s]$$

# Example: Student Markov Chain
![ExampleStudentMarkovChain](./pics/ExampleStudentMarkovChain.jpg)
Sample episodes for Student Markov Chain starting from $S_1=C1$
$$S_1,S_2,\cdots,S_T$$
- C1,C2,C3,Pass,Sleep
- C1,FB,FB,C1,C2,Sleep
- C1,C2,C3,Pub,C2,C3,Pass,Sleep
- C1,FB,FB,C1,C2,C3,Pub,C1,FB,FB,FB,C1,C2,C3,Pub,C2,Sleep
$$\begin{array}{lllllll}
{C1} & {C2} & {C3} & {Pass} & {Pub} & {FB} & {Sleep}
\end{array}$$
$$\mathcal{P}=\begin{bmatrix}
0 & 0.5 & 0 & 0 & 0 & 0.5 & 0\\
0 & 0 & 0.8 & 0 & 0 & 0 & 0.2\\
0 & 0 & 0 & 0.6 & 0.4 & 0 & 0\\
0 & 0 & 0 & 0 & 0 & 0 & 1\\
0.2 & 0.4 & 0.4 & 0 & 0 & 0 & 0 \\
0.1 & 0 & 0 & 0 & 0 & 0.9 & 0 \\
0 & 0 & 0 & 0 & 0 & 0 & 1\\
\end{bmatrix}$$

# Markov Reward Process
A Markov reward process is a Markov chain with values.
## Definition
A *Markov Reward Process* is a tuple $<\mathcal{S,P,{\color{red}R,\gamma}
}>$
- $\mathcal{S}$ is a finite set of states
- $\mathcal{P}$ is a state transition probability matrix,
- ${\color{red}\mathcal{R}}$ is areward function, ${\color{red}\mathcal{R_s}=\mathbb{E}[R_{t+1}|S_{t}=s]}$
- ${\color{red}\gamma}$ is a discount factor, ${\color{red}\gamma\in[0,1]}$
## Example:
![ExampleStudentMarkovChain](./pics/ExampleStudentMarkovChain-MDP.jpg)