# State
> Allow an object to alter its behavior when its internal state changes. The object will appear to change states.

## Problem
Consider an MDP with two states $s_1$, $s_2$. The agent can take one of two actions $a_1$ and $a_2$. But this system is subject to the following constraints:

When in state $s_1$, if the agent takes action $a_1$ it will most likely go to state $s_2$ but there is a small chance that it will remain where it is.
$$
P(s_1 \vert s_1, a_1) = 0.3 \\
P(s_2 \vert s_1, a_1) = 0.7 \\
$$

But if it takes action $a_2$ it will most likely remain where it is but there is a small chance that it will go to $s_2$.
$$
P(s_1 \vert s_1, a_2) = 0.7 \\
P(s_2 \vert s_1, a_2) = 0.3 \\
$$

When in state $s2$ if the agent takes action $a_1$ it will most likely remain where it is but there is a small chance it will go to state $s_1$.
$$
P(s_1 \vert s_2, a_1) = 0.1 \\
P(s_2 \vert s_2, a_1) = 0.9 \\
$$

And when it takes action $a_2$ there is equal chance that it will remain where it is or go to state $s_1$.
$$
P(s_1 \vert s_2, a_2) = 0.5 \\
P(s_2 \vert s_2, a_2) = 0.5 \\
$$

The code below is pretty messy. And if I decide to add a new state with its own transition probabilities, it gets even messier. How do I simplify the `MDP` class without having too many conditionals?

In [29]:
from enum import Enum, auto

class State(Enum):
    S1 = auto()
    S2 = auto()

class Action(Enum):
    A1 = auto()
    A2 = auto()
    
class MDP:
    def __init__(self):
        self._state = random.choices([State.S1, State.S2], [0.5, 0.5])[0]
    
    def act(self, action):
        old_state = self._state
        if self._state == State.S1:
            if action == Action.A1:
                self._state = random.choices([State.S1, State.S2], [0.3, 0.7])[0]
            elif action == Action.A2:
                self._state = random.choices([State.S1, State.S2], [0.7, 0.3])[0]
        elif self._state == State.S2:
            if action == Action.A1:
                self._state = random.choices([State.S1, State.S2], [0.1, 0.9])[0]
            elif action == Action.A2:
                self._state = random.choices([State.S1, State.S2], [0.5, 0.5])[0]
        print(f"{action}: {old_state} -> {self._state}")

In [30]:
mdp = MDP()
mdp.act(Action.A1)
mdp.act(Action.A2)

Action.A1: State.S1 -> State.S1
Action.A2: State.S1 -> State.S1


## Solution
Create a separate `State` object for each of the two states that will implement its own transition probabilties. And it is a good idea to create an abstract state interface so any new states can implement that.

In [34]:
%reset

Once deleted, variables cannot be recovered. Proceed (y/[n])? y


In [47]:
from abc import ABC, abstractmethod
from enum import Enum, auto
import random

class Action(Enum):
    A1 = auto()
    A2 = auto()
    
class State(ABC):
    @abstractmethod
    def transition(self, action: Action) -> "State":
        pass

class S1(State):
    def __repr__(self):
        return "S1"
    
    def transition(self, action: Action) -> State:
        possible_states = [self, S2()]
        if action == Action.A1:
            return random.choices(possible_states, [0.3, 0.7])[0]
        elif action == Action.A2:
            return random.choices(possible_states, [0.7, 0.3])[0]
        
class S2(State):
    def __repr__(self):
        return "S2"
    
    def transition(self, action: Action) -> State:
        possible_states = [S1(), self]
        if action == Action.A1:
            return random.choices(possible_states, [0.1, 0.9])[0]
        elif action == Action.A2:
            return random.choices(possible_states, [0.5, 0.5])[0]
            
class MDP:
    def __init__(self):
        self._state = random.choices([S1(), S2()], [0.5, 0.5])[0]
    
    def act(self, action):
        old_state = self._state
        self._state = self._state.transition(action)
        print(f"{action}: {old_state} -> {self._state}")

In [48]:
mdp = MDP()
mdp.act(Action.A1)
mdp.act(Action.A2)

Action.A1: S1 -> S2
Action.A2: S2 -> S2
