# Dice MDP
The Dice MDP is an precursor concept to the Markov Decision Processes. The problem is as follows:

You are given a dice. At start:
<ul>
        <li>If you quit, you get a one-off reward of 10</li>
        <li>If you stay, then you get reward of 4, and then:
            <ul>
                <li>If the dice rolls 1 or 2, then it is the end.</li>
                <li>Else the game continues.</li>
            </ul>
        </li>
    </ul>

In [6]:
class DiceMDP:
    """
        Models a markov decision process on a game of dice according to the provided rules.
    """
    def states(self):
        """
            Return the possible states of the system.
        """
        return ['in', 'end']
    
    def is_goal(self, state):
        """
            Check if the provided state is equal to the end state.
        """
        return state == 'end'
    
    def actions(self, state):
        """
            Return the actions that can be taken at a particular time t.
        """
        if state == 'in':
            return ['stay', 'quit']
        return []
    
    def start_state(self):
        return 'in'
    
    def transition_probability(self, s, a, s_new):
        """
            Return the transition probability from s to s_new.
            If action is not stay, then the probability will be 1.
            If action is stay, then two cases apply:
                Either we stay in, the probability is 2/3.
                If we quit in the next state, the probability is 1/3.
            
            Args:
                s : state
                a: action:
                s_new: The new state
            
            Returns:
                float: The value of the probability corresponding to that action.
        """
        
        if action != 'stay':
            return 1
        
        if s_new != 'in':
            return 1/3
        
        return 2/3
    
    def transition(self, s, a):
        """
            Transition from s to a taking the action.
            
            Args:
                s: the state
                a: the action
        """
        from random import random
        probability = random()
        
        if probability <= 0.33 and a == 'stay':
            return 'end'
        
        return 'in'
    
    def reward(self, s, a, s_new):
        """
            Return the reward from s to s_new.
            If action is not stay, then the probability will be 1.
            If action is stay, then two cases apply:
                Either we stay in, the probability is 2/3.
                If we quit in the next state, the probability is 1/3.
            
            Args:
                s : state
                a: action:
                s_new: The new state
            
            Returns:
                float: The value of the probability corresponding to that action.
        """
        
        rewards = {'quit': 10, 'stay' : 4}
        if s == 'end':
            return 0
        
        if action != 'stay':
            return rewards['quit']
        
        return rewards['stay']

# Driver Code

In [16]:
dice = DiceMDP()
s_state = dice.start_state()
total_reward = 0
while True:
    action = input('Do you want to quit/stay?')
    if action == 'quit':
        new_state = 'end'
        total_reward += dice.reward(s_state, action, new_state)
        break
    
    # action is in:stay (state:action)
    total_reward += dice.reward(s_state, action, '')   # new_state = '' is uncertain as of this moment
    s_state = dice.transition(s_state, action)
    if dice.is_goal(s_state):
        break
    
print(f'The total reward is {total_reward}.')

Do you want to quit/stay?stay
Do you want to quit/stay?quit
The total reward is 14.
