# Toy Environment

![toy-mdp](../pictures/toy-mdp.png)

In this exercise we will learn how to implement a simple Toy environment using Python. The environment is illustrated in figure. It is composed of 3 states and 2 actions. The initial state is state 1.
The goal of this exercise is to implement a class Environment with a method step() taking as input the agent’s action and returning the pair (next state, reward). The environment can be implemented using pure python. In addition, write also a reset() method that restarts the environment state.

In [1]:
from typing import Tuple

In [2]:
class Environment:
    def __init__(self):
        """
        Constructor of the Environment class.
        """
        self._initial_state = 1
        self.allowed_actions = [0, 1] # 0: A, 1: B
        self.states = [1, 2, 3]
        self.current_state = self._initial_state

    def step(self, action: int) -> Tuple[int, int]:
        """
        Step function: compute the one-step dynamic from the given action.
        
        Args:
            action (int): the action taken by the agent.
        
        Returns:
            The tuple current_state, reward.
        """
        
        # check if the action is allowed
        if action not in self.allowed_actions:
            raise ValueError("Action is not allowed")
        
        reward = 0
        if action == 0 and self.current_state == 1:
            self.current_state = 2
            reward = 1
        elif action == 1 and self.current_state == 1:
            self.current_state = 3
            reward = 10
        elif action == 0 and self.current_state == 2:
            self.current_state = 1
            reward = 0
        elif action == 1 and self.current_state == 2:
            self.current_state = 3
            reward = 1
        elif action == 0 and self.current_state == 3:
            self.current_state = 2
            reward = 0
        elif action == 1 and self.current_state == 3:
            self.current_state = 3
            reward = 10
        
        return self.current_state, reward
    
    def reset(self) -> int:
        """
        Reset the environment starting from the initial state.
        
        Returns:
            The environment state after reset (initial state).
        """
        self.current_state = self._initial_state
        return self.current_state