# Sample RL World

In this notebook, we create a sample RL World. A simple RL World consists of the following:

1. **Environment:** An environemnt is a model of the world that is external to the Agent and provides the Agent with the Observations and the Reward. The reward could be at each timestamp i.e. every episode or could be given at the end.
2. **Agent:** An agent is somebody or something that interacts with the Environment

The Agent interacts with the Environment using the following channels:
1. **Actions:** An action is something the the Agent performs in the Environment. It could be a single action or a set of actions.
2. **Reward:** A reaward is something that the Environment provides the Agent with for taking an Action in the Environment.
3. **Observations:** An observation is something that the Environemnt provides the Agent with and it represents the states around the Agent in the Environment.

So, let's get started.

In [1]:
# Import Dependencies
import numpy as np
from typing import List

Now let's code the Environment.

The Environment Class contains the following:

1. **Constructor:** that contains a variable defining the total number of episodes for which the Agent interacts with the Environment.
2. **get_observations:** is a function to return the Observations from the Environment.
3. **get_actions:** is a function defining the Action Space i.e. the actions that an Agent can take in the Environment.
4. **is_done:** is a function that returns True if the Agent has reached the last episode else False.
5. **action:** is a function that defines the actions that the Agent takes in the Envrionment. This function discards the action in the current implementation and returns the reward which is a random value in this case.

In [2]:
# RL World Dummy Environment
class Environment:
    # Initialize the Total Number of Episodes
    def __init__(self):
        self.num_episodes = 10
    
    # Function to get Observations Space from 
    # the Environment and Return to the Agent
    def get_observations(self) -> List[float]:
        return np.asarray([0.0, 0.0, 0.0])
    
    # Function to get Action Space
    # This allows the Agent to query the set of Actions it can execute
    def get_actions(self) -> List[int]:
        return np.asarray([0, 1])
    
    # Function to check if we have reached the end of Episode
    def is_done(self) -> bool:
        return self.num_episodes == 0
    
    # Function to perform Action in the Environment
    # This function handles Agent's Actions and returns the reward for this action
    # Here, the Reward is random and its Action is discarded
    def action(self, action: int) -> float:
        # If game is completed, return
        if self.is_done():
            return Exception("Game Over !!")
        # Decrement through the episodes
        self.num_episodes -= 1
        # return Reward for random action taken
        return np.random.random()

Now let's code the Agent.

The Agent Class contains of the following:

1. **Constructor:** that contains a variable defining the total reward.
2. **step:** is a function that allows the Agent to step through the Environment, get the Reward and Observations to the Agent for the Actions and finally provides the total reward for all Ations performed.

In [3]:
# RL Agent
class Agent:
    # total reward accumulated by the agent
    def __init__(self):
        self.total_reward = 0.0
    
    # Function that allows an Agent to take a step in the Environment
    # For every step, the Agent gets a set of Observations and a Reward
    def step(self, env: Environment):
        # Observe the Environment
        current_obs = env.get_observations()
        # Make a decision about the Action to take based on observations
        action_space = env.get_actions()
        # Submit the Action to the Environment and
        # get the reward for the current step
        actions = np.random.choice(action_space)
        reward = env.action(actions)
        # Add the reward
        self.total_reward += reward
        
        print("\n------- Episode: {} --------\n".format(env.num_episodes))
        print("Current Observation: {}".format(current_obs))
        print("Action Space: {}".format(action_space))
        print("Action: {}".format(actions))
        print("Reward: {}".format(reward))
        print("Total Reward: {}".format(self.total_reward))

Let's Instantiate the Environment and the Agent Classes and let the Agent step through it and finally print the total accumulated reward.

In [4]:
# Instantiate the Environment Class
env = Environment()

# Instantiate the Agent Class
agent = Agent()

# Step thorugh the Environment till we reach the End of Episode
while not env.is_done():
    agent.step(env)
print("\nTotal Reward at the End of 10 Episodes: {}".format(agent.total_reward))


------- Episode: 9 --------

Current Observation: [0. 0. 0.]
Action Space: [0 1]
Action: 1
Reward: 0.46218286797379293
Total Reward: 0.46218286797379293

------- Episode: 8 --------

Current Observation: [0. 0. 0.]
Action Space: [0 1]
Action: 0
Reward: 0.8650581904636087
Total Reward: 1.3272410584374015

------- Episode: 7 --------

Current Observation: [0. 0. 0.]
Action Space: [0 1]
Action: 1
Reward: 0.14263078804708107
Total Reward: 1.4698718464844824

------- Episode: 6 --------

Current Observation: [0. 0. 0.]
Action Space: [0 1]
Action: 1
Reward: 0.8441566495841479
Total Reward: 2.31402849606863

------- Episode: 5 --------

Current Observation: [0. 0. 0.]
Action Space: [0 1]
Action: 1
Reward: 0.7524967797778965
Total Reward: 3.0665252758465265

------- Episode: 4 --------

Current Observation: [0. 0. 0.]
Action Space: [0 1]
Action: 1
Reward: 0.811125712262521
Total Reward: 3.8776509881090475

------- Episode: 3 --------

Current Observation: [0. 0. 0.]
Action Space: [0 1]
Action