<a href="https://colab.research.google.com/github/ccaiafa/CursoRL/blob/master/OpenAIGym/Anatomy_of_agent.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**OpenAI Gym**

This notebook includes:
- High-level description of the requirements to plug the agent into the RL framework
- A basic, pure-Python implementation of a random RL agent

Toy example (not very useful indeed): Define an environment that will give the agent random rewards for a limited number of steps, regardless of the agent's actions.

Lets define the environment class first.


In [4]:
import random
from typing import List

In [5]:
class Environment:
  def __init__(self):
    self.steps_left = 10  # Initialize number of steps

  def get_observation(self) -> List[float]:
      return [0.0, 0.0, 0.0] # observation is always zero since agent take random actions regardless the state

  def get_actions(self) -> List[int]:
      return [0, 1]  # Two possible actions (no action, action)

  def is_done(self) -> bool:
      return self.steps_left == 0   # Check if process finished

  def action(self, action: int) -> float:
      if self.is_done():
          raise Exception("Game is over")
      self.steps_left -= 1
      return random.random() # return random number between 0 and 1

Lets define the Agent class now:

In [6]:
class Agent:
    def __init__(self):
        self.total_reward = 0.0 # Initialize total reward

    def step(self, env: Environment):  # define evolution of the system step by step
        current_obs = env.get_observation() # Observe the environment
        actions = env.get_actions() # Make a decision based on the observations
        reward = env.action(random.choice(actions)) # Submit an action to the environment
        self.total_reward += reward # Get the reward for the current step

Now, we can run one episode:

In [10]:
if __name__ == "__main__":
    env = Environment()
    agent = Agent()

    while not env.is_done():
        agent.step(env)

    print("Total reward got: %.4f" % agent.total_reward)

Total reward got: 7.5460
