<a href="https://colab.research.google.com/github/LondonNode/Pearl-tutorials/blob/main/4_Explorers.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install pearll

# Introduction

This notebook is a tutorial for the `explorers` module within Pearl. This module serves two functions:

1. Use a random uniform policy for enhanced exploration for the first n steps of training.
2. Add noise to the actions computed by a policy network output (e.g. in DDPG).

The explorers follow the same design pattern as the updaters in the last tutorial, with an `__init__` method for initialization and a `__call__` method to run the explorer and return an action.

# Base Explorer

The `BaseExplorer` acts as the base class for other explorer instances. However, it has no abstract methods, instead also serving as the explorer to use when you don't want to add any noise to the policy actions.

In [2]:
from pearll.explorers import BaseExplorer
from pearll.models import Actor
from pearll.models.encoders import IdentityEncoder
from pearll.models.torsos import MLP
from pearll.models.heads import DeterministicHead

import gym

env = gym.make("CartPole-v1")
# model can either be an Actor or ActorCritic
model = Actor(
    encoder=IdentityEncoder(),
    torso=MLP(layer_sizes=[4, 5]),
    head=DeterministicHead(5, 1)
)

explorer = BaseExplorer(action_space=env.action_space, start_steps=10)
# since step < start_steps, action is chosen via action_space.sample()
random_action = explorer(model=model, observation=env.reset(), step=1)
# since step >= start_steps, action is taken from the model
policy_action = explorer(model=model, observation=env.reset(), step=20)

# Gaussian Explorer

The `GaussianExplorer` is the only other explorer class implemented right now. It has the same functionality as the `BaseExplorer` with the addition of adding 0 mean normally distributed noise to the actions from the model. The actions are then clipped to ensure the added noise doesn't push the output out of the action space range.

In [3]:
from pearll.explorers import GaussianExplorer
from pearll.models import Actor
from pearll.models.encoders import IdentityEncoder
from pearll.models.torsos import MLP
from pearll.models.heads import DeterministicHead

import gym

env = gym.make("CartPole-v1")
# model can either be an Actor or ActorCritic
model = Actor(
    encoder=IdentityEncoder(),
    torso=MLP(layer_sizes=[4, 5]),
    head=DeterministicHead(5, 1)
)

# The scale parameter defines the std of the Gaussian noise.
explorer = GaussianExplorer(action_space=env.action_space, scale=1, start_steps=10)
# since step < start_steps, action is chosen via action_space.sample()
random_action = explorer(model=model, observation=env.reset(), step=1)
# since step >= start_steps, action is taken from the model.
policy_action_with_noise = explorer(model=model, observation=env.reset(), step=20)