# OpenAI Gym

## Imports

In [None]:
import numpy as np
import gym
from gym import envs
from gym import spaces
from gym import Wrapper, ObservationWrapper, ActionWrapper, RewardWrapper

## Version

The ```gym.__version__``` attribute shows the version of the used OpenAI Gym library.

In [None]:
print(f'Version of OpenAI Gym: {gym.__version__}')

## Registered Envrironments

The ```gym.envs``` module is useful to figure out what environments are available.

In [None]:
# print all registered environments
env_list = [env.id for env in envs.registry.all()]
print('Registered environments in OpenAI Gym\n')
print(env_list)

## Create an Environment

The ```gym.make('environment_name')``` function generates and returns an environment that the agent can interact with.

In [None]:
env = gym.make('FrozenLake-v1')

## Agent - Envrionement Interaction

The ```env.reset()``` method initializes the environment and returns the initial observation. The method is the first that needs to be run after we generate an evironment. Each time the agent encounters a terminal state ```T``` we need to run the method.

In the second step the agent needs to generate an action based on the current observation. At this point we don't have a trained agent, therefore we use a random action with ```env.action_space.sample()```. The generated action is usually saved in the variable ```action```.

The method ```env.step(action)``` takes an ```action``` as input and returns a tuple containing ```(next_observation, reward, done, info)```, where ```next_observation``` is the observation the environment transitions into, ```reward``` is the reward that the agent receives based on the action and the previous observation, ```done``` is the boolean value that is ```True``` if the environment transitioned into the terminal state and ```False``` otherwise and ```info``` is additional information, that is primarily intended for debuggin purposes.

```env.render()``` renders the current observation either directly to the terminal or using a graphical output, like a game engine.

```env.close()``` is the method that completely closes the environment. That method should be run once you don't need the environmet for training or testing. 

In [None]:
# show interaction for the Frozen Lake environment
env = gym.make('FrozenLake-v1')
obs, done = env.reset(), False
# interact with the environment until the terminal state
while not done:
    env.render()
    action = env.action_space.sample()     
    next_obs, reward, done, info = env.step(action)
    obs = next_obs
env.render()
env.close()

## State and Action Space

OpenAI Gym provides several classes that are responsible for defining state and action spaces, all of them are located in the ```gym.spaces``` module. The ```Space``` class is the base class other spaces derive from. The ```Discrete``` class is used for discrete state and action spaces. The ```Box``` class on the other hand is used for continuous state and action spaces.

In [None]:
# state and action spaces
print(spaces.Space)
print(spaces.Discrete)
print(spaces.Box)

Generally speaking the ```Discrete``` class only needs a single input ```n``` that determines the number of possible discrete states or actions.

In [None]:
# create a discrete space
discrete_space = spaces.Discrete(n=10)
print(discrete_space)

The ```Box``` class takes takes up to 4 arguments.

1. ```low```: the lower bound of the space
2. ```high```: the upper bound of the space
3. ```shape```: the shape of the space 
4. ```dtype```: the data type

In [None]:
# create a continuous space
continuous_space = spaces.Box(low=-1.0, high=1.0, shape=(2, 2), dtype=np.float32)
print(continuous_space)

In [None]:
# discrete environment example
print('FrozenLake-v1')
env_discrete = gym.make('FrozenLake-v1')
env_discrete.reset()
print(f'Observation Space: {env_discrete.observation_space}')
print(f'Action Space: {env_discrete.action_space}')

In [None]:
# continuous environment
print('MountainCarContinuous-v0')
env_continuous = gym.make('MountainCarContinuous-v0')
env_continuous.reset()
print(f'Observation Space: {env_continuous.observation_space}')
print(f'Action Space: {env_continuous.action_space}')

## Wrappers

The main purpose of wrappers is to change or extend the functionality of the original environment in a some way and to train the agent based on the *"wrapped"* environment. For example we could prepropress the rgb (red, green, blue) pictures of Atari games to make the training for neural networks easier.

The ```Wrapper``` class is the base class of all other wrappers. The base class is usially used to extend or change the ```reset``` and ```step``` methods. 

Below we create a ```PrintingWrapper``` which extends the environent by printing the current observation.

In [None]:
class PrintingWrapper(Wrapper):
    
    def __init__(self, env):
        super(PrintingWrapper, self).__init__(env)
    
    def reset(self):
        obs = self.env.reset()
        print(f'Observation: {obs}')
        return obs
    
    def step(self, action):
        print(f'Action taken: {action}')
        next_obs, reward, done, info = self.env.step(action)
        print(f'Observation: {next_obs}, Reward: {reward}, done: {done}')
        return next_obs, reward, done, info

In [None]:
env = gym.make('FrozenLake-v1')
env = PrintingWrapper(env)

In [None]:
obs = env.reset()

In [None]:
next_obs, reward, done, info = env.step(0)

The ```ObservationWrapper``` adjust the observation of the environment by using the ```observation``` method.

In [None]:
class AddHundredObsWrapper(ObservationWrapper):
    
    def __init__(self, env):
        super(AddHundredObsWrapper, self).__init__(env)
        
    def observation(self, obs):
        return obs+100

We can stack several wrappers!

In [None]:
env = gym.make('FrozenLake-v1')
env = AddHundredObsWrapper(env)
env = PrintingWrapper(env)

In [None]:
obs = env.reset()

The ```RewardWrapper``` utilizes the ```reward``` method to adjust the rewards of the environment.

In [None]:
class AddTenRewardWrapper(RewardWrapper):
    
    def __init__(self, env):
        super(AddTenRewardWrapper, self).__init__(env)
        
    def reward(self, reward):
        return reward+10

In [None]:
env = gym.make('FrozenLake-v1')
env = AddHundredObsWrapper(env)
env = AddTenRewardWrapper(env)
env = PrintingWrapper(env)

In [None]:
obs = env.reset()

In [None]:
next_obs, reward, done, info = env.step(0)

The ```ActionWrapper``` modifies the action that the agent provides to the environment. The wrapper implements the abstract ```action``` method that needs to be overwritten.

In [None]:
class AddActionOneWrapper(ActionWrapper):
    
    def __init__(self, env):
        super(AddActionOneWrapper, self).__init__(env)
        
    def action(self, action):
        return action+1

In [None]:
env = gym.make('FrozenLake-v1')
env = PrintingWrapper(env)
env = AddHundredObsWrapper(env)
env = AddTenRewardWrapper(env)
env = AddActionOneWrapper(env)

In [None]:
obs = env.reset()

In [None]:
next_obs, reward, done, info = env.step(0)