# Implementing an Environment

So far in the course you have implemented the agent part of each of the experiments. In this notebook you will learn to implement environments in the RL-Glue framework. This is an important part of doing reinforcement learning research. While there are many great existing environments that you can utilize you often will want to create your own to test a specific part of an RL algorithm, or to see how an algorithm handles a specific situation. Being able to create your own environments makes this easy and as you will see in this notebook, creating environments in RL-Glue is really clear and simple.

In this notebook we will walk through each of the methods in the Environment class using the example of creating a Mountain Car environment similar to what you saw back in Course 3 Module 3.

## Mountain Car Example

Recall the task in Mountain Car is for an under powered car to make it to the top of a hill:
![Mountain Car](mountaincar.png "Mountain Car")
The car is under-powered so the agent needs to learn to rock back and forth to get enough momentum to reach the goal. At each time step the agent receives from the environment its current velocity (a float between -0.07 and 0.07), and it's current position (a float between -1.2 and 0.5). Because our state is continuous there are a potentially infinite number of states that our agent could be in.


Let's walk through what each of the methods of an RL-Glue environment. Then you will implement your own version of lunar lander.

## Environment Methods

### ```__init__```
The init function in your environment is where you you will set up the the variables that you environment will need. In the python implementation of RL-Glue most of the initialization is done in the ```env_init``` method so in the ```__init__``` method we will typically create our variables and initialize them to None. Here is an example of what the Mountain Car environment ```__init__``` looks like:

In [None]:
def __init__(self):
    self.current_state = None
    self.reward_obs_term = None
    self.max_velocity = None
    self.count = 0

### ```env_init```
The env_init method is where we initialize the environment. For instance we may want to pass in a specific seed for our random function, we may have an MDP that we want to try different lengths of, or any other variables we may wish to initialize our environment with.

The ```env_init``` method takes in a dictionary of values we want to pass to the environment. For instance below we could pass in "max_velocity" to change the max_velocity the car could reach. We used python's "get" method for dictionaries which allows us to set a default value if that key is not found in the dictionary. Below the agent's max_velocity will be 0.07 if no value is passed in for that key.

The ```env_init``` method does not return any value and is called when the agent is first initialized.

In [None]:
def env_init(self, env_info):
    """Setup for the environment called when the experiment first starts.
    Note:
        Initialize a tuple with the reward, first state observation, boolean
        indicating if it's terminal.
    """
    local_observation = 0
    self.max_velocity = env_info.get("max_velocity", 0.07)

### ```env_start```

The ```env_start``` method is the first method called by RL-Glue when our experiment starts. Here we want to do things like set what state we want the agent to be in initially (this could be a specific start state, or a random state).

The ```env_start``` method returns a observation that RL-Glue then passes to the agent.

In [None]:
def env_start(self):
    """The first method called when the experiment starts, called before the
    agent starts.
    Returns:
        The first state observation from the environment.
    """
    position = np.random.uniform(-0.6, -0.4)
    velocity = 0.0
    self.current_state = np.array([position, velocity]) # position, velocity
    
    return self.current_state

### ```env_step```
```env_step``` is taken every time the environment moves forward a step. This is often the most complicated part of the environment. This method takes in the action that the agent has chosen and returns the reward, state, and whether or not the episode terminates.

Notice here that the reward is -1.0 on each time step, unless the agent reaches the goal in which case the reward is 0.0.

In [None]:
def env_step(self, action):
    """A step taken by the environment.
    Args:
        action: The action taken by the agent
    Returns:
        (float, state, Boolean): a tuple of the reward, state observation,
            and boolean indicating if it's terminal.
    """
    position, velocity = self.current_state

    terminal = False
    reward = -1.0
    velocity = self.bound_velocity(velocity + 0.001 * (action - 1) - 0.0025 * np.cos(3 * position))
    position = self.bound_position(position + velocity)

    if position == -1.2:
        velocity = 0.0
    elif position == 0.5:
        self.current_state = None
        terminal = True
        reward = 0.0

    self.current_state = np.array([position, velocity])

    self.reward_obs_term = (reward, self.current_state, terminal)

    return self.reward_obs_term

### ```env_cleanup``` and ```env_message```
```env_cleanup``` is run when the environment ends - typically at the end of an episode. ```env_message``` can be used to pass messages into the agent.

Often these methods are not used.

In [None]:
def env_cleanup(self):
    """Cleanup done after the environment ends"""
    pass

def env_message(self, message):
    """A message asking the environment for information
    Args:
        message (string): the message passed to the environment
    Returns:
        string: the response (or answer) to the message
    """
    if message == "what is the current reward?":
        return "{}".format(self.reward_obs_term[0])

    # else
    return "I don't know how to respond to your message"

### Additional Functions

At times agents will have additional functions in the agent class. For instance in Mountain Car ```bound_velocity``` and ```bound_position``` are additional methods that keep the position and velocity within a certain range. Here we use ```self.max_velocity``` set previously, rather than just setting the bounds to (-0.07 and 0.07).

In [None]:
def bound_velocity(self, velocity):
    """Bounds the velocity of the cart between (-self.max_velocity, self.max_velocity)"""
    if velocity > self.max_velocity:
        return self.max_velocity
    if velocity < -self.max_velocity:
        return -self.max_velocity
    return velocity

def bound_position(self, position):
    """Bounds the position of the agent between (-1.2, 0.5)"""
    if position > 0.5:
        return 0.5
    if position < -1.2:
        return -1.2
    return position

You can now implement environments of your own!