# Agents with Environments


In this notebook we will create our first agent which interacts with the environment based on the observations and not just in a random manner.<br />
In fact we will try to create an agent for the Mountain car environment https://gym.openai.com/envs/MountainCar-v0/, where the goal is to reach the top of the mountain (and stop there), but the engine is not strong enough to reach it in a single pass.<br />
Thus we need to find a strategy how to drive back and forth within the valley to gain momentum to be finally able to reach the top

We start by importing the necessary libraries.


In [2]:
import time  # to slow down the game a little bit
import gym

We again create the environment in the same way as shown in the previous notebook

In [3]:
env_name = "MountainCar-v0"  # Use the exact same name as stated on gym.openai
env = gym.make(env_name)  # use gym.make to create your environment

In order to be able to create such an agent we first need to understand what information we get from the environment and what actions are possible.
You can get this information by checking the first few lines of the corresponding source code:https://github.com/openai/gym/blob/master/gym/envs/classic_control/mountain_car.py

 1. observation: The observation is a list containing the two entries:
     1. position (x-coordinate of the car)
     2. velocity (speed of the car, either forward or backward (positive or negative)
 2. actions: The following actions are possible within this environment
     1. Accelerate to the left (or in other words use the reverse gear) (0)
     2. Neutral, dont do anything (1)
     3. Accelerate to the right (dive forwards) (2)
    
We can simply render the environment for a few iterations and take a look at the observation. <br />
Note how the velocity turns negative when the car engine runs out of power and starts moving backwards

In [4]:
env.seed(42)  # to make sure that we all have the same initial state
observation = env.reset()  # reset all internal values
for _ in range(50):
    env.render()  # display the current state
    action = 2  # lets only accelerate forward
    observation, reward, done, info = env.step(action) # perform the random action on the current state of the environment
    print(f"Position:{observation[0]}, Velocity: {observation[1]}")  # Take a look at the observations
    time.sleep(0.1)  # slow down the game a bit
env.close()  # dont forget to close the environment

Position:-0.5241595506668091, Velocity: 0.001011794083751738
Position:-0.522143542766571, Velocity: 0.0020159997511655092
Position:-0.5191384553909302, Velocity: 0.00300508551299572
Position:-0.5151668190956116, Velocity: 0.003971633967012167
Position:-0.5102584362030029, Velocity: 0.004908401053398848
Position:-0.5044500231742859, Velocity: 0.005808374844491482
Position:-0.49778521060943604, Velocity: 0.0066648381762206554
Position:-0.4903137683868408, Velocity: 0.007471430115401745
Position:-0.4820915460586548, Velocity: 0.008222207427024841
Position:-0.4731798470020294, Velocity: 0.008911706507205963
Position:-0.4636448621749878, Velocity: 0.009535005316138268
Position:-0.4535570740699768, Velocity: 0.010087771341204643
Position:-0.4429907500743866, Velocity: 0.010566315613687038
Position:-0.43202313780784607, Velocity: 0.01096763089299202
Position:-0.4207337200641632, Velocity: 0.011289420537650585
Position:-0.40920358896255493, Velocity: 0.011530118994414806
Position:-0.3975147008

Now the first task n this notebook is to fill in the following *chose_action* function, which gets the observation as an argument and returns a suitable action such that the car is able to reach the top of the mountain. 

The defined if, elif construct may act as a starting point but is not able to reach the top of the mountain.<br />
Try if you can change/expand it in such a way that you can reach the top

In [5]:
def chose_action(observation):
    position, velocity = observation
    
    if -0.1 < position < 0.1:  # if you current position falls in this intervall chose action 2 (drive forward)
        action = 2
    
    elif velocity < 0 and position < -0.2:  # if your velocity is negative and your position is smaller than -0.2 chose action 0 (drive backwards)
        action = 0
        
    else:  # else do nothing
        action = 1
    return action

In [6]:
env.seed(42)
observation = env.reset()
for _ in range(500):
    env.render()
    action = chose_action(observation)
    observation, reward, done, info = env.step(action) 
    time.sleep(0.001)
env.close()

**Here is one possible solution**

This function acts as your first agent in your Reinforcement Learning journey

In [7]:
def chose_action_solution(observation):
    position, velocity = observation
    
    if -0.1 < position < 0.4:
        action = 2
    
    elif velocity < 0 and position < -0.2:
        action = 0
        
    else:
        action = 1
    return action

In [8]:
env.seed(42)
observation = env.reset()
for _ in range(500):
    env.render()
    action = chose_action_solution(observation)
    observation, reward, done, info = env.step(action)
    if done:
        print(f"You got {reward} points!")
        break
    time.sleep(0.001)
env.close()

You got -1.0 points!


Now the next task will be to figure out if you can top the car at the flag (and not overshoot).
Try if you can find such an *chose_action* function.

Ps: Dont try too hard, it might be harder than you think

In [9]:
def chose_action2(observation):
    position, velocity = observation
    
    action = 0
    return action

In [10]:
env.seed(42)
observation = env.reset()
for _ in range(1000):
    env.render()
    action = chose_action2(observation)
    observation, reward, done, info = env.step(action) 
    time.sleep(0.001)
env.close()

**Here you can find our approach.<br />
Do you see how odd the numbers look and how many elifs are needed to be able to (more or less) fulfill the task?** 

Now imagine that your possible observation and action spaces would contain hundreds of observations/actions and not only two or three.<br />
This task would be daunting.
And the worst part is: If you switch tne value in the env.seed() function, your solution might not work at all

In [11]:
def chose_action2_solution(observation,):
    position, velocity = observation
    
    
    if 0.0 < position < 0.4:
        action = 1
    elif (position >= 5.03341452e-01 and velocity <= 4.07475660e-04) and \
    (position <= 5.10780594e-01 and velocity >= -2.51391396e-04):
        action = 2
    elif 5.420594e-01 < position:
        action = 0
    elif  0.5 < position < 0.505:
        action = 2
    elif position >= 0.4 and position < 0.41:
        action=2
    elif 0.49 < position < 0.496:
        action = 0
    elif position < 0.00938 and velocity > -0.0000001 and not velocity > 0.0472:
        action = 2
    elif position > -0.5 and velocity > 0.4:
        action = 1
    else:
        action = 1
    return action

In [12]:
env.seed(42)
observation = env.reset()
for _ in range(1000):
    env.render()
    action = chose_action2_solution(observation)
    observation, reward, done, info = env.step(action) 
    time.sleep(0.001)
env.close()