## <center>Agents with Environments</center>


In this notebook we will create our first agent. First it will interact with the environment in a random manner and then based on observations.<br />
In fact we will try to create an agent for the Mountain car environment https://gym.openai.com/envs/MountainCar-v0/, where the goal is to reach the top of the mountain (and stop there), but the engine is not strong enough to reach it in a single pass.<br />
Thus we need to find a strategy how to drive back and forth within the valley to gain momentum to be finally able to reach the top.

In [1]:
import time  # to slow down the game a little bit
import gym

In [2]:
def recall():
    """
    Each time a reset or a render are called, the environment has to be recharged or recalled.
    
    In the recall function the name and the make of the environment must me set. 
     
    """
    env_name = "MountainCar-v0"  # Use the exact same name as stated on gym.openai
    env = gym.make(env_name)  # use gym.make to create your environment

    return env

env = recall()


In order to be able to create such an agent we first need to understand what information we get from the environment and what actions are possible. You can get this information by checking the first few lines of the corresponding source code:

https://github.com/openai/gym/blob/master/gym/envs/classic_control/mountain_car.py

 1. Observation: The observation is a list containing the two entries:
     1. position (x-coordinate of the car)
     2. velocity (speed of the car, either forward or backward (positive or negative)


 2. Actions: The following actions are possible within this environment
     1. Accelerate to the left (or in other words use the reverse gear) (0)
     2. Neutral, dont do anything (1)
     3. Accelerate to the right (dive forwards) (2)
    
We can simply render the environment for a few iterations and take a look at the observation. **Note** how the velocity turns negative when the car engine runs out of power and starts moving backwards

In [3]:
observation = env.reset()  # reset all internal values to deault

for _ in range(50):
    env.render(mode="human")  # display the current state
    action = 2  # lets only accelerate forward
    observation, reward, done, info = env.step(action) # perform the random action on the current state of the environment
    print(f"Position:{observation[0]}, Velocity: {observation[1]}")  # Take a look at the observations
    time.sleep(0.1)  # slow down the game a bit


env.close()  # dont forget to close the environment

Position:-0.5687974955022129, Velocity: 0.0013479667178335764
Position:-0.5661115762281114, Velocity: 0.0026859192741015858
Position:-0.5621076746403997, Velocity: 0.004003901587711697
Position:-0.5568155983261087, Velocity: 0.005292076314291006
Position:-0.5502748079494625, Velocity: 0.006540790376646197
Position:-0.5425341608199951, Velocity: 0.007740647129467414
Position:-0.5336515746682162, Velocity: 0.008882586151778813
Position:-0.5236936039520327, Velocity: 0.009957970716183559
Position:-0.5127349220232057, Velocity: 0.01095868192882697
Position:-0.5008577045723429, Velocity: 0.011877217450862773
Position:-0.4881509128772175, Velocity: 0.012706791695125416
Position:-0.47470947933774743, Velocity: 0.013441433539470065
Position:-0.4606334023326944, Velocity: 0.014076077005053011
Position:-0.44602676224037874, Velocity: 0.014606640092315685
Position:-0.4309966751389404, Velocity: 0.015030087101438334
Position:-0.41565220483577625, Velocity: 0.015344470303164147
Position:-0.40010325

### Random actions

In [4]:
env = recall()
env.reset()

for _ in range(1000):
    env.render()  # display the current state
    random_action = env.action_space.sample()  # get the random action
    observation, reward, done, info = env.step(random_action) # perform the action the current state of the environment
    print(f"Reward: {reward}, Done: {done}, Info: {info}")

env.close()  # dont forget to close the environment

Reward: -1.0, Done: False, Info: {}
Reward: -1.0, Done: False, Info: {}
Reward: -1.0, Done: False, Info: {}
Reward: -1.0, Done: False, Info: {}
Reward: -1.0, Done: False, Info: {}
Reward: -1.0, Done: False, Info: {}
Reward: -1.0, Done: False, Info: {}
Reward: -1.0, Done: False, Info: {}
Reward: -1.0, Done: False, Info: {}
Reward: -1.0, Done: False, Info: {}
Reward: -1.0, Done: False, Info: {}
Reward: -1.0, Done: False, Info: {}
Reward: -1.0, Done: False, Info: {}
Reward: -1.0, Done: False, Info: {}
Reward: -1.0, Done: False, Info: {}
Reward: -1.0, Done: False, Info: {}
Reward: -1.0, Done: False, Info: {}
Reward: -1.0, Done: False, Info: {}
Reward: -1.0, Done: False, Info: {}
Reward: -1.0, Done: False, Info: {}
Reward: -1.0, Done: False, Info: {}
Reward: -1.0, Done: False, Info: {}
Reward: -1.0, Done: False, Info: {}
Reward: -1.0, Done: False, Info: {}
Reward: -1.0, Done: False, Info: {}
Reward: -1.0, Done: False, Info: {}
Reward: -1.0, Done: False, Info: {}
Reward: -1.0, Done: False, I

### Interaction of the agent with the system

Now the first task in this notebook is to fill in the following *chose_action* function, which gets the observation as an argument and returns a suitable action such that the car is able to reach the top of the mountain. 

The defined if, elif construct may act as a starting point but is not able to reach the top of the mountain. Try if you can change/expand it in such a way that you can reach the top

In [5]:
def chose_action(observation):
    position, velocity = observation
    
    if -0.1 < position < 0.1:  # if you current position falls in this intervall chose action 2 (drive forward)
        action = 2
    
    elif velocity < 0 and position < -0.2:  # if your velocity is negative and your position is smaller than -0.2 chose action 0 (drive backwards)
        action = 0
        
    else:  # else do nothing
        action = 1
    return action

In [6]:
env = recall()

observation = env.reset()
for _ in range(500):
    env.render()
    action = chose_action(observation)
    observation, reward, done, info = env.step(action)
    time.sleep(0.001)
env.close()

**Here is one possible solution**

This function acts as your first agent in your Reinforcement Learning journey. However it's important to say that the hardcoded values are not that representative and show some knowledge of the environment. in proper RL problems these values are unknown.

In [7]:
def chose_action_solution(observation):
    position, velocity = observation
    
    if -0.1 < position < 0.4:
        action = 2
    
    elif velocity < 0 and position < -0.2:
        action = 0
        
    else:
        action = 1
    return action

In [8]:
env = recall()

observation = env.reset()
tot_rew = 0
for _ in range(500):
    env.render()
    action = chose_action_solution(observation)
    observation, reward, done, info = env.step(action)
    tot_rew += reward
    if done:
        print(f"You got {tot_rew} points!")
        break
    time.sleep(0.001)
env.close()

You got -184.0 points!
