# Reinforcement Learning made easier(CartPole game Example)
CartPole, also known as inverted pendulum, is a game in which you try to balance the pole as long as possible. It is assumed that at the tip of the pole, there is an object which makes it unstable and very likely to fall over. The goal of this task is to move the cart left and right so that the pole can stand (within a certain angle) as long as possible.

In this notebook, we will look at reinforcement learning, a field in artificial intelligence where the AI explores the environment all by itself by playing the game many many times until it learns the right way to play the game.

![image.png](attachment:cade137f-f5b8-41cd-add3-7310aa8349bb.png)

In [1]:
#Installing the gym Package
!pip install gym




In [1]:
#Installing the gym Package
!pip install gym




In [16]:
#Importing the gym package 
import gym


In [2]:
#Creating the Cartpole environment
e = gym.make('CartPole-v0')

In [3]:
#Reseting the environment an obtaining the first observation
obs=e.reset()
obs

array([ 0.00458956, -0.00997421,  0.0388211 , -0.0196665 ])

In [4]:
#Returns the possible actions that the agent can take
# The action_space in this case is discrete, which means it is either 0 or 1
e.action_space

Discrete(2)

In [5]:
#Return the observation space, and in this case it is of a BOX( n-dimesnional tensor), [inf, inf]
e.observation_space

Box(4,)

In [6]:
#Here we performed action 0, which is moving the pole to the left
e.step(0)

(array([ 0.00439007, -0.20563078,  0.03842777,  0.28500776]), 1.0, False, {})

 ## We got the tuple of four elements:
• A new observation, which is a new vector of four numbers

• A reward of 1.0

• The done flag with value False , which means that the episode is not over yet
and we are more or less okay

• Extra information about the environment, which is an empty dictionary

## Using the sample() of the space class in the action_space and observation_space


In [7]:
e.action_space.sample()

0

In [8]:
e.action_space.sample()

1

In [9]:
e.observation_space.sample()

array([-2.9186893e-01, -2.1650322e+38,  2.3133039e-02,  1.2271323e+38],
      dtype=float32)

In [10]:
e.observation_space.sample()

array([ 9.2680448e-01, -1.8839460e+38,  3.6352718e-01,  1.4929361e+38],
      dtype=float32)

The method returned a random sample from the underlying space which is discrete for the actions, and for the observations it is random four vectors

# Cartpole Agent

In [15]:
if __name__=="__main__":
    env=gym.make("CartPole-v0")
    # Gym has a wrapper like class which is called Monitor. It is used to monitor the performance of the Agent, 
    # you can record the performance in form of a video, but you need to first install ffmpg.
    env = gym.wrappers.Monitor(env, "recording")
    total_reward = 0.0
    total_steps= 0
    obs=env.reset()
    
    while True:
        action= env.action_space.sample()
        obs, reward, done, _ = env.step(action)
        total_reward +=reward
        total_steps += 1
        if done:
            break
            
        print("Episode done in %d steps, total reward %.2f" % (
        total_steps, total_reward))

Episode done in 1 steps, total reward 1.00
Episode done in 2 steps, total reward 2.00
Episode done in 3 steps, total reward 3.00
Episode done in 4 steps, total reward 4.00
Episode done in 5 steps, total reward 5.00
Episode done in 6 steps, total reward 6.00
Episode done in 7 steps, total reward 7.00
Episode done in 8 steps, total reward 8.00
Episode done in 9 steps, total reward 9.00
Episode done in 10 steps, total reward 10.00
Episode done in 11 steps, total reward 11.00
Episode done in 12 steps, total reward 12.00
Episode done in 13 steps, total reward 13.00
Episode done in 14 steps, total reward 14.00
Episode done in 15 steps, total reward 15.00
Episode done in 16 steps, total reward 16.00


In this loop, I have sampled a random action, then asked the environment to execute
it and return to us the next observation ( obs ), the reward , and the done flag. If the
episode is over, we stop the loop and show how many steps we have taken and
how much reward has been accumulated.