### Balancing Cartpole via a Random Sampling Agent using OpenAI Gym ###

In [1]:
## Importing the necessary libraries ##

import gym

In this project we are going to implement a random agent which acts to balancing a Cartpole. Ofcourse, it would do badly, but this would give us an idea of working with Openai Gym platform.

For any RL project we need to create two things:
1. Environment
2. Agent

Creating a cartpole environment with OpenAI Gym is quite easy and can be done in one easy step. We are going to do just that in the next cell.

In [2]:
## Creating the environment ##

cartpole_env = gym.make('CartPole-v0')

  f"The environment {id} is out of date. You should consider "
  "We recommend you to use a symmetric and normalized Box action space (range=[-1, 1]) "


Don't get scared by the warning. It's just showing that I am using a lower version of the Cartpole environment. But for this example I really don't care.

As always we need to reset the environment to get it to the first observation.

In [3]:
## Reseting the environment ##

cartpole_env.reset()

array([ 0.01247975, -0.0119193 , -0.01870995,  0.02350845], dtype=float32)

Now we can check the action space and the observation space of the environment by using simple methods available with the environment. We will be doing this in the next cell.

In [4]:
## Checking the action space ##

cartpole_env.action_space

Discrete(2)

In [5]:
## Checking the observation space ##

cartpole_env.observation_space

Box([-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38], [4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38], (4,), float32)

That's so easy right?

So what insights did we gain?

- The action space has just 2 values.
- The observation space gives out 4 values which are continuous and run through the given range in between the square brackets.

Cool.

But how do we take an action? What does taking an action return?

Well lets see that in the next cell.

In [6]:
## Taking a random action ##

action = cartpole_env.action_space.sample()

cartpole_env.step(action)

(array([ 0.01224136,  0.1834659 , -0.01823978, -0.27501845], dtype=float32),
 1.0,
 False,
 {})

So, we get 4 values by taking a step. The first array is the current observation state. The second value is the reward. The third value depicts if we have reached the terminal state or not and the fourth value queries any additional information.

So with all that, lets now write our basic random cartpole balancing agent using OpenAI Gym.

In [7]:
## Our random cartpole agent ##

# Creating the environment #

cartpole_env = gym.make('CartPole-v1')

# Resetting the environment #

obs = cartpole_env.reset()

# Initializing a variable to calculate the final reward #

total_reward = 0.0

# Initialize a variable to calculate the total number of steps taken #

total_step = 0

# Running till we don't reach a terminal state #

while True:
    
    # Grabbing a random action #
    
    action = cartpole_env.action_space.sample()
    
    # Agent Taking a step in the environment #
    
    obs , reward , done , _ = cartpole_env.step(action)
    
    # Adding reward #
    
    total_reward += reward
    
    # Adding step #
    
    total_step += 1
    
    # If terminal state is reached breaking the while loop #
    
    if done:
        
        break
        
print('Total steps taken is : {} and total reward gained is : {}'.format(total_step , total_reward))

Total steps taken is : 27 and total reward gained is : 27.0


And done. Our random agent took 27 steps before falling down.

That's quite impressive :D