# Pendulum-v0

## Documentation

https://github.com/openai/gym/wiki/Pendulum-v0

## Description

Try to keep a frictionless pendulum standing up.

## Environment

### Observation

Type: Box(3)

| Num | Observation | Min | Max |
| --- | --- | --- | --- |
| 0 | cos($\theta$) | -1.0 | 1.0 |
| 1 | sin($\theta$) | -1.0 | 1.0 |
| 2 | $\dot{\theta}$ | -8.0 | 8.0 |

### Actions

Type: Box(1)

| Num | Action | Min | Max |
| --- | --- | --- | --- |
| 0 | Joint effort | -2.0 | 2.0 |

### Reward

The precise equation for reward:

$-(\theta^2 + 0.1\dot{\theta}^2+0.001a^2)$

$\theta$ is normalized between $-\pi$ and $\pi$. Therefore, the lowest cost is $-(\pi^2 + 0.1\times 8^2+0.001\times 2^2)=-16.2736044$, and the highest cost is $0$. In essence, the goal is to remain at zero angle (verticle), with the least rotational velocity, and the least effort.

### Starting State

Rnadom angle from $-\pi$ to $\pi$, and random velocity between $-1$ and $1$.

### Episode Termination

There is no specified termination. Adding a maximum number of steps might be a good idea.

### Solved Requirements

Not yet specified

# Test Code

In [3]:
# Import OpenAI Gym
import gym

In [4]:
# Make single pendulum swing-up environment
env = gym.make('Pendulum-v0')

[2018-05-01 00:44:59,916] Making new env: Pendulum-v0


In [5]:
# Restart the environment and return the state
env.reset()

array([ 0.70205368,  0.71212403, -0.3196691 ])

In [25]:
# Get the box object of observation_space
box_obs = env.observation_space
print(box_obs)

Box(3,)


In [13]:
# Press tab to see a list of attributes
# box.contains
# box.high
# box.low
# box.sample
# box.shape
# box.from_jsonable
# box.to_jsonable
box_obs.

SyntaxError: invalid syntax (<ipython-input-13-6e421564a4e5>, line 9)

In [17]:
# Get the box object of action_space
box_act = env.action_space
print(box_act)

Box(1,)

In [18]:
# Press tab to see a list of attributes
# box.contains
# box.high
# box.low
# box.sample
# box.shape
# box.from_jsonable
# box.to_jsonable
box_act.

SyntaxError: invalid syntax (<ipython-input-18-22d5c972d96f>, line 9)

In [24]:
# Move one step forward
# observation, reward, done, info = env.step(action)
observation, reward, done, _ = env.step(env.action_space.sample())
print(observation)
print(reward)
print(done)

[0.3866339  0.92223328 2.76483782]
-1.5414611701988161
False


In [28]:
# Play one episode
iters = 0 # iterations
MAX_ITERS = 1000 # Stop at max number of iterations
done = False # Not done at start
if not done and iters < MAX_ITERS:
    observation, reward, done, _ = env.step(env.action_space.sample())
    if iters % 10 == 0:
        print("Observation: ", observation, " Reward: ", reward, " Done: ", done)
    iters += 1 # Increment the counter

Observation:  [-0.25063078  0.96808275  5.22360195]  Reward:  -4.241802401381725  Done:  False


# 最初编辑时间

2018年5月1日

# 参考文献

[1] https://github.com/openai/gym/wiki/Pendulum-v0

[2] https://www.udemy.com/deep-reinforcement-learning-in-python