# Reinforcement Learning with Open AI!

In [1]:
#!pip install pygame
#!pip install gym
#!pip install -U mujoco-py

In [2]:
import gym 
from gym import spaces
from gym import envs

ModuleNotFoundError: No module named 'gym'

Here is the documentation, use this as reference. I am choosing to use openai's gym environment for the agent: https://gym.openai.com/docs/. And here is the docs repo: https://github.com/openai/gym

If we ever want to do better than take random actions at each step, it’d probably be good to actually know what our actions are doing to the environment.

The environment’s step function returns exactly what we need. In fact, step returns four values. These are:

- observation (object): an environment-specific object representing your observation of the environment. For example, pixel data from a camera, joint angles and joint velocities of a robot, or the board state in a board game.

- reward (float): amount of reward achieved by the previous action. The scale varies between environments, but the goal is always to increase your total reward.

- done (boolean): whether it’s time to reset the environment again. Most (but not all) tasks are divided up into well-defined episodes, and done being True indicates the episode has terminated. (For example, perhaps the pole tipped too far, or you lost your last life.)

- info (dict): diagnostic information useful for debugging. It can sometimes be useful for learning (for example, it might contain the raw probabilities behind the environment’s last state change). However, official evaluations of your agent are not allowed to use this for learning.


This is just an implementation of the classic “agent-environment loop”. Each timestep, the agent chooses an action, and the environment returns an observation and a reward.


In [2]:
env = gym.make("CartPole-v1")
observation = env.reset()
for _ in range(1000):
  env.render()
  action = env.action_space.sample() # your agent here (this takes random actions)
  observation, reward, done, info = env.step(action)

  if done:
    observation = env.reset()
env.close()


New environment, but this time I'll call the done() flag. 
I'll also call action_space and observation_space. These are attributes of the type Space, and they describe the format of valid actions and observations.
The Discrete space allows a fixed range of non-negative numbers, so therefore valid actions are either 0 or 1. 
The Box space represents an n-dimensional box, so valid observations will be an array of 4 numbers; this is useful to help write generic code that will work with many environments. 
Box and Discrete are the most common Spaces; you can sample from a Space or check that something belongs to it too


In [4]:
def cartPolev0():
    env = gym.make("CartPole-v0")
    print(env.action_space)
    print(env.observation_space)
    print(env.observation_space.high)
    print(env.observation_space.low)
    for i_episode in range(20): 
        observation = env.reset()
        for t in range(100):
            env.render()
            print(observation)
            action = env.action_space.sample()
            observation, reward, done, info = env.step(action)
            if done:
                print("Episode finished after {} timesteps".format(t+1))
                break 
    env.close()

In [8]:
cartPolev0()

  logger.warn(


Discrete(2)
Box([-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38], [4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38], (4,), float32)
[4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38]
[-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38]
[-0.03180542 -0.01051522 -0.00227409 -0.03158749]
[-0.03201573 -0.2056045  -0.00290584  0.26037708]
[-0.03612782 -0.01044118  0.0023017  -0.03322098]
[-0.03633664 -0.20559606  0.00163728  0.26018727]
[-0.04044856 -0.40074134  0.00684102  0.55338615]
[-0.04846339 -0.20571612  0.01790875  0.26286644]
[-0.05257771 -0.01085432  0.02316608 -0.02411452]
[-0.0527948   0.18392788  0.02268378 -0.30939922]
[-0.04911624 -0.0115098   0.0164958  -0.00964964]
[-0.04934644  0.18337174  0.01630281 -0.29708263]
[-0.045679   -0.01197878  0.01036116  0.00069701]
[-0.04591858 -0.20724778  0.0103751   0.29663092]
[-0.05006353 -0.01227525  0.01630771  0.00723809]
[-0.05030904  0.18260907  0.01645247 -0.28025526]
[-0.04665685  0.37749252  0

#### Environments

Gym comes out of the box with a diverse suit of environments, ranging from easy to difficult and involve many different kinds of data. You'll find a complete list here: https://gym.openai.com/envs/#classic_control

Here are some of those environments: 
- Classic control and toy text: complete small-scale tasks, mostly from the RL literature. They’re here to get you started.

- Algorithmic: perform computations such as adding multi-digit numbers and reversing sequences. One might object that these tasks are easy for a computer. The challenge is to learn these algorithms purely from examples. These tasks have the nice property that it’s easy to vary the difficulty by varying the sequence length.

- Atari: play classic Atari games. We’ve integrated the Arcade Learning Environment (which has had a big impact on reinforcement learning research) in an easy-to-install form.

- 2D and 3D robots: control a robot in simulation. These tasks use the MuJoCo physics engine, which was designed for fast and accurate robot simulation. 

#### The Registry

Gym's main purpose is to provide a large collection of environments that expose a common interface and are versioned to allow for comparisons. You can view the full list with the envs_list function below. This will list all EnvSpec objects, whose parameters are for a particular task, including the number of trials to run and the maximum number of steps. In order to ensure valid comparisons for the future, environments will never be changed in a fashion that affects performance, only replaced by newer versions; and it's easy to add your own environments to the registry, making them available for gym.make(), just register() them at load time. This will help with portability!






##### Classic Control Registries

In [9]:
def envs_list():
    print(envs.registry.all())

envs_list()

ValuesView(├──CartPole: [ v0, v1 ]
├──MountainCar: [ v0 ]
├──MountainCarContinuous: [ v0 ]
├──Pendulum: [ v1 ]
├──Acrobot: [ v1 ]
├──LunarLander: [ v2 ]
├──LunarLanderContinuous: [ v2 ]
├──BipedalWalker: [ v3 ]
├──BipedalWalkerHardcore: [ v3 ]
├──CarRacing: [ v1 ]
├──Blackjack: [ v1 ]
├──FrozenLake: [ v1 ]
├──FrozenLake8x8: [ v1 ]
├──CliffWalking: [ v0 ]
├──Taxi: [ v3 ]
├──Reacher: [ v2 ]
├──Pusher: [ v2 ]
├──InvertedPendulum: [ v2 ]
├──InvertedDoublePendulum: [ v2 ]
├──HalfCheetah: [ v2, v3 ]
├──Hopper: [ v2, v3 ]
├──Swimmer: [ v2, v3 ]
├──Walker2d: [ v2, v3 ]
├──Ant: [ v2, v3 ]
├──Humanoid: [ v2, v3 ]
└──HumanoidStandup: [ v2 ]
)


In [10]:
def MountainCar():
    env = gym.make("MountainCar-v0")
    print(env.action_space)
    print(env.observation_space)
    print(env.observation_space.high)
    print(env.observation_space.low)
    for i_episode in range(20): 
        observation = env.reset()
        for t in range(100):
            env.render()
            print(observation)
            action = env.action_space.sample()
            observation, reward, done, info = env.step(action)
            if done:
                print("Episode finished after {} timesteps".format(t+1))
                break 
    env.close()
MountainCar()

Discrete(3)
Box([-1.2  -0.07], [0.6  0.07], (2,), float32)
[0.6  0.07]
[-1.2  -0.07]
[-0.5661888  0.       ]
[-0.5668703  -0.00068144]
[-0.56822807 -0.00135782]
[-5.6825221e-01 -2.4097022e-05]
[-5.6794238e-01  3.0980274e-04]
[-0.566301   0.0016414]
[-0.5633402   0.00296079]
[-0.560082    0.00325815]
[-0.5555508   0.00453122]
[-0.5497803  0.0057705]
[-0.54381365  0.00596666]
[-0.53869545  0.00511818]
[-0.53346413  0.00523136]
[-0.5291588   0.00430534]
[-0.52581173  0.00334704]
[-0.5234481   0.00236364]
[-0.5210856   0.00236251]
[-0.5177419   0.00334366]
[-0.5144422   0.00329973]
[-0.51221114  0.00223107]
[-0.51006544  0.00214568]
[-0.50802124  0.00204421]
[-0.5050938   0.00292742]
[-0.5013051  0.0037887]
[-0.4966835   0.00462162]
[-0.4932635   0.00341998]
[-0.48907074  0.00419278]
[-0.48613647  0.00293428]
[-0.48448256  0.0016539 ]
[-0.48212135  0.00236121]
[-0.48007044  0.00205093]
[-0.47834504  0.00172539]
[-4.7795802e-01  3.8703013e-04]
[-4.7791222e-01  4.5792964e-05]
[-0.479208   -0

In [11]:
def MountainCarContinuous():
    env = gym.make("MountainCarContinuous-v0")
    print(env.action_space)
    print(env.observation_space)
    print(env.observation_space.high)
    print(env.observation_space.low)
    for i_episode in range(20): 
        observation = env.reset()
        for t in range(100):
            env.render()
            print(observation)
            action = env.action_space.sample()
            observation, reward, done, info = env.step(action)
            if done:
                print("Episode finished after {} timesteps".format(t+1))
                break 
    env.close()
MountainCarContinuous()

Box(-1.0, 1.0, (1,), float32)
Box([-1.2  -0.07], [0.6  0.07], (2,), float32)
[0.6  0.07]
[-1.2  -0.07]
[-0.5254724  0.       ]
[-0.5267143  -0.00124196]
[-0.5291421  -0.00242774]
[-0.531184   -0.00204195]
[-0.5331482  -0.00196422]
[-0.53439134 -0.00124311]
[-5.3450489e-01 -1.1355035e-04]
[-0.5335957   0.00090921]
[-0.5323359   0.00125978]
[-0.5304548   0.00188108]
[-0.528673    0.00178182]
[-0.5271822   0.00149079]
[-0.5256669   0.00151531]
[-0.5236261   0.00204083]
[-0.52176    0.0018661]
[-0.51980734  0.00195267]
[-0.51779807  0.00200925]
[-0.51659316  0.00120493]
[-0.51500106  0.00159208]
[-5.1478010e-01  2.2096152e-04]
[-0.51369494  0.00108515]
[-5.1336312e-01  3.3180256e-04]
[-0.5123544   0.00100877]
[-0.510298    0.00205638]
[-0.50886923  0.00142879]
[-0.5064657   0.00240353]
[-0.5047719   0.00169379]
[-0.50354815  0.00122376]
[-5.0331664e-01  2.3149623e-04]
[-5.0314832e-01  1.6832548e-04]
[-0.5022691   0.00087922]
[-0.50080377  0.00146531]
[-0.50023955  0.00056421]
[-0.498395   

In [12]:
def Pendulum():
    env = gym.make("Pendulum-v1")
    print(env.action_space)
    print(env.observation_space)
    print(env.observation_space.high)
    print(env.observation_space.low)
    for i_episode in range(20): 
        observation = env.reset()
        for t in range(100):
            env.render()
            print(observation)
            action = env.action_space.sample()
            observation, reward, done, info = env.step(action)
            if done:
                print("Episode finished after {} timesteps".format(t+1))
                break 
    env.close()
Pendulum()

Box(-2.0, 2.0, (1,), float32)
Box([-1. -1. -8.], [1. 1. 8.], (3,), float32)
[1. 1. 8.]
[-1. -1. -8.]
[-0.18446533  0.98283905  0.25893956]
[-0.21937779  0.97564     0.71297693]
[-0.30151895  0.95346016  1.7021736 ]
[-0.42426178  0.9055396   2.637223  ]
[-0.56338036  0.8261976   3.2065022 ]
[-0.71694076  0.69713414  4.018648  ]
[-0.8615353   0.50769764  4.7776403 ]
[-0.9664817   0.25673547  5.457347  ]
[-0.99929065 -0.03765929  5.946223  ]
[-0.9404246  -0.34000224  6.1850214 ]
[-0.8048048  -0.59353954  5.77061   ]
[-0.62378025 -0.78159976  5.2355323 ]
[-0.43758875 -0.8991752   4.413093  ]
[-0.24862905 -0.9685988   4.033014  ]
[-0.07973232 -0.99681634  3.428951  ]
[ 0.06017823 -0.99818766  2.8006332 ]
[ 0.15854092 -0.98735243  1.979962  ]
[ 0.22038326 -0.9754134   1.2598932 ]
[ 0.24109976 -0.97050035  0.4258302 ]
[ 0.21289572 -0.9770749  -0.57922405]
[ 0.14895175 -0.98884445 -1.300591  ]
[ 0.0334187  -0.99944144 -2.3216639 ]
[-0.12085983 -0.9926696  -3.0916188 ]
[-0.3192607  -0.94766694 

In [12]:
def Acrobot():
    env = gym.make("Acrobot-v1")
    print(env.action_space)
    print(env.observation_space)
    print(env.observation_space.high)
    print(env.observation_space.low)
    for i_episode in range(20): 
        observation = env.reset()
        for t in range(100):
            env.render()
            print(observation)
            action = env.action_space.sample()
            observation, reward, done, info = env.step(action)
            if done:
                print("Episode finished after {} timesteps".format(t+1))
                break 
    env.close()
Acrobot()

Discrete(3)
Box([ -1.        -1.        -1.        -1.       -12.566371 -28.274334], [ 1.        1.        1.        1.       12.566371 28.274334], (6,), float32)
[ 1.        1.        1.        1.       12.566371 28.274334]
[ -1.        -1.        -1.        -1.       -12.566371 -28.274334]
[ 0.9997786   0.02103973  0.9985032  -0.05469306  0.05216737  0.04802476]
[ 0.9998981   0.01427917  0.9999947  -0.00326629 -0.11658605  0.45617098]
[0.9999921  0.00398184 0.99868    0.05136434 0.0171533  0.07982733]
[ 0.99998534 -0.00541309  0.99537325  0.09608387 -0.10756489  0.35900092]
[ 0.99995065 -0.0099347   0.9927784   0.11996251  0.06364396 -0.12292524]
[ 0.99997026 -0.00771385  0.99306047  0.11760481 -0.04180589  0.099652  ]
[ 0.9999216  -0.01252124  0.9923155   0.1237335  -0.00517937 -0.03955805]
[ 0.99995214 -0.00978218  0.9947509   0.10232598  0.03131957 -0.17170615]
[ 9.9999958e-01 -9.2193013e-04  9.9831176e-01  5.8082521e-02
  5.4100335e-02 -2.6315895e-01]
[ 0.99972236  0.02356327  0.

##### Box2D Registries

In [3]:
def CarRacing():
    env = gym.make("CarRacing-v1", 
                    lap_complete_percent = True, 
                    domain_randomize = True, 
                    continuous = True)
    print(env.action_space)
    print(env.observation_space)
    for i_episode in range(20): 
        observation = env.reset()
        for t in range(100):
            env.render()
            print(observation)
            action = env.action_space.sample()
            observation, reward, done, info = env.step(action)
            if done:
                print("Episode finished after {} timesteps".format(t + 1))
                break
    env.close()
CarRacing()

AttributeError: module 'gym.envs.box2d' has no attribute 'CarRacing'

In [15]:
def LunarLander():
    env = gym.make("LunarLander-v2",
                    continuous = True, 
                    gravity = -10.0, 
                    enable_wind = False, 
                    wind_power = 15.0)
    print(env.action_space)
    print(env.observation_space)
    print(env.observation_space.high)
    print(env.observation_space.low)
    for i_episode in range(20): 
        observation = env.reset()
        for t in range(100):
            env.render()
            print(observation)
            action = env.action_space.sample()
            observation, reward, done, info = env.step(action)
            if done:
                print("Episode finished after {} timesteps".format(t+1))
                break 
    env.close()
LunarLander()

AttributeError: module 'gym.envs.box2d' has no attribute 'LunarLander'

In [16]:
def LunarLanderContinuous():
    env = gym.make("LunarLanderContinuous-v2", 
                    continuous = True)
    print(env.action_space)
    print(env.observation_space)
    print(env.observation_space.high)
    print(env.observation_space.low)
    for i_episode in range(20): 
        observation = env.reset()
        for t in range(100):
            env.render()
            print(observation)
            action = env.action_space.sample()
            observation, reward, done, info = env.step(action)
            if done:
                print("Episode finished after {} timesteps".format(t+1))
                break 
    env.close()
LunarLanderContinuous()

AttributeError: module 'gym.envs.box2d' has no attribute 'LunarLander'

In [5]:
def BipedalWalker():
    env = gym.make("BipedalWalker-v3")
    print(env.action_space)
    print(env.observation_space)
    print(env.observation_space.high)
    print(env.observation_space.low)
    for i_episode in range(20): 
        observation = env.reset()
        for t in range(100):
            env.render()
            print(observation)
            action = env.action_space.sample()
            observation, reward, done, info = env.step(action)
            if done:
                print("Episode finished after {} timesteps".format(t+1))
                break 
    env.close()
BipedalWalker()

AttributeError: module 'gym.envs.box2d' has no attribute 'BipedalWalker'

In [20]:
def BipedalWalkerHardcore():
    env = gym.make("BipedalWalkerHardcore-v3")
    print(env.action_space)
    print(env.observation_space)
    print(env.observation_space.high)
    print(env.observation_space.low)
    for i_episode in range(20): 
        observation = env.reset()
        for t in range(100):
            env.render()
            print(observation)
            action = env.action_space.sample()
            observation, reward, done, info = env.step(action)
            if done:
                print("Episode finished after {} timesteps".format(t+1))
                break 
    env.close()
BipedalWalkerHardcore()

AttributeError: module 'gym.envs.box2d' has no attribute 'BipedalWalker'

In [21]:
def CarRacing():
    env = gym.make("CarRacing-v1")
    print(env.action_space)
    print(env.observation_space)
    print(env.observation_space.high)
    print(env.observation_space.low)
    for i_episode in range(20): 
        observation = env.reset()
        for t in range(100):
            env.render()
            print(observation)
            action = env.action_space.sample()
            observation, reward, done, info = env.step(action)
            if done:
                print("Episode finished after {} timesteps".format(t+1))
                break 
    env.close()
CarRacing()

AttributeError: module 'gym.envs.box2d' has no attribute 'CarRacing'

##### Toy Text Registries

In [28]:
def Blackjack():
    env = gym.make("Blackjack-v1")
    print(env.action_space)
    print(env.observation_space)
    for i_episode in range(20): 
        observation = env.reset()
        for t in range(100):
            env.render()
            print(observation)
            action = env.action_space.sample()
            observation, reward, done, info = env.step(action)
            if done:
                print("Episode finished after {} timesteps".format(t+1))
                break 
    env.close()
Blackjack()

Discrete(2)
Tuple(Discrete(32), Discrete(11), Discrete(2))
(13, 4, False)
Episode finished after 1 timesteps
(14, 2, False)
Episode finished after 1 timesteps
(10, 3, False)
(21, 3, True)
(20, 3, False)
Episode finished after 3 timesteps
(10, 8, False)
Episode finished after 1 timesteps
(20, 10, False)
Episode finished after 1 timesteps
(6, 10, False)
(12, 10, False)
Episode finished after 2 timesteps
(11, 8, False)
Episode finished after 1 timesteps
(12, 6, False)
Episode finished after 1 timesteps
(18, 2, False)
Episode finished after 1 timesteps
(13, 6, False)
(14, 6, False)
(18, 6, False)
Episode finished after 3 timesteps
(12, 3, False)
Episode finished after 1 timesteps
(20, 2, False)
Episode finished after 1 timesteps
(15, 6, False)
Episode finished after 1 timesteps
(21, 4, True)
(14, 4, False)
Episode finished after 2 timesteps
(9, 10, False)
(11, 10, False)
(21, 10, False)
Episode finished after 3 timesteps
(12, 7, True)
(12, 7, False)
Episode finished after 2 timesteps
(11, 

: 

In [23]:
def FrozenLake():
    env = gym.make("FrozenLake-v1")
    print(env.action_space)
    print(env.observation_space)
    for i_episode in range(20): 
        observation = env.reset()
        for t in range(100):
            env.render()
            print(observation)
            action = env.action_space.sample()
            observation, reward, done, info = env.step(action)
            if done:
                print("Episode finished after {} timesteps".format(t+1))
                break 
    env.close()
FrozenLake()

Discrete(4)
Discrete(16)
0
0
4
8
9
10
9
10
Episode finished after 8 timesteps
0
0
0
0
4
Episode finished after 5 timesteps
0
4
4
4
4
Episode finished after 5 timesteps
0
1
2
3
2
2
1
2
3
2
2
6
Episode finished after 12 timesteps
0
1
1
1
Episode finished after 4 timesteps
0
4
4
Episode finished after 3 timesteps
0
0
0
1
2
6
10
14
Episode finished after 8 timesteps
0
0
0
0
4
0
1
1
Episode finished after 8 timesteps
0
0
1
2
1
1
2
6
Episode finished after 8 timesteps
0
1
1
1
0
0
4
8
9
8
4
0
1
0
1
Episode finished after 15 timesteps
0
0
0
0
1
0
0
0
0
0
0
1
2
1
Episode finished after 14 timesteps
0
1
2
1
0
1
2
6
Episode finished after 8 timesteps
0
4
4
8
9
Episode finished after 5 timesteps
0
0
0
0
4
4
4
4
0
0
4
8
4
4
0
0
1
1
2
6
Episode finished after 20 timesteps
0
0
0
4
8
Episode finished after 5 timesteps
0
4
4
8
Episode finished after 4 timesteps
0
0
0
1
2
1
0
0
0
1
0
0
0
0
1
0
1
0
0
0
4
8
4
0
1
Episode finished after 25 timesteps
0
4
0
4
0
1
2
3
Episode finished after 8 timesteps
0
4
0


In [4]:
def CliffWalking():
    env = gym.make("CliffWalking-v0")
    print(env.action_space)
    print(env.observation_space)

    for i_episode in range(20): 
        observation = env.reset()
        for t in range(100):
            env.render()
            print(observation)
            action = env.action_space.sample()
            observation, reward, done, info = env.step(action)
            if done:
                print("Episode finished after {} timesteps".format(t+1))
                break 
    env.close()
CliffWalking()

Discrete(4)
Discrete(48)
o  o  o  o  o  o  o  o  o  o  o  o
o  o  o  o  o  o  o  o  o  o  o  o
o  o  o  o  o  o  o  o  o  o  o  o
x  C  C  C  C  C  C  C  C  C  C  T

36
o  o  o  o  o  o  o  o  o  o  o  o
o  o  o  o  o  o  o  o  o  o  o  o
o  o  o  o  o  o  o  o  o  o  o  o
x  C  C  C  C  C  C  C  C  C  C  T

36
o  o  o  o  o  o  o  o  o  o  o  o
o  o  o  o  o  o  o  o  o  o  o  o
o  o  o  o  o  o  o  o  o  o  o  o
x  C  C  C  C  C  C  C  C  C  C  T

36
o  o  o  o  o  o  o  o  o  o  o  o
o  o  o  o  o  o  o  o  o  o  o  o
o  o  o  o  o  o  o  o  o  o  o  o
x  C  C  C  C  C  C  C  C  C  C  T

36
o  o  o  o  o  o  o  o  o  o  o  o
o  o  o  o  o  o  o  o  o  o  o  o
o  o  o  o  o  o  o  o  o  o  o  o
x  C  C  C  C  C  C  C  C  C  C  T

36
o  o  o  o  o  o  o  o  o  o  o  o
o  o  o  o  o  o  o  o  o  o  o  o
o  o  o  o  o  o  o  o  o  o  o  o
x  C  C  C  C  C  C  C  C  C  C  T

36
o  o  o  o  o  o  o  o  o  o  o  o
o  o  o  o  o  o  o  o  o  o  o  o
o  o  o  o  o  o  o  o  o  o  o  o
x  C  

In [6]:
def Taxi():
    env = gym.make("Taxi-v3")
    print(env.action_space)
    print(env.observation_space)
    for i_episode in range(20): 
        observation = env.reset()
        for t in range(100):
            env.render()
            print(observation)
            action = env.action_space.sample()
            observation, reward, done, info = env.step(action)
            if done:
                print("Episode finished after {} timesteps".format(t+1))
                break 
    env.close()
Taxi()

Discrete(6)
Discrete(500)
+---------+
|[34;1mR[0m: | : :[35mG[0m|
| : |[43m [0m: : |
| : : : : |
| | : | : |
|Y| : |B: |
+---------+

141
+---------+
|[34;1mR[0m: |[43m [0m: :[35mG[0m|
| : | : : |
| : : : : |
| | : | : |
|Y| : |B: |
+---------+
  (North)
41
+---------+
|[34;1mR[0m: | :[43m [0m:[35mG[0m|
| : | : : |
| : : : : |
| | : | : |
|Y| : |B: |
+---------+
  (East)
61
+---------+
|[34;1mR[0m: | :[43m [0m:[35mG[0m|
| : | : : |
| : : : : |
| | : | : |
|Y| : |B: |
+---------+
  (Pickup)
61
+---------+
|[34;1mR[0m: | : :[35mG[0m|
| : | :[43m [0m: |
| : : : : |
| | : | : |
|Y| : |B: |
+---------+
  (South)
161
+---------+
|[34;1mR[0m: | : :[35mG[0m|
| : |[43m [0m: : |
| : : : : |
| | : | : |
|Y| : |B: |
+---------+
  (West)
141
+---------+
|[34;1mR[0m: | : :[35mG[0m|
| : | :[43m [0m: |
| : : : : |
| | : | : |
|Y| : |B: |
+---------+
  (East)
161
+---------+
|[34;1mR[0m: | : :[35mG[0m|
| : | : :[43m [0m|
| : : : : |
| | : | : |
|Y| : |B: 

##### MuJoCo Registries

In [2]:
def Reacher():
    env = gym.make("Reacher-v2")
    print(env.action_space)
    print(env.observation_space)
    print(env.observation_space.high)
    print(env.observation_space.low)
    for i_episode in range(20): 
        observation = env.reset()
        for t in range(100):
            env.render()
            print(observation)
            action = env.action_space.sample()
            observation, reward, done, info = env.step(action)
            if done:
                print("Episode finished after {} timesteps".format(t+1))
                break 
    env.close()
Reacher()

Compiling c:\Users\jaywo\anaconda3\lib\site-packages\mujoco_py\cymj.pyx because it changed.
[1/1] Cythonizing c:\Users\jaywo\anaconda3\lib\site-packages\mujoco_py\cymj.pyx
running build_ext
building 'mujoco_py.cymj' extension


DistutilsPlatformError: Unable to find vcvarsall.bat

In [6]:
def Pusher():
    env = gym.make("Pusher-v2")
    print(env.action_space)
    print(env.observation_space)
    print(env.observation_space.high)
    print(env.observation_space.low)
    for i_episode in range(20): 
        observation = env.reset()
        for t in range(100):
            env.render()
            print(observation)
            action = env.action_space.sample()
            observation, reward, done, info = env.step(action)
            if done:
                print("Episode finished after {} timesteps".format(t+1))
                break 
    env.close()
Pusher()

running build_ext
building 'mujoco_py.cymj' extension


DistutilsPlatformError: Unable to find vcvarsall.bat

In [None]:
def InvertedPendulum():
    env = gym.make("InvertedPendulum-v2")
    print(env.action_space)
    print(env.observation_space)
    print(env.observation_space.high)
    print(env.observation_space.low)
    for i_episode in range(20): 
        observation = env.reset()
        for t in range(100):
            env.render()
            print(observation)
            action = env.action_space.sample()
            observation, reward, done, info = env.step(action)
            if done:
                print("Episode finished after {} timesteps".format(t+1))
                break 
    env.close()
InvertedPendulum()

In [9]:
def InvertedDoublePendulum():
    env = gym.make("InvertedDoublePendulum-v2")
    print(env.action_space)
    print(env.observation_space)
    print(env.observation_space.high)
    print(env.observation_space.low)
    for i_episode in range(20): 
        observation = env.reset()
        for t in range(100):
            env.render()
            print(observation)
            action = env.action_space.sample()
            observation, reward, done, info = env.step(action)
            if done:
                print("Episode finished after {} timesteps".format(t+1))
                break 
    env.close()
InvertedDoublePendulum()

DependencyNotInstalled: No module named 'mujoco_py'. (HINT: you need to install mujoco_py, and also perform the setup instructions here: https://github.com/openai/mujoco-py/.)

In [None]:
def HalfCheetahV2():
    env = gym.make("HalfCheetah-v2")
    print(env.action_space)
    print(env.observation_space)
    print(env.observation_space.high)
    print(env.observation_space.low)
    for i_episode in range(20): 
        observation = env.reset()
        for t in range(100):
            env.render()
            print(observation)
            action = env.action_space.sample()
            observation, reward, done, info = env.step(action)
            if done:
                print("Episode finished after {} timesteps".format(t+1))
                break 
    env.close()
HalfCheetahV2()

In [None]:
def HalfCheetahV3():
    env = gym.make("HalfCheetah-v3")
    print(env.action_space)
    print(env.observation_space)
    print(env.observation_space.high)
    print(env.observation_space.low)
    for i_episode in range(20): 
        observation = env.reset()
        for t in range(100):
            env.render()
            print(observation)
            action = env.action_space.sample()
            observation, reward, done, info = env.step(action)
            if done:
                print("Episode finished after {} timesteps".format(t+1))
                break 
    env.close()
HalfCheetahV3()

In [None]:
def HopperV2():
    env = gym.make("Hopper-v2")
    print(env.action_space)
    print(env.observation_space)
    print(env.observation_space.high)
    print(env.observation_space.low)
    for i_episode in range(20): 
        observation = env.reset()
        for t in range(100):
            env.render()
            print(observation)
            action = env.action_space.sample()
            observation, reward, done, info = env.step(action)
            if done:
                print("Episode finished after {} timesteps".format(t+1))
                break 
    env.close()
HopperV2()

In [None]:
def HopperV3():
    env = gym.make("Hopper-v3")
    print(env.action_space)
    print(env.observation_space)
    print(env.observation_space.high)
    print(env.observation_space.low)
    for i_episode in range(20): 
        observation = env.reset()
        for t in range(100):
            env.render()
            print(observation)
            action = env.action_space.sample()
            observation, reward, done, info = env.step(action)
            if done:
                print("Episode finished after {} timesteps".format(t+1))
                break 
    env.close()
HopperV3()

In [None]:
def SwimmerV2():
    env = gym.make("Swimmer-v2")
    print(env.action_space)
    print(env.observation_space)
    print(env.observation_space.high)
    print(env.observation_space.low)
    for i_episode in range(20): 
        observation = env.reset()
        for t in range(100):
            env.render()
            print(observation)
            action = env.action_space.sample()
            observation, reward, done, info = env.step(action)
            if done:
                print("Episode finished after {} timesteps".format(t+1))
                break 
    env.close()
SwimmerV2()

In [None]:
def SwimmerV3():
    env = gym.make("Swimmer-v3")
    print(env.action_space)
    print(env.observation_space)
    print(env.observation_space.high)
    print(env.observation_space.low)
    for i_episode in range(20): 
        observation = env.reset()
        for t in range(100):
            env.render()
            print(observation)
            action = env.action_space.sample()
            observation, reward, done, info = env.step(action)
            if done:
                print("Episode finished after {} timesteps".format(t+1))
                break 
    env.close()
SwimmerV3()

In [None]:
def Walker2dV2():
    env = gym.make("Walker2d-v2")
    print(env.action_space)
    print(env.observation_space)
    print(env.observation_space.high)
    print(env.observation_space.low)
    for i_episode in range(20): 
        observation = env.reset()
        for t in range(100):
            env.render()
            print(observation)
            action = env.action_space.sample()
            observation, reward, done, info = env.step(action)
            if done:
                print("Episode finished after {} timesteps".format(t+1))
                break 
    env.close()
Walker2dV2()

In [None]:
def Walker2dV3():
    env = gym.make("Walker2d-v3")
    print(env.action_space)
    print(env.observation_space)
    print(env.observation_space.high)
    print(env.observation_space.low)
    for i_episode in range(20): 
        observation = env.reset()
        for t in range(100):
            env.render()
            print(observation)
            action = env.action_space.sample()
            observation, reward, done, info = env.step(action)
            if done:
                print("Episode finished after {} timesteps".format(t+1))
                break 
    env.close()
Walker2dV3()

In [None]:
def AntV2():
    env = gym.make("Ant-v2")
    print(env.action_space)
    print(env.observation_space)
    print(env.observation_space.high)
    print(env.observation_space.low)
    for i_episode in range(20): 
        observation = env.reset()
        for t in range(100):
            env.render()
            print(observation)
            action = env.action_space.sample()
            observation, reward, done, info = env.step(action)
            if done:
                print("Episode finished after {} timesteps".format(t+1))
                break 
    env.close()
AntV2()

In [None]:
def AntV3():
    env = gym.make("Ant-v3")
    print(env.action_space)
    print(env.observation_space)
    print(env.observation_space.high)
    print(env.observation_space.low)
    for i_episode in range(20): 
        observation = env.reset()
        for t in range(100):
            env.render()
            print(observation)
            action = env.action_space.sample()
            observation, reward, done, info = env.step(action)
            if done:
                print("Episode finished after {} timesteps".format(t+1))
                break 
    env.close()
AntV3()

In [None]:
def HumanoidV2():
    env = gym.make("Humanoid-v2")
    print(env.action_space)
    print(env.observation_space)
    print(env.observation_space.high)
    print(env.observation_space.low)
    for i_episode in range(20): 
        observation = env.reset()
        for t in range(100):
            env.render()
            print(observation)
            action = env.action_space.sample()
            observation, reward, done, info = env.step(action)
            if done:
                print("Episode finished after {} timesteps".format(t+1))
                break 
    env.close()
HumanoidV2()

In [None]:
def HumanoidV3():
    env = gym.make("Humanoid-v3")
    print(env.action_space)
    print(env.observation_space)
    print(env.observation_space.high)
    print(env.observation_space.low)
    for i_episode in range(20): 
        observation = env.reset()
        for t in range(100):
            env.render()
            print(observation)
            action = env.action_space.sample()
            observation, reward, done, info = env.step(action)
            if done:
                print("Episode finished after {} timesteps".format(t+1))
                break 
    env.close()
HumanoidV3()

In [None]:
def HumanoidStandup():
    env = gym.make("HumanoidStandup-v2")
    print(env.action_space)
    print(env.observation_space)
    print(env.observation_space.high)
    print(env.observation_space.low)
    for i_episode in range(20): 
        observation = env.reset()
        for t in range(100):
            env.render()
            print(observation)
            action = env.action_space.sample()
            observation, reward, done, info = env.step(action)
            if done:
                print("Episode finished after {} timesteps".format(t+1))
                break 
    env.close()
HumanoidStandup()