# Chapter 1

In [1]:
import random 
import time
import gym

In probability theory, the law of large numbers is a theorem that describes the result of performing the same experiment a large number of times. According to the law, the average of the results obtained from a large number of trials should be close to the expected value, and will tend to become closer as more trials are performed.

In [2]:
episodes = [5,100,10000,2500000]
for episode in episodes:
    h = 0; t = 0
    for i in range(episode):
        if random.random() > 0.5:
            h += 1
        else:
            t += 1
    avg_h = h/episode
    avg_t = t/episode
    print("Episodes: {:8d} Avg.Heads: {:.4f} Average Tails: {:.4f}".format(episode, avg_h, avg_t))

Episodes:        5 Avg.Heads: 0.4000 Average Tails: 0.6000
Episodes:      100 Avg.Heads: 0.4600 Average Tails: 0.5400
Episodes:    10000 Avg.Heads: 0.5027 Average Tails: 0.4973
Episodes:  2500000 Avg.Heads: 0.5000 Average Tails: 0.5000


You create an environment using gym.make()

In [3]:
env = gym.make("Taxi-v2")

Here we have created a taxi environment, but before we can act in this environment, we must reset it.  When the environment is reset, it will return a state.  You can render the environment using env.render().

In [4]:
print(env.reset())
env.render()

111
+---------+
|R: | : :G|
|[43m [0m: : : : |
| : : : : |
| | : | : |
|[34;1mY[0m| : |[35mB[0m: |
+---------+



Gym also allows you to access information from the environment such as the number of states and actions.  This is extremely useful when programming an agent to solve an environment.

In [5]:
n_actions = env.action_space.n
n_states = env.observation_space.n
print("Available Actions: {} \nPossible States: {}".format(n_actions, n_states))

Available Actions: 6 
Possible States: 500


In this environment we can overide the current state.

In [6]:
env.env.s  = 362
env.render()

+---------+
|[34;1mR[0m: | : :G|
| : : : : |
| : : : : |
| | : |[43m [0m: |
|[35mY[0m| : |B: |
+---------+



In the Taxi environment there are 500 possible states (0-499) and 6 possible actions (0-5).  To perform an action, we use the env.step() method.  When you perform a step Gym will return a tuple including the new state, the reward, a boolean stating if the environment has terminated and lastly info used for debugging (but is considered cheating if the agent uses).  Here we take action 0 and the state changes to 462, we receive a reward of -1, the environment has not terminated and our state transition happened with a probability of 1.

In [7]:
env.step(0)

(462, -1, False, {'prob': 1.0})

The aim of the Taxi environment is to move to the passenger (marked in blue), pick them off and drop off at the destination (marked in purple).  With enough random actions you can solve this environment, though you will not get a very good reward.

In [8]:
state = env.reset()
counter = 0
G = 0
reward = None
while reward != 20:
    state, reward, done, info = env.step(env.action_space.sample())
    G += reward
    counter += 1

print("Solved in {} steps with a total reward of {}".format(counter, G))

Solved in 6013 steps with a total reward of -23497


In [9]:
env = gym.make("MsPacman-v0")
state = env.reset()

For the Atari 2600 environments we can get the action meanings.

In [10]:
meanings = env.env.get_action_meanings()
for i in range(env.action_space.n):
    print("Action {}: {}".format(i, meanings[i]))

Action 0: NOOP
Action 1: UP
Action 2: RIGHT
Action 3: LEFT
Action 4: DOWN
Action 5: UPRIGHT
Action 6: UPLEFT
Action 7: DOWNRIGHT
Action 8: DOWNLEFT


We can try playing pacman using completely random actions, but that will not get us very far.

In [11]:
state = env.reset()
done = None
while done != True:
    state, reward, done, info = env.step(env.action_space.sample())
    time.sleep(0.05)
    env.render()

To get a complete list of all the environments.

In [12]:
from gym import envs
all_envs = envs.registry.all()
env_ids = [env_spec.id for env_spec in all_envs]
for env in env_ids:
    print(env)

AtlantisNoFrameskip-v4
VideoPinball-ram-v4
Thrower-v2
Atlantis-ramDeterministic-v4
BattleZone-ram-v4
Venture-ramNoFrameskip-v4
PrivateEye-v0
BowlingNoFrameskip-v0
NameThisGameDeterministic-v4
Seaquest-v4
Solaris-ramDeterministic-v0
KungFuMaster-ram-v0
AirRaid-v4
BankHeistDeterministic-v0
Krull-ram-v0
FishingDerby-v4
Gopher-ramDeterministic-v4
PrivateEye-v4
DemonAttack-ram-v4
Gravitar-ramDeterministic-v0
HandManipulateEgg-v0
KrullNoFrameskip-v4
RiverraidNoFrameskip-v4
AsterixNoFrameskip-v0
VideoPinballDeterministic-v4
Pong-ram-v0
Solaris-ram-v4
ElevatorActionDeterministic-v4
ChopperCommandDeterministic-v4
Asterix-ram-v4
Solaris-v4
Asteroids-v0
Pong-ramDeterministic-v4
Carnival-ram-v4
EnduroDeterministic-v0
BattleZone-ramNoFrameskip-v4
HandManipulatePen-v0
Alien-ramNoFrameskip-v0
FishingDerby-ram-v0
Kangaroo-ramNoFrameskip-v0
IceHockey-ramDeterministic-v0
CarRacing-v0
KungFuMaster-ramNoFrameskip-v0
DemonAttack-ramDeterministic-v0
Phoenix-ramNoFrameskip-v0
Alien-v4
ChopperCommand-ramNoFra