In [89]:
import gym
import torch
import torch.nn as nn

### The LunarLander-v2 Environment:
Description: Control a space ship to land on an landing pad. Initial conditions for the shape ship and terrain are somewhat random. Landing pad is always at (0,0).

Action Space: Discrete(4)
- 0: Do nothing
- 1: fire left orientation engine (thrust to the left)
- 2: fire main engine (thrust up)
- 3: fire right orientation engine

Observation Space: Box(8,)
- x position, $x$
- y position, $y$
- x velocty, $\dot{x}$
- y velocity, $\dot{y}$
- lander angle with horizontal axis, $\theta$
- angular velocity
- is leg 1 on the ground, $l_1$ [Boolean]
- is leg 2 on the ground, $l_2$ [Boolean]
    
Reward: <br>
- Uses a "shaping" function that usually outputs a negative number. It is a rough estimate of how good your state is. The shaping function for the last step is subtracted from the shaping function for the current step and the difference is added to the reward. So to increase reward you want your the value of the shaping function to be less negative than that of the previous step. This way, you are rewarded for improving your state from the last time-step.

$$ Shaping = -100 \sqrt{x^2 + y^2} \
            - 100\sqrt{\dot{x}^2 + \dot{y}^2} \
            - 100 |\theta| + 10 l_1 + 10l_2 $$
$$ reward = Shaping_t - Shaping_{t - 1} $$

- Lose reward if lander moves away from the landing pad
- Gains reward if lander moves closer to the landing pad
- Incentivises decreasing / low velocity 
- Incentivises keeping a leg on the ground after it has made contact <br><br>
In addition:
- -0.3 points per frame for firing the main engine (up)
- -0.03 points per frame for firing a side engine 
- -100 points if the lander crashes (end of episode)
- +100 points if the lander comes to rest on the ground (end of episode)

In [85]:
env = gym.make('LunarLander-v2')

for i_episode in range(1):
    observation = env.reset()
    for t in range(500):
        env.render()
        action = env.action_space.sample()
        
        observation, reward, done, info = env.step(action)
        if done:
            print("Episode finished after {} timesteps".format(t+1))
            break
env.close()

[-0.00970173  1.4141862  -0.49067292  0.05972188  0.0111271   0.11000228
  0.          0.        ]
[-0.01455307  1.4149313  -0.49069008  0.03306899  0.01662343  0.10993652
  0.          0.        ]
[-0.0194046   1.415077   -0.4907062   0.00639559  0.02211905  0.10992263
  0.          0.        ]
[-0.02420025  1.4154488  -0.485454    0.01642852  0.02793977  0.11642508
  0.          0.        ]
[-0.02883453  1.4158064  -0.47014147  0.01575278  0.03458357  0.13288835
  0.          0.        ]
[-0.03355455  1.4164572  -0.4784057   0.02875379  0.04091541  0.12664875
  0.          0.        ]
[-0.03827477  1.4165081  -0.47842336  0.0020774   0.04724688  0.12664086
  0.          0.        ]
[-0.04292736  1.4159533  -0.46993217 -0.02481668  0.05187706  0.092612
  0.          0.        ]
[-0.04749422  1.4148152  -0.45917982 -0.05066625  0.05433574  0.04917832
  0.          0.        ]
[-0.05200224  1.4140325  -0.4537355  -0.03490587  0.05723745  0.05803918
  0.          0.        ]
[-0.05651045

  0.          0.        ]
[-0.37514788  0.39642507 -0.31151336 -1.163254   -0.31320018 -0.19819728
  0.          0.        ]
[-0.37835842  0.36969867 -0.3115171  -1.1899284  -0.32310998 -0.19819596
  0.          0.        ]
[-0.38156867  0.3423736  -0.31152096 -1.2166028  -0.3330197  -0.1981946
  0.          0.        ]
[-0.3847119   0.31441492 -0.30309278 -1.2452428  -0.34476802 -0.23496632
  0.          0.        ]
[-0.38785467  0.28585798 -0.3030985  -1.2719203  -0.3565162  -0.23496413
  0.          0.        ]
[-0.39099708  0.25670272 -0.30310443 -1.2985978  -0.36826432 -0.23496187
  0.          0.        ]
[-0.394139    0.22694916 -0.30311054 -1.3252753  -0.3800123  -0.23495963
  0.          0.        ]
[-0.3969575   0.19807778 -0.270531   -1.2862327  -0.39206702 -0.24109444
  0.          0.        ]
[-0.39954048  0.16926764 -0.24724892 -1.2835442  -0.4038944  -0.23654738
  0.          0.        ]
[-0.402183    0.13990103 -0.25488895 -1.3078834  -0.41396865 -0.2014852
  1.        

In [76]:
env.close()

In [78]:
help(env.render)

Help on method render in module gym.core:

render(mode='human', **kwargs) method of gym.wrappers.time_limit.TimeLimit instance
    Renders the environment.
    
    The set of supported modes varies per environment. (And some
    environments do not support rendering at all.) By convention,
    if mode is:
    
    - human: render to the current display or terminal and
      return nothing. Usually for human consumption.
    - rgb_array: Return an numpy.ndarray with shape (x, y, 3),
      representing RGB values for an x-by-y pixel image, suitable
      for turning into a video.
    - ansi: Return a string (str) or StringIO.StringIO containing a
      terminal-style text representation. The text can include newlines
      and ANSI escape sequences (e.g. for colors).
    
    Note:
        Make sure that your class's metadata 'render.modes' key includes
          the list of supported modes. It's recommended to call super()
          in implementations to use the functionality of this m

Testing environment


In [45]:
env.action_space

Discrete(4)

In [46]:
env.observation_space

Box(8,)

In [47]:
env.observation_space.high

array([inf, inf, inf, inf, inf, inf, inf, inf], dtype=float32)

In [48]:
env.observation_space.low

array([-inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf], dtype=float32)

In [34]:
for _ in range(20):
    print(env.action_space.sample())

0
1
3
1
1
0
0
3
1
3
2
2
0
1
0
3
2
3
0
3


In [92]:
help(nn.Module)

Help on class Module in module torch.nn.modules.module:

class Module(builtins.object)
 |  Base class for all neural network modules.
 |  
 |  Your models should also subclass this class.
 |  
 |  Modules can also contain other Modules, allowing to nest them in
 |  a tree structure. You can assign the submodules as regular attributes::
 |  
 |      import torch.nn as nn
 |      import torch.nn.functional as F
 |  
 |      class Model(nn.Module):
 |          def __init__(self):
 |              super(Model, self).__init__()
 |              self.conv1 = nn.Conv2d(1, 20, 5)
 |              self.conv2 = nn.Conv2d(20, 20, 5)
 |  
 |          def forward(self, x):
 |              x = F.relu(self.conv1(x))
 |              return F.relu(self.conv2(x))
 |  
 |  Submodules assigned in this way will be registered, and will have their
 |  parameters converted too when you call :meth:`to`, etc.
 |  
 |  Methods defined here:
 |  
 |  __call__(self, *input, **kwargs)
 |      Call self as a function.
