## Sample Code
This should serve as a small example for the different parts of the Gym environment that we will be using. Feel free to make a copy and start your project off of this, please just don't edit it :)

Gotta import the gym
 - __Check the required packages below__
 - at least i didn't have them installed so you might have to download them as well.

In [2]:
"""
Required Packages (can copy and paste commands)
pip install swig
pip install "gymnasium[box2d]"
"""
import gymnasium as gym
import pygame

Here is an example of how to load the environment, and move randomly (a policy that has equal weighting for all actions in the action space)
 - __If you want to see the lunar lander in action!__
 - when calling `gym.make`, set `render_mode="human"`
 - __WARNING__: Pygame does not work well with jupyter notebooks, so setting `render_mode="human"` will cause the entire kernel to crash after the render is complete.
 - Probably want to test in either a .py file, or something that isn't taking a long time to run so that you wont have to restart the whole kernel.

In [11]:
env = gym.make("LunarLander-v3")
#env = gym.make("LunarLander-v3", render_mode="human")
observation, info = env.reset()

#take 100 random actions
for action_number in range(100):
    #take a random action from the action space
    action = env.action_space.sample()
    #use the action in the environment
    observation, reward, terminated, truncated, info = env.step(action)

    #if terminated or truncated are True, then it means this run of the simulation is over
    if terminated or truncated:
        env.reset() #how to restart the environment after a simulation is complete

env.close()
if env.render_mode == "human":
    pygame.quit() #doesn't even do anything on notebooks, but this is how you would close it in a .py file

Here is pretty much the same example, but instead of acting randomly for 100 actions, we act randomly until one run of the simulation is complete. Which is probably the more common way we will be interfacing with the simulation.

In [12]:
env = gym.make("LunarLander-v3")

observation, info = env.reset()
terminated, truncated = False, False

# Loop for 10 simulations
for simulations in range(10):
# While simulation is still active

    while not terminated and not truncated:
        #take a random action from the action space
        action = env.action_space.sample()
        #use the action in the environment
        observation, reward, terminated, truncated, info = env.step(action)
    env.reset()

env.close()
print(f'Reward: {reward}')

Reward: -100


## Key Variables

#### State
Our state is an 8-dimensional vector, these are the indexes for the different values. In the code above it is referenced as `observation`

| Index  | Type | Purpose |
|---|---|---|
| 0 | float | X coordinate |
| 1 | float | Y coordinate |
| 2 | float | X Velocity |
| 3 | float | Y Velocity |
| 4 | float | Angle |
| 5 | float | Angular Velocity |
| 6 | boolean | left-leg in contact with floor|
| 7 | boolean | right-leg in contact with floor |

#### Action Space
Our action space is very similar to the one from the Markov-Decision Process (MDP) lab. It's a 4 discrete dimensional vector indexed as such
| Value | Purpose |
|---|---|
| 0 | Do Notihng |
| 1 | Fire left orientation engine |
| 2 | Fire main engine |
| 3 | Fire right orientation engine |

#### Rewards
The reward after each action. This is returned after every `env.step()` call. For each step the reward:

- **Increases** the closer the lander is to the landing pad and **decreases** the further it is.
- **Increases** the slower the lander is moving and **decreases** the faster it is moving.
- **Decreases** the more the lander is tilted (i.e., the angle is not horizontal).
- **Increases** by **10 points** for each leg that is in contact with the ground.
- **Decreases** by **0.03 points** for each frame a side engine is firing.
- **Decreases** by **0.3 points** for each frame the main engine is firing.

Additionally, the episode receives:

- **-100 points** for crashing.
- **+100 points** for landing safely.

An episode is considered a **solution** if it scores at least **200 points**. This means that the agents need to learn how to land with __minimal engine usage__.

_For more information_ [go to the docs](https://gymnasium.farama.org/environments/box2d/lunar_lander/)