# [Farama Gymnasium](https://gymnasium.farama.org/)
##### Previously OpenAI Gym

Fully supported on Linux and MacOS, see [Github](https://github.com/Farama-Foundation/Gymnasium) for details.

Most packages work on Windows. *Tested with clean Anaconda env & Python 3.11.5*

MujoCo and Atari versions require seperate files, read [the documentation](https://gymnasium.farama.org/).

Trying to install **Box2D** on Windows? [This](https://www.youtube.com/watch?v=gMgj4pSHLww) might help.


In [7]:

pip install gymnasium

Note: you may need to restart the kernel to use updated packages.




In [8]:
pip install gymnasium[classic-control]

Collecting pygame>=2.1.3 (from gymnasium[classic-control])
  Downloading pygame-2.5.2-cp310-cp310-win_amd64.whl.metadata (13 kB)
Downloading pygame-2.5.2-cp310-cp310-win_amd64.whl (10.8 MB)
   ---------------------------------------- 10.8/10.8 MB 22.6 MB/s eta 0:00:00
Installing collected packages: pygame
Successfully installed pygame-2.5.2
Note: you may need to restart the kernel to use updated packages.




In [9]:
import gymnasium as gym

## CartPole
This environment corresponds to the version of the cart-pole problem described by **Barto, Sutton, and Anderson** in “Neuronlike Adaptive Elements That Can Solve Difficult Learning Control Problem”. A pole is attached by an un-actuated joint to a cart, which moves along a frictionless track. The pendulum is placed upright on the cart and the goal is to balance the pole by applying forces in the left and right direction on the cart.

In [10]:
env = gym.make("CartPole-v1", render_mode="human")
observation, info = env.reset(seed=42)
for _ in range(300):
   action = env.action_space.sample()  # this is where you would insert your policy
   observation, reward, terminated, truncated, info = env.step(action)

   if terminated or truncated:
      observation, info = env.reset()

env.close()

## Mountain Car
The Mountain Car MDP is a deterministic MDP that consists of a car placed stochastically at the bottom of a sinusoidal valley, with the only possible actions being the accelerations that can be applied to the car in either direction. The goal of the MDP is to strategically accelerate the car to reach the goal state on top of the right hill. There are two versions of the mountain car domain in gymnasium: one with discrete actions and one with continuous. This version is the one with discrete actions.

In [None]:
env = gym.make("MountainCar-v0", render_mode="human")
observation, info = env.reset(seed=42)
for _ in range(300):
   action = env.action_space.sample()  # this is where you would insert your policy
   observation, reward, terminated, truncated, info = env.step(action)

   if terminated or truncated:
      observation, info = env.reset()

env.close()

## Blackjack

The game starts with the dealer having one face up and one face down card, while the player has two face up cards. All cards are drawn from an infinite deck (i.e. with replacement).

The card values are:
- Face cards (Jack, Queen, King) have a point value of 10.
- Aces can either count as 11 (called a ‘usable ace’) or 1.
- Numerical cards (2-9) have a value equal to their number.

The player has the sum of cards held. The player can request additional cards (hit) until they decide to stop (stick) or exceed 21 (bust, immediate loss).

After the player sticks, the dealer reveals their facedown card, and draws cards until their sum is 17 or greater. If the dealer goes bust, the player wins.

If neither the player nor the dealer busts, the outcome (win, lose, draw) is decided by whose sum is closer to 21.

This environment corresponds to the version of the **blackjack problem described in Example 5.1 in Reinforcement Learning: An Introduction by Sutton and Barto**.

In [None]:
env = gym.make("Blackjack-v1", render_mode="human")
observation, info = env.reset(seed=42)
for _ in range(10):
   action = env.action_space.sample()  # this is where you would insert your policy
   observation, reward, terminated, truncated, info = env.step(action)

   if terminated or truncated:
      observation, info = env.reset()

env.close()

## Frozen Lake
The game starts with the player at location [0,0] of the frozen lake grid world with the goal located at far extent of the world e.g. [3,3] for the 4x4 environment.

Holes in the ice are distributed in set locations when using a pre-determined map or in random locations when a random map is generated.

The player makes moves until they reach the goal or fall in a hole.

The lake is slippery (unless disabled) so the player may move perpendicular to the intended direction sometimes (see is_slippery).

Randomly generated worlds will always have a path to the goal.

Elf and stool from https://franuka.itch.io/rpg-snow-tileset. All other assets by Mel Tillery http://www.cyaneus.com/.

In [None]:
env = gym.make("FrozenLake-v1", render_mode="human")
observation, info = env.reset(seed=42)
for _ in range(100):
   action = env.action_space.sample()  # this is where you would insert your policy
   observation, reward, terminated, truncated, info = env.step(action)

   if terminated or truncated:
      observation, info = env.reset()

env.close()

## Lunar Lander
This environment is a classic rocket trajectory optimization problem. According to Pontryagin’s maximum principle, it is optimal to fire the engine at full throttle or turn it off. This is the reason why this environment has discrete actions: engine on or off.

There are two environment versions: discrete or continuous. The landing pad is always at coordinates (0,0). The coordinates are the first two numbers in the state vector. Landing outside of the landing pad is possible. Fuel is infinite, so an agent can learn to fly and then land on its first attempt.

In [None]:
env = gym.make("LunarLander-v2", render_mode="human")
observation, info = env.reset(seed=42)
for _ in range(300):
   action = env.action_space.sample()  # this is where you would insert your policy
   observation, reward, terminated, truncated, info = env.step(action)

   if terminated or truncated:
      observation, info = env.reset()

env.close()

## Bipedal Walker
This is a simple 4-joint walker robot environment. There are two versions:

- Normal, with slightly uneven terrain.
- Hardcore, with ladders, stumps, pitfalls.

To solve the normal version, you need to get 300 points in 1600 time steps. To solve the hardcore version, you need 300 points in 2000 time steps.

In [None]:
env = gym.make("BipedalWalker-v3", render_mode="human")
observation, info = env.reset(seed=42)
for _ in range(200):
   action = env.action_space.sample()  # this is where you would insert your policy
   observation, reward, terminated, truncated, info = env.step(action)

   if terminated or truncated:
      observation, info = env.reset()

env.close()

## Car Racing
The easiest control task to learn from pixels - a top-down racing environment. The generated track is random every episode.

Some indicators are shown at the bottom of the window along with the state RGB buffer. From left to right: true speed, four ABS sensors, steering wheel position, and gyroscope. To play yourself (it’s rather fast for humans), type:
```
python gymnasium/envs/box2d/car_racing.py
```


Remember: it’s a powerful rear-wheel drive car - don’t press the accelerator and turn at the same time.

In [None]:
env = gym.make("CarRacing-v2", render_mode="human")
observation, info = env.reset(seed=42)
for _ in range(200):
   action = env.action_space.sample()  # this is where you would insert your policy
   observation, reward, terminated, truncated, info = env.step(action)

   if terminated or truncated:
      observation, info = env.reset()

env.close()

## And many many more...
Check out the [full documentation](https://gymnasium.farama.org/) for many more environments