In [2]:
import gymnasium as gym

# 1. Cartpole

![cartpole](https://gymnasium.farama.org/v0.26.3/_images/cart_pole.gif)

#### State Space
The statespace consist of 4 continuous variables:

* Cart position (x-axis)
* Cart velocity
* Pole angle
* Pole angular velocity

### Action Space
Discrete action space with 2 possible actions:

* 0: Push cart to the left
* 1: Push cart to the right

### Transition Dynamics
The physics engine simulates the movement of the cart and the pole based on classical mechanics. The cart moves on a frictionless track, and the pole is attached to the cart with a frictionless joint.

### Reward Function
+1 for every timestep the pole remains upright.
Episode Termination Conditions:

* The pole angle exceeds ±12 degrees from vertical
* The cart position is more than ±2.4 units from the center
* 500 timesteps are reached (success)

### Rendering and Visualization
Provides a 2D visualization of the cart and pole system.

env = gym.make('CartPole-v1', render_mode='human')

observation, info = env.reset(seed=42)

while True:
    env.render()

    action = env.action_space.sample()
    
    observation, reward, terminated, truncated, info = env.step(action)
    
    print(f"Observation: {observation}")
    print(f"Reward: {reward}")
    print(f"Terminated: {terminated}")
    print(f"Truncated: {truncated}")
    print(f"Info: {info}")
    print("-" * 50)
    
    if terminated or truncated:
        break

# Close the environment
env.close()

# 2. Mountain Car

![mountain car](https://gymnasium.farama.org/_images/mountain_car.gif)

### State Space
The state space consists of 2 continuous variables:

* Car position (x-axis): Range from -1.2 to 0.6
* Car velocity: Range from -0.07 to 0.07

### Action Space
Discrete action space with 3 possible actions:

* 0: Push left
* 1: No push
* 2: Push right

### Transition Dynamics
The car's movement is affected by gravity and the force applied. The car does not have enough power to climb the mountain directly and must build momentum by swinging back and forth.
### Reward Function
-1 for each timestep until the goal is reached. This encourages finding the quickest solution.
The episode ends when:

* The car position reaches the goal at position 0.5
* 200 timesteps are reached (failure)

### Rendering and Visualization
Provides a 2D visualization of the car in a valley between mountains.

In [None]:
env = gym.make('MountainCar-v0', render_mode='human')

observation, info = env.reset(seed=42)

while True:
    env.render()

    action = env.action_space.sample()
    
    observation, reward, terminated, truncated, info = env.step(action)
    
    print(f"Observation: {observation}")
    print(f"Reward: {reward}")
    print(f"Terminated: {terminated}")
    print(f"Truncated: {truncated}")
    print(f"Info: {info}")
    print("-" * 50)
    
    if terminated or truncated:
        break

# Close the environment
env.close()

# 3. Acrobot

![acrobot](https://gymnasium.farama.org/_images/acrobot.gif)

### State Space:
The state space consists of 6 continuous variables:

* cos(θ₁): Cosine of the first joint angle
* sin(θ₁): Sine of the first joint angle
* cos(θ₂): Cosine of the second joint angle
* sin(θ₂): Sine of the second joint angle
* θ̇₁: Angular velocity of the first joint
* θ̇₂: Angular velocity of the second joint

### Action Space:
Discrete action space with 3 possible actions:

* 0: Apply -1 torque to the joint between the two links
* 1: Apply 0 torque
* 2: Apply +1 torque

### Transition Dynamics:
The system simulates a double pendulum where the first link is fixed and the second link can move freely. The goal is to swing the end of the lower link up to a given height.
### Reward Function:
-1 for each timestep until the goal is reached, encouraging faster solutions.
### Episode Termination Conditions:
The episode ends when:

* The end of the second link reaches a height at least the length of one link above the base
* 500 timesteps are reached (failure)

### Rendering and Visualization:
Provides a 2D visualization of the double pendulum system.

In [None]:
env = gym.make('Acrobot-v1', render_mode='human')

observation, info = env.reset(seed=42)

while True:
    env.render()

    action = env.action_space.sample()
    
    observation, reward, terminated, truncated, info = env.step(action)
    
    print(f"Observation: {observation}")
    print(f"Reward: {reward}")
    print(f"Terminated: {terminated}")
    print(f"Truncated: {truncated}")
    print(f"Info: {info}")
    print("-" * 50)
    
    if terminated or truncated:
        break

env.close()

# 4. Lunar Lander

![lunar lander](https://gymnasium.farama.org/_images/lunar_lander.gif)

### State Space:
The state space consists of 8 continuous variables:

* Coordinates of the lander (x, y)
* Linear velocities (x, y)
* Angle of the lander
* Angular velocity
* Boolean indicating if left leg has contact with ground
* Boolean indicating if right leg has contact with ground

### Action Space:
Discrete action space with 4 possible actions:

* 0: Do nothing
* 1: Fire left engine
* 2: Fire main engine
* 3: Fire right engine

### Transition Dynamics:
The lander is affected by gravity, thrust from the engines, and collisions with the ground. The physics are simulated using Box2D.
### Reward Function:
Complex reward system:

Rewarded for moving toward the landing pad and for landing safely
Penalized for firing the engine (fuel usage)
* +100 for landing on the pad
* +10 for each leg that makes contact with the ground
* -100 for crashing
* -0.3 for firing the main engine
* -0.03 for firing the side engines

### Episode Termination Conditions:
The episode ends when:

* The lander crashes or comes to rest
* The lander goes outside the screen boundaries

### Rendering and Visualization:
Provides a 2D visualization of the lunar lander and landing pad.

In [None]:
env = gym.make('LunarLander-v3', render_mode='human')

observation, info = env.reset(seed=42)

while True:
    env.render()

    action = env.action_space.sample()
    
    observation, reward, terminated, truncated, info = env.step(action)
    
    print(f"Observation: {observation}")
    print(f"Reward: {reward}")
    print(f"Terminated: {terminated}")
    print(f"Truncated: {truncated}")
    print(f"Info: {info}")
    print("-" * 50)
    
    if terminated or truncated:
        break

env.close()

# 5. Bipedal Walker

![bipedal walker](https://gymnasium.farama.org/_images/bipedal_walker.gif)

### State Space:
The state space consists of 24 continuous variables including:

* Hull angle, hull angular velocity
* Hip joint angles and velocities
* Leg joint angles and velocities
* Contact with ground sensors
* Lidar rangefinder measurements

### Action Space:
Continuous action space with 4 dimensions:

* Hip joint 1 torque (range: -1.0 to 1.0)
* Knee joint 1 torque (range: -1.0 to 1.0)
* Hip joint 2 torque (range: -1.0 to 1.0)
* Knee joint 2 torque (range: -1.0 to 1.0)

### Transition Dynamics:
Complex physics simulation of a bipedal robot with two legs, each having two joints (hip and knee). The robot must learn to walk forward without falling.
### Reward Function:

* +300 for reaching the far end of the track
* -100 for falling
* Reward of +1 for moving forward, scaled by velocity
* Small penalties for torque and head-contact with ground

### Episode Termination Conditions:
The episode ends when:

* The walker reaches the far end of the track
* The walker falls
* 1600 timesteps are reached

### Rendering and Visualization:
Provides a 2D visualization of the bipedal walker and terrain.

In [None]:
env = gym.make('BipedalWalker-v3', render_mode='human')

observation, info = env.reset(seed=42)

while True:
    env.render()

    action = env.action_space.sample()
    
    observation, reward, terminated, truncated, info = env.step(action)
    
    print(f"Observation: {observation}")
    print(f"Reward: {reward}")
    print(f"Terminated: {terminated}")
    print(f"Truncated: {truncated}")
    print(f"Info: {info}")
    print("-" * 50)
    
    if terminated or truncated:
        break

env.close()