# Reinforcement Learning Practical Using Gymnasium

## Aim
To understand and implement basic Reinforcement Learning environments using Gymnasium. We will explore:
- Blackjack-v1
- CartPole-v1
- FrozenLake-v1

## Environment Setup

In [1]:
import gymnasium as gym
import numpy as np

## Part 1: Blackjack-v1

### Game Description
Blackjack is a card game played between a player and a dealer. The objective is to get a hand total as close as possible to 21 without exceeding it (busting).

### Card Values
- Number cards (2â€“10) are worth their face value.
- Face cards (Jack, Queen, King) are worth 10.
- Ace can be worth either 1 or 11. When it is counted as 11 without busting, it is called a usable ace.

### Game Rules
- Both player and dealer are dealt two cards: one face up and one face down.
- The player sees their own cards and the dealer's face-up card.
- The player can:
  - **Hit (1)**: Take another card.
  - **Stick (0)**: Stop taking cards.
- If the player's total exceeds 21, the player busts and loses immediately.
- After the player sticks, the dealer reveals the hidden card and draws until the total is at least 17.
- If the dealer busts, the player wins.
- If neither busts, the hand closer to 21 wins.

### Rewards
- Win: +1
- Draw: 0
- Lose: -1

### Observation Space
Tuple: (player_sum, dealer_card, usable_ace)

### Action Space
0 = STICK, 1 = HIT

In [2]:
env = gym.make('Blackjack-v1')
print(env.observation_space)
print(env.action_space)

Tuple(Discrete(32), Discrete(11), Discrete(2))
Discrete(2)


### Random Agent

In [3]:
for i_episode in range(5):
    state, info = env.reset()
    print(f"\nEpisode {i_episode+1}")
    while True:
        print("State:", state)
        action = env.action_space.sample()
        print("Action:", action)
        state, reward, terminated, truncated, info = env.step(action)
        if terminated or truncated:
            print("Final State:", state)
            print("Reward:", reward)
            break


Episode 1
State: (18, 8, 0)
Action: 1
Final State: (28, 8, 0)
Reward: -1.0

Episode 2
State: (11, 10, 0)
Action: 1
State: (17, 10, 0)
Action: 1
State: (21, 10, 0)
Action: 1
Final State: (31, 10, 0)
Reward: -1.0

Episode 3
State: (19, 8, 0)
Action: 0
Final State: (19, 8, 0)
Reward: 1.0

Episode 4
State: (8, 3, 0)
Action: 1
State: (17, 3, 0)
Action: 0
Final State: (17, 3, 0)
Reward: -1.0

Episode 5
State: (7, 3, 0)
Action: 0
Final State: (7, 3, 0)
Reward: -1.0


### Rule-Based Agent

In [4]:
def simple_blackjack_agent(state):
    player_sum, dealer_card, usable_ace = state
    if player_sum < 17:
        return 1  # HIT
    else:
        return 0  # STICK

In [5]:
wins = losses = draws = 0
num_episodes = 10

for ep in range(num_episodes):
    state, info = env.reset()
    done = False
    total_reward = 0
    while not done:
        action = simple_blackjack_agent(state)
        state, reward, terminated, truncated, info = env.step(action)
        done = terminated or truncated
        total_reward += reward
    if total_reward > 0: wins += 1
    elif total_reward < 0: losses += 1
    else: draws += 1
    print(f"Episode {ep+1}: Reward = {total_reward}")

print("Results:")
print("Wins:", wins, "Losses:", losses, "Draws:", draws)
print("Win Rate:", wins/num_episodes)

Episode 1: Reward = -1.0
Episode 2: Reward = 1.0
Episode 3: Reward = 1.0
Episode 4: Reward = 1.0
Episode 5: Reward = 1.0
Episode 6: Reward = -1.0
Episode 7: Reward = 1.0
Episode 8: Reward = -1.0
Episode 9: Reward = 1.0
Episode 10: Reward = 1.0
Results:
Wins: 7 Losses: 3 Draws: 0
Win Rate: 0.7


## Part 2: CartPole-v1

### Game Description
CartPole is a classic control problem where a pole is attached to a cart by a hinge. The goal is to keep the pole balanced upright by moving the cart left or right.

### State (Observation)
The observation is a vector of four continuous values:
- Cart position
- Cart velocity
- Pole angle
- Pole angular velocity

### Actions
- 0: Move cart left
- 1: Move cart right

### Game Rules
- The episode ends if the pole falls beyond a certain angle or the cart moves too far from the center.
- Each time step the pole remains balanced gives a reward of +1.

### Objective
Maximize the total time (steps) the pole stays balanced, which maximizes the cumulative reward.

In [6]:
env = gym.make('CartPole-v1')
print(env.observation_space)
print(env.action_space)

Box([-4.8               -inf -0.41887903        -inf], [4.8               inf 0.41887903        inf], (4,), float32)
Discrete(2)


### Random Agent for CartPole

In [7]:
for ep in range(3):
    state, info = env.reset()
    total_reward = 0
    while True:
        action = env.action_space.sample()
        state, reward, terminated, truncated, info = env.step(action)
        total_reward += reward
        if terminated or truncated:
            print(f"Episode {ep+1}, Total Reward: {total_reward}")
            break

Episode 1, Total Reward: 38.0
Episode 2, Total Reward: 20.0
Episode 3, Total Reward: 14.0


## Part 3: FrozenLake-v1

### Game Description
FrozenLake is a grid-world navigation problem. The agent starts at a starting tile (S) and must reach the goal tile (G) without falling into holes (H).

### Grid Symbols
- **S**: Starting position
- **F**: Frozen (safe) tile
- **H**: Hole (agent falls and loses)
- **G**: Goal (agent wins)

### Actions
- 0: Move Left
- 1: Move Down
- 2: Move Right
- 3: Move Up

### Game Rules
- The agent moves one tile per action.
- If the agent enters a hole, the episode ends with a loss.
- If the agent reaches the goal, the episode ends with a win.
- In slippery mode, actions may not always move in the intended direction.

### Rewards
- Reaching the goal: +1
- Falling in a hole or other moves: 0

### Objective
Learn a path from start to goal that avoids holes and reaches the goal consistently.

In [8]:
env = gym.make('FrozenLake-v1', is_slippery=False)
print(env.observation_space)
print(env.action_space)

Discrete(16)
Discrete(4)


### Random Agent for FrozenLake

In [9]:
for ep in range(5):
    state, info = env.reset()
    done = False
    print(f"\nEpisode {ep+1}")
    while not done:
        action = env.action_space.sample()
        state, reward, terminated, truncated, info = env.step(action)
        done = terminated or truncated
        print("State:", state, "Action:", action, "Reward:", reward)
    print("Episode Ended with Reward:", reward)


Episode 1
State: 0 Action: 0 Reward: 0
State: 0 Action: 3 Reward: 0
State: 1 Action: 2 Reward: 0
State: 2 Action: 2 Reward: 0
State: 6 Action: 1 Reward: 0
State: 7 Action: 2 Reward: 0
Episode Ended with Reward: 0

Episode 2
State: 1 Action: 2 Reward: 0
State: 1 Action: 3 Reward: 0
State: 5 Action: 1 Reward: 0
Episode Ended with Reward: 0

Episode 3
State: 4 Action: 1 Reward: 0
State: 8 Action: 1 Reward: 0
State: 8 Action: 0 Reward: 0
State: 4 Action: 3 Reward: 0
State: 4 Action: 0 Reward: 0
State: 5 Action: 2 Reward: 0
Episode Ended with Reward: 0

Episode 4
State: 0 Action: 0 Reward: 0
State: 0 Action: 0 Reward: 0
State: 0 Action: 3 Reward: 0
State: 0 Action: 3 Reward: 0
State: 1 Action: 2 Reward: 0
State: 2 Action: 2 Reward: 0
State: 2 Action: 3 Reward: 0
State: 3 Action: 2 Reward: 0
State: 3 Action: 2 Reward: 0
State: 7 Action: 1 Reward: 0
Episode Ended with Reward: 0

Episode 5
State: 0 Action: 3 Reward: 0
State: 4 Action: 1 Reward: 0
State: 4 Action: 0 Reward: 0
State: 5 Action: 

## Conclusion

- **Blackjack** shows decision making under uncertainty.
- **CartPole** demonstrates continuous control.
- **FrozenLake** demonstrates navigation in a grid world.

These environments help understand basic Reinforcement Learning concepts such as states, actions, rewards, and episodes.