<a href="https://colab.research.google.com/github/ProValarous/A-Simple-React-Aap/blob/master/lab1_v0.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Importing Libraries and Dependencies

In [None]:
# HIDE OUTPUT
!apt-get update > /dev/null 2>&1
!apt-get install cmake > /dev/null 2>&1
!pip install --upgrade setuptools 2>&1
!pip install ez_setup > /dev/null 2>&1
!pip install gym[atari] > /dev/null 2>&1



In [None]:
# HIDE OUTPUT
!pip install gym pyvirtualdisplay > /dev/null 2>&1
!apt-get install -y xvfb python-opengl ffmpeg > /dev/null 2>&1
!sudo apt-get install xvfb
!pip install xvfbwrapper

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
  libfontenc1 libxfont2 libxkbfile1 x11-xkb-utils xfonts-base xfonts-encodings
  xfonts-utils xserver-common
The following NEW packages will be installed:
  libfontenc1 libxfont2 libxkbfile1 x11-xkb-utils xfonts-base xfonts-encodings
  xfonts-utils xserver-common xvfb
0 upgraded, 9 newly installed, 0 to remove and 29 not upgraded.
Need to get 7,814 kB of archives.
After this operation, 12.0 MB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu jammy/main amd64 libfontenc1 amd64 1:1.1.4-1build3 [14.7 kB]
Get:2 http://archive.ubuntu.com/ubuntu jammy/main amd64 libxfont2 amd64 1:2.0.5-1build1 [94.5 kB]
Get:3 http://archive.ubuntu.com/ubuntu jammy/main amd64 libxkbfile1 amd64 1:1.1.0-1build3 [71.8 kB]
Get:4 http://archive.ubuntu.com/ubuntu jammy/main amd64 x11-xkb-utils amd64 7.7+5build4 [172 kB]
Get:5 http://archiv

In [None]:
!pip install gymnasium



In [None]:
import gym
# from gym.wrappers import Monitor
from gym.wrappers.record_video import RecordVideo
import glob
import io
import base64
from IPython.display import HTML
from pyvirtualdisplay import Display
from IPython import display as ipythondisplay

In [None]:
## Some Display Utility functions



display = Display(visible=0, size=(1400, 900))
display.start()

"""
Utility functions to enable video recording of gym environment
and displaying it.
To enable video, just do "env = wrap_env(env)""
"""


def show_video():
    mp4list = glob.glob('video/*.mp4')
    if len(mp4list) > 0:
        mp4 = mp4list[0]
        video = io.open(mp4, 'r+b').read()
        encoded = base64.b64encode(video)
        ipythondisplay.display(HTML(data='''<video alt="test" autoplay
                loop controls style="height: 400px;">
                <source src="data:video/mp4;base64,{0}" type="video/mp4" />
             </video>'''.format(encoded.decode('ascii'))))
    else:
        print("Could not find video")


def wrap_env(env):
    env = RecordVideo(env, './video')
    return env


## OpenAI Gymnasium

OpenAI Gymnasium (formerly known as OpenAI Gym) is a toolkit for developing and comparing **Reinforcement Learning (RL)** algorithms. It provides a wide range of **environments** that can simulate different kinds of tasks, making it easier to test, benchmark, and train RL agents.

### Key Features:
1. **Environment Collection:**
    - Gymnasium provides a variety of environments ranging from simple tasks like balancing a pole to more complex ones like playing video games or controlling robots. Each environment follows a standard interface to interact with the agent, which includes:
      - `reset()`: Resets the environment to the initial state.
      - `step(action)`: Takes an action and returns the next state, reward, done (whether the episode has finished), and additional information.
    
2. **Standardized Interface:**
    - The environments in Gymnasium share a standard API, allowing researchers and developers to write agents that can work across different tasks without changing much of their code.

3. **Compatibility with Popular Libraries:**
    - It can be easily integrated with other machine learning and deep learning frameworks such as TensorFlow, PyTorch, and RL-specific libraries like **Ray RLlib** and **Stable Baselines**.

4. **Benchmarking:**
    - Gymnasium allows easy comparison of algorithms by providing a consistent testing environment. Researchers can evaluate the performance of their RL algorithms on a standardized set of tasks.

5. **Variety of Use Cases:**
    - From **classical control** problems like CartPole and MountainCar to more complex tasks like **Atari games** or even **robot simulations**, Gymnasium supports diverse domains, enabling experimentation in many different areas.


## Key Components of Reinforcement Learning (RL) Paradigm

In **Reinforcement Learning (RL)**, the goal is for an agent to learn how to perform tasks by interacting with an environment. The agent makes decisions by taking actions, and the environment responds by transitioning to a new state and giving a reward. Over time, the agent learns to maximize the cumulative reward by improving its decision-making process.

Let's break down the key components using the **CartPole** environment as an example.

---

### 1. **Environment**
- The environment is the "world" or "simulation" in which the agent operates. It provides the setting where the agent can take actions and receive feedback in the form of states and rewards.
- In the **CartPole** environment, the task is to balance a pole on a moving cart. The environment defines:
    - The physics of the system (e.g., gravity and motion).
    - The action space (how the cart can move).
    - The observation space (the agent’s perception of the system).
    
    In CartPole, the environment will change based on the actions taken by the agent to keep the pole balanced.

---

### 2. **Observations**
- **Observation** refers to the data or information that the agent receives from the environment at each time step. It allows the agent to understand the current situation of the environment.
- In the CartPole environment, the observations include:
    - The position of the cart.
    - The velocity of the cart.
    - The angle of the pole.
    - The angular velocity of the pole.
    
    These four values describe the state of the system at a given time. The agent uses this information to decide which action to take.

---

### 3. **States**
- The **state** of the environment represents the complete description of the environment at a given time. In many cases, the observation is a partial view of the state, while the full state includes all internal variables.
- In CartPole, the **state** is represented by:
    - `cart position`: Where the cart is on the track.
    - `cart velocity`: How fast the cart is moving.
    - `pole angle`: The current tilt of the pole.
    - `pole angular velocity`: How fast the pole is rotating.

    Together, these values form the complete **state** of the CartPole system at a time step, and the agent uses them to determine how to act.

---

### 4. **Actions**
- The **action** is what the agent decides to do in response to the current state. The action changes the environment and moves it to a new state.
- In the CartPole environment, the agent has two possible actions:
    - Move the cart to the **left**.
    - Move the cart to the **right**.
    
    Based on the observations (e.g., if the pole is tilting to the right), the agent might take the action to move the cart to the left to counterbalance it.

---

### 5. **Rewards**
- The **reward** is the feedback from the environment that indicates how well the agent is performing its task. The agent’s goal is to maximize the total reward it receives over time.
- In CartPole, the agent receives a reward of **+1** for every time step that the pole remains upright. If the pole falls (or the cart moves too far off the track), the episode ends, and no more rewards are given.
    
    The longer the pole is balanced, the higher the total reward the agent earns. The agent’s goal is to take actions that keep the pole balanced for as long as possible.

---

### 6. **The RL Paradigm**
In RL, the agent and environment interact in a loop:

<img src="https://miro.medium.com/v2/resize:fit:908/1*the1cXDp1idTpZEvv1piAQ.png" alt="RL loop img">

1. **Agent observes** the environment’s current state.
2. **Agent selects** an action based on its observation.
3. **Environment responds** by transitioning to a new state and providing a reward.
4. **Agent learns** by updating its decision-making strategy based on the reward and the new state.

The agent repeats this process, gradually improving its actions to maximize the reward over time. In the case of CartPole, the agent’s task is to learn how to keep the pole balanced by repeatedly observing the system, taking corrective actions (moving left or right), and receiving rewards based on how long it can keep the pole upright.




## Basic Example: Cartpole Environment

In [None]:
import gymnasium as gym


In [None]:
# Create an environment (e.g., CartPole)
env = wrap_env(gym.make('CartPole-v1',render_mode="rgb_array"))

# Reset the environment
state = env.reset()

# Start the recorder (utility for displaying output)
env.start_video_recorder()

# Example of an interaction loop
for _ in range(1000):
    # Render the environment
    # env.render()

    # Sample random action from action space
    action = env.action_space.sample()

    # Step through the environment using the action
    next_state, reward, done,_, info = env.step(action)

    # Break the loop if the episode is done
    if done:
        break


# close the video recorder(utility for displaying output)
env.close_video_recorder()

# Close the environment
env.close()

  logger.warn(


Moviepy - Building video /content/video/rl-video-episode-0.mp4.
Moviepy - Writing video /content/video/rl-video-episode-0.mp4

Moviepy - Building video /content/video/rl-video-episode-0.mp4.
Moviepy - Writing video /content/video/rl-video-episode-0.mp4



t:   0%|          | 0/2 [00:00<?, ?it/s, now=None]
t:   0%|          | 0/1 [00:00<?, ?it/s, now=None][A


Moviepy - Done !
Moviepy - video ready /content/video/rl-video-episode-0.mp4
Moviepy - Done !
Moviepy - video ready /content/video/rl-video-episode-0.mp4
Moviepy - Building video /content/video/rl-video-episode-0.mp4.
Moviepy - Writing video /content/video/rl-video-episode-0.mp4





Moviepy - Done !
Moviepy - video ready /content/video/rl-video-episode-0.mp4


In [None]:
# Display Output
show_video()

### Here’s the line-by-line explanation:


```python
import gymnasium as gym
```
- **Explanation**: This line imports the Gymnasium library, which provides environments for reinforcement learning tasks. We import it using the alias `gym`, which is standard practice for ease of use.

---

```python
env = gym.make('CartPole-v1')
```
- **Explanation**: The `gym.make()` function creates a specific environment, in this case, `'CartPole-v1'`. This environment involves balancing a pole on a cart, a classical control problem in reinforcement learning. It initializes the environment object, which we store in the `env` variable.

---

```python
state = env.reset()
```
- **Explanation**: The `reset()` method resets the environment to its initial state and returns the first observation or state. This is typically used to start or restart an episode.

---

```python
for _ in range(1000):
```
- **Explanation**: This line starts a loop that runs for 1000 iterations. The underscore `_` is used when the loop variable isn’t important or needed. In reinforcement learning, this loop represents time steps during which the agent interacts with the environment.

---

```python
env.render()
```
- **Explanation**: The `render()` method visually displays the environment. In some environments, this might open a graphical window showing the agent interacting with its surroundings.

---

```python
action = env.action_space.sample()
```
- **Explanation**: The `action_space.sample()` method randomly selects an action from the action space of the environment. In this example, it picks a random action that the agent will take. The `action_space` defines all possible actions the agent can take.

---

```python
next_state, reward, done, info = env.step(action)
```
- **Explanation**: The `step()` method is where the agent interacts with the environment by taking an action. It returns:
    - `next_state`: The new state the environment is in after taking the action.
    - `reward`: The reward the agent receives for taking the action.
    - `done`: A boolean indicating if the episode has finished (whether the task is complete or failed).
    - `info`: Additional information from the environment, often used for debugging purposes.

---

```python
if done:
    break
```
- **Explanation**: This condition checks if the episode is finished (`done` is `True`). If the task is complete, the loop breaks, stopping further interaction. This prevents unnecessary steps after the goal is reached or failure occurs.

---

```python
env.close()
```
- **Explanation**: The `close()` method closes the environment properly, releasing any resources like windows or processors that were used for rendering the environment. It’s good practice to close environments after they are no longer needed.




## Cartpole implemention in Gymnasium

### **Action Space**
The action is a ndarray with shape `(1,)` which can take values `{0, 1}` indicating the direction of the fixed force the cart is pushed with.

* 0: Push cart to the left

* 1: Push cart to the right

Note: The velocity that is reduced or increased by the applied force is not fixed and it depends on the angle the pole is pointing. The center of gravity of the pole varies the amount of energy needed to move the cart underneath it.

### **Observation Space**

The observation is a ndarray with the values corresponding to the following positions and velocities:

**(inset table)**

Note: While the ranges above denote the possible values for observation space of each element, it is not reflective of the allowed values of the state space in an unterminated episode. Particularly:

The cart x-position `(index 0)` can be take values between `(-4.8, 4.8)`, but the episode terminates if the cart leaves the `(-2.4, 2.4)` range.


The pole angle can be observed between `(-.418, .418)` radians (or ±24°), but the episode terminates if the pole angle is not in the range `(-.2095, .2095)` (or ±12°)

The **observation space** in the CartPole environment is represented as a NumPy `ndarray` with a shape of `(4,)`. This means that it is a one-dimensional array containing 4 elements, each representing a specific observation related to the state of the system.

The structure of the array will look like this:

```python
observation = np.array([cart_position, cart_velocity, pole_angle, pole_angular_velocity])
```

### Each element in the `ndarray`:
1. **Cart Position**:
   - Describes the position of the cart along the track.
   - Ranges from `-4.8` to `4.8`.

2. **Cart Velocity**:
   - Represents the speed of the cart.
   - No fixed bounds (`-inf` to `inf`), but its value will vary based on how fast the cart is moving.

3. **Pole Angle**:
   - Describes the tilt of the pole relative to being perfectly vertical.
   - Ranges approximately between `-0.418` radians (around `-24°`) and `0.418` radians (around `24°`).

4. **Pole Angular Velocity**:
   - The rate at which the pole is tilting or rotating.
   - No fixed bounds (`-inf` to `inf`), but it changes depending on how quickly the pole is falling or being corrected.

#### Example `ndarray` structure:
Suppose the environment gives the following values for each observation:
- Cart position: `1.2`
- Cart velocity: `0.5`
- Pole angle: `0.1` radians
- Pole angular velocity: `-0.02`

The `ndarray` representing this observation would look like:
```python
observation = np.array([1.2, 0.5, 0.1, -0.02])
```

This array fully describes the current state of the CartPole environment at a specific point in time, and the agent will use this information to decide its next action.

### **Rewards**
Since the goal is to keep the pole upright for as long as possible, a reward of `+1` for every step taken, including the termination step, is allotted. The threshold for rewards is 500 for v1 and 200 for v0.

### **Episode End**
The episode ends if any one of the following occurs:

* Termination: Pole Angle is greater than ±12°

* Termination: Cart Position is greater than ±2.4 (center of the cart reaches the edge of the display)

* Truncation: Episode length is greater than 500 (200 for v0)

## Creating a Custom Gymnasium Environment

In [None]:
!pip install jdc

Collecting jdc
  Downloading jdc-0.0.9-py2.py3-none-any.whl.metadata (817 bytes)
Downloading jdc-0.0.9-py2.py3-none-any.whl (2.1 kB)
Installing collected packages: jdc
Successfully installed jdc-0.0.9


In [None]:
import numpy as np
import pygame

import gymnasium as gym
from gymnasium import spaces
import jdc

### Declaration and Initialization

Our custom environment will be based on the abstract class `gymnasium.Env`. Don't forget to add the `metadata` attribute to your class. In this, you'll specify the render modes your environment supports (like "human", "rgb_array", or "ansi") and the framerate for rendering. Every environment must support `None` as a render mode, but you don’t need to add it to `metadata`. For example, in our `GridWorldEnv`, we’ll support “rgb_array” and “human” modes and render at 4 frames per second (FPS).

The `__init__` method of the environment will take an integer `size`, which determines the grid's size. Inside `__init__`, we’ll set up variables for rendering and define the environment’s `observation_space` and `action_space`. In our case, the observation will give the location of both the agent and the target on a 2D grid. We’ll represent observations as dictionaries with keys "agent" and "target". For example, an observation could look like `{"agent": array([1, 0]), "target": array([0, 3])}`. Since there are 4 possible actions (“right”, “up”, “left”, “down”), we’ll use `Discrete(4)` for the action space. Here’s how the `GridWorldEnv` class and the `__init__` method are implemented:"

In [None]:
class GridWorldEnv(gym.Env):
    metadata = {"render_modes": ["human", "rgb_array"], "render_fps": 4}
    # super().reset()

    def __init__(self, render_mode=None, size=5, perc_num_obstacle = 30 ):

        self.size = size  # The size of the square grid
        self.window_size = 512  # The size of the PyGame window
        self.perc_num_obstacle = perc_num_obstacle
        self._num_obstacles = int((self.perc_num_obstacle/100)*(self.size*self.size))

        # Observations are dictionaries with the agent's and the target's location.
        # Each location is encoded as an element of {0, ..., `size`}^2, i.e. MultiDiscrete([size, size]).

        self.observation_space = self._make_observation_space() # implement self._make_observation_space()

        # We have 4 actions, corresponding to "right", "up", "left", "down"
        self.action_space = self._make_action_space() # implement self._make_action_space()

        self._action_to_direction = self._action_to_direction() # implement self._action_to_direction()

        assert render_mode is None or render_mode in self.metadata["render_modes"]
        self.render_mode = render_mode

        """
        If human-rendering is used, `self.window` will be a reference
        to the window that we draw to. `self.clock` will be a clock that is used
        to ensure that the environment is rendered at the correct framerate in
        human-mode. They will remain `None` until human-mode is used for the
        first time.
        """
        self.window = None
        self.clock = None

In [None]:
%%add_to GridWorldEnv
def _make_observation_space(self):
        observation_space = spaces.Dict(
                {
                    "agent": spaces.Box(0, self.size - 1, shape=(2,), dtype=int),
                    "target": spaces.Box(0, self.size - 1, shape=(2,), dtype=int),
                }
            )
        return observation_space


In [None]:
%%add_to GridWorldEnv
def _make_action_space(self):
    action_space = spaces.Discrete(4)
    return action_space

In [None]:
%%add_to GridWorldEnv
def _action_to_direction(self):
    """
        The following dictionary maps abstract actions from `self.action_space` to
        the direction we will walk in if that action is taken.
        I.e. 0 corresponds to "right", 1 to "up" etc.
    """
    action_to_direction = {
            0: np.array([1, 0]),
            1: np.array([0, 1]),
            2: np.array([-1, 0]),
            3: np.array([0, -1]),}

    return action_to_direction

### Constructing Observations From Environment States

Since we will need to compute observations both in `reset` and `step`, it is often convenient to have a (private) method `_get_obs` that translates the environment’s state into an observation.

In [None]:
%%add_to GridWorldEnv
def _get_obs(self):
    return {"agent": self._agent_location, "target": self._target_location}

We can also implement a similar method for the auxiliary information that is returned by `step` and `reset`. In our case, we would like to provide the manhattan distance between the agent and the target:

In [None]:
%%add_to GridWorldEnv
def _get_info(self):
    return {
        "distance": np.linalg.norm(
            self._agent_location - self._target_location, ord=1
        )
    }

Oftentimes, info will also contain some data that is only available inside the step method (e.g. individual reward terms). In that case, we would have to update the dictionary that is returned by `_get_info` in `step`.



### Reset

The `reset` method is used to start a new episode. You can assume that the `step` method won't be called before `reset` has been called. Additionally, `reset` should be called whenever a `done` signal is issued to restart the environment. The `reset` method may accept a `seed` parameter, which allows users to set the random number generator to a predictable (deterministic) state. It’s recommended to use the random number generator `self.np_random` provided by the environment’s base class, `gymnasium.Env`.

If you only use this random generator, seeding is straightforward. However, you must remember to call `super().reset(seed=seed)` to ensure the base class correctly seeds the RNG. After that, we can randomly set the environment's state. In our case, this involves randomly selecting both the agent's starting position and the target's location, ensuring that the target doesn’t overlap with the agent’s position.

The `reset` method should return a tuple containing the initial observation and any auxiliary information. To create these, we can use the `_get_obs` and `_get_info` methods we implemented earlier.

In [None]:
%%add_to GridWorldEnv
def reset(self, seed=None, options=None,start_location="random",target_location="random"):
    # We need the following line to seed self.np_random
    # super().reset(seed=seed)

    # Choose the agent's location uniformly at random
    self._start_location = self._initialize_start_location(loc=start_location) # implement _initialize_start_obstacle()

    # We will sample the target's location randomly until it does not coincide with the agent's location
    self._target_location = self._initialize_target_location(loc=target_location) # implement _initialize_target_obstacle()
    while np.array_equal(self._start_location, self._target_location):
        self._target_location = self._initialize_target_location(loc=target_location)

    self._agent_location = self._start_location

    self._obstacle_location = self._initialize_obstacle() # implement _initialize_obstacle()

    observation = self._get_obs()
    info = self._get_info()

    if self.render_mode == "human":
        self._render_frame()

    return observation, info

In [None]:
%%add_to GridWorldEnv
def _initialize_start_location(self,loc="random"):
    if loc=='random':
        start_location = self.np_random.integers(0, self.size, size=2, dtype=int)
    else:
        start_location = np.array(loc)

    return start_location

In [None]:
%%add_to GridWorldEnv
def _initialize_target_location(self,loc="random"):
    if loc=='random':
        target_location = self.np_random.integers(0, self.size, size=2, dtype=int)
    else:
        target_location = np.array(loc)

    return target_location

In [None]:
%%add_to GridWorldEnv
def _initialize_obstacle(self):
        obstacle_location = []
        for i in range(self._num_obstacles):
            temp = self.np_random.integers(0, self.size, size=2, dtype=int)
            cond1 = np.array_equal(temp, self._start_location)
            cond2 = np.array_equal(temp, self._target_location)
            # cond3 = any(np.array_equal(temp, arr) for arr in self._obstacle_location)
            while  cond1 or cond2:
                temp = self.np_random.integers(
                0, self.size, size=2, dtype=int
            )
            obstacle_location.append(temp)
        return obstacle_location

### Step

The `step` method contains most of the logic for your environment. It takes an action, calculates the new state of the environment based on that action, and returns a 5-tuple: `(observation, reward, terminated, truncated, info)`. You can check out `gymnasium.Env.step()` for more details.

After computing the new state, we check if it’s a terminal state, and if so, we set `done` accordingly. In our `GridWorldEnv`, since we are using sparse binary rewards, calculating the reward is straightforward once we know if the episode is done. To get the observation and info, we can use the `_get_obs` and `_get_info` methods, just like before.

In [None]:
%%add_to GridWorldEnv
def _reward_system(self,agent_location):
    agent_win = np.array_equal(self._agent_location, self._target_location)
    obstacle_hit = any(np.array_equal(self._agent_location, arr) for arr in self._obstacle_location)

    # reward logic
    if obstacle_hit:
        reward = -1
    elif agent_win:
        reward = 1
    else:
        reward = 0

    return reward

In [None]:
%%add_to GridWorldEnv
def step(self, action):
    # Map the action (element of {0,1,2,3}) to the direction we walk in
    direction = self._action_to_direction[action]
    # We use `np.clip` to make sure we don't leave the grid
    self._agent_location = np.clip(
        self._agent_location + direction, 0, self.size - 1
    )

    # An episode is done iff the agent has reached the target
    terminated = np.array_equal(self._agent_location, self._target_location)

    reward = self._reward_system(self._agent_location) # implement _reward_system()

    observation = self._get_obs()

    info = self._get_info()

    if self.render_mode == "human":
        self._render_frame()

    return observation, reward, terminated, False, info

### Render

Here, we are using PyGame for rendering. A similar approach to rendering is used in many environments that are included with Gymnasium and you can use it as a skeleton for your own environments:

In [None]:
%%add_to GridWorldEnv
def render(self):
    if self.render_mode == "rgb_array":
            return self._render_frame()

def _render_frame(self):
    if self.window is None and self.render_mode == "human":
        pygame.init()
        pygame.display.init()
        self.window = pygame.display.set_mode(
            (self.window_size, self.window_size)
        )
    if self.clock is None and self.render_mode == "human":
        self.clock = pygame.time.Clock()

    canvas = pygame.Surface((self.window_size, self.window_size))
    canvas.fill((255, 255, 255))
    pix_square_size = (
        self.window_size / self.size
    )  # The size of a single grid square in pixels

    # First we draw the target
    pygame.draw.rect(
        canvas,
        (0, 255, 0),
        pygame.Rect(
            pix_square_size * self._target_location,
            (pix_square_size, pix_square_size),
        ),
    )

    # First we draw the start
    pygame.draw.rect(
        canvas,
        (255, 0, 0),
        pygame.Rect(
            pix_square_size * self._start_location,
            (pix_square_size, pix_square_size),
        ),
    )

        # First we draw the obstacles
    for i in range(self._num_obstacles):
        pygame.draw.rect(
            canvas,
            (0, 0, 0),
            pygame.Rect( # Rect(left, top, width, height) -> Rect
                pix_square_size * self._obstacle_location[i], #loc
                (pix_square_size, pix_square_size), # size
            ),
        )

    # Now we draw the agent
    pygame.draw.circle(
        canvas,
        (0, 0, 255),
        (self._agent_location + 0.5) * pix_square_size,
        pix_square_size / 3,
    )

    # Finally, add some gridlines
    for x in range(self.size + 1):
        pygame.draw.line(
            canvas,
            0,
            (0, pix_square_size * x),
            (self.window_size, pix_square_size * x),
            width=3,
        )
        pygame.draw.line(
            canvas,
            0,
            (pix_square_size * x, 0),
            (pix_square_size * x, self.window_size),
            width=3,
        )

    if self.render_mode == "human":
        # The following line copies our drawings from `canvas` to the visible window
        self.window.blit(canvas, canvas.get_rect())
        pygame.event.pump()
        pygame.display.update()

        # We need to ensure that human-rendering occurs at the predefined framerate.
        # The following line will automatically add a delay to keep the framerate stable.
        self.clock.tick(self.metadata["render_fps"])
    else:  # rgb_array
        return np.transpose(
            np.array(pygame.surfarray.pixels3d(canvas)), axes=(1, 0, 2)
        )

In [None]:
%%add_to GridWorldEnv
def render(self):
    if self.render_mode == "rgb_array":
            return self._render_frame()

### Close

The `close` method should close any open resources that were used by the environment. In many cases, you don’t actually have to bother to implement this method. However, in our example `render_mode` may be `"human"` and we might need to close the window that has been opened:

In [None]:
%%add_to GridWorldEnv
def close(self):
    if self.window is not None:
        pygame.display.quit()
        pygame.quit()

In other environments `close` might also close files that were opened or release other resources. You shouldn’t interact with the environment after having called `close`.

### Executing Custom Environment



In [None]:
import gymnasium as gym

# Create an environment (e.g., CartPole)
env = wrap_env(GridWorldEnv(render_mode="rgb_array", size=10, perc_num_obstacle=30))

# Reset the environment
state,_ = env.reset()

# Start the recorder (utility for displaying output)
env.start_video_recorder()

# Example of an interaction loop
for _ in range(100):
    # Render the environment
    # env.render()

    # Sample random action from action space
    action = env.action_space.sample()

    # Step through the environment using the action
    next_state, reward, done,_, info = env.step(action)

    # Break the loop if the episode is done
    if done:
        break


# close the video recorder(utility for displaying output)
env.close_video_recorder()

# Close the environment
env.close()

  logger.warn(


Moviepy - Building video /content/video/rl-video-episode-0.mp4.
Moviepy - Writing video /content/video/rl-video-episode-0.mp4





Moviepy - Done !
Moviepy - video ready /content/video/rl-video-episode-0.mp4
Moviepy - Building video /content/video/rl-video-episode-0.mp4.
Moviepy - Writing video /content/video/rl-video-episode-0.mp4





Moviepy - Done !
Moviepy - video ready /content/video/rl-video-episode-0.mp4
Moviepy - Building video /content/video/rl-video-episode-0.mp4.
Moviepy - Writing video /content/video/rl-video-episode-0.mp4





Moviepy - Done !
Moviepy - video ready /content/video/rl-video-episode-0.mp4


In [None]:
# Display Output
show_video()

## References



1.   https://github.com/Farama-Foundation/Gymnasium



