# **Reinforcement Learning**
<img align="right" src="https://vitalflux.com/wp-content/uploads/2020/12/Reinforcement-learning-real-world-example.png">

- In reinforcement learning, your system learns how to interact intuitively with the environment by basically doing stuff and watching what happens

if you need the last version of gym use block of code below:
```
!pip uninstall gym -y
!pip install gym
```

In [None]:
# !pip install -U gym==0.25.2
!pip install swig
!pip install gymnasium[atari]
!pip install gymnasium[box2d]
!pip install gymnasium[accept-rom-license]
# !pip install autorom[accept-rom-license]

In [None]:
# !pip install --upgrade IPython

In [4]:
import numpy as np
import matplotlib.pyplot as plt
import gymnasium as gym
from IPython.core.display import HTML
from base64 import b64encode
from gym.wrappers import record_video, record_episode_statistics
from gym.wrappers import RecordVideo, RecordEpisodeStatistics
import torch

In [5]:
# turn off or turn on warning messages in Jupyter notebooks
import warnings
warnings.filterwarnings('ignore')

In [6]:
# Check gym version
gym.__version__

'1.2.0'

## **Create environment**
<img width="600" align="right" src="https://gymnasium.farama.org/_images/AE_loop_dark.png">

we use gym from OPenAl for create our invironment
- first define to funciton:
    1. create and save video `create_env`
    2. display saved video `display_video`

- in the time this notebook created the gym version is `0.26.2` but running veriosn on colab is `0.25.2` which I worked with

<br>

to record video in mac or linux we need `ffmpeg` pakage
- MAC: `brew install ffmpeg`
- linux: `apt install ffmpeg`

**note**: in lunux first run `apt update && apt upgtade -y`

after create env game instance use code below:
 ```
 video_dir = "/video/"
 env = gym.wrapper.Monitor(env, video_dir)
 ```


In [11]:
def display_video(episode=0, video_width=600):
    """
    Displays a video from a specified episode with customizable width.

    Args:
        episode (int): The episode number to load the video for. Defaults to 0.
        video_width (int): The width of the video player in pixels. Defaults to 600.

    Returns:
        IPython.display.HTML: An HTML video element that can be rendered in Jupyter notebooks.

    Note:
        - The function expects video files to be in './video/' directory with naming format 'rl-video-episode-{N}.mp4'
        - Videos are base64 encoded and embedded directly in the HTML for display
    """
    # Construct the path to the video file based on episode number
    video_path = f"./video/rl-video-episode-{episode}.mp4"

    # Read the video file as binary data
    video_file = open(video_path, "rb").read()

    # Encode the binary video data as base64 string
    decoded = b64encode(video_file).decode()

    # Create a data URL for the video
    video_url = f"data:video/mp4;base64,{decoded}"

    # Return an HTML video element with the embedded video
    return HTML(f"""<video width="{video_width}"" controls><source src="{video_url}"></video>""")

In [8]:
def create_env(name, render_mode="rgb_array", record=False, eps_record=50, video_folder='./video'):
    """
    Creates and configures a Gym environment with optional video recording and statistics tracking.

    Args:
        name (str): Name of the Gym environment to create (e.g., 'CartPole-v1')
        render_mode (str): Rendering mode - "human", "rgb_array", or "ansi". Defaults to "rgb_array"
        record (bool): Whether to record videos of the environment. Defaults to False
        eps_record (int): Record a video every N episodes (when record=True). Defaults to 50
        video_folder (str): Directory to save recorded videos. Defaults to './video'

    Returns:
        gym.Env: Configured Gym environment wrapped with recording and statistics tracking

    Note:
        - When record=True, videos will be saved in the specified folder with automatic naming
        - The environment is always wrapped with episode statistics tracking
    """
    # Create base Gym environment with specified render mode
    env = gym.make(name, render_mode=render_mode)

    # Optionally wrap environment with video recorder
    if record:
        # Record video every eps_record episodes (trigger function)
        env = RecordVideo(env, video_folder=video_folder,
                         episode_trigger=lambda x: x % eps_record == 0)

    # Always wrap environment with episode statistics tracker
    env = RecordEpisodeStatistics(env)

    return env

### **Make our first games and see how it work**

#### **Lunar Lander**
<img align="center" width="400" src="https://www.gymlibrary.dev/_images/lunar_lander.gif">

In [9]:
!rm -r ./video
env = create_env("LunarLander-v3", "rgb_array", record=True, eps_record=1)
for episode in range(10):
    done = False
    env.reset()
    while not done:
        action = env.action_space.sample()
        state, reward, done, info =  env.step(action)
    if episode % 5 == 0 and episode > 0:
        print(f"- Game completed at episode: {episode}")

rm: cannot remove '/content/video': No such file or directory
- Game completed at episode: 5


In [12]:
display_video(0)

#### **Space Invador**
<img align="center" width="200" src="https://www.gymlibrary.dev/_images/space_invaders.gif">

In [14]:
import ale_py

# gym.register_envs(ale_py)

In [15]:
!rm -r ./video
env = create_env("ALE/SpaceInvaders-v5", 'rgb_array', record=True, eps_record=1)
for episode in range(10):
    done = False
    env.reset()
    while not done:
        action = env.action_space.sample()
        state, reward, done, info =  env.step(action)
    if episode % 5 == 0 and episode > 0:
        print(f"- Game completed at episode: {episode}")

rm: cannot remove '/content/video': No such file or directory
- Game completed at episode: 5


In [16]:
display_video(8, 400)

#### **Cartpole**
<img align="center" width="300" src="https://www.gymlibrary.dev/_images/cart_pole.gif">

In [21]:
!rm -r ./video
env = create_env("CartPole-v1", 'rgb_array', record=True, eps_record=1)
for episode in range(10):
    done = False
    env.reset()
    while not done:
        action = env.action_space.sample()
        state, reward, done, info =  env.step(action)
    if episode % 5 == 0 and episode > 0:
        print(f"- Game completed at episode: {episode}")

rm: cannot remove './video': No such file or directory
- Game completed at episode: 5


In [22]:
display_video(1)

### **How well the agent perform!?**
we can simulate many episodes and then average the total reward for and individual episode. **the average total reward** will tell us about the performance ot the agent that take random action

In [None]:
!rm -r /content/video
env = create_env("ALE/SpaceInvaders-v5", 'rgb_array', record=True, eps_record=1)
total_rewards = []
n_episodes = 100
for episode in range(n_episodes):
    done = False
    env.reset()
    total_reward = 0
    while not done:
        action = env.action_space.sample()
        state, reward, done, info =  env.step(action)
        total_reward += reward

    total_rewards.append(total_reward)
    if episode % 20 == 0 and episode > 0:
        print(f"- Game completed at episode: {episode}")

- Game completed at episode: 20
- Game completed at episode: 40
- Game completed at episode: 60
- Game completed at episode: 80


In [None]:
avg_tot_reward = sum(total_rewards) / n_episodes
print(f"Average total reward over {n_episodes} episodes: {avg_tot_reward}")

Average total reward over 100 episodes: 143.75
