# **Reinforcement Learning**
<img align="right" src="https://vitalflux.com/wp-content/uploads/2020/12/Reinforcement-learning-real-world-example.png">

- In reinforcement learning, your system learns how to interact intuitively with the environment by basically doing stuff and watching what happens

if you need the last version of gym use block of code below:
```
!pip uninstall gym -y
!pip install gym
```

In [None]:
# !pip install -U gym==0.25.2
!pip install gym[atari]
!pip install autorom[accept-rom-license]
!pip install swig
!pip install gym[box2d]

In [111]:
# !pip install --upgrade IPython

In [153]:
import numpy as np
import matplotlib.pyplot as plt
import gym
from IPython.core.display import HTML
from base64 import b64encode
from gym.wrappers import record_video, record_episode_statistics
from gym.wrappers import RecordVideo, RecordEpisodeStatistics
import torch

In [156]:
# turn off or turn on warning messages in Jupyter notebooks
import warnings
warnings.filterwarnings('ignore')

In [113]:
# Check gym version
gym.__version__

'0.25.2'

## **Create invironment**
<img width="600" align="right" src="https://shirsho-12.github.io/images/rl/gym.png">

we use gym from OPenAl for create our invironment
- first define to funciton:
    1. create and save video `create_env`
    2. display saved video `display_video`

- in the time this notebook created the gym version is `0.26.2` but running veriosn on colab is `0.25.2` which I worked with

<br>

to record video in mac or linux we need `ffmpeg` pakage
- MAC: `brew install ffmpeg`
- linux: `apt install ffmpeg`

**note**: in lunux first run `apt update && apt upgtade -y`

after create env game instance use code below:
 ```
 video_dir = "/video/"
 env = gym.wrapper.Monitor(env, video_dir)
 ```


In [143]:
def display_video(episode=0, video_width=600):
    video_path = f"/content/video/rl-video-episode-{episode}.mp4"
    video_file = open(video_path, "rb").read()
    decoded = b64encode(video_file).decode()
    video_url = f"data:video/mp4;base64,{decoded}"
    return HTML(f"""<video width="{video_width}"" controls><source src="{video_url}"></video>""")

In [124]:
def create_env(name, render_mode=None):
    # render mode: "human", "rgb_array", "ansi")
    env = gym.make(name, new_step_api=True, render_mode=render_mode)
    env = RecordVideo(env, video_folder='/content/video')
    return env

### **Make our first games and see how it work**

#### **Lunar Lander**
<img align="center" width="400" src="https://www.gymlibrary.dev/_images/lunar_lander.gif">

In [None]:
!rm -r /content/video
env = create_env("LunarLander-v2", "rgb_array")
for episode in range(10):
    done = False
    env.reset()
    while not done:
        action = env.action_space.sample()
        state, reward, done, info =  env.step(action)
    if episode % 5 == 0 and episode > 0:
        print(f"- Game completed at episode: {episode}")

In [None]:
display_video(0)

#### **Space Invador**
<img align="center" width="200" src="https://www.gymlibrary.dev/_images/space_invaders.gif">

In [None]:
!rm -r /content/video
env = create_env("ALE/SpaceInvaders-v5", 'rgb_array')
for episode in range(10):
    done = False
    env.reset()
    while not done:
        action = env.action_space.sample()
        state, reward, done, info =  env.step(action)
    if episode % 5 == 0 and episode > 0:
        print(f"- Game completed at episode: {episode}")

In [None]:
display_video(8, 400)

#### **Cartpole**
<img align="center" width="300" src="https://www.gymlibrary.dev/_images/cart_pole.gif">

In [None]:
!rm -r /content/video
env = create_env("CartPole-v1", 'rgb_array')
for episode in range(10):
    done = False
    env.reset()
    while not done:
        action = env.action_space.sample()
        state, reward, done, info =  env.step(action)
    if episode % 5 == 0 and episode > 0:
        print(f"- Game completed at episode: {episode}")

In [None]:
display_video(1)

### **How well the agent perform!?**
we can simulate many episodes and then average the total reward for and individual episode. **the average total reward** will tell us about the performance ot the agent that take random action

In [155]:
!rm -r /content/video
env = create_env("ALE/SpaceInvaders-v5", 'rgb_array')
total_rewards = []
n_episodes = 100
for episode in range(n_episodes):
    done = False
    env.reset()
    total_reward = 0
    while not done:
        action = env.action_space.sample()
        state, reward, done, info =  env.step(action)
        total_reward += reward

    total_rewards.append(total_reward)
    if episode % 20 == 0 and episode > 0:
        print(f"- Game completed at episode: {episode}")

- Game completed at episode: 20
- Game completed at episode: 40
- Game completed at episode: 60
- Game completed at episode: 80


In [157]:
avg_tot_reward = sum(total_rewards) / n_episodes
print(f"Average total reward over {n_episodes} episodes: {avg_tot_reward}")

Average total reward over 100 episodes: 143.75
