# Lab 01 Setup and installation environment for reinforcement learning

## Reinforcement learning

Reinforcement Learning (RL) is a machine learning technique that enables an agent to learn in an interactive environment by trial and error using feedback on its actions and experiences. RL uses rewards and punishment as signals for "good" and "bad" behavior.

Reinforcement learning compound with:
1. Environment
2. Agent

Generally, at each step, 
1. The **agent** outputs an ***action***, which is input to the **environment**.
2. The **environment** evolves according to its dynamics and change to be ***new state***.
3. The **agent** observes the ***new state*** of the **environment** and (optionally) a ***reward***

The process continues until hopefully the agent learns what behavior maximizes its reward.

<img src="img/RL.jpg" title="Introduction" style="width: 600px;" />

In machine learning, our responsibility is to create and design **agent** as smart as possible. However, we cannot let our agent learns without environment. In this lab, we must install python, pytorch, and openAI gym.

## OpenAI Gym

One of the popular simulation environment for RL is OpenAI Gym.

[OpenAI](https://openai.com) is a research company trying to develop systems exhibiting *artificial general intelligence* (AGI).
They developed Gym to support the development of RL algorithms. Gym
provides many reinforcement learning simulations and tasks. Visit [the Gym website](https://gym.openai.com) for a full list of environments.

<img src="img/RL_gym.PNG" title="Gym example" style="width: 600px;" />

## Install Environment in your PC

***Note***: you can use *google colab*, if your computer does not support.

***Note2***: No need to follow this instruction, if you can do it, show your result to me is fine.

**System requirement**: Ubuntu (Linux), Nvidia GPU (optional, really, but you may cry when try to run more complex agent)
    
Things to install for desktop version:
1. VSCode, or PyCharm, or jupyter, or conda, or Visual Studio (For windows user only) --> Select one as you like
2. Python 3.8.xx or upper
3. important Libraries: numpy, matplotlib, etc.
4. PyTorch library (for NVidia GPU need cuda version)
5. mujoco_py library
6. OpenAI library: gym

## VSCode
Visual Studio Code is a lightweight yet full featured cross platform IDE for software development that has recently caught up in terms of capabilities and popularity with other popular IDEs for Python such as PyCharm. It is reputed to be easier to configure and use, also. We'll give it a try this semester. Download and install VSCode from the [Visual Studio downloads page](https://code.visualstudio.com/download).

### For Windows User
You can use Windows but there is not manual in here. Please follow from the links:
1. Install python from the [link](https://www.python.org/downloads/windows/), select version 3.8.xx (You can use more latest version, but I cannot confirm it works properly. At least I know 3.9.xx is working properly.)
2. Install [visual studio code](https://code.visualstudio.com/docs/python/python-tutorial) and setup python interpreter.
3. Install pytorch from [link](https://pytorch.org/)
4. Install [OpenAI Gym](https://towardsdatascience.com/how-to-install-openai-gym-in-a-windows-environment-338969e24d30)
5. Install [OpenAI Gym with Box2D and Mujoco](https://medium.com/@sayanmndl21/install-openai-gym-with-box2d-and-mujoco-in-windows-10-e25ee9b5c1d5)

### For Linux User

#### Step 1: Install VS code

Download VS code from [here](https://code.visualstudio.com/download) and install it.

#### Step 2: Install Python

For the person who already has python program can skip this.

1. Open *terminal*
2. Do Step
    1. Update and Refresh Repository lists:
        - `$ sudo apt-get update`
        - `$ sudo apt-get upgrade`
    2. Install Python:
        - `$ sudo apt-get install build-essential cmake python3-numpy python3-dev python3-tk libavcodec-dev libavformat-dev libavutil-dev libswscale-dev libavresample-dev libdc1394-dev libeigen3-dev libgtk-3-dev libvtk7-qt-dev`
        - check python version: `$ python --version` or `$ python3 --version`
    3. Upgrade pip3: `sudo -H pip3 install --upgrade pip`
    4. Install Jupyter: `$ pip install jupyter` (optional)
        - Test Run Jupyter Notebook: `$ jupyter notebook`
        
3. Open VS code:
    1. Before do anything, create a folder which you want to put your code into
    2. At VS code, go to *File* --> *Open Folder...* --> Select the folder path.
    
    <img src="img/VScod01.PNG" style="width: 600px;">
    
    3. You can save **Workspace** to link the folder to workspace file. Go to *File* --> *Save Workspace* --> set the workspace name. After that you can open your code via workspace file.
    4. Create a new file by click at icon new file. Type file name and type of python file
    
        <img src="img/VScod02.PNG" style="width: 300px;">
        
        - .py : Python script file. It runs fast but difficult to show simulator.
        - .ipynb : Jupyter notebook file. It can make note, like lecture note. Coding can stop by cell. Easy to see your simulation, but it is slower and possible to collapse easily.
    5. Select interpreter as below:
    
    <img src="img/VScod03.PNG" style="width: 600px;">
    
    6. If using Jupyter notebook file, you can select interpreter at here:
    
    <img src="img/VScod04.PNG" style="width: 600px;">
    
#### Step 3: Install PyTorch

Go to this [link](https://pytorch.org/) scroll down until found the page as below. Select one option (Usually select Linux, Pip, Python, CUDA10.2 or CUDA11.3 or CPU, but nevermind this website has automatically selected your suitable installation mode). Copy the text below into your terminal.

<img src="img/pytorch.PNG" style="width: 600px;">

#### Step 4: Install Mujoco and openAI-gym

You must install mujoco first, you do in 2 ways

1. **Install mujoco_py (full version)** Follow from the [link](https://github.com/openai/mujoco-py)

***Note***: If you use jupyter notebook and want to use command line in terminal, add '!' in front of command.

In [None]:
# install important library
!sudo apt-get update
!sudo apt-get install -y libosmesa6-dev libgl1-mesa-glx libglfw3 libgl1-mesa-dev libglew-dev patchelf
# Get Mujoco
!mkdir ~/.mujoco
!wget -q https://mujoco.org/download/mujoco210-linux-x86_64.tar.gz -O mujoco.tar.gz
!tar -zxf mujoco.tar.gz -C "$HOME/.mujoco"
!rm mujoco.tar.gz

In [None]:
# Add it to the actively loaded path and the bashrc path (these only do so much)
!echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HOME/.mujoco/mujoco210/bin' >> ~/.bashrc 
!echo 'export LD_PRELOAD=$LD_PRELOAD:/usr/lib/x86_64-linux-gnu/libGLEW.so' >> ~/.bashrc 
# THE ANNOYING ONE, FORCE IT INTO LDCONFIG SO WE ACTUALLY GET ACCESS TO IT THIS SESSION
!echo "/root/.mujoco/mujoco210/bin" > /etc/ld.so.conf.d/mujoco_ld_lib_path.conf
!ldconfig

In [None]:
# Install Mujoco-py
!pip3 install -U 'mujoco-py<2.2,>=2.1'

In [None]:
# Add it to the actively loaded path and the bashrc path (these only do so much)
try:
    os.environ['LD_LIBRARY_PATH']=os.environ['LD_LIBRARY_PATH'] + ':/root/.mujoco/mujoco210/bin'
except KeyError:
    os.environ['LD_LIBRARY_PATH']='/root/.mujoco/mujoco210/bin'
try:
    os.environ['LD_PRELOAD']=os.environ['LD_PRELOAD'] + ':/usr/lib/x86_64-linux-gnu/libGLEW.so'
except KeyError:
    os.environ['LD_PRELOAD']='/usr/lib/x86_64-linux-gnu/libGLEW.so'
    
# presetup so we don't see output on first env initialization
import mujoco_py

If it does not work (GL error or something), uninstall it.

In [None]:
!pip3 uninstall mujoco-py

2. Install **free-mujoco-py**

this version is easy to setup, but it is a little bit slower

In [None]:
!pip3 install free-mujoco-py

3. Install gym

In [None]:
!pip install gym pyvirtualdisplay > /dev/null 2>&1
!pip install -U gym>=0.21.0
!pip install -U gym[atari,accept-rom-license]
!pip install -U gym[Robotics,classic_control]

***Note***: After install gym, try this code. If there are some error occur, you must solve the problem, unless **uninstall mujoco-py and reinstall free-mujoco-py** instead.

In [None]:
import gym
import time

env = gym.make("FetchPickAndPlace-v1")
env.reset()
env.render()
time.sleep(5)
env.close()

### For Google Colab

For the person who cannot setup your computer, you can use Google Colab for run and train in this class. However, you **Must** do these installation step every time when the system shutdown. For free version, system will shut down every 12 hours, and pro version (3xx baht per month) will shut down every 24 hours.

You can access Google colab from this [colab](https://colab.research.google.com/). Sign in as your e-mail and it will ready to use.

Open a Google colab file and copy the code below:

#### Step 1: Install xvfb & other dependencies

In [None]:
!apt-get install x11-utils > /dev/null 2>&1 
!pip install pyglet > /dev/null 2>&1 
!apt-get install -y xvfb python-opengl > /dev/null 2>&1

#### Step 2: Install mujoco
This step used for 3D simulators. Ex. Robots, and some control plants

In some simulators, it does not require.

In [None]:
#Include this at the top of your colab code
import os
if not os.path.exists('.mujoco_setup_complete'):
  # Get the prereqs
  !apt-get -qq update
  !apt-get -qq install -y libosmesa6-dev libgl1-mesa-glx libglfw3 libgl1-mesa-dev libglew-dev patchelf
  # Get Mujoco
  !mkdir ~/.mujoco
  !wget -q https://mujoco.org/download/mujoco210-linux-x86_64.tar.gz -O mujoco.tar.gz
  !tar -zxf mujoco.tar.gz -C "$HOME/.mujoco"
  !rm mujoco.tar.gz
  # Add it to the actively loaded path and the bashrc path (these only do so much)
  !echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HOME/.mujoco/mujoco210/bin' >> ~/.bashrc 
  !echo 'export LD_PRELOAD=$LD_PRELOAD:/usr/lib/x86_64-linux-gnu/libGLEW.so' >> ~/.bashrc 
  # THE ANNOYING ONE, FORCE IT INTO LDCONFIG SO WE ACTUALLY GET ACCESS TO IT THIS SESSION
  !echo "/root/.mujoco/mujoco210/bin" > /etc/ld.so.conf.d/mujoco_ld_lib_path.conf
  !ldconfig
  # Install Mujoco-py
  !pip3 install -U 'mujoco-py<2.2,>=2.1'
  # run once
  !touch .mujoco_setup_complete

try:
  if _mujoco_run_once:
    pass
except NameError:
  _mujoco_run_once = False
if not _mujoco_run_once:
  # Add it to the actively loaded path and the bashrc path (these only do so much)
  try:
    os.environ['LD_LIBRARY_PATH']=os.environ['LD_LIBRARY_PATH'] + ':/root/.mujoco/mujoco210/bin'
  except KeyError:
    os.environ['LD_LIBRARY_PATH']='/root/.mujoco/mujoco210/bin'
  try:
    os.environ['LD_PRELOAD']=os.environ['LD_PRELOAD'] + ':/usr/lib/x86_64-linux-gnu/libGLEW.so'
  except KeyError:
    os.environ['LD_PRELOAD']='/usr/lib/x86_64-linux-gnu/libGLEW.so'
  # presetup so we don't see output on first env initialization
  import mujoco_py
  _mujoco_run_once = True

#### Step 3: Install pyvirtual display

This is for show simulator in the google colab (and jupyter notebook)

In [None]:
!pip install gym pyvirtualdisplay > /dev/null 2>&1
!pip install -U gym>=0.21.0
!pip install -U gym[atari,accept-rom-license]
!pip install -U gym[Robotics,classic_control]

You can see the example and test in colab from the [link](https://colab.research.google.com/drive/1WonMpHUG_0MO8jedG7ePmoGOo-JVPS7r?usp=sharing)

## Test your environment

After installation finish, use the code below to check that your installation is correct

import all your libraries, including matplotlib & ipythondisplay:

In [None]:
import gym
import numpy as np
import matplotlib.pyplot as plt
from IPython import display as ipythondisplay

then you want to import Display from pyvirtual display & initialise your screen size, in this example 400x300... :

In [None]:
from pyvirtualdisplay import Display
display = Display(visible=0, size=(400, 300))
display.start()

using gym's "rgb_array" render functionally, render to a "Screen" variable, then plot the screen variable using Matplotlib!

(rendered indirectly using Ipython display)

In [None]:
# env = gym.make("CartPole-v0")
# env = gym.make("DoubleDunk-v0")
# env = gym.make("SpaceInvaders-v0")
# env = gym.make("Acrobot-v1") # double invert pendulum
# env = gym.make("Ant-v2")
env = gym.make("FetchPickAndPlace-v1")
env.reset()
prev_screen = env.render(mode='rgb_array')
plt.imshow(prev_screen)

for i in range(50):
  action = env.action_space.sample()
  obs, reward, done, info = env.step(action)
  screen = env.render(mode='rgb_array')

  plt.imshow(screen)
  ipythondisplay.clear_output(wait=True)
  ipythondisplay.display(plt.gcf())

  if done:
    break

ipythondisplay.clear_output(wait=True)
env.close()

### Save simulator video

#### Step 1: Create video folder name "video_rl"

You can create yourself, or use python code

In [None]:
import os

vdo_path = 'video_rl/'
if not os.path.exists(vdo_path):
  print("No folder ", vdo_path, 'exist. Create the folder')
  os.mkdir(vdo_path)
  print("Create directory finished")
else:
  print(vdo_path, 'existed, do nothing')

#### Step 2: modify openAI gym code at last section

In [None]:
# Add wrappers monitor library
from gym.wrappers import Monitor

In [None]:
# change environment as you want
# env = gym.make("CartPole-v0")
# env = gym.make("DoubleDunk-v0")
# env = gym.make("SpaceInvaders-v0")
# env = gym.make("Acrobot-v1") # double invert pendulum
# env = gym.make("Ant-v2")
# env = gym.make("InvertedDoublePendulum-v2")

# Set environment and monitor it in video
env = Monitor(gym.make('Ant-v2'), vdo_path, force=True)
env.reset()
prev_screen = env.render(mode='rgb_array')
plt.imshow(prev_screen)

for i in range(500):
    action = env.action_space.sample()
    obs, reward, done, info = env.step(action)
    screen = env.render(mode='rgb_array')
    # Show screen in jupyter or colab, very slow
    plt.imshow(screen)
    ipythondisplay.clear_output(wait=True)
    ipythondisplay.display(plt.gcf())

    if done:
        break

ipythondisplay.clear_output(wait=True)
env.close()

Show your vdo

<video controls src="img/openai_test2.mp4" />

### Check PyTorch available

In [None]:
import torch

print(torch.__version__)
print(torch.has_cuda)

If you found <code>torch.has_cuda</code> is True, you can use GPU for run. If you don't have it, the code still can run but it will be slower.

## Implementing a random search policy

Now, let's implement in CartPole environment.

You can use .py or .ipynb file.

1. First of all, import Gym and PyTorch packages and CartPole environment.

In [None]:
import gym
import torch

# for real-time show
import matplotlib.pyplot as plt
from IPython import display as ipythondisplay
from pyvirtualdisplay import Display
display = Display(visible=0, size=(400, 300))
display.start()
# for save video
from gym.wrappers import Monitor
save_vdo = False

import os
vdo_path = 'video_rl/'
if not os.path.exists(vdo_path):
  print("No folder ", vdo_path, 'exist. Create the folder')
  os.mkdir(vdo_path)
  print("Create directory finished")
else:
  print(vdo_path, 'existed, do nothing')

if save_vdo:
    env = Monitor(gym.make('CartPole-v0'), vdo_path, force=True)
else:
    env = gym.make('CartPole-v0')

2. Check number of states, and number of action

In [None]:
n_state = env.observation_space.shape
print('State matrix:', n_state, 'number of state', n_state[0])

n_action = env.action_space.n
print('number of action:', n_action)

3. Create <code>run_episode</code> function for run and simulate when give input weight and return all reward in the episode.

In [None]:
def run_episode(env, weight, show=False):
    # reset to default state
    state = env.reset()
    total_reward = 0
    is_done = False
    while not is_done:
        # Get state situation from environment
        state = torch.from_numpy(state).float()
        # Calculate action from maximum possible
        action = torch.argmax(torch.matmul(state, weight))
        # Send action to environment to get next state
        state, reward, is_done, _ = env.step(action.item())
        
        if show:
            # render screen to show
            screen = env.render(mode='rgb_array')
            plt.imshow(screen)
            ipythondisplay.clear_output(wait=True)
            ipythondisplay.display(plt.gcf())
        
        # sum all rewards
        total_reward += reward
    return total_reward

<code>weight</code> $W$ is used to get possibility actions $pA$ from current state $S$ happened. To calulate probability actions, you can multiply matrix:

$$pA=SW$$

To get the actions, in reinforcement learning, you can do as random actions (from probability) or maximum probability. In this implementation, we select action $a$ from maximum probability. To get index of maximum value, use <code>torch.argmax()</code> function. This function return an array tensor, to address this, use <code>.item()</code> to get one-element tensor.

4. Try to run 1 episode from random weight.

In [None]:
# Create random weight
weight = torch.rand(n_state[0], n_action)
# Run 1 episode to get total_reward (Show simulator)
total_reward = run_episode(env, weight, True)
print('Episode {}: {}'.format(0, total_reward))

5. OK! Let's find the best weight from searching using the maximum reward in 1000 episodes

In [None]:
# Initialize
best_total_reward = 0
best_weight = None
total_rewards = []
# Set number of episode
n_episode = 1000
for episode in range(n_episode):
    weight = torch.rand(n_state[0], n_action)
    # Run 1 episode to get total_reward (not show simulator)
    total_reward = run_episode(env, weight, False)
    print('Episode {}: {}'.format(episode+1, total_reward))
    # find the best weight from best reward
    if total_reward > best_total_reward:
        best_weight = weight
        best_total_reward =  total_reward
    # keep all total_rewards
    total_rewards.append(total_reward)

In [None]:
print('Average total reward over {} episode: {}'.format(
           n_episode, sum(total_rewards) / n_episode))

You can see the rewards are not improved by episode step.

6. Simulate the result from the best weight

In [None]:
# Run 1 episode to get total_reward (Show simulator)
total_reward = run_episode(env, best_weight, True)

7. Plot the total_rewards

In [None]:
# This library is used for plot
# import matplotlib.pyplot as plt
plt.plot(total_rewards)
plt.xlabel('Episode')
plt.ylabel('Reward')
plt.show()

8. See the average reward from new 1000 episodes

In [None]:
# Initialize
best_total_reward = 0
total_rewards_eval = []
# Set number of episode
n_episode = 1000
for episode in range(n_episode):
    # Run 1 episode to get total_reward (not show simulator)
    total_reward_eval = run_episode(env, best_weight, False)
    print('Episode {}: {}'.format(episode+1, total_reward_eval))
    # keep all total_rewards
    total_rewards_eval.append(total_reward_eval)
    
print('Average total reward over {} episode: {}'.format(
           n_episode, sum(total_rewards_eval) / n_episode))

## Lab work

1. Setup and install Python, PyTorch and OpenAI environment (with mujoco_py) in **any environment**. (Windows, Linux, MacOS or Colab, in PyCharm, VS code, Jupyter, or other)
    - Show your result that you can use OpenAI and PyTorch.
    - Save an 3D environment into vdo at least 5 second.
2. (Optional) For the person who have lag of python and pytorch, please study it.
    - [Python tutorial](https://www.w3schools.com/python/)
    - [Numpy tutorial](https://www.w3schools.com/python/numpy/default.asp)
    - [MatPlotLib](https://matplotlib.org/stable/tutorials/index)
    - [PyTorch tutorial](https://pytorch.org/tutorials/)
3. Try to implement [**Hill-climbing**](https://en.wikipedia.org/wiki/Hill_climbing) algorithm in *CartPole*. The weight for each episode can be calculated by:
    $$W_n=W_b+\alpha W_r$$
    
    when $W_n$ is the new weight which input into each episode, $W_b$ is the best weight, $\alpha$ is learning rate scale, and $W_r$ is the new random weight. At default, letting $\alpha=0.01$

    - Plot the graph while training and see the different between random search and hill-climbing
    - Change $\alpha$ to be 0.5, 0.1, and 0.001. See the different.
    - Do a short report (1-2 pages).