In [76]:
import gymnasium as gym
from IPython.display import display, clear_output
import matplotlib.pyplot as plt
import mediapy as media
import time
import numpy as np
from stable_baselines3 import PPO
from stable_baselines3.common.env_util import make_vec_env

In [None]:
import os

# Define the runtime directory
runtime_dir = "/tmp/runtime-yourusername"

# Set the environment variable
os.environ["XDG_RUNTIME_DIR"] = runtime_dir

# Create the directory if it doesn't exist
if not os.path.exists(runtime_dir):
    os.makedirs(runtime_dir, exist_ok=True)

# Ensure the directory has the correct permissions
os.chmod(runtime_dir, 0o700)


<img src='https://www.tng-project.org/static/data/lab_logo_tng.png'/>

# Install and launch

There are two ways to install Jupyter Lab on a local machine.
1. with conda - `conda install -c conda-forge jupyterlab`
2. with pip - `pip install jupyterlab`

Once installed, we can run `jupyter lab` from the terminal to start a jupyter server. By default, a new tab should open on the machine's default web browser, but if that is not the case then use one of the links provided in the command output in the terminal.

# The Interface


1. Menu - general UI and engine control
2. New items - launcher, new directory, upload files, sync files with disk.
3. Directory navigation
4. File browser
5. Open tabs bar
6. Launcher tab - new notebook, code file, text file, terminal, console, ...
7. Notebook cell settings - used for creating [reveal.js](https://revealjs.com/) presentations out of notebooks
8. Debugging tools

## Text
We can write rich text representations with [markdown](https://www.markdownguide.org/) (Markdown cell type) 
 
Markdown cells also support [$\LaTeX$](https://www.latex-project.org/) math  notations by wrapping them with the dollar sign character '\$'. For example, the following text will be rendered as the quadratic equation:  
<code>\$x_{1,2}=\frac{-b\pm\sqrt{b^2-4ac}}{2a}\$</code>  
Rendered:  
$x_{1,2}=\frac{-b\pm\sqrt{b^2-4ac}}{2a}$  



## Running Code

Jupyter notebooks run using a background console called a kernel. This kernel keeps the program's memory in tact until it is restarted, shut down, or crashes.

Whenever a cell is run, the last exression in the cell is printed to the console. 

In [77]:
for i in range(3):
    print(i)

"hello everyone" 

0
1
2


'hello everyone'

When a variable is assigned, it remains in memory until the end of the kernel's lifetime. This means that the variable can be used in any cell once it has been initialized. We can even use a variable in a cell preceeding the assignment cell as long as we run the initialization cell first!

In [78]:
x = 2 

In [79]:
def f(x):
    return 2*x

f(x)

4

By using the exclamation point character '!' we can run terminal commands. For example, the cell below will output hello from terminal.

In [80]:
!echo "hello from terminal"

hello from terminal


A common use for this feature is to install external packages via pip like so:  
`!pip install numpy`

## Magic Functions

Magic funcitons control the funcionality of certain libraries integrated with jupyter. We can call these functions using the percent character '%'.

Some notable examples include:
- `%matplotlib inline` - Shows matplotlib figures inline with the cell output
- `%load_ext` - Load a jupyter extension, e.g. `%load_ext autoreload`.
- `%run` - Allows you to execute Python code from external .py files and other Jupyter Notebooks directly within your current notebook.
- `%load` - Allows you to insert code from an external script into the current cell of your Jupyter Notebook.


In [81]:
%lsmagic

Available line magics:
%alias  %alias_magic  %autoawait  %autocall  %automagic  %autosave  %bookmark  %cat  %cd  %clear  %code_wrap  %colors  %conda  %config  %connect_info  %cp  %debug  %dhist  %dirs  %doctest_mode  %ed  %edit  %env  %gui  %hist  %history  %killbgscripts  %ldir  %less  %lf  %lk  %ll  %load  %load_ext  %loadpy  %logoff  %logon  %logstart  %logstate  %logstop  %ls  %lsmagic  %lx  %macro  %magic  %mamba  %man  %matplotlib  %micromamba  %mkdir  %more  %mv  %notebook  %page  %pastebin  %pdb  %pdef  %pdoc  %pfile  %pinfo  %pinfo2  %pip  %popd  %pprint  %precision  %prun  %psearch  %psource  %pushd  %pwd  %pycat  %pylab  %qtconsole  %quickref  %recall  %rehashx  %reload_ext  %rep  %rerun  %reset  %reset_selective  %rm  %rmdir  %run  %save  %sc  %set_env  %store  %sx  %system  %tb  %time  %timeit  %unalias  %unload_ext  %who  %who_ls  %whos  %xdel  %xmode

Available cell magics:
%%!  %%HTML  %%SVG  %%bash  %%capture  %%code_wrap  %%debug  %%file  %%html  %%javascript  %%js  %

In [82]:
%%time
import time
for _ in range(1000):    
    time.sleep(0.01) # sleep for 0.01 seconds

CPU times: user 16.9 ms, sys: 20.4 ms, total: 37.3 ms
Wall time: 10.2 s


## Exporting

We can export a jupyter notebook into a large number of sharable formats out of the box. To do this, navigate via the menu to:  
File --> Save and export notebook as --> \<format\>  
Available formats include:
* HTML
* LaTeX
* PDF (requires an installation of `tex` on your local machine)
* python script


<img src='https://miro.medium.com/max/1400/1*7oukapIBInsovpHkQB3QZg.jpeg'/>

Colab is a free Jupyter notebook environment that requires no setup and runs entirely in the cloud.

With Colaboratory you can write and execute code, save and share your analyses, and access powerful computing resources, all for free from your browser.

A common use for notebooks is data visualization using charts. Colab makes this easy with several charting tools available as Python imports.

In [None]:
x  = [1, 2, 3, 4, 5, 6, 7, 8, 9]
y1 = [1, 3, 5, 3, 1, 3, 5, 3, 1]
y2 = [2, 4, 6, 4, 2, 4, 6, 4, 2]

In [None]:
plt.plot(x, y1, label="line L"); plt.plot(x, y2, label="line H"); plt.plot()
plt.xlabel("x axis"); plt.ylabel("y axis"); plt.title("Line Graph Example"); plt.legend(); plt.show()

# Virtual Machine
The most powerful feature of google colab is the ability to use cloud GPU for free. Like the other desktop environment you can also access most of the bash command with a `!` added in the front of the command.

At first turn on the GPU from `Runtime`->`Change Runtime Type`->`Hardware Acceleration`

The entire colab runs in a cloud VM. Let's investigate the VM. You will see that the current colab notebook is running on top of `Ubuntu 18.04.3 LTS` (at the time of this writing.)

In [None]:
!nvidia-smi

<img src='https://venturebeat.com/wp-content/uploads/2019/06/pytorch-e1576624094357.jpg?w=1024?w=1200&strip=all'>

<img src='https://github.com/CLAIR-LAB-TECHNION/CLAI/blob/main/tutorials/assets/tut_01_goal.png?raw=true'>

<img src='https://github.com/CLAIR-LAB-TECHNION/CLAI/blob/main/tutorials/assets/tut_01_why_pytorch.png?raw=true'>

## Why use PyTorch?

Machine learning researchers love using PyTorch. And as of February 2022, PyTorch is the [most used deep learning framework on Papers With Code](https://paperswithcode.com/trends), a website for tracking machine learning research papers and the code repositories attached with them.

<img src="https://miro.medium.com/v2/resize:fit:1400/format:webp/1*JhId218tCWv77VPFb0M5dg.png" width="800">


![example of going from an input image to a tensor representation of the image, image gets broken down into 3 colour channels as well as numbers to represent the height and width](https://raw.githubusercontent.com/mrdbourke/pytorch-deep-learning/main/images/00-tensor-shape-example-of-image.png)

<img src='https://github.com/CLAIR-LAB-TECHNION/CLAI/blob/main/tutorials/assets/tut_01_pytorch_vs_numpy.png?raw=true'>

<img src='https://github.com/CLAIR-LAB-TECHNION/CLAI/blob/main/tutorials/assets/tut_01_autograd.png?raw=true'>


<img src='https://gymnasium.farama.org/_images/gymnasium-text.png'>


Gymnasium is a project that provides an API for all single agent reinforcement learning environments. We will outline the basics of how to use Gymnasium including its four key functions: `make`, `Env.reset`, `Env.step` and `Env.render`.

<img src='https://gymnasium.farama.org/_images/AE_loop.png' width='650'>


At the core of Gymnasium is `Env`, a high-level python class representing a markov decision process (MDP). The class provides users the ability generate an initial state, transition / move to new states given an action and the visualise the environment. Alongside `Env`, `Wrapper` are provided to help augment / modify the environment, in particular, the agent observations, rewards and actions taken.

In [83]:
# Initialize the environment with the appropriate render mode
env = gym.make("LunarLander-v3", render_mode="rgb_array")
observation, info = env.reset()

frames = []  # List to hold all frames

In [84]:
for _ in range(1_000):
    action = env.action_space.sample()  # Randomly sample an action
    observation, reward, terminated, truncated, info = env.step(action)
    frame = env.render()  # Get the frame
    frames.append(frame)  # Append frame to the list

    if terminated or truncated:
        observation, info = env.reset()  # Reset if the episode ends

env.close()


In [85]:
# Show the collected frames as a video
media.show_video(frames, fps=60)  # You can adjust FPS as needed   

0
This browser does not support the video tag.


## Action and observation spaces

Every environment specifies the format of valid actions and observations with the `action_space` and `observation_space` attributes.

Importantly, `Env.action_space` and `Env.observation_space` are instances of `Space`, a high-level python class that provides the key functions: `Space.contains` and `Space.sample`. Gymnasium has support for a wide range of spaces that users might need:



- `Box`: describes bounded space with upper and lower limits of any n-dimensional shape.
- `Discrete`: describes a discrete space where ``{0, 1, ..., n-1}`` are the possible values our observation or action can take.

In [88]:
from gymnasium.spaces import Box, Discrete

observation_space = Discrete(3, start=-1, seed=42)  # {-1, 0, 1}    
observation_space.sample() # Generates a single random sample from this space.

np.int64(-1)

## Modifying the environment

Wrappers are a convenient way to modify an existing environment without having to alter the underlying code directly.

Gymnasium already provides many commonly used wrapper.


Some examples:

- `TimeLimit`: Issues a truncated signal if a maximum number of timesteps has been exceeded (or the base environment has issued a truncated signal).
- `ClipAction`: Clips any action passed to ``step`` such that it lies in the base environment's action space.
- `RescaleAction`: Applies an affine transformation to the action to linearly scale for a new low and high bound on the environment.
- `TimeAwareObservation`: Add information about the index of timestep to observation.

In [89]:
from gymnasium.wrappers import FlattenObservation, RescaleAction, TimeAwareObservation
base_env = gym.make("CarRacing-v3")
base_env.action_space  

Box([-1.  0.  0.], 1.0, (3,), float32)

In [90]:
wrapped_env = RescaleAction(base_env, min_action=0, max_action=1)
wrapped_env.action_space

Box(0.0, 1.0, (3,), float32)

# Stable_baselines
<img src="https://stable-baselines3.readthedocs.io/en/master/_static/logo.png" width="500" align="center"/>

In [91]:
# Initialize the environment
env = make_vec_env('CartPole-v1', n_envs=1)

# Setup the model
model = PPO('MlpPolicy', env, verbose=1)  
model.learn(total_timesteps=10000)

# Reset the environment
obs = env.reset()
frames = []  # List to store frames

Using cpu device
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 25.1     |
|    ep_rew_mean     | 25.1     |
| time/              |          |
|    fps             | 2470     |
|    iterations      | 1        |
|    time_elapsed    | 0        |
|    total_timesteps | 2048     |
---------------------------------
-------------------------------------------
| rollout/                |               |
|    ep_len_mean          | 27.1          |
|    ep_rew_mean          | 27.1          |
| time/                   |               |
|    fps                  | 1456          |
|    iterations           | 2             |
|    time_elapsed         | 2             |
|    total_timesteps      | 4096          |
| train/                  |               |
|    approx_kl            | 0.0083544385  |
|    clip_fraction        | 0.0963        |
|    clip_range           | 0.2           |
|    entropy_loss         | -0.686        |
|    explained_variance   |

In [92]:
# Simulate the environment
for i in range(1000):
    action, _states = model.predict(obs)
    obs, rewards, dones, info = env.step(action)
    frame = env.render(mode='rgb_array')  # Get frame in rgb array mode
    frames.append(frame)
    if dones:
        obs = env.reset()

env.close()


In [93]:
# Convert frames to a numpy array for compatibility with mediapy
frames_np = np.stack(frames)

# Display the video
media.show_video(frames_np, fps=30)  # Adjust FPS as needed

0
This browser does not support the video tag.


In [94]:
# Evaluate the trained agent
from stable_baselines3.common.evaluation import evaluate_policy
mean_reward, std_reward = evaluate_policy(model, env, n_eval_episodes=100, warn=False)

print(f"mean_reward:{mean_reward:.2f} +/- {std_reward:.2f}")

mean_reward:374.39 +/- 119.69


<img src="https://raw.githubusercontent.com/Farama-Foundation/PettingZoo/master/pettingzoo-text.png" width="500" align="center"/>

[PettingZoo](https://pettingzoo.farama.org/) is a Python library for conducting research in multi-agent reinforcement learning, akin to a multi-agent version of [Gym](https://github.com/openai/gym).