# "Stable baselines 3 - 1st steps"
> "installation, 1st experimentations"
- show_tags: true
- toc: true
- branch: master
- badges: false
- comments: true
- categories: [reinforcement learning, pytorch, sb3]

# What is stable baselines 3 (sb3)

I have just read about this new release. This is a complete rewrite of stable baselines 2, without any reference to tensorflow, and based on pytorch (>1.4+).

There is a lot of running implementations of RL algorithms, based on gym.
A very good introduction in this [blog entry](https://araffin.github.io/post/sb3/)

[Stable-Baselines3: Reliable Reinforcement Learning Implementations | Antonin Raffin | Homepage](https://araffin.github.io/post/sb3/)

> ## Links
> 
> GitHub repository: [https://github.com/DLR-RM/stable-baselines3](https://github.com/DLR-RM/stable-baselines3)

> Documentation: [https://stable-baselines3.readthedocs.io/](https://stable-baselines3.readthedocs.io/)

> RL Baselines3 Zoo: [https://github.com/DLR-RM/rl-baselines3-zoo](https://github.com/DLR-RM/rl-baselines3-zoo)

> Contrib: [https://github.com/Stable-Baselines-Team/stable-baselines3-contrib](https://github.com/Stable-Baselines-Team/stable-baselines3-contrib)

> RL Tutorial: [https://github.com/araffin/rl-tutorial-jnrr19](https://github.com/araffin/rl-tutorial-jnrr19)



# My installation

Standard installation
```bash
conda create --name stablebaselines3 python=3.7
conda activate stablebaselines3
pip install stable-baselines3[extra]
conda install -c conda-forge jupyter_contrib_nbextensions
conda install nb_conda
```

In [1]:
!conda list

# packages in environment at /home/explore/miniconda3/envs/stablebaselines3:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main  
absl-py                   0.12.0                   pypi_0    pypi
atari-py                  0.2.6                    pypi_0    pypi
attrs                     20.3.0             pyhd3deb0d_0    conda-forge
backcall                  0.2.0              pyh9f0ad1d_0    conda-forge
backports                 1.0                        py_2    conda-forge
backports.functools_lru_cache 1.6.1                      py_0    conda-forge
bleach                    3.3.0              pyh44b312d_0    conda-forge
box2d                     2.3.10                   pypi_0    pypi
box2d-py                  2.3.8                    pypi_0    pypi
ca-certificates           2021.1.19            h06a4308_1  
cachetools                4.2.1                    pypi_0    pypi
certifi             

# SB3 tutorials

In [3]:
import gym

from stable_baselines3 import A2C
from stable_baselines3.common.monitor import Monitor
from stable_baselines3.common.callbacks import CheckpointCallback, EvalCallback

# Save a checkpoint every 1000 steps
checkpoint_callback = CheckpointCallback(save_freq=5000, save_path="/home/explore/git/guillaume/stable_baselines_3/logs/",
                                         name_prefix="rl_model")

# Evaluate the model periodically
# and auto-save the best model and evaluations
# Use a monitor wrapper to properly report episode stats
eval_env = Monitor(gym.make("LunarLander-v2"))
# Use deterministic actions for evaluation
eval_callback = EvalCallback(eval_env, best_model_save_path="/home/explore/git/guillaume/stable_baselines_3/logs/",
                             log_path="/home/explore/git/guillaume/stable_baselines_3/logs/", eval_freq=2000,
                             deterministic=True, render=False)

# Train an agent using A2C on LunarLander-v2
model = A2C("MlpPolicy", "LunarLander-v2", verbose=1)
model.learn(total_timesteps=20000, callback=[checkpoint_callback, eval_callback])

# Retrieve and reset the environment
env = model.get_env()
obs = env.reset()

# Query the agent (stochastic action here)
action, _ = model.predict(obs, deterministic=False)

Using cuda device
Creating environment from the given name 'LunarLander-v2'
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 97.6     |
|    ep_rew_mean        | -265     |
| time/                 |          |
|    fps                | 484      |
|    iterations         | 100      |
|    time_elapsed       | 1        |
|    total_timesteps    | 500      |
| train/                |          |
|    entropy_loss       | -1.28    |
|    explained_variance | -0.0391  |
|    learning_rate      | 0.0007   |
|    n_updates          | 99       |
|    policy_loss        | -5.3     |
|    value_loss         | 17.3     |
------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 107      |
|    ep_rew_mean        | -249     |
| time/                 |          |
|    fps                | 499    

-------------------------------------
| rollout/              |           |
|    ep_len_mean        | 158       |
|    ep_rew_mean        | -242      |
| time/                 |           |
|    fps                | 323       |
|    iterations         | 1300      |
|    time_elapsed       | 20        |
|    total_timesteps    | 6500      |
| train/                |           |
|    entropy_loss       | -0.799    |
|    explained_variance | -0.000516 |
|    learning_rate      | 0.0007    |
|    n_updates          | 1299      |
|    policy_loss        | -4.78     |
|    value_loss         | 91.6      |
-------------------------------------
-------------------------------------
| rollout/              |           |
|    ep_len_mean        | 162       |
|    ep_rew_mean        | -238      |
| time/                 |           |
|    fps                | 330       |
|    iterations         | 1400      |
|    time_elapsed       | 21        |
|    total_timesteps    | 7000      |
| train/    

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 208      |
|    ep_rew_mean        | -210     |
| time/                 |          |
|    fps                | 256      |
|    iterations         | 2500     |
|    time_elapsed       | 48       |
|    total_timesteps    | 12500    |
| train/                |          |
|    entropy_loss       | -0.0409  |
|    explained_variance | -0.00201 |
|    learning_rate      | 0.0007   |
|    n_updates          | 2499     |
|    policy_loss        | -0.00709 |
|    value_loss         | 1.45     |
------------------------------------
-------------------------------------
| rollout/              |           |
|    ep_len_mean        | 206       |
|    ep_rew_mean        | -206      |
| time/                 |           |
|    fps                | 262       |
|    iterations         | 2600      |
|    time_elapsed       | 49        |
|    total_timesteps    | 13000     |
| train/                |    

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 228      |
|    ep_rew_mean        | -173     |
| time/                 |          |
|    fps                | 248      |
|    iterations         | 3700     |
|    time_elapsed       | 74       |
|    total_timesteps    | 18500    |
| train/                |          |
|    entropy_loss       | -0.369   |
|    explained_variance | 0.0153   |
|    learning_rate      | 0.0007   |
|    n_updates          | 3699     |
|    policy_loss        | 0.175    |
|    value_loss         | 2.75     |
------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 228      |
|    ep_rew_mean        | -171     |
| time/                 |          |
|    fps                | 252      |
|    iterations         | 3800     |
|    time_elapsed       | 75       |
|    total_timesteps    | 19000    |
| train/                |          |
|

# Issues

## CUDA error: CUBLAS_STATUS_INTERNAL_ERROR

Downgrade pytorch to 1.7.1 

to avoid `RuntimeError: CUDA error: CUBLAS_STATUS_INTERNAL_ERROR when calling cublasCreate(handle)`

```bash
pip install torch==1.7.1
```

## RuntimeError: CUDA error: invalid device function

In [1]:
!nvidia-smi

Thu Mar 25 09:13:49 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.102.04   Driver Version: 450.102.04   CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Quadro RTX 4000     Off  | 00000000:01:00.0  On |                  N/A |
| N/A   41C    P5    18W /  N/A |   2104MiB /  7982MiB |     32%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+---------------------------------------------------------------------------

CUDA version is 11.0 on my workstation.

In [2]:
!nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243


In [4]:
!conda install pytorch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2 cudatoolkit=11.0 -c pytorch

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

