# RL фреймворки

Почти полный список RL фреймворков можно посмотреть в [Open-source RL](https://docs.google.com/spreadsheets/d/1EeFPd-XIQ3mq_9snTlAZSsFY7Hbnmd7P5bbT8LPuMn0/edit#gid=0). Не хватает [PARL](https://github.com/PaddlePaddle/PARL) и [ChainerRL](https://github.com/chainer/chainerrl).

<img src="frameworks.png">

## [OpenAI baselines](https://github.com/openai/baselines)
OpenAI Baselines is a set of high-quality implementations of reinforcement learning algorithms.

These algorithms will make it easier for the research community to replicate, refine, and identify new ideas, and will create good baselines to build research on top of. Our DQN implementation and its variants are roughly on par with the scores in published papers. We expect they will be used as a base around which new ideas can be added, and as a tool for comparing a new approach against existing ones.

Usage example:
```python
python -m baselines.run --alg=deepq --env=PongNoFrameskip-v4 --num_timesteps=1e6
```

<img src="ailogo.jpg">

## [Stable baselines](https://github.com/hill-a/stable-baselines)
Stable Baselines is a set of improved implementations of reinforcement learning algorithms based on OpenAI Baselines.

You can read a detailed presentation of Stable Baselines in the [Medium article](https://towardsdatascience.com/stable-baselines-a-fork-of-openai-baselines-reinforcement-learning-made-easy-df87c4b2fc82).

These algorithms will make it easier for the research community and industry to replicate, refine, and identify new ideas, and will create good baselines to build projects on top of. We expect these tools will be used as a base around which new ideas can be added, and as a tool for comparing a new approach against existing ones. We also hope that the simplicity of these tools will allow beginners to experiment with a more advanced toolset, without being buried in implementation details.

<img src="stableLogo.png">


## [Catalyst](https://github.com/catalyst-team/catalyst)
<img src="catalyst_logo.png">

High-level utils for PyTorch DL & RL research. It was developed with a focus on reproducibility, fast experimentation and code/ideas reusing. Being able to research/develop something new, rather than write another regular train loop.

Break the cycle - use the Catalyst! https://arxiv.org/pdf/1903.00027.pdf

<img src="catalyst.png">

https://docs.wandb.com/library/integrations/catalyst

In [None]:
import torch
from catalyst.dl import SupervisedRunner

# experiment setup
logdir = "./logdir"
num_epochs = 42

# data
loaders = {"train": ..., "valid": ...}

# model, criterion, optimizer
model = Net()
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters())
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer)

# model runner
runner = SupervisedRunner()

# model training
runner.train(
    model=model,
    criterion=criterion,
    optimizer=optimizer,
    scheduler=scheduler,
    loaders=loaders,
    logdir=logdir,
    num_epochs=num_epochs,
    verbose=True,
)

## [PARL](https://github.com/PaddlePaddle/PARL)

<img src="PARL-logo.png">

PARL is a flexible and high-efficient reinforcement learning framework.

Reproducible. We provide algorithms that stably reproduce the result of many influential reinforcement learning algorithms.

Large Scale. Ability to support high-performance parallelization of training with thousands of CPUs and multi-GPUs.

Reusable. Algorithms provided in the repository could be directly adapted to a new task by defining a forward network and training mechanism will be built automatically.

Extensible. Build new algorithms quickly by inheriting the abstract class in the framework.

<img src="parl-decorator.png">

Alogithms:

- DQN
- DDDQN
- DDPG
- PPO
- IMPALA
- A2C
- A3C
- TD3

## PARL vs Catalyst in NIPS Learn to Move competition

[Learn to move Leaderboard](https://www.aicrowd.com/challenges/neurips-2019-learn-to-move-walk-around/leaderboards)

<img src="tomove.png">

[First place: PARL Baidu](https://www.youtube.com/watch?v=FD2lGv-4BLE&feature=youtu.be) Bo Zhou, Hongsheng Zeng, Fan Wang, Yunxiang Li, and Hao Tian

[Second place: Catalyst](https://www.youtube.com/watch?v=WuqNdNBVzzI&feature=youtu.be) Sergey Kolesnikov and Valentin Khrulkov

[Third place: SimBodyWithDummyPlug](https://www.youtube.com/watch?v=DfPPiTsuB4E&feature=youtu.be) Dmitry Akimov



## [ChainerRL](https://github.com/chainer/chainerrl)

<img src="ChainerRL.png">

ChainerRL is a deep reinforcement learning library that implements various state-of-the-art deep reinforcement algorithms in Python using [Chainer](https://github.com/chainer/chainer), a flexible deep learning framework.

## Algorithms

| Algorithm | Discrete Action | Continous Action | Recurrent Model | Batch Training | CPU Async Training |
|:----------|:---------------:|:----------------:|:---------------:|:--------------:|:------------------:|
| DQN (including DoubleDQN etc.) | ✓ | ✓ (NAF) | ✓ | ✓ | x |
| Categorical DQN | ✓ | x | ✓ | ✓ | x |
| Rainbow | ✓ | x | ✓ | ✓ | x |
| IQN | ✓ | x | ✓ | ✓ | x |
| DDPG | x | ✓ | ✓ | ✓ | x |
| A3C  | ✓ | ✓ | ✓ | ✓ (A2C) | ✓ |
| ACER | ✓ | ✓ | ✓ | x | ✓ |
| NSQ (N-step Q-learning) | ✓ | ✓ (NAF) | ✓ | x | ✓ |
| PCL (Path Consistency Learning) | ✓ | ✓ | ✓ | x | ✓ |
| PPO  | ✓ | ✓ | ✓ | ✓ | x |
| TRPO | ✓ | ✓ | ✓ | ✓ | x |
| TD3 | x | ✓ | x | ✓ | x |
| SAC | x | ✓ | x | ✓ | x |

Following algorithms have been implemented in ChainerRL:
-

### Задание: Выбрать несколько фреймворков и воспроизвести стандартные эксперименты. Разобраться в том, как модифицировать код.