# RL4CO Quickstart Notebook

<a href="https://colab.research.google.com/github/ai4co/rl4co/blob/main/examples/1-quickstart.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a>

[**Documentation**](https://rl4co.readthedocs.io/) |  [**Getting Started**](https://github.com/ai4co/rl4co/tree/main#getting-started) | [**Usage**](https://github.com/ai4co/rl4co/tree/main#usage) | [**Contributing**](#contributing) | [**Paper**](https://arxiv.org/abs/2306.17100) | [**Citation**](#cite-us)

In this notebook we will train the AttentionModel (AM) on the TSP environment for 20 nodes. On a GPU, this should less than 2 minutes!  🚀

![Alt text](https://user-images.githubusercontent.com/48984123/245925317-0db4efdd-1c93-4991-8f09-f3c6c1f35d60.png)

### Installation

In [1]:
## Uncomment the following line to install the package from PyPI
## You may need to restart the runtime in Colab after this
## Remember to choose a GPU runtime for faster training!

# !pip install rl4co

### Imports

In [1]:
%load_ext autoreload
%autoreload 2

import torch

from rl4co.envs import FFSPEnv
from rl4co.models.zoo import MatNet
from rl4co.models.zoo.matnet.policy import MultiStageFFSPPolicy
from rl4co.utils.trainer import RL4COTrainer

/opt/homebrew/anaconda3/envs/RL/lib/python3.9/site-packages/lightning/fabric/__init__.py:41: Deprecated call to `pkg_resources.declare_namespace('lightning.fabric')`.
Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages
Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages
  declare_namespace(parent)
/opt/homebrew/anaconda3/envs/RL/lib/python3.9/site-packages/lightning/pytorch/__init__.py:37: Deprecated call to `pkg_resources.declare_namespace('lightning.pytorch')`.
Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages
Imp

### Environment, Policy and Model

Full documentation of:

- Base environment class [here](https://rl4co.readthedocs.io/en/latest/_content/api/envs/base.html)
- Base policy class [here](https://rl4co.readthedocs.io/en/latest/_content/api/models/base.html)
- Base model class [here](https://rl4co.readthedocs.io/en/latest/_content/api/algos/base.html)

In [2]:
# RL4CO env based on TorchRL
params ={
    'num_stage': 3, 
    'num_machine': 2, 
    'num_job': 5, 
    'num_machine_total': 6, 
    'flatten_stages': 3,
    }
env = FFSPEnv(generator_params=params)

# Policy: neural network, in this case with encoder-decoder architecture
policy = MultiStageFFSPPolicy( stage_cnt=2,
                              embed_dim=128,
                              num_encoder_layers=3,
                              num_heads=8,
                              feedforward_hidden=256,
                            )

# print(policy.encoder)
# print(policy.decoder)

# Model: default is AM with REINFORCE and greedy rollout baseline
model = MatNet(env, 
                      #  baseline="rollout",
                      #  batch_size=512,
                      #  train_data_size=100_000,
                      #  val_data_size=10_000,
                      #  optimizer_kwargs={"lr": 1e-4},
                       ) 

Found 1 unused kwargs: {'num_machine_total': 6}
Found 4 unused kwargs: {'embed_dim': 256, 'num_encoder_layers': 5, 'num_heads': 16, 'normalization': 'instance'}
/opt/homebrew/anaconda3/envs/RL/lib/python3.9/site-packages/lightning/pytorch/utilities/parsing.py:199: Attribute 'env' is an instance of `nn.Module` and is already saved during checkpointing. It is recommended to ignore them using `self.save_hyperparameters(ignore=['env'])`.
/opt/homebrew/anaconda3/envs/RL/lib/python3.9/site-packages/lightning/pytorch/utilities/parsing.py:199: Attribute 'policy' is an instance of `nn.Module` and is already saved during checkpointing. It is recommended to ignore them using `self.save_hyperparameters(ignore=['policy'])`.


### Test greedy rollout with untrained model and plot

In [3]:
# Greedy rollouts over untrained policy
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
td_init = env.reset(batch_size=[3]).to(device)
policy = policy.to(device)
out = policy(td_init.clone(), phase="test", decode_type="greedy", return_actions=True)
actions_untrained = out['actions'].cpu().detach()
rewards_untrained = out['reward'].cpu().detach()

for i in range(3):
    print(f"Problem {i+1} | Cost: {-rewards_untrained[i]:.3f}")
    env.render(td_init[i], actions_untrained[i])

TypeError: forward() missing 1 required positional argument: 'env'

### Trainer

The RL4CO trainer is a wrapper around PyTorch Lightning's `Trainer` class which adds some functionality and more efficient defaults

In [44]:
trainer = RL4COTrainer(
    max_epochs=3,
    accelerator="gpu",
    devices=1,
    logger=None,
)

Using 16bit Automatic Mixed Precision (AMP)
GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
/opt/homebrew/anaconda3/envs/RL/lib/python3.9/site-packages/lightning/pytorch/trainer/connectors/logger_connector/logger_connector.py:75: Starting from v1.9.0, `tensorboardX` has been removed as a dependency of the `lightning.pytorch` package, due to potential conflicts with other packages in the ML ecosystem. For this reason, `logger=True` will use `CSVLogger` as the default logger, unless the `tensorboard` or `tensorboardX` packages are found. Please `pip install lightning[extra]` or one of them to enable TensorBoard support by default


### Fit the model

In [45]:
trainer.fit(model)

/opt/homebrew/anaconda3/envs/RL/lib/python3.9/site-packages/lightning/pytorch/utilities/parsing.py:44: attribute 'policy' removed from hparams because it cannot be pickled
Missing logger folder: /Users/yunseongjun/Desktop/RL4RIDER/rl4co/examples/lightning_logs
val_file not set. Generating dataset instead
test_file not set. Generating dataset instead

  | Name     | Type           | Params
--------------------------------------------
0 | env      | FFSPEnv        | 0     
1 | policy   | MatNetPolicy   | 5.7 M 
2 | baseline | SharedBaseline | 0     
--------------------------------------------
5.7 M     Trainable params
0         Non-trainable params
5.7 M     Total params
22.659    Total estimated model params size (MB)


Sanity Checking: |          | 0/? [00:00<?, ?it/s]

/opt/homebrew/anaconda3/envs/RL/lib/python3.9/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:441: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=7` in the `DataLoader` to improve performance.


Sanity Checking DataLoader 0:   0%|          | 0/2 [00:00<?, ?it/s]



KeyError: 'key "cost_matrix" not found in TensorDict with keys [\'action_mask\', \'done\', \'job_duration\', \'job_location\', \'job_wait_step\', \'machine_idx\', \'machine_wait_step\', \'reward\', \'run_time\', \'schedule\', \'stage_idx\', \'stage_machine_idx\', \'sub_time_idx\', \'terminated\', \'time_idx\']'

### Testing

In [46]:
# Greedy rollouts over trained model (same states as previous plot)
policy = model.policy.to(device)
out = policy(td_init.clone(), phase="test", decode_type="greedy", return_actions=True)
actions_trained = out['actions'].cpu().detach()

# Plotting
import matplotlib.pyplot as plt
for i, td in enumerate(td_init):
    fig, axs = plt.subplots(1,2, figsize=(11,5))
    env.render(td, actions_untrained[i], ax=axs[0]) 
    env.render(td, actions_trained[i], ax=axs[1])
    axs[0].set_title(f"Untrained | Cost = {-rewards_untrained[i].item():.3f}")
    axs[1].set_title(r"Trained $\pi_\theta$" + f"| Cost = {-out['reward'][i].item():.3f}")

KeyError: 'key "cost_matrix" not found in TensorDict with keys [\'action_mask\', \'done\', \'job_duration\', \'job_location\', \'job_wait_step\', \'machine_idx\', \'machine_wait_step\', \'reward\', \'run_time\', \'schedule\', \'stage_idx\', \'stage_machine_idx\', \'sub_time_idx\', \'terminated\', \'time_idx\']'

We can see that even after just 3 epochs, our trained AM is able to find much better solutions than the random policy! 🎉

In [8]:
# Optionally, save the checkpoint for later use (e.g. in tutorials/4-search-methods.ipynb)
trainer.save_checkpoint("tsp-quickstart.ckpt")