# Training Assistance

## Loading a local file

If you created a dataset using the ‘Controller’ class. You can load the data by simply using the ‘BaselineDataset’ class and inform the path to the local file.

---

Dataset will have information about the average reward from the ‘teacher’ and consists of tuples of (𝑠, 𝑎, 𝑠′ ), where 𝑠 is the current state, and 𝑠′ the next state given action 𝑎. Additionally, if the user wants to access the original information, the ‘BaselineDataset’ provides it on ‘dataset.data’, which is a dictionary.

In [14]:
from imitation_datasets.dataset import BaselineDataset

In [15]:
dataset = BaselineDataset("./dataset/pendulum/teacher.npz")

Creating dataset: 100%|███████████████████████████████████████████| 100/100 [00:00<00:00, 1539.42it/s]


In [16]:
dataset.average_reward, dataset.states.shape

(-100.08150779832427, torch.Size([19900, 3]))

In [17]:
state, action, next_state = dataset[0]
state.shape, action.shape, next_state.shape

(torch.Size([3]), torch.Size([1]), torch.Size([3]))

## Using to assist training

Simple example for training a Behavioural Cloning agent in LunarLander-v2 environment.
In this example, we are using a simplistic training/evaluation loop to train an MLP with 2 hidden layers, each with 32 neurons and a output layer with 4 neurons, the Adam optimizer and a Cross Entropy loss function.
We divide the data into a 70/30 split, with 700 episodes to train the agent and 300 to evaluate.

The data is available at: https://huggingface.co/datasets/NathanGavenski/LunarLander-v2

In [18]:
from typing import Tuple, Union

import gymnasium as gym
import numpy as np
import torch
from torch import nn
from torch import optim
from torch.nn import LeakyReLU
from torch.utils.data import DataLoader
from torchvision.ops import MLP

from imitation_datasets.dataset import BaselineDataset
from imitation_datasets.dataset.metrics import accuracy as accuracy_fn

In [19]:
def loop(
    model: nn.Module,
    dataloader: DataLoader,
    optimizer: optim.Optimizer = None,
    loss_fn: nn.Module = None,
    train: bool = False
) -> Union[Tuple[float, float], float]:
    """This is a loop to train and evaluate the model."""
    model = model.train() if train else model.eval()
    epoch_loss = []
    epoch_acc = []
    for (state, action, next_state) in dataloader:
        bc.zero_grad()
        predictions = model(state.float())
        if train:
            loss = loss_fn(predictions, action.squeeze(-1).long())
            loss.backward()
            optimizer.step()
        acc = accuracy_fn(predictions, action.squeeze(-1))
        if train:
            epoch_loss.append(loss.item())
        epoch_acc.append(acc)
    if train:
        return np.mean(epoch_loss), np.mean(epoch_acc)
    return np.mean(epoch_acc)

In [20]:
# Model
bc = MLP(in_channels=8, hidden_channels=[32, 32, 4], activation_layer=LeakyReLU)
loss_fn = nn.CrossEntropyLoss()
optimizer = optim.Adam(bc.parameters(), lr=1e-3)

# Data
dataset_train = BaselineDataset("NathanGavenski/LunarLander-v2", source="hf", n_episodes=700)
dataset_eval = BaselineDataset("NathanGavenski/LunarLander-v2", source="hf", n_episodes=700, split="eval")
dataloader_train = DataLoader(dataset_train, batch_size=32, shuffle=True)
dataloader_eval = DataLoader(dataset_eval, batch_size=32, shuffle=True)

# Train
loss, acc = loop(bc, dataloader_train, optimizer, loss_fn, train=True)
print(f"Train - Avg Loss: {round(loss, 4)} Avg Acc: {round(acc, 2)}%")

# Eval
acc = loop(bc, dataloader_eval, train=False)
print(f"Eval - Avg Acc: {round(acc, 2)}%")

Creating dataset: 100%|████████████████████████████████████████████| 700/700 [00:01<00:00, 593.86it/s]
Creating dataset: 100%|███████████████████████████████████████████| 300/300 [00:00<00:00, 1389.37it/s]


Train - Avg Loss: 0.3667 Avg Acc: 86.43%
Eval - Avg Acc: 94.02%


In [21]:
# Test
env = gym.make("LunarLander-v2")
state, _ = env.reset()
acc_reward = 0
done = False

while not done:
    with torch.no_grad():
        action = torch.argmax(bc(torch.from_numpy(state)[None]), dim=1).item()
        state, reward, done, terminated, info = env.step(action)
        done |= terminated
        acc_reward += reward
print(f"Test Reward: {round(acc_reward, 4)}")

Test Reward: 221.979


## Using as inheritance

An example for using ‘BaselinesDataset’ in inheritance to create a sequential dataset.

In [22]:
from typing import Tuple
import torch
from torch.nn.utils.rnn import pad_sequence
import numpy as np
from tqdm import tqdm

In [23]:
class SequenceDataset(BaselineDataset):
    """
    Squence dataset for the BaselineDataset from IL-Dataset.
    """
    def __init__(
        self,
        path: str,
        source: str = "local",
        split: str = "train",
        n_episodes: int = None,
    ) -> None:
        super().__init__(path, source, split, n_episodes)
        episode_starts = list(np.where(self.data["episode_starts"] == 1)[0])
        episode_starts.append(len(self.data["episode_starts"]))
        
        if n_episodes is not None:
            if split == "train":
                episode_starts = episode_starts[:n_episodes + 1]
            else:
                episode_starts = episode_starts[n_episodes:]

        self.lenghts = []
        self.sequences = []
        self.sequences_actions = []
        for start, end in zip(episode_starts, tqdm(episode_starts[1:], desc="Creating sequence")):
            episode = self.data["obs"][start:end]
            episode = torch.from_numpy(episode)
            actions = torch.from_numpy(self.data["actions"][start:end].reshape((-1, 1)))
            self.lenghts.append(episode.shape[0])
            self.sequences.append(episode)
            self.sequences_actions.append(actions)

        self.sequences = pad_sequence(self.sequences, batch_first=True)
        self.sequences_actions = pad_sequence(self.sequences_actions, batch_first=True)

    def __len__(self) -> int:
        return self.sequences.shape[0]
    
    def __getitem__(self, index: int) -> Tuple[torch.Tensor, torch.Tensor, list[int]]:
        return self.sequences[index], self.lenghts[index], self.sequences_actions[index]

In [24]:
dataset = SequenceDataset("./dataset/pendulum/teacher.npz")

Creating dataset: 100%|███████████████████████████████████████████| 100/100 [00:00<00:00, 1538.19it/s]
Creating sequence: 100%|██████████████████████████████████████████| 100/100 [00:00<00:00, 1689.63it/s]


In [25]:
dataset.sequences.shape, dataset.sequences_actions.shape

(torch.Size([100, 200, 3]), torch.Size([100, 200, 1]))

In [26]:
episode, length, action = dataset[0]
episode.shape, length, action.shape

(torch.Size([200, 3]), 200, torch.Size([200, 1]))