# Dataset Creation

## Looking for available environments/experts

Here we can see all available environments/experts from IL-Datasets. We can also search for a
specific key if we want to check whether IL-Datasets has an environment.

In [1]:
import pprint
from imitation_datasets import Controller, Experts, Policy
from imitation_datasets.functions import baseline_collate, baseline_enjoy

import warnings
warnings.filterwarnings("ignore")

# We need to import nest_asyncio to run on jupyter notebooks
# since the Controller class instantiates its own asyncio process.
import nest_asyncio
nest_asyncio.apply()

In [2]:
pp = pprint.PrettyPrinter(indent=4)
pp.pprint(Experts.get_register())

{   'acrobot': Policy(name='Acrobot-v1',
                      repo_id='sb3/dqn-Acrobot-v1',
                      filename='dqn-Acrobot-v1.zip',
                      threshold=-75.0,
                      algo=<class 'stable_baselines3.dqn.dqn.DQN'>,
                      policy=None,
                      internal_state=None,
                      environment=None),
    'ant': Policy(name='Ant-v3',
                  repo_id='sb3/td3-Ant-v3',
                  filename='td3-Ant-v3.zip',
                  threshold=5822.0,
                  algo=<class 'stable_baselines3.td3.td3.TD3'>,
                  policy=None,
                  internal_state=None,
                  environment=None),
    'ant-1': Policy(name='Ant-v3',
                    repo_id='sb3/sac-Ant-v3',
                    filename='sac-Ant-v3.zip',
                    threshold=5181,
                    algo=<class 'stable_baselines3.sac.sac.SAC'>,
                    policy=None,
                    internal_state=N

In [3]:
Experts.get_expert("pendulum")

## Register new expert weights and create new dataset

Since theren’t any available expert weights for the Pendulum-v1 environment, we will register a
new expert, and use it to create a new dataset with only 100 episodes.

We are using: https://huggingface.co/HumanCompatibleAI/ppo-Pendulum-v1

In [4]:
from stable_baselines3 import PPO
Experts.register(
    "pendulum",
    Policy(
        name="Pendulum-v1",
        repo_id="HumanCompatibleAI/ppo-Pendulum-v1",
        filename="ppo-Pendulum-v1.zip",
        threshold=-189,
        algo=PPO,
    )
)
Experts.get_expert("pendulum")

Policy(name='Pendulum-v1', repo_id='HumanCompatibleAI/ppo-Pendulum-v1', filename='ppo-Pendulum-v1.zip', threshold=-189.0, algo=<class 'stable_baselines3.ppo.ppo.PPO'>, policy=None, internal_state=None, environment=None)

In [5]:
from imitation_datasets.dataset.random_dataset import create_arguments
args = create_arguments({
    "--game": "pendulum",
    "--episodes": "100",
    "--threads": "4",
    "--mode": "all"
})

controller = Controller(
    baseline_enjoy,
    baseline_collate,
    args.episodes,
    args.threads,
)
controller.start(args)

Running episodes: 100%|█████████████████████████████████████████████| 100/100 [00:14<00:00,  4.91it/s]
Running episodes: 100%|█████████████████████████████████████████████| 100/100 [00:16<00:00,  6.17it/s]


In [6]:
import numpy as np

dataset = np.load("./dataset/pendulum/teacher.npz", allow_pickle=True)
dataset["obs"].shape, dataset["actions"].shape, dataset["episode_returns"].mean()

((20000, 3), (20000,), -100.08150779832427)

# Create file for HuggingFace

If the user wants to send the data to HuggingFace afterwards, IL-Datasets provide a ‘baseline_to_huggingface’ function, 
which transforms the ‘teacher.npz’ file into a ‘teacher.jsonl’ file that can be uploaded directly to HuggingFace.

In [7]:
from imitation_datasets.dataset.huggingface import baseline_to_huggingface
baseline_to_huggingface("./dataset/pendulum/teacher.npz", "./dataset/pendulum/teacher.jsonl")

Writing into file: 100%|████████████████████████████████████| 20000/20000 [00:00<00:00, 142621.92it/s]


In [8]:
!xdg-open ./dataset/pendulum/teacher.jsonl