# **Tutorial 1. DBC (Diffusion Behavior Clone) for FrankaKitchen**
## 1. Introduction

In this tutorial, we will demonstrate how to implement a basic DBC (Diffusion Behavior Cloning) using CleanDiffuser. DBC is an imitation learning algorithm that replicates behaviors from an offline demonstration dataset. It leverages a diffusion model to generate actions based on the current observations. The underlying concept is similar to diffusion-based image generation models, with the key difference being that DBC generates actions conditioned on the state $\bm s$ instead of images.

Imitation learning relies on expert demonstrations. In this tutorial, we’ll tackle the RelayKitchen task, which involves a 9-DoF position-controlled Franka robot interacting with a kitchen environment. This environment includes an openable microwave, four turnable oven burners, an oven light switch, a movable kettle, two hinged cabinets, and a sliding cabinet door. The task includes 566 human demonstrations of various activities, such as opening the microwave, turning on the oven light, and moving the kettle. The goal is to train agents to imitate these demonstrations and complete as many tasks as possible within a limited time frame.

Let’s begin by downloading the expert demonstrations!

In [1]:
! mkdir ./dev
! cd ./dev
! wget https://diffusion-policy.cs.columbia.edu/data/training/kitchen.zip
! unzip kitchen.zip
! rm kitchen.zip
! cd ..

mkdir: cannot create directory ‘./dev’: File exists
--2024-09-07 14:48:49--  https://diffusion-policy.cs.columbia.edu/data/training/kitchen.zip
Connecting to 172.19.135.130:5000... connected.
Proxy request sent, awaiting response... 200 OK
Length: 777744116 (742M) [application/zip]
Saving to: ‘kitchen.zip’


2024-09-07 14:49:49 (12.5 MB/s) - ‘kitchen.zip’ saved [777744116/777744116]

Archive:  kitchen.zip
   creating: kitchen/
  inflating: kitchen/all_init_qvel.npy  
   creating: kitchen/kitchen_demos_multitask/
   creating: kitchen/kitchen_demos_multitask/friday_kettle_switch_hinge_slide/
  inflating: kitchen/kitchen_demos_multitask/friday_kettle_switch_hinge_slide/kitchen_playdata_2019_06_28_15_26_42.mjl  
  inflating: kitchen/kitchen_demos_multitask/friday_kettle_switch_hinge_slide/kitchen_playdata_2019_06_28_15_24_31.mjl  
  inflating: kitchen/kitchen_demos_multitask/friday_kettle_switch_hinge_slide/kitchen_playdata_2019_06_28_15_30_01.mjl  
  inflating: kitchen/kitchen_demos_multi

## 2. Setting up the Environment and the Dataset

CleanDiffuser provides a straightforward interface to set up the environment and dataset. The code below shows how to create a gym-like environment and a PyTorch Dataset class for the FrankaKitchen task.

In [3]:
import gym

from cleandiffuser.dataset.kitchen_dataset import KitchenDataset
from cleandiffuser.env import kitchen

env = gym.make("kitchen-all-v0")
dataset = KitchenDataset("dev/kitchen", horizon=1, pad_before=0, pad_after=0, abs_action=False)

data = dataset[0]
obs, act = data["state"], data["action"]
obs_dim, act_dim = dataset.obs_dim, dataset.act_dim
print(f"Finish loading data. Observation shape: {obs.shape}. Action shape {act.shape}.")

Reading configurations for Franka
[40m[97mInitializing Franka sim[0m
Finish loading data. Observation shape: (1, 60). Action shape (1, 9).


## 3. Building the Diffusion Model
Following the DBC approach, we use a diffusion model to generate expert actions based on the current observations. We utilize a DDPM with `PearceMlp` as the neural network backbone and `PearceObsCondition` as the conditioning network. Once the networks are set up, building the diffusion model is simply a matter of integrating them!

In [6]:
import torch

from cleandiffuser.diffusion import ContinuousDiffusionSDE
from cleandiffuser.nn_condition import PearceObsCondition
from cleandiffuser.nn_diffusion import PearceMlp

nn_diffusion = PearceMlp(
    x_dim=act_dim, condition_horizon=1, emb_dim=128, hidden_dim=512, timestep_emb_type="untrainable_fourier"
)
""" nn.Module: xt (bs, act_dim) x t (bs, ) x condition (bs, condition_horizon * emb_dim) -> eps_theta (bs, act_dim) """
nn_condition = PearceObsCondition(obs_dim=obs_dim, emb_dim=128, flatten=True, dropout=0.0)
""" nn.Module: obs (bs, condition_horizon, obs_dim) x t (bs, ) -> condition (bs, condition_horizon * emb_dim) if `flatten` else (bs, condition_horizon, emb_dim) """

# Since the action space is [-1, 1], we can set `x_max` and `x_min` to constrain the generated actions.
actor = ContinuousDiffusionSDE(
    nn_diffusion,
    nn_condition,
    x_max=torch.full((act_dim,), fill_value=1.0),
    x_min=torch.full((act_dim,), fill_value=-1.0),
)

## 4. Training the Diffusion Model

### 4.1 PyTorch Lightning Approach
All diffusion models in CleanDiffuser are implemented as `LightningModules`, making it easy to train them using PyTorch Lightning Trainers. PyTorch Lightning simplifies the process of training deep learning models and supports features like distributed training, mixed precision training, and automatic checkpointing with just a few lines of code. To set up the Trainer, you'll need:

- A CleanDiffuser `DiffusionModel`.
- A PyTorch `DataLoader` that organizes the batch data as a dictionary. The keys are `x0`, `condition_cfg`, and `condition_cg` by default. `x0` contains samples from the target distribution and is required. `condition_cfg` and `condition_cg` represent the CFG/CG conditions for the diffusion model and are optional; they can be set to `None` or not included if not used.

Here’s an example of how to set up the Trainer and train the diffusion model.

**NOTE:** Setting up the PyTorch Lightning Trainer requires a specific configuration. You’ll either need to create a Dataset class that returns the required dictionary format, or use a Wrapper to adapt the data. The `BC_Wrapper` below demonstrates how to adapt the data. The `KitchenDataset` organizes the batch as `batch = {"state": torch.Tensor of shape (batch_size, horizon, state_dim), "action": torch.Tensor of shape (batch_size, horizon, action_dim)}`. Using `BC_Wrapper`, we adapt the batch to the required format: `{"x0": torch.Tensor of shape (batch_size, action_dim), "condition_cfg": torch.Tensor of shape (batch_size, state_dim)}`.


In [8]:
import pytorch_lightning as L
from pytorch_lightning.callbacks import ModelCheckpoint


class BC_Wrapper(torch.utils.data.Dataset):
    def __init__(self, dataset: torch.utils.data.Dataset):
        self.dataset = dataset

    def __len__(self):
        return len(self.dataset)

    def __getattr__(self, name):
        return getattr(self.dataset, name)

    def __getitem__(self, idx):
        data = self.sampler.sample_sequence(idx)
        return {
            "x0": data["action"][0],
            "condition_cfg": data["state"][0],
        }


save_path = "results/tutorial1_dbc_for_kitchen/"

dataloader = torch.utils.data.DataLoader(
    BC_Wrapper(dataset), batch_size=512, shuffle=True, num_workers=4, persistent_workers=True
)

callback = ModelCheckpoint(dirpath=save_path, filename="dbc-{step}", every_n_train_steps=10_000)

trainer = L.Trainer(
    accelerator="gpu",
    devices=[0],
    max_steps=200_000,
    deterministic=True,
    log_every_n_steps=200,
    default_root_dir=save_path,
    callbacks=[callback],
)

trainer.fit(actor, dataloader)

GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1,2,3]

  | Name         | Type       | Params | Mode 
----------------------------------------------------
0 | model        | ModuleDict | 782 K  | train
1 | model_ema    | ModuleDict | 782 K  | eval 
  | other params | n/a        | 20     | n/a  
----------------------------------------------------
782 K     Trainable params
782 K     Non-trainable params
1.6 M     Total params
6.261     Total estimated model params size (MB)
29        Modules in train mode
29        Modules in eval mode


Epoch 793:  65%|██████▌   | 164/252 [00:03<00:01, 44.20it/s, v_num=1, diffusion_loss=0.0499]

`Trainer.fit` stopped: `max_steps=200000` reached.


Epoch 793:  65%|██████▌   | 164/252 [00:03<00:01, 44.15it/s, v_num=1, diffusion_loss=0.0499]


One of the key advantages of PyTorch Lightning is its native support for distributed training. By simply setting the `devices` argument to a list of GPU IDs, you can train the model on multiple GPUs. For more advanced features, such as mixed precision training or gradient accumulation, you can refer to the [PyTorch Lightning documentation](https://lightning.ai/docs/pytorch/stable/common/trainer.html).

### 4.2 Manual Updating Approach

For some offline RL algorithms, the training process may involve multiple components that need to be updated in a specific order. In such cases, you can manually update the diffusion model using the `update_diffusion` method. This method processes a batch of data, performs one update step, and returns the loss. Below is an example of a single training step using this approach.

In [16]:
batch = next(iter(dataloader))

update_log = actor.update_diffusion(x0=batch["x0"], condition_cfg=batch["condition_cfg"])

print(update_log)

{'diffusion_loss': 0.05957807973027229}


## 5. Evaluation

After training, the model’s performance can be evaluated by sampling actions from the model and running the agent in the environment. The `sample` method allows you to generate actions from the model. Below is an example of how to evaluate the model by running the agent in a parallel environment.

In [30]:
import numpy as np

n_seeds = 5
success_rate_for_n_tasks = np.zeros(5)

# device for evaluation
device = "cuda:0"

# loading from checkpoint
actor.load_state_dict(
    torch.load("results/tutorial1_dbc_for_kitchen/dbc-step=200000.ckpt", map_location=device)["state_dict"]
)
actor.to(device).eval()

# evaluating
env_eval = gym.vector.make("kitchen-all-v0", num_envs=50)
normalizer = dataset.get_normalizer()

for _ in range(n_seeds):
    obs, all_done, ep_rew, t = env_eval.reset(), False, 0, 0

    while not np.all(all_done):
        obs = torch.tensor(normalizer["state"].normalize(obs), device=device)

        prior = torch.zeros((50, act_dim))

        act, log = actor.sample(prior, solver="ddpm", sample_steps=5, condition_cfg=obs, w_cfg=1.0)
        act = act.cpu().numpy()
        act = normalizer["action"].unnormalize(act)

        obs, rew, done, info = env_eval.step(act)
        all_done = np.logical_or(all_done, done)
        ep_rew += rew
        t += 1

        print(f"[t={t}] Task completed: {ep_rew}")

    for i in range(5):
        success_rate_for_n_tasks[i] += ((ep_rew >= i + 1).sum() / 50)

env_eval.close()
success_rate_for_n_tasks /= n_seeds

print(f"Success rate (>= n tasks): {success_rate_for_n_tasks}")

Reading configurations for Franka
[40m[97mInitializing Franka sim[0m
Reading configurations for Franka

[40m[97mInitializing Franka sim[0mReading configurations for FrankaReading configurations for FrankaReading configurations for Franka


[40m[97mInitializing Franka sim[0m[40m[97mInitializing Franka sim[0m

[40m[97mInitializing Franka sim[0m
Reading configurations for FrankaReading configurations for Franka

Reading configurations for FrankaReading configurations for Franka[40m[97mInitializing Franka sim[0m

Reading configurations for Franka
[40m[97mInitializing Franka sim[0m[40m[97mInitializing Franka sim[0m
[40m[97mInitializing Franka sim[0m
Reading configurations for Franka


[40m[97mInitializing Franka sim[0mReading configurations for FrankaReading configurations for Franka
[40m[97mInitializing Franka sim[0m
[40m[97mInitializing Franka sim[0m


[40m[97mInitializing Franka sim[0m
Reading configurations for Franka
[40m[97mInitializing Franka 

The results are impressive! When comparing with the official report, the tutorial model achieves significantly better performance using only 5 sampling steps and no history observations, while the official model uses 50 sampling steps and 2 history observations. We believe that using more advanced solvers and sampling schedules could further enhance the model's performance.

||Sampling Steps|History Observations|Tasks>=1|Tasks>=2|Tasks>=3|Tasks>=4|Tasks>=5|
|---|---|---|---|---|---|--|--|
|Offical|50|2|99|94|82|68|2|
|Tutorial 1|5|0|100|99.2|94.8|77.6|4|