# Robomimic Lowdim
This notebook documents the usage of Robomimic Lowdim in the context of diffusion policies. Lowdim means that the observations using **few variables** (opposed to e.g. images). 

In [1]:
# custom command to allow code documentation without execution
from IPython.core.magic import register_cell_magic

@register_cell_magic
def skip(line, cell):
    return

## Dataset

### Dataloading from config file
The Robomimic Lowdim dataset contains data for the following tasks:
- can
- lift
- square
- tool_hang
- transport

All of these tasks are loaded with the same `RobomimicReplayLowdimDataset()` dataloader defined in `./diffusion_policy/diffusion_policy/dataset/robomimic_replay_lowdim_dataset.py`. The config file for tasks needs to have a specific structure that can be investigated in `./diffusion_policy/diffusion_policy/config/task/`:

```bash
dataset:
  _target_: diffusion_policy.dataset.robomimic_replay_lowdim_dataset.RobomimicReplayLowdimDataset
  dataset_path: *dataset_path
  horizon: ${horizon}
  pad_before: ${eval:'${n_obs_steps}-1+${n_latency_steps}'}
  pad_after: ${eval:'${n_action_steps}-1'}
  obs_keys: *obs_keys
  abs_action: *abs_action
  use_legacy_normalizer: False
  rotation_rep: rotation_6d
  seed: 42
  val_ratio: 0.02
```

The following cell demonstrates how you can use a configuration file to instantiate a dataloader for a dataset of your choice. Note that this configuration file is usually wrapped in another workspace configuration file. We have created an explicit `/config/test/`-folder with a configuration that does not require this instantiation.

In [1]:
import hydra
from omegaconf import OmegaConf
import sys
from pathlib import Path
sys.path.append(str(Path.cwd() / "diffusion_policy"))

import torch
from torch.utils.data import DataLoader
from diffusion_policy.diffusion_policy.dataset.base_dataset import BaseLowdimDataset

# step 1: specify the .config file
config_path = "./diffusion_policy/diffusion_policy/config/test"

with hydra.initialize(config_path=config_path):
        cfg = hydra.compose(config_name="lift_lowdim")
        OmegaConf.resolve(cfg)
        dataset = hydra.utils.instantiate(cfg.dataset)

# step 2: instantiate dataset from cfg
dataset = hydra.utils.instantiate(cfg.dataset)
print(type(dataset))

# step 3: instantiate DataLoader for dataset
train_dataloader = DataLoader(dataset, **cfg.dataloader)

  from .autonotebook import tqdm as notebook_tqdm
The version_base parameter is not specified.
Please specify a compatability version level, or None.
Will assume defaults for version 1.1
  with hydra.initialize(config_path=config_path):
Loading hdf5 to ReplayBuffer: 100%|██████████| 200/200 [00:00<00:00, 387.59it/s]
Loading hdf5 to ReplayBuffer: 100%|██████████| 200/200 [00:00<00:00, 778.27it/s]

<class 'diffusion_policy.dataset.robomimic_replay_lowdim_dataset.RobomimicReplayLowdimDataset'>





The raw data are then processed using the `train_dataloader`. To understand how the processed data looks like, we process an example batch:

In [2]:
for batch in train_dataloader:
    
    if isinstance(batch, dict):
        print(f"Batch Keys: {list(batch.keys())}")
    
    print(f"Observation Shape: {batch['obs'].shape}")
    print(f"Action Shape: {batch['action'].shape}")

    observation_batch = batch['obs']
    first_observation_sequence = observation_batch[0]
    first_observation = first_observation_sequence[0]

    action_batch = batch['action']
    first_action_sequence = action_batch[0]
    first_action = first_action_sequence[0]

    print(f"Example Observation: {first_observation}")
    print(f"Example Action: {first_action}")
    
    break

Batch Keys: ['obs', 'action']
Observation Shape: torch.Size([256, 10, 19])
Action Shape: torch.Size([256, 10, 7])
Example Observation: tensor([ 0.0264,  0.0270,  0.8314,  0.0000,  0.0000,  0.9691,  0.2466, -0.1169,
        -0.0422,  0.1804, -0.0905, -0.0152,  1.0118,  0.9972, -0.0072,  0.0740,
         0.0019,  0.0208, -0.0208])
Example Action: tensor([-0.0000,  0.0000,  0.0000,  0.0038,  0.1482,  0.0145, -1.0000])


As we can see, the batch size (as specified in the configuration file) is 256. For each sample in the batch, we have 10 observations and actions. 

### Understanding Data Preprocessing
Now that we now how a single observation looks like, let's understand in more detail which data preprocessing steps are applied by the DataLoader. We start by reading the dataset from its original format, i.e. hdf5.

In [3]:
import h5py

data_path = "/home/luca_daniel/tum-adlr-04/diffusion_policy/data/robomimic/datasets/lift/ph/low_dim.hdf5"

with h5py.File(data_path, "r") as file:
    print("===== Original Dataset Information =====")
    print("Dataset Keys:", list(file.keys()))
    
    # the 'data' key of the dictionary contains the demonstrations, i.e. training data
    data = file['data']
    print(f"Data Keys: {list(data.keys())[:5]}")
    example_demo = data['demo_0']
    print(f"Demo Keys: {list(example_demo.keys())}")

    print(f"Observations: {list(example_demo['obs'].keys())}")
    print(f"Actions: {example_demo['actions']}")
    

===== Original Dataset Information =====
Dataset Keys: ['data', 'mask']
Data Keys: ['demo_0', 'demo_1', 'demo_10', 'demo_100', 'demo_101']
Demo Keys: ['actions', 'dones', 'next_obs', 'obs', 'rewards', 'states']
Observations: ['object', 'robot0_eef_pos', 'robot0_eef_quat', 'robot0_eef_vel_ang', 'robot0_eef_vel_lin', 'robot0_gripper_qpos', 'robot0_gripper_qvel', 'robot0_joint_pos', 'robot0_joint_pos_cos', 'robot0_joint_pos_sin', 'robot0_joint_vel']
Actions: <HDF5 dataset "actions": shape (59, 7), type "<f8">


The authors of *Diffusion Policy* only use observations and actions from the dataset. This data is preprocessed using the following steps:
1. `_data_to_obs()`: Transforms training demonstrations into the desired format
    - reshapes action dimension, in case a dual arm is used
    - extracts `pos`, `rot`, `gripper` from raw action
    - applies `RotationTransformer()` to the rotation to get from axis-angle to 6D-rotation
    - wraps this into a dictionary with keys `obs` and `action` 
2. `get_val_mask()`: Determines which of the episodes should be used for validation (specified using a boolean mask)
3. `downsample_mask()`: Downsample the training mask, i.e. do not use all *remaining* episodes for training but only a subset
4. `SequenceSampler()`: The Sequence Sampler can return a sequence of observation/action-pairs that can be used for training

**Note**: The `SequenceSampler` uses a ReplayBuffer to sample sequences from. Each demo is converted to the desired format using `_data_to_obs()`. Once all demos have been added, they are stored in the ReplayBuffer. Each demo is viewed as one *episode* that has very long trajectories. Therefore we can artificially generate more training iterations by dividing the trajectories into smaller chunks to learn from.

## Training Procedure

Each robot (experiment) setup has its own training workspace. The workspace is defined as a class that inherits its basic configutation from a parent general class. For the robomimic case:
- Training workspace class: `TrainrobomimicLowdimWorkspace(Baseworkspace)`
- Parent class: `Baseworkspace()`
    - The constructor receives a configuration object `OmegaConf` and an optionaly a directory
        - The configuration is stored in the YALM file `train_robomimic_lowdim_workspace.yaml`: it provides information on horizon, No of observations, training parameters, policy etc.
        - From the configuration object the constructor extracts the seed and set it in the remaing objects of numpy, torch and random
        - It configures the policy model. The model is an instance of the class `RobomimicLowdimPolicy` which recives the `policy` stored in the workspace config file.
        - Initialize training states global step and epoch

In [4]:
""" --config-name=train_diffusion_unet_real_image_workspace
    --config-name=train_robomimic_lowdim_workspace
    --config-name=train_diffusion_transformer_lowdim_workspace
"""

import diffusion_policy.workspace.train_robomimic_lowdim_workspace as trws

config_path = "./diffusion_policy/diffusion_policy/config/"
#str(pathlib.Path(__file__).parent.parent.joinpath("config"))
config_name = "train_diffusion_ours_lowdim_workspace" 
#pathlib.Path(__file__).stem
# Initialize Hydra
with hydra.initialize(config_path=config_path, version_base=None):
    cfg_ws = hydra.compose(config_name=config_name)    
    #OmegaConf.resolve(cfg_ws)

    # Loading workspace
    workspace = trws.TrainRobomimicLowdimWorkspace(cfg_ws)
    

- Methods:
    - `run`
        - check if the training is resumed from a checkpoint and continues from there. Resume alway set to true.
        - Loads the data set into an object of class `BaseLowdimDataset` using the DataLoader using config YAML file for task
        - Normalize data
        - Configures the environment. Creates an object of class BaseLowdimRunner using config YAML file for task
        - Configures logging of training data using Wandb library
        - Configures checkpoints. Configures transfer to GPU
        - Launches training loop, iteriting over epoch and batch size (specified in the config files). Saves checkpoints. Using the models network/transformer architectures .train()
        - Logs training data
        - Performs evaluation of epochs using model .eval() according to architecture.
            - It runs the rollout and runs validation
            - Updates checkpoints with validation data.


## Policy

Policy configuration and execution in an object of class `RobomimicLowdimPolicy` which inherits from class `BaseLowdimPolicy`.
- Constructor initializes object with action dimensions, observations dimension, algorithm, type of task (square, lift, etc) and data set type.
    - Configure robomimic instance using Robomimic library. To that end passes the type of data (low_dim, low_dim_sparse, low_dim_dense, or image), the type of algortihm (e.g. behavoir cloning bc_rnn), the task, and the data set type.
    - Creates a model of class `PolicyAlgo` from Robomimic.algo.algo
- Inference: `Predict_action` method. Takes an observation dictiornary and returns the an action compatible with robomimic shapes and formating.
- Methods for the training stage
    - `to` to select device CPU,GPU...
    - `state_normalizer` normalize training data either by limits or guassian method. For consistency in traning and testing.
    - `train on batch` normalize batch observations and actions for training. 

### Creating a custom model
In this section, we will create a custom model and describe the key aspects that need to be considered when creating a customized network architecture for diffusion models. Our architecture is a simple MLP-based diffusion architecture. The first step is creating a workspace file that you can find in `train_diffusion_mlp_lowdim_workspace.py` with an extensive documentation.

## Inference

### Lowdim Wrapper


Initialize class using `RobomimicLowdimWrapper()` as described in `robomimic_lowdim_wrapper.py`
+ OpenAI Gym Simulation environment
    + `.seed(seed)`  sets seed to create environment
    + `.step(action)` performs next action and returns `observation`, `reward`, `done`, `info`
    + `.reset()` resets the environment (observation), possible to reset to set seed
    + `.get_observation()` returns the current observation
+ An observation has the following attributes:
    + `object`
    + `pos`: position of the robot
    + `qpos`: joint positions of the robot
    + `quat`: quaternion of the robot


### Lowdim Policy

Initialize class using `RobomimicLowdimPolicy()` as described in `robomimic_lowdim_policy.py`
+ Inherits from `BaseImagePolicy` class as defined in `base_image_policy.py` which has the following key properties:
    + `predict_action(obs_dict)`: Function stub to predict the next action given the observation
    + `reset()`: Function stub to reset the policy
    + `set_normalizer()`: Function stub to set the policy's normalizer
+ Extends `BaseLowdimPolicy`
    + For initialization
        + `get_robo_mimic_config()`: Creates a config file for robomimic based on algorithm, observation type, task, and dataset type
        + `algo_factory()`: Initializes a model for the given algorithm based on config file and available actions
    + For training
        + `train_on_batch()`: Uses observations and actions. Preprocesses a robomimic batch and calls `model.train()` to train the model
        + `get_optimizer()`: optimize policy after based on previous policy
    + For inference
        + `predict_action()`: Predicts the next action given the observation

### Lowdim Runner

Initialize class using `RobomimicLowdimRunner(**lowdimRunner_cfg, output_dir)` as described `robomimic_lowdim_runner.py`. Example usage is given in `test_robomimic_image_runner.py`.
+ Inherits from `BaseImageRunner` class as defined in `base_lowdim_runner.py` which has the following key properties:
    + `run()`: Function stub to run the policy
    + `save()`: Function stub to save the policy
    + `load()`: Function stub to load the policy
+ Extends `BaseLowdimRunner`
    + For initialization
        + initializes configuration attributes and paths
        + wraps `RobomimicLowdimWrapper` in a `VideoRecordingWrapper` to generate output videos
    + For training
        + initializes output directory
        + configures path for rendered output videos
    + For running
        + Locates video data and divides it into chunks
        + For each chunk, reset policy and observations
        + Then, run the simulator (i.e. obtain `action_dict` and call `.step()` until `done` for all chunks)
        + Use `env.render()` to add video paths to the output
