## Robomimic Get Started Tutorial

This notebook implements a simple training loop without the extensive features offered in robomimic such as logging and hyperparameter sweeping. Please refer to the [repository](https://github.com/ARISE-Initiative/robomimic) and the [documentation](https://robomimic.github.io/docs/introduction/overview.html) for the full set of features and the rest of the pipeline.

This notebook includes the following tutorials:

1. Set up robomimic development environment
2. Downloading task-specific dataset
3. Create a naive behavior cloning policy
4. Setup a simple training loop
5. Run policy training
6. Visualize the trained policy

###0. Use GPU to accelerate training

To use GPU runtime, click runtime on the top navigation part -> change runtime type -> select GPU as your accelerator

In [None]:
import os
# First, we need to decide where to host the runtime storage
USE_GDRIVE_STORAGE = True

if not USE_GDRIVE_STORAGE:
    # Option 1: use the colab runtime storage. All trained model and downloaded
    # will disappear after you disconnect from the runtime.
    WS_DIR = "/content/"
    os.system("git clone https://github.com/ARISE-Initiative/robomimic")
    os.system("git clone https://github.com/ARISE-Initiative/robosuite.git")

else:
    # Option 2: use your google drive as the runtime storage. You need to grant
    # permission for the colab runtime to access your google drive. You also
    # need to decide on a workspace for robomimic
    from google.colab import drive
    drive.mount('/content/drive')
    WS_DIR = "/content/drive/MyDrive/01_research/ICRA2023-cybersecurity/error-awareness-detector" # this should be the absolute path, e.g., "/content/drive/MyDrive/my-ws/"
    assert os.path.exists(WS_DIR)

%cd $WS_DIR

Mounted at /content/drive
/content/drive/MyDrive/01_research/ICRA2023-cybersecurity/error-awareness-detector


In [None]:
!pip install -e robomimic/

import sys
import os
sys.path.append('./robomimic/')

Obtaining file:///content/drive/MyDrive/01_research/ICRA2023-cybersecurity/error-awareness-detector/robomimic
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting tensorboardX (from robomimic==0.3.0)
  Downloading tensorboardX-2.6.2.2-py2.py3-none-any.whl (101 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m101.7/101.7 kB[0m [31m2.6 MB/s[0m eta [36m0:00:00[0m
Collecting egl_probe>=1.0.1 (from robomimic==0.3.0)
  Downloading egl_probe-1.0.2.tar.gz (217 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m217.5/217.5 kB[0m [31m11.7 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: egl_probe
  Building wheel for egl_probe (setup.py) ... [?25l[?25hdone
  Created wheel for egl_probe: filename=egl_probe-1.0.2-cp310-cp310-linux_x86_64.whl size=476832 sha256=edebcaa2d41de8a51c7c3e59949e9ec2290398b335ac51f5732493730b018f13
  Stored in directory: /root/.cache/pip/

In [None]:
import robomimic

print(robomimic.__file__)

/content/drive/MyDrive/01_research/ICRA2023-cybersecurity/error-awareness-detector/./robomimic/robomimic/__init__.py


In [None]:
# !pip install -e robosuite/
# # !pip install robosuite==1.4.1

# import sys
# import os
# sys.path.append('./robosuite/')

### 1. Set up development environment

The main dependencies of robomimic are
- torch
- numpy
- h5py
- robosuite
- mujoco
- tensorbordX
- egl_probe
- matplotlib


The full list is included in the requirements.txt file in the repo.

Select US keyboard

In [None]:
#install mujoco-py
# !pip install mujoco

## 2. Download demonstration dataset for a task

For robomimic tasks, we organize the demonstration datasets by
- task name (e.g., lift)
- data source (ph - proficient human, mh - multi human, mg - machine-generated)
- observation type (low_dim or image)

For more details of the dataset structure, visit [robomimic documentation](https://robomimic.github.io/docs/datasets/robomimic_v0.1.html) and the [dataset tutorial](https://github.com/ARISE-Initiative/robomimic/blob/master/examples/notebooks/datasets.ipynb)


Here we demonstrate downloading the proficient human (`ph`) dataset with low-dimensional (`low_dim`) observation for the `lift` task.

(https://robomimic.github.io/docs/datasets/robomimic_v0.1.html)


### Read quantities from dataset
Next, let's demonstrate how to read different quantities from the dataset. There are scripts such as scripts/get_dataset_info.py that can help you easily understand the contents of a dataset, but in this example, we'll break down how to do this directly.

First, let's take a look at the number of demonstrations in the file.

In [None]:
import os
import json
import h5py
import numpy as np

download_folder = WS_DIR + "/robomimic_data/"
dataset_path = os.path.join(download_folder, "can_real_image.hdf5")
assert os.path.exists(dataset_path)

# open file
f = h5py.File(dataset_path, "r")

# each demonstration is a group under "data"
demos = list(f["data"].keys())
num_demos = len(demos)

print("hdf5 file {} has {} demonstrations".format(dataset_path, num_demos))

hdf5 file /content/drive/MyDrive/01_research/ICRA2023-cybersecurity/error-awareness-detector/robomimic_data/can_real_image.hdf5 has 200 demonstrations


Next, let's list all of the demonstrations, along with the number of state-action pairs in each demonstration.

In [None]:
# each demonstration is named "demo_#" where # is a number.
# Let's put the demonstration list in increasing episode order
inds = np.argsort([int(elem[5:]) for elem in demos])
demos = [demos[i] for i in inds]

NameError: ignored

In [None]:
for ep in demos:
    num_actions = f["data/{}/actions".format(ep)].shape[0]
    print("{} has {} samples".format(ep, num_actions))

demo_0 has 102 samples
demo_1 has 95 samples
demo_2 has 101 samples
demo_3 has 82 samples
demo_4 has 86 samples
demo_5 has 83 samples
demo_6 has 85 samples
demo_7 has 82 samples
demo_8 has 84 samples
demo_9 has 86 samples
demo_10 has 90 samples
demo_11 has 96 samples
demo_12 has 96 samples
demo_13 has 118 samples
demo_14 has 103 samples
demo_15 has 91 samples
demo_16 has 95 samples
demo_17 has 100 samples
demo_18 has 79 samples
demo_19 has 92 samples
demo_20 has 76 samples
demo_21 has 91 samples
demo_22 has 99 samples
demo_23 has 100 samples
demo_24 has 92 samples
demo_25 has 98 samples
demo_26 has 87 samples
demo_27 has 84 samples
demo_28 has 91 samples
demo_29 has 80 samples
demo_30 has 90 samples
demo_31 has 85 samples
demo_32 has 93 samples
demo_33 has 90 samples
demo_34 has 100 samples
demo_35 has 86 samples
demo_36 has 130 samples
demo_37 has 99 samples
demo_38 has 90 samples
demo_39 has 76 samples
demo_40 has 81 samples
demo_41 has 82 samples
demo_42 has 102 samples
demo_43 has 

In [None]:
def print_h5_items(name, obj):
    if isinstance(obj, h5py.Group):
        print(f"Group: {name}")
    elif isinstance(obj, h5py.Dataset):
        print(f"Dataset: {name}")

f.visititems(print_h5_items)

Group: data
Group: data/demo_0
Dataset: data/demo_0/actions
Dataset: data/demo_0/dones
Dataset: data/demo_0/interventions
Group: data/demo_0/next_obs
Dataset: data/demo_0/next_obs/dq
Dataset: data/demo_0/next_obs/ee_pose
Dataset: data/demo_0/next_obs/ee_vel
Dataset: data/demo_0/next_obs/gripper_position
Dataset: data/demo_0/next_obs/image
Dataset: data/demo_0/next_obs/image_wrist
Dataset: data/demo_0/next_obs/q
Group: data/demo_0/obs
Dataset: data/demo_0/obs/dq
Dataset: data/demo_0/obs/ee_pose
Dataset: data/demo_0/obs/ee_vel
Dataset: data/demo_0/obs/gripper_position
Dataset: data/demo_0/obs/image
Dataset: data/demo_0/obs/image_wrist
Dataset: data/demo_0/obs/q
Dataset: data/demo_0/policy_acting
Dataset: data/demo_0/rewards
Dataset: data/demo_0/states
Dataset: data/demo_0/user_acting
Group: data/demo_1
Dataset: data/demo_1/actions
Dataset: data/demo_1/dones
Dataset: data/demo_1/interventions
Group: data/demo_1/next_obs
Dataset: data/demo_1/next_obs/dq
Dataset: data/demo_1/next_obs/ee_pos

Now, let's dig into a single trajectory to take a look at some of the quantities in each demonstration.

In [None]:
# look at first demonstration
demo_key = demos[0]
demo_grp = f["data/{}".format(demo_key)]

# Each observation is a dictionary that maps modalities to numpy arrays, and
# each action is a numpy array. Let's print the observations and actions for the
# first 5 timesteps of this trajectory.
dones = demo_grp["dones"][:]
rewards = demo_grp["rewards"][:]
states = demo_grp["states"][:]

print("dones")
print(dones)
print("")
print("rewards")
print(rewards)
print("states")
print(states)
print("")


policy_acting = demo_grp["policy_acting"][:]
user_acting = demo_grp["user_acting"][:]
print("policy_acting")
print(policy_acting)
print("user_acting[:10]")
print(user_acting[:10])

img1 = demo_grp["obs"]["image"][0]
img2 = demo_grp["obs"]["image_wrist"][0]

print('image formate')
print(img1.shape)
print(img2.shape)


dones
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1]

rewards
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 1.]
states
[]

policy_acting
[False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 F

In [None]:
def list_datasets_with_prefix(name, obj, prefix):
    if isinstance(obj, h5py.Dataset) and name.startswith(prefix):
        print(name)
        print(obj)

prefix = "data/demo_114/"

with h5py.File(download_folder + 'can_real_image.hdf5', 'r') as file:
    file.visit(lambda name: list_datasets_with_prefix(name, file[name], prefix))

data/demo_114/actions
<HDF5 dataset "actions": shape (86, 7), type "<f8">
data/demo_114/dones
<HDF5 dataset "dones": shape (86,), type "<i8">
data/demo_114/interventions
<HDF5 dataset "interventions": shape (86, 1), type "|b1">
data/demo_114/next_obs/dq
<HDF5 dataset "dq": shape (86, 7), type "<f8">
data/demo_114/next_obs/ee_pose
<HDF5 dataset "ee_pose": shape (86, 7), type "<f8">
data/demo_114/next_obs/ee_vel
<HDF5 dataset "ee_vel": shape (86, 6), type "<f8">
data/demo_114/next_obs/gripper_position
<HDF5 dataset "gripper_position": shape (86, 1), type "<f8">
data/demo_114/next_obs/image
<HDF5 dataset "image": shape (86, 120, 120, 3), type "|u1">
data/demo_114/next_obs/image_wrist
<HDF5 dataset "image_wrist": shape (86, 120, 120, 3), type "|u1">
data/demo_114/next_obs/q
<HDF5 dataset "q": shape (86, 7), type "<f8">
data/demo_114/obs/dq
<HDF5 dataset "dq": shape (86, 7), type "<f8">
data/demo_114/obs/ee_pose
<HDF5 dataset "ee_pose": shape (86, 7), type "<f8">
data/demo_114/obs/ee_vel
<H

In [None]:
for t in range(4):
    print("timestep {}".format(t))
    obs_t = dict()
    # each observation modality is stored as a subgroup
    # pretty-print observation and action using json
    actions_t = demo_grp["actions"][t]
    print("actions")
    print(actions_t)
    actions_t = demo_grp["actions"][t]

    print('next_obs from last time step')
    for k in demo_grp["next_obs"]:
        if "q" in k or "dq" in k or "ee_pose" in k:
          obs_t[k] = demo_grp["obs/{}".format(k)][t] # numpy array
          print(k)
          print(np.array(obs_t[k]))
        else:
          continue
    print('obs from current time step')
    for k in demo_grp["obs"]:
        if "q" in k or "dq" in k or "ee_pose" in k:
          obs_t[k] = demo_grp["obs/{}".format(k)][t+1] # numpy array
          print(k)
          print(np.array(obs_t[k]))
        else:
          continue
    print()

timestep 0
actions
[ 3.11194657e-04 -5.19574546e-03 -5.79637346e-03  1.56704119e-03
 -3.34278618e-02 -1.17795895e-01  1.00000000e+00]
next_obs from last time step
dq
[ 2.03903036e-03 -2.98083968e-04 -1.50669438e-04  4.28737584e-03
  1.38464194e-05 -1.43853540e-04  1.21640102e-03]
ee_pose
[ 4.33142183e-01 -1.27626822e-05  2.91189148e-01 -9.99000192e-01
  4.29054238e-02 -1.25313345e-02  8.22483620e-04]
q
[ 2.34757489e-04 -3.13365848e-01 -4.42706928e-04 -2.51460878e+00
  5.50330178e-04  2.22635361e+00  8.70724031e-01]
obs from current time step
dq
[ 0.00823189  0.00304526 -0.00962748 -0.0019343   0.00186787 -0.00011815
  0.00132628]
ee_pose
[ 4.33140762e-01 -9.47826353e-06  2.91164175e-01 -9.98998702e-01
  4.29464988e-02 -1.25109171e-02  8.44419934e-04]
q
[ 3.24456694e-04 -3.13313350e-01 -5.31447725e-04 -2.51462526e+00
  5.74357804e-04  2.22637547e+00  8.70794253e-01]

timestep 1
actions
[-0.00455739 -0.02020172 -0.01990417 -0.01125307 -0.03416026 -0.1285657
  1.        ]
next_obs from la

### Visualizing demonstration trajectories
Finally, let's play some of these demonstrations back in the simulation environment to easily visualize the data that was collected.

It turns out that the environment metadata stored in the hdf5 allows us to easily create a simulation environment that is consistent with the way the dataset was collected!

In [None]:
import imageio

# prepare to write playback trajectories to video
video_path = os.path.join(download_folder, "playback.mp4")
video_writer = imageio.get_writer(video_path, fps=20)

In [None]:
download_folder

'/content/drive/MyDrive/01_research/ICRA2023-cybersecurity/error-awareness-detector/robomimic_data/'

In [None]:
def playback_trajectory(demo_key, data):
    # Get the images from the hdf5 dataset
    images = f["data/{}/obs/{}".format(demo_key,data)][:]
    for img in images:
        # If the images are not already in uint8 format, you might need to convert them.
        # Here, I assume they are already in the right format.
        video_writer.append_data(img)

In [None]:
from IPython.display import Video

video_path = os.path.join(download_folder, "demo_114_imgw.mp4")
video_writer = imageio.get_writer(video_path, fps=20)
playback_trajectory("demo_114", "image_wrist")
Video(video_path, embed=True)



In [None]:
video_path = os.path.join(download_folder, "demo_114_img.mp4")
video_writer = imageio.get_writer(video_path, fps=20)
playback_trajectory("demo_114", "image")
Video(video_path, embed=True)



# Run a trained policy

This notebook will provide examples on how to run a trained policy and visualize the rollout.

### Train from scrach  by train.py
1. Write a config for training
https://robomimic.github.io/docs/modules/configs.html
https://robomimic.github.io/docs/tutorials/configs.html
2. algo --> bc-rnn
3. name --> realdata-training
4. dataset --> written in config

Results:
https://robomimic.github.io/study/


In [None]:
!ls /content/drive/MyDrive/01_research/ICRA2023-cybersecurity/error-awareness-detector/

lift_ph_low_dim_epoch_1000_succ_100.pth  robomimic_data    robomimic-pre  rollout.mp4
robomimic				 robomimic_output  robosuite


In [None]:
# 50 rollouts with max horizon 400 and render agentview and wrist camera images to video
!python robomimic/robomimic/scripts/run_trained_agent.py --agent lift_ph_low_dim_epoch_1000_succ_100.pth --n_rollouts 50 --horizon 400 --seed 0 --video_path robomimic_data/ --camera_names agentview robot0_eye_in_hand

{
    "algo_name": "bc",
    "experiment": {
        "name": "core_bc_rnn_lift_ph_low_dim",
        "validate": true,
        "logging": {
            "terminal_output_to_txt": true,
            "log_tb": true
        },
        "save": {
            "enabled": true,
            "every_n_seconds": null,
            "every_n_epochs": 50,
            "epochs": [],
            "on_best_validation": false,
            "on_best_rollout_return": false,
            "on_best_rollout_success_rate": true
        },
        "epoch_every_n_steps": 100,
        "validation_epoch_every_n_steps": 10,
        "env": null,
        "additional_envs": null,
        "render": false,
        "render_video": true,
        "keep_all_videos": false,
        "video_skip": 5,
        "rollout": {
            "enabled": true,
            "n": 50,
            "horizon": 400,
            "rate": 50,
            "warmstart": 0,
            "terminate_on_success": true
        }
    },
    "train": {
        "data":

In [None]:
import argparse
import json
import h5py
import imageio
import numpy as np
import os
from copy import deepcopy

import torch

import robomimic
import robomimic.utils.file_utils as FileUtils
import robomimic.utils.torch_utils as TorchUtils
import robomimic.utils.tensor_utils as TensorUtils
import robomimic.utils.obs_utils as ObsUtils
from robomimic.envs.env_base import EnvBase
from robomimic.algo import RolloutPolicy

import urllib.request

ModuleNotFoundError: ignored

### Download policy checkpoint
First, let's try downloading a pretrained model from our model zoo.

In [None]:
# Get pretrained checkpooint from the model zoo

ckpt_path = "lift_ph_low_dim_epoch_1000_succ_100.pth"
# Lift (Proficient Human)
urllib.request.urlretrieve(
    "http://downloads.cs.stanford.edu/downloads/rt_benchmark/model_zoo/lift/bc_rnn/lift_ph_low_dim_epoch_1000_succ_100.pth",
    filename=ckpt_path
)

assert os.path.exists(ckpt_path)

NameError: ignored

In [None]:
if os.path.exists(ckpt_path):
    abs_ckpt_path = os.path.abspath(ckpt_path)
    print(f"The absolute path of the checkpoint is: {abs_ckpt_path}")

The absolute path of the checkpoint is: /content/drive/MyDrive/01_research/ICRA2023-cybersecurity/error-awareness-detector/lift_ph_low_dim_epoch_1000_succ_100.pth


### Loading trained policy
We have a convenient function called `policy_from_checkpoint` that takes care of building the correct model from the checkpoint and load the trained weights. Of course you could also load the checkpoint manually.

In [None]:
device = TorchUtils.get_torch_device(try_to_use_cuda=True)

# restore policy
policy, ckpt_dict = FileUtils.policy_from_checkpoint(ckpt_path=ckpt_path, device=device, verbose=True)

{
    "algo_name": "bc",
    "experiment": {
        "name": "core_bc_rnn_lift_ph_low_dim",
        "validate": true,
        "logging": {
            "terminal_output_to_txt": true,
            "log_tb": true
        },
        "save": {
            "enabled": true,
            "every_n_seconds": null,
            "every_n_epochs": 50,
            "epochs": [],
            "on_best_validation": false,
            "on_best_rollout_return": false,
            "on_best_rollout_success_rate": true
        },
        "epoch_every_n_steps": 100,
        "validation_epoch_every_n_steps": 10,
        "env": null,
        "additional_envs": null,
        "render": false,
        "render_video": true,
        "keep_all_videos": false,
        "video_skip": 5,
        "rollout": {
            "enabled": true,
            "n": 50,
            "horizon": 400,
            "rate": 50,
            "warmstart": 0,
            "terminate_on_success": true
        }
    },
    "train": {
        "data":

### Creating rollout envionment
The policy checkpoint also contains sufficient information to recreate the environment that it's trained with. Again, you may manually create the environment.

In [None]:
# create environment from saved checkpoint
env, _ = FileUtils.env_from_checkpoint(
    ckpt_dict=ckpt_dict,
    render=False, # we won't do on-screen rendering in the notebook
    render_offscreen=True, # render to RGB images for video
    verbose=True,
)

### Define the rollout loop
Now let's define the main rollout loop. The loop runs the policy to a target `horizon` and optionally writes the rollout to a video.

In [None]:
def rollout(policy, env, horizon, render=False, video_writer=None, video_skip=5, camera_names=None):
    """
    Helper function to carry out rollouts. Supports on-screen rendering, off-screen rendering to a video,
    and returns the rollout trajectory.
    Args:
        policy (instance of RolloutPolicy): policy loaded from a checkpoint
        env (instance of EnvBase): env loaded from a checkpoint or demonstration metadata
        horizon (int): maximum horizon for the rollout
        render (bool): whether to render rollout on-screen
        video_writer (imageio writer): if provided, use to write rollout to video
        video_skip (int): how often to write video frames
        camera_names (list): determines which camera(s) are used for rendering. Pass more than
            one to output a video with multiple camera views concatenated horizontally.
    Returns:
        stats (dict): some statistics for the rollout - such as return, horizon, and task success
    """
    assert isinstance(env, EnvBase)
    assert isinstance(policy, RolloutPolicy)
    assert not (render and (video_writer is not None))

    policy.start_episode()
    obs = env.reset()
    state_dict = env.get_state()

    # hack that is necessary for robosuite tasks for deterministic action playback
    obs = env.reset_to(state_dict)

    results = {}
    video_count = 0  # video frame counter
    total_reward = 0.
    try:
        for step_i in range(horizon):

            # get action from policy
            act = policy(ob=obs)

            # play action
            next_obs, r, done, _ = env.step(act)

            # compute reward
            total_reward += r
            success = env.is_success()["task"]

            # visualization
            if render:
                env.render(mode="human", camera_name=camera_names[0])
            if video_writer is not None:
                if video_count % video_skip == 0:
                    video_img = []
                    for cam_name in camera_names:
                        video_img.append(env.render(mode="rgb_array", height=512, width=512, camera_name=cam_name))
                    video_img = np.concatenate(video_img, axis=1) # concatenate horizontally
                    video_writer.append_data(video_img)
                video_count += 1

            # break if done or if success
            if done or success:
                break

            # update for next iter
            obs = deepcopy(next_obs)
            state_dict = env.get_state()

    except env.rollout_exceptions as e:
        print("WARNING: got rollout exception {}".format(e))

    stats = dict(Return=total_reward, Horizon=(step_i + 1), Success_Rate=float(success))

    return stats


### Run the policy
Now let's rollout the policy!

In [None]:
rollout_horizon = 400
np.random.seed(0)
torch.manual_seed(0)
video_path = "rollout.mp4"
video_writer = imageio.get_writer(video_path, fps=20)

In [None]:
stats = rollout(
    policy=policy,
    env=env,
    horizon=rollout_horizon,
    render=False,
    video_writer=video_writer,
    video_skip=5,
    camera_names=["agentview"]
)
print(stats)
video_writer.close()

NameError: ignored

### Visualize the rollout

In [None]:
from IPython.display import Video
Video(video_path)

# 3. Build a simple behavior cloning model

Follows the default hyperparameter in `robomimic/config/bc_config.py`.

In [None]:
import numpy as np
import torch
from torch.utils.data import DataLoader

import robomimic

import robomimic.utils.obs_utils as ObsUtils
import robomimic.utils.torch_utils as TorchUtils
import robomimic.utils.test_utils as TestUtils
import robomimic.utils.file_utils as FileUtils
import robomimic.utils.train_utils as TrainUtils
from robomimic.utils.dataset import SequenceDataset

from robomimic.config import config_factory
from robomimic.algo import algo_factory

In [None]:
def get_example_model(dataset_path, device):
    """
    Use a default config to construct a BC model.
    """

    # default BC config
    config = config_factory(algo_name="bc")

    # read config to set up metadata for observation modalities (e.g. detecting rgb observations)
    ObsUtils.initialize_obs_utils_with_config(config)

    # read dataset to get some metadata for constructing model
    # all_obs_keys determines what observations we will feed to the policy
    shape_meta = FileUtils.get_shape_metadata_from_dataset(
        dataset_path=dataset_path,
        all_obs_keys=sorted((
            "ee_pose",  # robot end effector position
            "q",   # robot end effector rotation (in quaternion)
            "gripper_position",   # parallel gripper joint position
            # "object",  # object information
            "image",
            "image_wrist"
        )),
    )

    # make BC model
    model = algo_factory(
        algo_name=config.algo_name,
        config=config,
        obs_key_shapes=shape_meta["all_shapes"],
        ac_dim=shape_meta["ac_dim"],
        device=device,
    )
    return model

In [None]:
print(dataset_path)

/content/drive/MyDrive/01_research/ICRA2023-cybersecurity/error-awareness-detector/robomimic_data/can_real_image.hdf5


In [None]:
device = TorchUtils.get_torch_device(try_to_use_cuda=True)
model = get_example_model(dataset_path, device=device)

print(model)



using obs modality: low_dim with keys: ['robot0_gripper_qpos', 'robot0_eef_quat', 'robot0_eef_pos', 'object']
using obs modality: rgb with keys: []
using obs modality: depth with keys: []
using obs modality: scan with keys: []
ObservationKeyToModalityDict: ee_pose not found, adding ee_pose to mapping with assumed low_dim modality!
ObservationKeyToModalityDict: gripper_position not found, adding gripper_position to mapping with assumed low_dim modality!
ObservationKeyToModalityDict: image not found, adding image to mapping with assumed low_dim modality!
ObservationKeyToModalityDict: image_wrist not found, adding image_wrist to mapping with assumed low_dim modality!
ObservationKeyToModalityDict: q not found, adding q to mapping with assumed low_dim modality!




ObservationKeyToModalityDict: action not found, adding action to mapping with assumed low_dim modality!
BC (
  ModuleDict(
    (policy): ActorNetwork(
        action_dim=7
  
        encoder=ObservationGroupEncoder(
            group=obs
            ObservationEncoder(
                output_shape=[0]
            )
        )
  
        mlp=MLP(
            input_dim=0
            output_dim=1024
            layer_dims=(1024,)
            layer_func=Linear
            dropout=None
            act=ReLU
            output_act=ReLU
        )
  
        decoder=ObservationDecoder(
            Key(
                name=action
                shape=(7,)
                modality=low_dim
                net=(Linear(in_features=1024, out_features=7, bias=True))
            )
        )
    )
  )
)


## 4. Build a simple training loop

Here we build a simple data loader pipeline and a training loop. Note that this code snippet is only instructional and is a stripped-down version of robomimic's main training loop (`robomimic/scripts/train.py`).

In [None]:
!ls /content/drive/MyDrive/01_research/ICRA2023-cybersecurity/error-awareness-detector/robomimic/examples


add_new_modality.py  simple_config.py	 simple_train_loop.py
notebooks	     simple_obs_nets.py  train_bc_rnn.py


Generate a config file

https://robomimic.github.io/docs/tutorials/hyperparam_scan.html
https://robomimic.github.io/docs/modules/configs.html

https://github.com/ARISE-Initiative/robomimic/blob/master/examples/train_bc_rnn.py
https://github.com/ARISE-Initiative/robomimic/blob/master/robomimic/config/base_config.py
https://github.com/ARISE-Initiative/robomimic/tree/master/robomimic/config

In [None]:
!ls /content/drive/MyDrive/01_research/ICRA2023-cybersecurity/error-awareness-detector/robomimic

docs		 LICENSE      requirements-docs.txt  robomimic.egg-info  train_realdata.ipynb
examples	 MANIFEST.in  requirements.txt	     setup.py		 train_realdata_v1.ipynb
get_start.ipynb  README.md    robomimic		     tests


In [None]:
!python3 robomimic/examples/train.py --name real_data --config robomimic/config/ --dataset robomimic_data/can_real_image.hdf5  --algo

python3: can't open file '/content/drive/MyDrive/01_research/ICRA2023-cybersecurity/error-awareness-detector/robomimic/robomimic/examples/train.py': [Errno 2] No such file or directory


In [None]:
"""
WARNING: This code snippet is only for instructive purposes, and is missing several useful
         components used during training such as logging and rollout evaluation.
"""
def get_data_loader(dataset_path):
    """
    Get a data loader to sample batches of data.
    Args:
        dataset_path (str): path to the dataset hdf5
    """
    dataset = SequenceDataset(
        hdf5_path=dataset_path,
        obs_keys=(                      # observations we want to appear in batches
            "robot0_eef_pos",
            "robot0_eef_quat",
            "robot0_gripper_qpos",
            "object",
        ),
        dataset_keys=(                  # can optionally specify more keys here if they should appear in batches
            "actions",
            "rewards",
            "dones",
        ),
        load_next_obs=True,
        frame_stack=1,
        seq_length=10,                  # length-10 temporal sequences
        pad_frame_stack=True,
        pad_seq_length=True,            # pad last obs per trajectory to ensure all sequences are sampled
        get_pad_mask=False,
        goal_mode=None,
        hdf5_cache_mode="all",          # cache dataset in memory to avoid repeated file i/o
        hdf5_use_swmr=True,
        hdf5_normalize_obs=False,
        filter_by_attribute=None,       # can optionally provide a filter key here
    )
    print("\n============= Created Dataset =============")
    print(dataset)
    print("")

    data_loader = DataLoader(
        dataset=dataset,
        sampler=None,       # no custom sampling logic (uniform sampling)
        batch_size=100,     # batches of size 100
        shuffle=True,
        num_workers=0,
        drop_last=True      # don't provide last batch in dataset pass if it's less than 100 in size
    )
    return data_loader


def run_train_loop(model, data_loader, num_epochs=50, gradient_steps_per_epoch=100):
    """
    Note: this is a stripped down version of @TrainUtils.run_epoch and the train loop
    in the train function in train.py. Logging and evaluation rollouts were removed.
    Args:
        model (Algo instance): instance of Algo class to use for training
        data_loader (torch.utils.data.DataLoader instance): torch DataLoader for
            sampling batches
    """
    # ensure model is in train mode
    model.set_train()

    for epoch in range(1, num_epochs + 1): # epoch numbers start at 1

        # iterator for data_loader - it yields batches
        data_loader_iter = iter(data_loader)

        # record losses
        losses = []

        for _ in range(gradient_steps_per_epoch):

            # load next batch from data loader
            try:
                batch = next(data_loader_iter)
            except StopIteration:
                # data loader ran out of batches - reset and yield first batch
                data_loader_iter = iter(data_loader)
                batch = next(data_loader_iter)

            # process batch for training
            input_batch = model.process_batch_for_training(batch)

            # forward and backward pass
            info = model.train_on_batch(batch=input_batch, epoch=epoch, validate=False)

            # record loss
            step_log = model.log_info(info)
            losses.append(step_log["Loss"])

        # do anything model needs to after finishing epoch
        model.on_epoch_end(epoch)

        print("Train Epoch {}: Loss {}".format(epoch, np.mean(losses)))


## 5. Run policy training

Using the model and the training loop defined above. Note that this simple training loop does not save checkpoint. For model checkpointing, take a look at the full-feature [training loop](https://github.com/ARISE-Initiative/robomimic/blob/master/robomimic/scripts/train.py#L290) and the [documentation](https://robomimic.github.io/docs/tutorials/viewing_results.html)

In [None]:
# get dataset loader
data_loader = get_data_loader(dataset_path=dataset_path)

# run training loop
run_train_loop(model=model, data_loader=data_loader, num_epochs=50, gradient_steps_per_epoch=100)

## 6. Evaluate and visualize trained policy

Here we execute the trained policy `model` in a simulated environment and play the rollout video.

In [None]:
# create simulation environment

import robomimic.utils.env_utils as EnvUtils

env_meta = FileUtils.get_env_metadata_from_dataset(dataset_path)

env = EnvUtils.create_env_from_metadata(
    env_meta=env_meta,
    env_name=env_meta["env_name"],
    render=False,
    render_offscreen=True,
    use_image_obs=False,
)

Created environment with name Lift
Action size is 7


In [None]:
from robomimic.algo import RolloutPolicy
from robomimic.utils.train_utils import run_rollout
import imageio

# create a thin wrapper around the model to interact with the environment
policy = RolloutPolicy(model)

# create a video writer
video_path = "rollout.mp4"
video_writer = imageio.get_writer(video_path, fps=20)

# run rollout
rollout_log = run_rollout(
    policy=policy,
    env=env,
    horizon=200,
    video_writer=video_writer,
    render=False
)

video_writer.close()
# print rollout results
print(rollout_log)

{'Return': 51.0, 'Horizon': 200, 'Success_Rate': 1.0}


In [None]:
# visualize rollout video

from IPython.display import HTML
from base64 import b64encode

mp4 = open(video_path, "rb").read()
data_url = "data:video/mp4;base64," + b64encode(mp4).decode()
HTML(f"""
<video width=400 controls>
      <source src="{data_url}" type="video/mp4">
</video>
""")