Skip to content

v0.2.0

Compare
Choose a tag to compare
@cremebrule cremebrule released this 17 Dec 20:04
62ed2de

robomimic 0.2.0 Release Notes

Highlights

This release of robomimic brings integrated support for mobile manipulation datasets from the recent MOMART paper, and adds modular features for easily modifying and adding custom observation modalities and corresponding encoding networks.

MOMART Datasets

We have added integrated support for MOMART datasets, a large-scale set of multi-stage, long-horizon mobile manipulation task demonstrations in a simulated kitchen environment collected in iGibson.

Using MOMART Datasets

Datasets can be easily downloaded using download_momart_datasets.py.

For step-by-step instructions for setting up your machine environment to visualize and train with the MOMART datasets, please visit the Getting Started page.

Modular Observation Modalities

We also introduce modular features for easily modifying and adding custom observation modalities and corresponding encoding networks. A modality corresponds to a group of specific observations that should be encoded the same way.

Default Modalities

robomimic natively supports the following modalities (expected size from a raw dataset shown, excluding the optional leading batch dimension):

  • rgb (H, W, 3): Standard 3-channel color frames with values in range [0, 255]
  • depth (H, W, 1): 1-channel frame with normalized values in range [0, 1]
  • low_dim (N): low dimensional observations, e.g.: proprioception or object states
  • scan (1, N): 1-channel, single-dimension data from a laser range scanner

We have default encoder networks which can be configured / modified by setting relevant parameters in your config, e.g.:

# These keys should exist in your dataset
config.observation.modalities.obs.rgb = ["cam1", "cam2", "cam3"]    # Add camera observations to the RGB modality
config.observation.modalities.obs.low_dim = ["proprio", "object"]   # Add proprioception and object states to low dim modality
...

# Now let's modify the default RGB encoder network and set the feature dimension size
config.observation.encoder.rgb.core_kwargs.feature_dimension = 128
...

To see the structure of the observation modalities and encoder parameters, please see the base config module.

Custom Modalities

You can also easily add your own modality and corresponding custom encoding network! Please see our example add_new_modality.py.

Refactored Config Structure

With the introduction of modular modalities, our Config class structure has been modified slightly, and will likely cause breaking changes to any configs you have created using version 0.1.0. Below, we describe the exact changes in the config that need to be updated to match the current structure:

Observation Modalities

The image modality have been renamed to rgb. Thus, you would need to change your config in any places referencing image modality, e.g.:

# Old format
config.observation.modalities.image.<etc>

# New format
config.observation.modalities.rgb.<etc>

The low_dim modality remains unchanged. Note, however, that we have additionally added integrated support for both depth and scan modalities, and can be referenced in the same way, e.g.:

config.observation.modalities.depth.<etc>
config.observation.modalities.scan.<etc>

Observation Encoders / Randomizer Networks

We have modularized the encoder / randomizer arguments so that they are general, and are unique to each type of observation modality. All of the original arguments in v0.1.0 have been preserved, but are now re-formatted as follows:

############# OLD ##############

# Previously, a single set of arguments were specified, and was hardcoded to process image (rgb) observations

# Assumes that you're using the VisualCore class, not general!
config.observation.encoder.visual_feature_dimension = 64
config.observation.encoder.visual_core = 'ResNet18Conv'
config.observation.encoder.visual_core_kwargs.pretrained = False
config.observation.encoder.visual_core_kwargs.input_coord_conv = False

# For pooling, is hardcoded to use spatial softmax or not, not general!
config.observation.encoder.use_spatial_softmax = True
# kwargs for spatial softmax layer
config.observation.encoder.spatial_softmax_kwargs.num_kp = 32
config.observation.encoder.spatial_softmax_kwargs.learnable_temperature = False
config.observation.encoder.spatial_softmax_kwargs.temperature = 1.0
config.observation.encoder.spatial_softmax_kwargs.noise_std = 0.0


############# NEW ##############

# Now, argument names are general (network-agnostic), and are specified per modality!

# Example for RGB, to reproduce the above configuration

# The core encoder network can be arbitrarily specified!
config.observation.encoder.rgb.core_class = "VisualCore"

# Corresponding kwargs that should be passed to the core class are specified below
config.observation.encoder.rgb.core_kwargs.feature_dimension = 64
config.observation.encoder.rgb.core_kwargs.backbone_class = "ResNet18Conv"
config.observation.encoder.rgb.core_kwargs.backbone_kwargs.pretrained = False
config.observation.encoder.rgb.core_kwargs.backbone_kwargs.input_coord_conv = False

# The pooling class can also arbitrarily be specified!
config.observation.encoder.rgb.core_kwargs.pool_class = "SpatialSoftmax"

# Corresponding kwargs that should be passed to the pooling class are specified below
config.observation.encoder.rgb.core_kwargs.pool_kwargs.num_kp = 32
config.observation.encoder.rgb.core_kwargs.pool_kwargs.learnable_temperature = False
config.observation.encoder.rgb.core_kwargs.pool_kwargs.temperature = 1.0
config.observation.encoder.rgb.core_kwargs.pool_kwargs.noise_std = 0.0

Thankfully, the observation randomization network specifications were already modularized, but were hardcoded to process image (rgb) modality only. Thus, the only change we made is to allow the randomization kwargs to be specified per modality:

############# OLD ##############
# Previously, observation randomization was hardcoded for image / rgb modality
config.observation.encoder.obs_randomizer_class = None
config.observation.encoder.obs_randomizer_kwargs.crop_height = 76
config.observation.encoder.obs_randomizer_kwargs.crop_width = 76
config.observation.encoder.obs_randomizer_kwargs.num_crops = 1
config.observation.encoder.obs_randomizer_kwargs.pos_enc = False

############# NEW ##############

# Now, the randomization arguments are specified per modality. An example for RGB is shown below
config.observation.encoder.rgb.obs_randomizer_class = None
config.observation.encoder.rgb.obs_randomizer_kwargs.crop_height = 76
config.observation.encoder.rgb.obs_randomizer_kwargs.crop_width = 76
config.observation.encoder.rgb.obs_randomizer_kwargs.num_crops = 1
config.observation.encoder.rgb.obs_randomizer_kwargs.pos_enc = False

You can also view the default configs and compare your config to these templates to view exact diffs in structure.