Mimictest

A simple testbed for robotics manipulation policies based on robomimic. All policies are rewritten in a simple way. We may further expand it to the robocasa benchmark, which is also based on robosuite simulator.

We also have policies trained and tested on the CALVIN benchmark, e.g., GR1-Training which is the current SOTA on the hardest ABC->D task of CALVIN.

We also recommend other good frameworks / comunities for robotics policy learning.

HuggingFace's LeRobot, which currently have ACT, Diffusion Policy (only simple pusht task), TDMPC, and VQ-BeT. LeRobot has a nice robotics learning community on this discord server.
CleanDiffuser which implements multiple diffusion algorithms for imitation learning and reinforcement learning. Our implementation of diffusion algorithms is different from CleanDiffuser, but we thank the help of their team members.
Dr. Mu Yao organizes a nice robitics learning community for Chinese researchers, see DeepTimber website and 知乎.

Please remember we build systems for you ヾ(^▽^*)). Feel free to ask @StarCycle if you have any question!

News

[2024.11.1] Update the performance of PushT environment.

[2024.8.9] Several updates below. And we are merging the base Florence policy to HuggingFace LeRobot.

Add Florence policy with DiT diffusion action head from MDT developed by intuitive robot lab at KIT.
Switch from tensorboard to wandb.
Heavily optimize training speed of Florence-series models.
Support compilation.

Profiling result of compiled Florence MDT DiT policy

[2024.7.30] Add Florence policy with MLP action head & diffusion transformer action head (from Cheng Chi's Diffusion policy). Add RT-1 policy.

[2024.7.16] Add transformer version of Diffusion Policy.

[2024.7.15] Initial release which only contains UNet version of Diffusion Policy.

Features

Unified State and Action Space.

All policies share the same data pre-processing pipeline and predict actions in 3D Cartesian translation + 6D rotation + gripper open/close. The 3D translation can be relative to current gripper position (abs_mode=False) or the world coordinate (abs_mode=True).
They perceive obs_horizon historical observations, generate chunk_size future actions, and execute test_chunk_size predicted actions. An example with obs_horizon=3, chunk_size=4, test_chunk_size=2:

Policy sees: 		|o|o|o|
Policy predicts: 	| | |a|a|a|a|
Policy executes:	| | |a|a|

They use image input from both static and wrist cameras.

Multi-GPU training and simulation.

We achieve multi-GPU / multi-machine training with HuggingFace accelerate.
We achieve parallel simulation with asynchronized environment provided by stable-baseline3. In practice, we train and evaluate the model on multiple GPUs. For each GPU training process, there are several parallel environments running on different CPU.

Optimizing data loading pipeline and profiling.

We implement a simple GPU data prefetching mechanism.
Image preprocessing are performed on GPU, instead of CPU.
You can perform detailed profiling of the training pipeline by setting do_profile=True and check the trace log with torch_tb_profiler. Introduction to the pytorch profiler.

Sorry...but you should tune the learning rate manually.

We try new algorithms here so we are not sure when the algorithm will converge before we run it. Thus, we use a simple constant learning rate schduler with warmup. To get the best performance, you should set the learning rate manually: a high learning rate at the beginning and a lower learning rate at the end.
Sometimes you need to freeze the visual encoder at the first training stage, and unfreeze the encoder when the loss converges in the first stage. It's can be done by setting freeze_vision_tower=<True/False> in the script.

Supported Policies

We implement the following algorithms:

Google's RT1.

Original implementation.
Our implementation supports EfficientNet v1/v2 and you can directly load pretrained weights by torchvision API. Google's implementation only supports EfficientNet v1.
You should choose a text encoder in Sentence Transformers to generate text embeddings and sent them to RT1.
Our implementation predicts multiple continuous actions (see above) instead of a single discrete action. We find our setting has better performance.
To get better performance, you should freeze the EfficientNet visual encoder in the 1st training stage, and unfreeze it in the 2nd stage.

Chi Cheng's Diffusion Policy (UNet / Transformer).

Original implementation.
Our architecture is a copy of Chi Cheng's network. We test it in our pipeline and it has the same performance. Note that diffusion policy trains 2 resnet visual encoders for 2 camera views from scratch, so we never freeze the visual encoders.
We also support predict actions in episilon / sample / v-space and other diffusion schedulers. The DiffusionPolicy wrapper can easily adapt to different network designs.

Florence Policy developed on Microsoft's Florence2 VLM, which is trained with VQA, OCR, detection and segmentation tasks on 900M images.

We develop the policy on the pretrained model.
Unlike OpenVLA and RT2, Florence2 is much smaller with 0.23B (Florence-2-base) or 0.7B (Florence-2-large) parameters.
Unlike OpenVLA and RT2 which generate discrete actions, our Florence policy generates continuous actions with a linear action head / a diffusion transformer action head from Cheng Chi's Diffusion Policy / a DiT action head from MDT policy.
The following figure illustrates the architecture of the Florence policy. We always freeze the DaViT visual encoder of Florence2, which is so good that unfreezing it does not improve the success rate.

Original Florence2 Network

Florence policy with a linear action head

Florence policy with a diffusion transformer action head

Performance on Example Task

Square task with professional demos:

Policy	Success Rate	Model Size
RT-1	62%	23.8M
Diffusion Policy (UNet)	88.5%	329M
Diffusion Policy (Transformer)	90.5%	31.5M
Florence (linear head)	88.5%	270.8M
Florence (diffusion head - MDT DiT)	93.75%	322.79M

*The success rate is measured with an average of 3 latest checkpoints. Each checkpoint is evaluated with 96 rollouts. *For diffusion models, we save both the trained model and the exponential moving average (EMA) of the trained model in a checkpoint

PushT task:

Policy	Success Rate	Model Size
RT-1	52%	23.8M
Diffusion Policy (UNet)	64.5%	76M
Florence (linear head)	53%	270.8M
Florence (diffusion head - MDT DiT)	64%	322.79M

*Each checkpoint is evaluated with 96 rollouts.

*A success in the PushT environment requires a final IoU > 95% (which is difficult to locate under low resolution). If you raise the resolution or reduce the threshold, the succes rate will be much higher.

Installation

You can use mirror sites of Github to avoid the connection problem in some regions. With different simulators, it's recommended to use different python versions, which will be mentioned below.

conda create -n mimic python=3.x
conda activate mimic
apt install curl git libgl1-mesa-dev libgl1-mesa-glx libglew-dev libosmesa6-dev software-properties-common net-tools unzip vim virtualenv wget xpra xserver-xorg-dev libglfw3-dev patchelf cmake
git clone https://github.com/EDiRobotics/mimictest
cd mimictest
pip install -e .

Now, depending on the environment and model you want, Please perform the following steps.

For Robomimic experiments.

The recommended python version is 3.9. You need to install robomimic and robosuite via:

pip install pip install robosuite@https://github.com/cheng-chi/robosuite/archive/277ab9588ad7a4f4b55cf75508b44aa67ec171f0.tar.gz
pip install robomimic

Recent robosuite has turned to the DeepMind's Mujoco 3 backend but we are still using the old version with Mujoco 2.1. This is because the dataset is recorded in Mujoco 2.1, which has slighlyly dynamics difference with Mujoco 3.

You should also download dataset that contains robomimic_image.zip or robomimic_lowdim.zip from the official link or HuggingFace. In this example, I use the tool of HF-Mirror. You can set the environment variable export HF_ENDPOINT=https://hf-mirror.com to avoid the connection problem in some regions.

apt install git-lfs aria2
wget https://hf-mirror.com/hfd/hfd.sh
chmod a+x hfd.sh
./hfd.sh EDiRobotics/mimictest_data --dataset --tool aria2c -x 9

If you only want to download a subset of the data, e.g., the square task with image input:

./hfd.sh EDiRobotics/mimictest_data --dataset --tool aria2c -x 9 --include robomimic_image/square.zip

For the PushT experiment.

The recommended python version is 3.19. You can install the environment via

pip install gym-pusht

Then you can download the PushT dataset from the official link.

For Florence-based models.

To use florence-based models, you should download one of it from HuggingFace, for example:

./hfd.sh microsoft/Florence-2-base --model --tool aria2c -x 9

And then set model_path in the script, for example:

# in Script/FlorenceImage.py
model_path = "/path/to/downloaded/florence/folder"

You need to install florence-specific dependencies, e.g., flash-attention. You can achieve it with:

pip install -e .[florence]

Multi-GPU Train & Evaluation

You shall first run accelerate config to set environment parameters (number of GPUs, precision, etc). We recommend to use bf16.
Download and unzip the dataset mentioned above.
Please check and modify the settings (e.g, train or eval, and the corresponding settings) in the scripts you want to run, under the Script directory. Each script represents a configuration of an algorithm.
Please then run

accelerate launch Script/<the script you choose>.py

Possible Installation Problems

`GLIBCXX_3.4.30' not found

ImportError: /opt/conda/envs/test/bin/../lib/libstdc++.so.6: version `GLIBCXX_3.4.30' not found (required by /lib/x86_64-linux-gnu/libLLVM-15.so.1)

You can try conda install -c conda-forge gcc=12.1 which is a magical command that automatically install some dependencies.

Also check this link.

Spend too much time compiling flash-attn

You can download a pre-build wheel from official release, instead of building a wheel by yourself. For example (you should choose a wheel depending on your system):

wget https://github.com/Dao-AILab/flash-attention/releases/download/v2.6.3/flash_attn-2.6.3+cu118torch2.4cxx11abiTRUE-cp39-cp39-linux_x86_64.whl
pip install flash_attn-2.6.3+cu118torch2.4cxx11abiTRUE-cp39-cp39-linux_x86_64.whl

When installing pytorch, make sure the torch cuda version and your cuda driver version are the same (e.g., 11.8).

Cannot initialize a EGL device display

Cannot initialize a EGL device display. This likely means that your EGL driver does not support the PLATFORM_DEVICE extension, which is required for creating a headless rendering context.

You can try conda install -c conda-forge gcc=12.1.

fatal error: GL/osmesa.h: No such file or directory

 /tmp/pip-install-rsxccpmh/mujoco-py/mujoco_py/gl/osmesashim.c:1:23: fatal error: GL/osmesa.h: No such file or directory
    compilation terminated.
    error: command 'gcc' failed with exit status 1

You can try conda install -c conda-forge mesalib glew glfw or check this link.

cannot find -lGL

 /home/ubuntu/anaconda3/compiler_compat/ld: cannot find -lGL
  collect2: error: ld returned 1 exit status
  error: command 'gcc' failed with exit status 1

You can try conda install -c conda-forge mesa-libgl-devel-cos7-x86_64 or check this link.

SystemError: initialization of _internal failed without raising an exception.

You can simply pip -U numba or this link.

ImportError: libGL.so.1: cannot open shared object file

apt-get update && apt-get install ffmpeg libsm6 libxext6  -y

Or check this link.

failed to EGL with glad

The core problem seems to be lack of libEGL.so.1. You may try apt-get update && apt-get install libegl1. If you find other packages not installed during installing libegl1, please install them.

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
README_md_files		README_md_files
mimictest		mimictest
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mimictest

News

Features

Supported Policies

Performance on Example Task

Installation

Multi-GPU Train & Evaluation

Possible Installation Problems

About

Releases

Packages

Languages

License

EDiRobotics/mimictest

Folders and files

Latest commit

History

Repository files navigation

Mimictest

News

Features

Supported Policies

Performance on Example Task

Installation

Multi-GPU Train & Evaluation

Possible Installation Problems

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages