TCoT: Trajectory Chain-of-thoughts for Robotic Manipulation with Failure Recovery in Vision-Language-Action Model

we introduce TCoT, a unified VLA framework that enhances this direct mapping with trajectory planning as well as failure detection and recovery. TCoT leverages hierarchy trajectories as a precise and compact representation of CoT reasoning for manipulation: global planning provides a high-level, goal-oriented trajectory to guide the robot toward its task objective, while local planning focuses on real-time adjustments to address dynamic changes. Moreover, we designed the Global-Local Switching Recovery algorithm that detects and effectively recovers from failures.

Our codebase is built on top of OpenVLA and ECoT. We refer to it for the detailed documentation of the code and dependencies.

Quickstart

The global and local trajectory genreate by TCoT on libero looks like this:

We provide a Colab notebook containing code for loading up our TCoT policy and using it to generate reasoning and actions in response to an observation. Loading the model for inference is easy:

from transformers import AutoModelForVision2Seq, AutoProcessor

device = "cuda"
path_to_hf = "TCoT/tcot-openvla-7b-libero"
processor = AutoProcessor.from_pretrained(path_to_hf, trust_remote_code=True)
vla = AutoModelForVision2Seq.from_pretrained(path_to_hf, torch_dtype=torch.bfloat16).to(device)

observation = <ROBOT IMAGE OBSERVATION HERE>
instruction = <YOUR INSTRUCTION HERE>
prompt = "A chat between a curious user and an artificial intelligence assistant. " + \
    "The assistant gives helpful, detailed, and polite answers to the user's questions. " + \
    f"USER: What action should the robot take to {instruction.lower()}? ASSISTANT: TASK:"

inputs = processor(prompt, image).to(device, dtype=torch.bfloat16)
action, generated_ids = vla.predict_action(**inputs, unnorm_key="libero", max_new_tokens=1024)
generated_text = processor.batch_decode(generated_ids)[0]

The standard model in torch.bfloat16 requires 16 GB of GPU memory, but using bitsandbytes and 4-bit quantization lowers memory usage to around 5 GB. See the Colab for more details.

Training and Evaluation

To train the models, from scratch use the following command:

bash ./vla-scripts/finetune_tcot.sh

To evaluate the model on the libero,

python experiments/robot/libero/run_libero_eval_tcot_globallocal.py + args

To evaluate the model on real robot arm (AIRBOT),

client:
  python experiments/robot/airbot/airbot_client_aug_tcot.py + args
server:
  pyton experiments/robot/airbot/deploy_tcot.py + args

Pretrained models

We release two TCoT models trained as part of our work, and the dataset of trajectory-based reasonings, available on our HuggingFace page:

libero_spatial_trajectory: The trajectory-based reasoning dataset for libero-spatial dataset.
libero_goal_trajectory: The trajectory-based reasoning dataset for libero-goal dataset.
libero_object_trajectory: The trajectory-based reasoning dataset for libero-object dataset.
libero_10_trajectory: The trajectory-based reasoning dataset for libero-10 dataset.

Explicit Notes on Model Licensing & Commercial Use: While all code in this repository is released under an MIT License, our pretrained models may inherit restrictions from the underlying base models we use. Specifically, both the above models are derived from Llama-2, and as such are subject to the Llama Community License.

Installation

See the original OpenVLA repository for detailed installation instructions.

Repository Structure

High-level overview of repository/project file-tree:

prismatic - Package source; provides core utilities for model loading, training, data preprocessing, etc.
experiments - Code for evaluating the policies on a WidowX robot.
vla-scripts/ - Core scripts for training, fine-tuning, and deploying VLAs.
LICENSE - All code is made available under the MIT License; happy hacking!
Makefile - Top-level Makefile (by default, supports linting - checking & auto-fix); extend as needed.
pyproject.toml - Full project configuration details (including dependencies), as well as tool configurations.
README.md - You are here!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TCoT: Trajectory Chain-of-thoughts for Robotic Manipulation with Failure Recovery in Vision-Language-Action Model

Quickstart

Training and Evaluation

Pretrained models

Installation

Repository Structure

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
experiments		experiments
media		media
prismatic		prismatic
scripts/generate_trajectory		scripts/generate_trajectory
vla-scripts		vla-scripts
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
requirements-min.txt		requirements-min.txt

License

Serenos/TCoT

Folders and files

Latest commit

History

Repository files navigation

TCoT: Trajectory Chain-of-thoughts for Robotic Manipulation with Failure Recovery in Vision-Language-Action Model

Quickstart

Training and Evaluation

Pretrained models

Installation

Repository Structure

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages