Skip to content

L1ziang/Light-WAM

Repository files navigation

Light-WAM

Codebase for Light-WAM: Efficient World Action Models with State-Fusion Action Decoding. This repository provides training and evaluation pipelines for Light-WAM on LIBERO and RoboTwin2.0.

What Is Light-WAM?

Light-WAM overview

Light-WAM is a lightweight World Action Model for robot manipulation, centered on:

  • Wan2.1-T2V-1.3B as a frozen video backbone
  • lightweight adapters and LoRA updates
  • future-video supervision in downsampled latent space
  • learned-query pooling over adapted states
  • StateFusionActionExpert for action-chunk decoding

Repository Layout

LightWAM/
├── configs/                 # Training and evaluation configs
├── scripts/                 # Main CLI entrypoints
├── experiments/             # LIBERO, RoboTwin2.0, and real-robot
├── src/lightwam/            # Model and dataset code
├── third_party/             # Simulation dependencies adapted from Fast-WAM
├── checkpoints/             # Wan weights and released checkpoints
├── data/                    # Datasets and caches
└── runs/                    # Training outputs

Environment Setup

conda create -n lightwam python=3.10 -y
conda activate lightwam
pip install -U pip
pip install torch==2.7.1+cu128 torchvision==0.22.1+cu128 --extra-index-url https://download.pytorch.org/whl/cu128
pip install -e .

FFmpeg Libraries for Precompute

sudo apt-get update
sudo apt-get install -y ffmpeg libavutil-dev libavcodec-dev libavformat-dev libswscale-dev

Backbone Preparation

Set the Wan checkpoint directory first:

mkdir -p checkpoints
export DIFFSYNTH_MODEL_BASE_PATH="$(pwd)/checkpoints"

Hugging Face download command:

huggingface-cli download Wan-AI/Wan2.1-T2V-1.3B   --local-dir checkpoints/Wan-AI/Wan2.1-T2V-1.3B

Data and Precompute

Raw datasets used by this repo come from Fast-WAM:

Expected local layout:

data/
├── libero_mujoco3.3.2/
│   ├── libero_10_no_noops_lerobot/
│   ├── libero_goal_no_noops_lerobot/
│   ├── libero_object_no_noops_lerobot/
│   └── libero_spatial_no_noops_lerobot/
└── robotwin2.0/
    └── robotwin2.0/
        ├── data/
        ├── meta/
        └── videos/

Offline cache release:

To restore locally:

cat robotwin_3cam384_sharded.tar.part-* | tar -xf -

Expected local layout:

data/
├── latent_cache_Wan2.1-T2V-1.3B/
│   ├── libero_spatial_2cam224/
│   ├── libero_object_2cam224/
│   ├── libero_goal_2cam224/
│   ├── libero_10_2cam224/
│   └── robotwin_3cam384_sharded/
└── text_embeds_cache/
    ├── libero/
    └── robotwin/   # generate locally with the text-only cache command below

Precompute commands:

LIBERO_SUITE=spatial bash scripts/precompute_libero.sh
LIBERO_SUITE=object  bash scripts/precompute_libero.sh
LIBERO_SUITE=goal    bash scripts/precompute_libero.sh
LIBERO_SUITE=10      bash scripts/precompute_libero.sh
bash scripts/precompute_robotwin.sh

These commands generate text caches and offline future-video latent caches

Text-only cache commands:

LIBERO_SUITE=spatial RUN_TEXT=true RUN_VIDEO=false bash scripts/precompute_libero.sh
RUN_TEXT=true RUN_VIDEO=false bash scripts/precompute_robotwin.sh

Training

bash scripts/train_libero_spatial.sh
bash scripts/train_libero_object.sh
bash scripts/train_libero_goal.sh
bash scripts/train_libero_10.sh
bash scripts/train_robotwin.sh

Evaluation

Released checkpoints:

Evaluation environment configs:

conda env create -f ./scripts/libero_env.full.yml
conda activate lightwam-libero-eval
pip install -e . --no-deps

conda env create -f ./scripts/robotwin_env.full.yml
conda activate lightwam-robotwin-eval
pip install -e . --no-deps

For LIBERO evaluation, install the official LIBERO package first, then set:

export LIBERO_ROOT=/path/to/LIBERO
export PYTHONPATH="${LIBERO_ROOT}:${PYTHONPATH:-}"

Evaluation commands:

CKPT=/path/to/checkpoints/weights/xxxx.pt bash scripts/eval_libero.sh
CKPT=/path/to/checkpoints/weights/xxxx.pt bash scripts/eval_robotwin.sh

Citation

Acknowledgements

This codebase is primarily based on Fast-WAM. We thank the Fast-WAM project for open-sourcing a strong and practical foundation.

About

Codebase for Light-WAM: Efficient World Action Models with State-Fusion Action Decoding.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors