FrameSkip: Learning from Fewer but More Informative Frames in VLA Training

FrameSkip is a training-time frame selection framework for Vision-Language-Action (VLA) models. Instead of treating every frame in a dense robot demonstration trajectory as equally useful supervision, FrameSkip scores trajectory frames with lightweight cues and trains primarily from fewer but more informative frames.

FrameSkip is designed as a data-layer intervention: it changes which frames are exposed during training while leaving the VLA architecture, action head, loss function, and inference procedure unchanged.

Highlights

Frame-level supervision allocation for VLA training.
Architecture-agnostic dataloader integration with no change to model inference.
Importance-guided frame retention using action variation, visual-action coherence, task-progress priors, and gripper-transition preservation.
Released checkpoints on Hugging Face.

Checkpoints

Download Checkpoints

Model checkpoints are hosted in the Hugging Face collection:

VLyb/frameskip

You can download checkpoints with the Hugging Face CLI:

pip install -U "huggingface_hub[cli]"
huggingface-cli download <checkpoint-repo-name> --local-dir checkpoints/<checkpoint-name>

Replace <checkpoint-repo-name> with the checkpoint repository listed in the collection.

Load and Use Checkpoints

FrameSkip is built on the starVLA training and evaluation stack. The released checkpoints follow the standard starVLA checkpoint format and can be loaded in the same way as starVLA VLA policies.

For simulation evaluation, please refer to the model loading and evaluation workflow of the QwenGR00T architecture in starVLA, and replace the checkpoint path with the downloaded FrameSkip checkpoint.

Quick Start

The code and model checkpoints have been released. The current FrameSkip implementation is located under ./starVLA/starVLA/frameskip. A cleaner and more user-friendly version is being organized; at this stage, we recommend using AI-assisted code reading to navigate the implementation details.

The ./starVLA directory is a full copy of the starVLA project, with additional implementations for FrameSkip-related functionality.

Method Overview

FrameSkip follows a three-stage pipeline:

Score frames in each demonstration trajectory with trajectory-level cues.
Retain high-importance frames under a target retention ratio.
Remap dataloader queries to the retained frame indices during VLA training.

The policy is trained with the standard VLA objective, and inference is unchanged.

Resources

Code: https://github.com/ZGC-EmbodyAI/FrameSkip
Checkpoints: https://huggingface.co/collections/VLyb/frameskip

Citation

If you find FrameSkip useful, please cite:

TODO

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
starVLA		starVLA
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FrameSkip: Learning from Fewer but More Informative Frames in VLA Training

Highlights

Checkpoints

Download Checkpoints

Load and Use Checkpoints

Quick Start

Method Overview

Resources

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

FrameSkip: Learning from Fewer but More Informative Frames in VLA Training

Highlights

Checkpoints

Download Checkpoints

Load and Use Checkpoints

Quick Start

Method Overview

Resources

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages