Skip to content

Latest commit

 

History

History
55 lines (38 loc) · 2.07 KB

README.md

File metadata and controls

55 lines (38 loc) · 2.07 KB

PuppyGo

Vision language model and large language model powered embodied agent.

Here’s what I did:

  • Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents
  • extracts affordances and constraints from large language models and vision-language models to compose 3D value maps, which are used by motion planners to zero-shot synthesize trajectories for everyday manipulation tasks.
  • combine with e2e large model trainning framework, like UniAD;

This Package Is Sponsorware 💰💰💰

https://github.com/sponsors/Charmve?frequency=one-time&sponsor=Charmve

This repo was only available to my sponsors on GitHub Sponsors until I reached 15 sponsors.

Learn more about Sponsorware at github.com/sponsorware/docs 💰.

image


Execution under Disturbances

Because the language model output stays the same throughout the task, we can cache its output and re-evaluate the generated code using closed-loop visual feedback, which enables fast replanning using MPC. This enables VoxPoser to be robust to online disturbances.

"Sort the paper trash into the blue tray."

"Close the top drawer."