Skip to content

Zero-coder/Prompt2Act

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

30 Commits
Β 
Β 
Β 
Β 

Repository files navigation

🧠 (Undergoing Sorting project process)Prompt2Act: Mapping Prompts into Sequence of Robotic Actions with Large Foundation Models

Official implementation of our system Prompt2Act, which maps open-ended multi-modal prompts into real-world robotic actions via large vision-language models, mixed execution agents, and visual grounding modules.

p2a

🌟 Highlights

  • Multimodal Prompt Understanding: Supports vision-language prompts such as images, sketches, pointing gestures, and free-form instructions.
  • Mixed Execution Agent: Combines predefined symbolic functions and on-the-fly code generation to execute diverse tasks.
  • Visual Grounding via VG-Marker: Automatically identifies objects and assigns semantic anchors from raw scenes using open-vocabulary segmentation + GPT-4o.
  • Supports Novel Tasks: From "Arrange pens by reference photo" to "Pick the toy pointed by hand", with no finetuning needed.
  • Zero-shot generalization to occluded, unseen, and cluttered environments.

πŸ“‚ Project Structure

Prompt2Act/
β”‚
β”œβ”€β”€ prompt2act/                 # Core system modules
β”‚   β”œβ”€β”€ planner/                # LLM-based sequence planner
β”‚   β”œβ”€β”€ executor/               # Mixed Execution Agent (predefined + code-gen)
β”‚   β”œβ”€β”€ visual_grounding/       # VG-Marker
β”‚   └── utils/
β”‚
β”œβ”€β”€ data/                       # Demo trajectories & prompt configs
β”œβ”€β”€ models/                     # LLM/VLM interface wrappers
β”œβ”€β”€ scripts/                    # Evaluation scripts and launchers
└── README.md

Hardware Configuration(Preparing...)

βš™οΈ Installation

We recommend Python 3.10 and Linux/Ubuntu.

# Clone repository
git clone https://github.com/Zero-coder/Prompt2Act.git
cd Prompt2Act

# Create virtual env (optional)
python3 -m venv venv
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

πŸš€ Quick Start

Run a predefined task (e.g., Arrange Pens by Reference) in simulation:

python scripts/run_demo.py \
  --task arrange_pens \
  --prompt examples/prompts/arrange_pens.json

You can also modify the prompt file to test new tasks:

{
  "instruction": "Please arrange these pens as shown in the image.",
  "image": "reference_pens.jpg"
}

🧠 Model & Assets

To fully enable Prompt2Act, you need:

  • GPT-4o / GPT-4V (OpenAI API or local Azure proxy)
  • SAM / Grounded-SAM for segmentation
  • VG-Marker (included, no training needed)
  • Predefined skills: pick, place, rotate, etc.

We provide starter checkpoints and test prompts in /data.


πŸ§ͺ Evaluation

To evaluate Prompt2Act under different generalization axes:

python scripts/eval_benchmark.py \
  --benchmark occlusion \
  --config configs/eval_occlusion.yaml

You can test:

  • Visual generalization (new textures, occlusion)
  • Reasoning (visual constraints, sketch understanding)
  • Embodiment shift (sim vs real)

πŸ“– Citation

If you find this project helpful, please consider citing our paper:

@article{jiang2025prompt2act,
  title={Prompt2Act: Mapping Prompts into Sequence of Robotic Actions with Large Foundation Models},
  author={Maowei Jiang and Qi Wang and Hongfeng Ai and Zhiyong Dong and Yusong Hu and Ao Liang and Yifan Wang and Ruiqi Li and Quangao Liu and Moquan Chen and Peter BuΕ‘ and Long Zeng},
  journal={ Information Fusion (under_revision)},
  year={2025}
}

πŸ“¬ Contact

Maintained by @Zero-coder. Please open issues or pull requests for contributions.

About

Prompt2Act: Transforming Prompts into Sequence of Actions with Large Foundation Model

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published