Vision-language physical reasoning using Qwen3-VL with training data from MuJoCo and PhiFlow simulators. This repository is a clean export of code, analysis, evaluation helpers, and the workshop paper sources. The full private project (checkpoints, large logs, local datasets) may live one level up on your machine; refresh this folder with python scripts/build_github_export.py from the parent project root.
- Synthetic scenes with simulator-derived labels (no human MCQ annotation for training answers).
- LoRA SFT and optional GRPO on a managed API (Tinker); PhysBench-style evaluation scripts.
- Paper LaTeX under
paper/with bundled figures and sample images.
python -m venv .venv
.venv\Scripts\activate # Windows
# source .venv/bin/activate # Linux / macOS
pip install -r requirements.txt
pip install "git+https://github.com/huggingface/transformers.git"See docs/ for training and dataset guides. Do not commit .env or API keys.
See STRUCTURE.md.
Build instructions: paper/README.md.
MIT — see LICENSE.