A three-LoRA-adapter system for cooperative cooking. A single base model (Qwen2.5-7B-Instruct) is fine-tuned with three separate LoRA adapters that each play a distinct role at inference time:
| Adapter | Role |
|---|---|
planner |
High-level subgoal generation in natural language (e.g. "Pick up carrot from counter and put it into pot0.") |
actor |
Low-level action chunking — turns a subgoal into up to 3 ml_actions (e.g. pickup(carrot, ingredient_dispenser)) |
coordinator |
Shared-context generation + critique of planner subgoals (PASS / REJECT + feedback) |
At eval time, the coordinator's REJECT events trigger the planner to refine
its subgoal (up to 2 retries) before handing off to the actor.
End-to-end: clone the repo, pull the pre-trained adapters and data from HuggingFace into the right folders, and run evaluation.
# 1) Clone the project
git clone https://github.com/anonymous-projectpage/DriCo.git
cd DriCo
# 2) Install dependencies
pip install -U huggingface_hub
pip install -r requirements.txt # if a requirements file exists
# (or install torch / transformers / peft / overcooked_ai / wandb manually)
# 3) Pull the LoRA adapters into drico/out/
hf download anonymous-24421/DriCo-adapters \
--local-dir drico/out
# 4) (Optional) pull the training / DPO data into drico/dataset/
hf download anonymous-24421/DriCo \
--repo-type dataset \
--local-dir drico/dataset
# 5) Run evaluation
cd drico
CUDA_VISIBLE_DEVICES=0 python eval_overcooked.py \
--base_model Qwen/Qwen2.5-7B-Instruct \
--adapter_root out/ \
--layout new_env \
--recipe_split test \
--statistics_save_dir data/eval_testAfter step 3, the directory tree looks like:
DriCo/
└── drico/
└── out/
├── planner/ ← LoRA adapter weights
├── actor/
├── coordinator/
└── coordinator_dpo/
That's exactly the layout eval_proposed.py --adapter_root out/ expects, so
step 5 just works.
The training/eval code lives in drico/. This README sits at the repo root.
.
├── README.md # ← this file
├── drico/
│ ├── agent/
│ │ ├── hf_agent.py # Planner+Actor agent (no collab.* deps)
│ │ ├── human_agent.py # Human CLI agent (drop-in replacement)
│ │ ├── parsers.py # Parses Planner/Actor outputs
│ │ ├── hf_model.py # HuggingFace model wrapper
│ │ └── leader.py # Coordinator agent
│ │
│ ├── prompts/
│ │ ├── recipe/ # train split (seen recipes)
│ │ └── test/recipe/ # test split (unseen recipes)
│ │
│ ├── dataset/
│ │ ├── planner/ # planner SFT data
│ │ ├── actor/ # actor SFT data
│ │ ├── coordinator/ # coordinator SFT data
│ │ └── coordinator_dpo/
│ │ └── dpo.json # coordinator DPO data
│ │
│ ├── train_planner_coordinator.py
│ ├── train_actor.py
│ ├── train_coordinator_dop.py
│ ├── eval_proposed.py
│ └── out/ # default LoRA adapter output
│ ├── planner/
│ ├── actor/
│ └── coordinator/ (and coordinator_dpo/ after DPO)
│
└── … (the rest of the repo: upgraded benchmark code, layouts, etc.)
All command-line examples below assume you are running from inside drico/:
cd dricoTrain each adapter independently. All three share the same base model
(Qwen/Qwen2.5-7B-Instruct); only the LoRA weights differ.
Trains the planner and coordinator adapters together from one script:
python train_planner_coordinator.py \
--data_dir dataset/planner/ \
--base_model Qwen/Qwen2.5-7B-Instruct \
--output_dir out/ \
--epochs 3 \
--bf16Produces out/planner/ and out/coordinator/.
python train_actor.py \
--data_dir dataset/actor/ \
--base_model Qwen/Qwen2.5-7B-Instruct \
--output_dir out/ \
--bf16Produces out/actor/.
After the coordinator SFT adapter is ready, refine it with DPO on preference pairs:
python train_coordinator_dop.py \
--data_path dataset/coordinator_dpo/dpo.json \
--coordinator_adapter out/coordinator/ \
--out_coordinator out/coordinator_dpo \
--coordinator_data_dir dataset/coordinator/Produces out/coordinator_dpo/. Swap this in for out/coordinator/ at
eval time if you want the DPO-refined coordinator.
Single CLI for the full Planner → Coordinator-Critique → Actor pipeline.
CUDA_VISIBLE_DEVICES=1 python eval_proposed.py \
--base_model Qwen/Qwen2.5-7B-Instruct \
--adapter_root out/ \
--layout circuit_env \
--recipe_split test \
--statistics_save_dir data/eval_testThe script:
- Loads
out/{planner,actor,coordinator}as three named adapters on one shared base model. - Evaluates every recipe in
prompts/test/recipe. - Computes DriCo-style PC (
completed_milestones / total_milestones, withsuccess ⇒ PC=1.0). - Writes per-recipe JSONs and a summary at
data/eval_test/eval_summary_test.json.
Either player can be controlled from the CLI:
# Chef = human, Assistant = LLM
CUDA_VISIBLE_DEVICES=1 python eval_proposed.py \
--base_model Qwen/Qwen2.5-7B-Instruct \
--adapter_root out/ \
--layout circuit_env \
--recipe_split test \
--statistics_save_dir data/eval_test \
--human_player p1| Flag value | Effect |
|---|---|
none (default) |
Both players LLM-controlled |
p1 / chef |
First player (Chef) typed by human |
p2 / assistant |
Second player (Assistant) typed by human |
When a human is in the game, --episode is forced to 1.
The human types ml_action strings each time a new one is needed
(auto-repeats across timesteps until completion). A cheat-sheet of the action
grammar is shown on the first prompt; every prompt also lists the actions
that look valid in the current state. Examples of what to type:
pickup(carrot, ingredient_dispenser)
put_obj_in_utensil(pot0)
cook(pot0)
wait(3)
deliver_soup()
Commands:
- Enter /
r→ repeat last action h→ reprint the grammarq→ end the episode
The human's last action is also converted to a natural-language subgoal that the partner LLM sees as the teammate's intent (so the partner can plan around it without any special-casing in LLM-side code).
# Different recipe splits
--recipe_split test # prompts/test/recipe (default)
--recipe_split train # prompts/recipe
--recipe_split both # both folders (test wins on name conflicts)
# Restrict to specific recipes (comma-separated)
--orders pea_soup,mashed_zucchini_and_pea_patty
# Restrict to specific difficulty levels
--levels 1 # only level 1
--levels 1,3 # levels 1 and 3
--levels 2-4 # levels 2,3,4
# Turn the coordinator critique on/off
--use_critique True # default — every subgoal reviewed, refined on REJECT
--use_critique False # planner output goes straight to actor
# Episode budget
--episode 3 # 3 episodes per recipe (ignored if --human_player set)
--horizon 80 # max timesteps per episode
# Generation / dtype
--planner_max_tokens 128
--actor_max_tokens 64
--dtype bf16 # bf16 | fp16 | fp32# Train-split recipes, level 1 only
python eval_proposed.py ... --recipe_split train --levels 1
# Both splits, just one recipe, human as Chef
python eval_proposed.py ... \
--recipe_split both \
--orders pea_soup \
--human_player p1
# Pure LLM eval, no critique, levels 2 through 4
python eval_proposed.py ... \
--recipe_split test \
--levels 2-4 \
--use_critique FalseFor each evaluation run, the script writes:
data/eval_test/
├── diff1_pea_soup/
│ ├── pea_soup_ep1_2026-05-12_…json # per-episode trace
│ └── pea_soup_ep2_2026-05-12_…json
├── diff2_carrot_soup/
│ └── …
└── eval_summary_test.json # aggregated metrics
eval_summary_<suffix>.json contains:
- Per-recipe and per-difficulty success rate (SR) and progress completeness (PC, DriCo-style)
- Average
first_reward_tover successful episodes (completion speed) - Drift analysis: how often the coordinator REJECTed, what drift types it flagged, how often refinement resolved them within 3 attempts
- Coordinator call counts (shared-context vs. critique-initial vs. critique-refine)
The filename suffix encodes the active filters (e.g.
eval_summary_both_lv1-3_orders-pea_soup_human-p1.json) so re-runs don't
clobber prior results.
# 1. Train all three adapters
python train_planner_coordinator.py \
--data_dir dataset/planner/ \
--base_model Qwen/Qwen2.5-7B-Instruct \
--output_dir out/ --epochs 3 --bf16
python train_actor.py \
--data_dir dataset/actor/ \
--base_model Qwen/Qwen2.5-7B-Instruct \
--output_dir out/ --bf16
python train_coordinator_dop.py \
--data_path dataset/coordinator_dpo/dpo.json \
--coordinator_adapter out/coordinator/ \
--out_coordinator out/coordinator_dpo \
--coordinator_data_dir dataset/coordinator/
# 2. Evaluate on test split — full LLM team
CUDA_VISIBLE_DEVICES=1 python eval_proposed.py \
--base_model Qwen/Qwen2.5-7B-Instruct \
--adapter_root out/ \
--layout circuit_env \
--recipe_split test \
--statistics_save_dir data/eval_test
# 3. Same eval, but you play the Chef
CUDA_VISIBLE_DEVICES=1 python eval_proposed.py \
--base_model Qwen/Qwen2.5-7B-Instruct \
--adapter_root out/ \
--layout circuit_env \
--recipe_split test \
--statistics_save_dir data/eval_test \
--human_player p1The LoRA adapters and training data are hosted on HuggingFace and mirror the local folder layout, so a one-shot download (see Quick start above) drops everything in the right place.
| Asset | Repo |
|---|---|
| LoRA adapters (4) | https://huggingface.co/anonymous-24421/DriCo-adapters |
| Training / DPO data | https://huggingface.co/datasets/anonymous-24421/DriCo |
If you only need a subset:
# Just the actor adapter
hf download anonymous-24421/DriCo-adapters \
--include "actor/*" \
--local-dir drico/out
# Just the DPO-refined coordinator
hf download anonymous-24421/DriCo-adapters \
--include "coordinator_dpo/*" \
--local-dir drico/out
# Just the DPO data
hf download anonymous-24421/DriCo \
--repo-type dataset \
--include "coordinator_dpo/*" \
--local-dir drico/datasetIf you want to use the adapters in your own code instead of going through the eval script:
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
base = "Qwen/Qwen2.5-7B-Instruct"
tok = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(
base, torch_dtype="bfloat16", device_map="auto",
)
# Register all three roles as named adapters on the same base model
model = PeftModel.from_pretrained(
model, "anonymous-24421/DriCo-adapters",
subfolder="planner", adapter_name="planner",
)
model.load_adapter("anonymous-24421/DriCo-adapters",
subfolder="actor", adapter_name="actor")
model.load_adapter("anonymous-24421/DriCo-adapters",
subfolder="coordinator_dpo", adapter_name="coordinator")
# Switch role per generation
model.set_adapter("planner")
# ...generate planner subgoal...
model.set_adapter("actor")
# ...generate action chunk...