DriCo: Planner / Actor / Coordinator

A three-LoRA-adapter system for cooperative cooking. A single base model (Qwen2.5-7B-Instruct) is fine-tuned with three separate LoRA adapters that each play a distinct role at inference time:

Adapter	Role
`planner`	High-level subgoal generation in natural language (e.g. "Pick up carrot from counter and put it into pot0.")
`actor`	Low-level action chunking — turns a subgoal into up to 3 `ml_action`s (e.g. `pickup(carrot, ingredient_dispenser)`)
`coordinator`	Shared-context generation + critique of planner subgoals (PASS / REJECT + feedback)

At eval time, the coordinator's REJECT events trigger the planner to refine its subgoal (up to 2 retries) before handing off to the actor.

Quick start

End-to-end: clone the repo, pull the pre-trained adapters and data from HuggingFace into the right folders, and run evaluation.

# 1) Clone the project
git clone https://github.com/anonymous-projectpage/DriCo.git
cd DriCo

# 2) Install dependencies
pip install -U huggingface_hub
pip install -r requirements.txt        # if a requirements file exists
# (or install torch / transformers / peft / overcooked_ai / wandb manually)

# 3) Pull the LoRA adapters into drico/out/
hf download anonymous-24421/DriCo-adapters \
    --local-dir drico/out

# 4) (Optional) pull the training / DPO data into drico/dataset/
hf download anonymous-24421/DriCo \
    --repo-type dataset \
    --local-dir drico/dataset

# 5) Run evaluation
cd drico
CUDA_VISIBLE_DEVICES=0 python eval_overcooked.py \
    --base_model           Qwen/Qwen2.5-7B-Instruct \
    --adapter_root         out/ \
    --layout               new_env \
    --recipe_split         test \
    --statistics_save_dir  data/eval_test

After step 3, the directory tree looks like:

DriCo/
└── drico/
    └── out/
        ├── planner/             ← LoRA adapter weights
        ├── actor/
        ├── coordinator/
        └── coordinator_dpo/

That's exactly the layout eval_proposed.py --adapter_root out/ expects, so step 5 just works.

1. Repo layout

The training/eval code lives in drico/. This README sits at the repo root.

.
├── README.md                       # ← this file
├── drico/
│   ├── agent/
│   │   ├── hf_agent.py             # Planner+Actor agent (no collab.* deps)
│   │   ├── human_agent.py          # Human CLI agent (drop-in replacement)
│   │   ├── parsers.py              # Parses Planner/Actor outputs
│   │   ├── hf_model.py             # HuggingFace model wrapper
│   │   └── leader.py               # Coordinator agent
│   │
│   ├── prompts/
│   │   ├── recipe/                 # train split (seen recipes)
│   │   └── test/recipe/            # test split (unseen recipes)
│   │
│   ├── dataset/
│   │   ├── planner/                # planner SFT data
│   │   ├── actor/                  # actor SFT data
│   │   ├── coordinator/            # coordinator SFT data
│   │   └── coordinator_dpo/
│   │       └── dpo.json            # coordinator DPO data
│   │
│   ├── train_planner_coordinator.py
│   ├── train_actor.py
│   ├── train_coordinator_dop.py
│   ├── eval_proposed.py
│   └── out/                        # default LoRA adapter output
│       ├── planner/
│       ├── actor/
│       └── coordinator/            (and coordinator_dpo/ after DPO)
│
└── … (the rest of the repo: upgraded benchmark code, layouts, etc.)

All command-line examples below assume you are running from inside drico/:

cd drico

2. Training

Train each adapter independently. All three share the same base model (Qwen/Qwen2.5-7B-Instruct); only the LoRA weights differ.

2.1 Planner + Coordinator (SFT)

Trains the planner and coordinator adapters together from one script:

python train_planner_coordinator.py \
    --data_dir   dataset/planner/ \
    --base_model Qwen/Qwen2.5-7B-Instruct \
    --output_dir out/ \
    --epochs     3 \
    --bf16

Produces out/planner/ and out/coordinator/.

2.2 Actor (DPO)

python train_actor.py \
    --data_dir   dataset/actor/ \
    --base_model Qwen/Qwen2.5-7B-Instruct \
    --output_dir out/ \
    --bf16

Produces out/actor/.

2.3 Coordinator (DPO refinement)

After the coordinator SFT adapter is ready, refine it with DPO on preference pairs:

python train_coordinator_dop.py \
    --data_path            dataset/coordinator_dpo/dpo.json \
    --coordinator_adapter  out/coordinator/ \
    --out_coordinator      out/coordinator_dpo \
    --coordinator_data_dir dataset/coordinator/

Produces out/coordinator_dpo/. Swap this in for out/coordinator/ at eval time if you want the DPO-refined coordinator.

3. Evaluation

Single CLI for the full Planner → Coordinator-Critique → Actor pipeline.

3.1 Basic — all LLMs, test split

CUDA_VISIBLE_DEVICES=1 python eval_proposed.py \
    --base_model            Qwen/Qwen2.5-7B-Instruct \
    --adapter_root          out/ \
    --layout                circuit_env \
    --recipe_split          test \
    --statistics_save_dir   data/eval_test

The script:

Loads out/{planner,actor,coordinator} as three named adapters on one shared base model.
Evaluates every recipe in prompts/test/recipe.
Computes DriCo-style PC (completed_milestones / total_milestones, with success ⇒ PC=1.0).
Writes per-recipe JSONs and a summary at data/eval_test/eval_summary_test.json.

3.2 Run a human against the LLM

Either player can be controlled from the CLI:

# Chef = human, Assistant = LLM
CUDA_VISIBLE_DEVICES=1 python eval_proposed.py \
    --base_model          Qwen/Qwen2.5-7B-Instruct \
    --adapter_root        out/ \
    --layout              circuit_env \
    --recipe_split        test \
    --statistics_save_dir data/eval_test \
    --human_player        p1

Flag value	Effect
`none` (default)	Both players LLM-controlled
`p1` / `chef`	First player (Chef) typed by human
`p2` / `assistant`	Second player (Assistant) typed by human

When a human is in the game, --episode is forced to 1.

The human types ml_action strings each time a new one is needed (auto-repeats across timesteps until completion). A cheat-sheet of the action grammar is shown on the first prompt; every prompt also lists the actions that look valid in the current state. Examples of what to type:

pickup(carrot, ingredient_dispenser)
put_obj_in_utensil(pot0)
cook(pot0)
wait(3)
deliver_soup()

Commands:

Enter / r → repeat last action
h → reprint the grammar
q → end the episode

The human's last action is also converted to a natural-language subgoal that the partner LLM sees as the teammate's intent (so the partner can plan around it without any special-casing in LLM-side code).

3.3 Other useful flags

# Different recipe splits
--recipe_split test      # prompts/test/recipe       (default)
--recipe_split train     # prompts/recipe
--recipe_split both      # both folders (test wins on name conflicts)

# Restrict to specific recipes (comma-separated)
--orders pea_soup,mashed_zucchini_and_pea_patty

# Restrict to specific difficulty levels
--levels 1               # only level 1
--levels 1,3             # levels 1 and 3
--levels 2-4             # levels 2,3,4

# Turn the coordinator critique on/off
--use_critique True      # default — every subgoal reviewed, refined on REJECT
--use_critique False     # planner output goes straight to actor

# Episode budget
--episode 3              # 3 episodes per recipe (ignored if --human_player set)
--horizon  80            # max timesteps per episode

# Generation / dtype
--planner_max_tokens 128
--actor_max_tokens    64
--dtype               bf16   # bf16 | fp16 | fp32

3.4 Combined examples

# Train-split recipes, level 1 only
python eval_proposed.py ... --recipe_split train --levels 1

# Both splits, just one recipe, human as Chef
python eval_proposed.py ... \
    --recipe_split both \
    --orders pea_soup \
    --human_player p1

# Pure LLM eval, no critique, levels 2 through 4
python eval_proposed.py ... \
    --recipe_split test \
    --levels 2-4 \
    --use_critique False

4. Output files

For each evaluation run, the script writes:

data/eval_test/
├── diff1_pea_soup/
│   ├── pea_soup_ep1_2026-05-12_…json     # per-episode trace
│   └── pea_soup_ep2_2026-05-12_…json
├── diff2_carrot_soup/
│   └── …
└── eval_summary_test.json                 # aggregated metrics

eval_summary_<suffix>.json contains:

Per-recipe and per-difficulty success rate (SR) and progress completeness (PC, DriCo-style)
Average first_reward_t over successful episodes (completion speed)
Drift analysis: how often the coordinator REJECTed, what drift types it flagged, how often refinement resolved them within 3 attempts
Coordinator call counts (shared-context vs. critique-initial vs. critique-refine)

The filename suffix encodes the active filters (e.g. eval_summary_both_lv1-3_orders-pea_soup_human-p1.json) so re-runs don't clobber prior results.

5. End-to-end example

# 1. Train all three adapters
python train_planner_coordinator.py \
    --data_dir dataset/planner/ \
    --base_model Qwen/Qwen2.5-7B-Instruct \
    --output_dir out/ --epochs 3 --bf16

python train_actor.py \
    --data_dir dataset/actor/ \
    --base_model Qwen/Qwen2.5-7B-Instruct \
    --output_dir out/ --bf16

python train_coordinator_dop.py \
    --data_path dataset/coordinator_dpo/dpo.json \
    --coordinator_adapter out/coordinator/ \
    --out_coordinator out/coordinator_dpo \
    --coordinator_data_dir dataset/coordinator/

# 2. Evaluate on test split — full LLM team
CUDA_VISIBLE_DEVICES=1 python eval_proposed.py \
    --base_model Qwen/Qwen2.5-7B-Instruct \
    --adapter_root out/ \
    --layout circuit_env \
    --recipe_split test \
    --statistics_save_dir data/eval_test

# 3. Same eval, but you play the Chef
CUDA_VISIBLE_DEVICES=1 python eval_proposed.py \
    --base_model Qwen/Qwen2.5-7B-Instruct \
    --adapter_root out/ \
    --layout circuit_env \
    --recipe_split test \
    --statistics_save_dir data/eval_test \
    --human_player p1

6. Pre-trained adapters & data on HuggingFace

The LoRA adapters and training data are hosted on HuggingFace and mirror the local folder layout, so a one-shot download (see Quick start above) drops everything in the right place.

Asset	Repo
LoRA adapters (4)	https://huggingface.co/anonymous-24421/DriCo-adapters
Training / DPO data	https://huggingface.co/datasets/anonymous-24421/DriCo

Selective download

If you only need a subset:

# Just the actor adapter
hf download anonymous-24421/DriCo-adapters \
    --include "actor/*" \
    --local-dir drico/out

# Just the DPO-refined coordinator
hf download anonymous-24421/DriCo-adapters \
    --include "coordinator_dpo/*" \
    --local-dir drico/out

# Just the DPO data
hf download anonymous-24421/DriCo \
    --repo-type dataset \
    --include "coordinator_dpo/*" \
    --local-dir drico/dataset

Manual loading in Python

If you want to use the adapters in your own code instead of going through the eval script:

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base = "Qwen/Qwen2.5-7B-Instruct"
tok  = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(
    base, torch_dtype="bfloat16", device_map="auto",
)

# Register all three roles as named adapters on the same base model
model = PeftModel.from_pretrained(
    model, "anonymous-24421/DriCo-adapters",
    subfolder="planner", adapter_name="planner",
)
model.load_adapter("anonymous-24421/DriCo-adapters",
                   subfolder="actor", adapter_name="actor")
model.load_adapter("anonymous-24421/DriCo-adapters",
                   subfolder="coordinator_dpo", adapter_name="coordinator")

# Switch role per generation
model.set_adapter("planner")
# ...generate planner subgoal...

model.set_adapter("actor")
# ...generate action chunk...

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
Figure		Figure
drico		drico
lib		lib
LICENSE		LICENSE
README.md		README.md
index.html		index.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DriCo: Planner / Actor / Coordinator

Quick start

1. Repo layout

2. Training

2.1 Planner + Coordinator (SFT)

2.2 Actor (DPO)

2.3 Coordinator (DPO refinement)

3. Evaluation

3.1 Basic — all LLMs, test split

3.2 Run a human against the LLM

3.3 Other useful flags

3.4 Combined examples

4. Output files

5. End-to-end example

6. Pre-trained adapters & data on HuggingFace

Selective download

Manual loading in Python

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DriCo: Planner / Actor / Coordinator

Quick start

1. Repo layout

2. Training

2.1 Planner + Coordinator (SFT)

2.2 Actor (DPO)

2.3 Coordinator (DPO refinement)

3. Evaluation

3.1 Basic — all LLMs, test split

3.2 Run a human against the LLM

3.3 Other useful flags

3.4 Combined examples

4. Output files

5. End-to-end example

6. Pre-trained adapters & data on HuggingFace

Selective download

Manual loading in Python

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages