A controlled dataset for testing whether finetuning turns VLA models into fancy imitation learners.
.
βββ generate_control_dataset.py # Generates 40 controlled BDDL variants (2x2 design)
βββ render_examples.py # Renders scene images for each variant
βββ control_dataset/
β βββ original_seen/ # Baseline: original position + seen prompt
β βββ original_unseen/ # Tests language grounding: original position + unseen prompt
β βββ shuffled_seen/ # Tests spatial generalization: shuffled position + seen prompt
β βββ shuffled_unseen/ # Tests both: shuffled position + unseen prompt
βββ examples/
βββ original_seen/
βββ original_unseen/
βββ shuffled_seen/
βββ shuffled_unseen/
python generate_control_dataset.py \
--bddl_dir /path/to/libero/bddl_files/libero_object \
--output_dir ./control_dataset \
--seed 42