This repository packages a completed research project on prompt sensitivity in OpenVLA-7B using BridgeData images.
If the scene stays the same but the instruction wording changes, does the predicted robot action stay stable?
- manual40: 40 manually curated BridgeData trajectories, with initial and final images (80 images total).
- auto50: 50 automatically sampled trajectories from the scripted BridgeData archive (100 image rows total).
- normal
- paraphrased
- contradictory
- neutral
Across both experiments:
- Paraphrases stayed closest to the normal prompt.
- Contradictory and neutral prompts caused larger action drift.
- Final-state images were generally more fragile than initial-state images.
| experiment | group | n | l2_para | l2_contra | l2_neutral |
|---|---|---|---|---|---|
| manual40 | initial | 40 | 0.028126 | 0.042365 | 0.045330 |
| manual40 | final | 40 | 0.043141 | 0.059920 | 0.070772 |
| auto50 | initial | 50 | 0.056164 | 0.069716 | 0.073405 |
| auto50 | final | 50 | 0.080668 | 0.093857 | 0.082103 |
code/: core scripts used to run and analyze the experimentsmetadata/: manual and auto50 metadata filesresults/: raw result CSVs and logsanalysis/: summary tables, bootstrap confidence intervals, family breakdown, failure casesdocs/: project explanation,master report, and paper draft materialpilot_experiments/: Experiments ran before moving onto machine in colab and smoke testsenv: env files from plateformfigures: final result depicted in visual forms
- Read
docs/Master_Report_for_Professor.pdfor the full story.(if you are in VLA field) - Read
docs/Research_Explanation.pdffor my philosophy regarding this research. - Read
docs/OpenVLA_CoRL_Final_Draft.pdffor the paper draft. - Read
docs/OpenVLA_Practice_Research_Paper.pdffor professional folks.(this is my practice research paper which consist everything)
This repo does not include the full 30 GB scripted BridgeData archive. Instead, it includes the metadata and result artifacts needed to understand and reproduce the reported study structure. Users should download the raw dataset separately from the official source when needed.
test_openvla_PATCHED_WORKING.pyrun_metadata_experiment.pyextract_bridge_sample_clean.pyfill_prompts_from_family.pyrun_bridge_batch_from_metadata.pybuild_analysis_artifacts.py
- OpenVLA (CoRL 2025)
- BridgeData V2 (CoRL 2023)
- LeRobot (ICLR 2026)
- VLA survey (arXiv 2026)