Hi, thanks for your great efforts on this work. I was wandering if there are any attempt to test pi05 on SIMPLERENV benchmark. I tried by myself and find it take a lot to make it work well
What I have tried and detailed setting
task: SIMPLERENV, Widow X
dataset: https://huggingface.co/datasets/IPEC-COMMUNITY/bridge_orig_lerobot
evaluation: https://github.com/DelinQu/SimplerEnv-OpenVLA
steps: 80k
batchsize: 1024 on 32 H100 (32 per GPU)
lr: 5e-5
norm: zscore (i tried the default one, quantile norm, which performs even worse)
results (ignore the invalid numbers, which is from evaluation codebase):
|
0 |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
| put_spoon_on_tablecloth/matching_partial |
0.7083333333333334 |
nan |
nan |
0.167 |
nan |
0.347 |
0.778 |
nan |
0.041 |
0.375 |
| put_spoon_on_tablecloth/matching_entire |
0.5416666666666666 |
nan |
nan |
0.0 |
nan |
0.125 |
0.472 |
nan |
0.0 |
0.208 |
| put_carrot_on_plate/matching_partial |
0.9166666666666666 |
nan |
nan |
0.208 |
nan |
0.528 |
0.278 |
nan |
0.333 |
0.333 |
| put_carrot_on_plate/matching_entire |
0.6666666666666666 |
nan |
nan |
0.042 |
nan |
0.083 |
0.097 |
nan |
0.0 |
0.25 |
| stack_green_block_on_yellow_block/matching_partial |
0.9166666666666666 |
nan |
nan |
0.083 |
nan |
0.319 |
0.403 |
nan |
0.125 |
0.083 |
| stack_green_block_on_yellow_block/matching_entire |
0.5 |
nan |
nan |
0.0 |
nan |
0.0 |
0.042 |
nan |
0.0 |
0.083 |
| put_eggplant_in_basket/matching_partial |
0.20833333333333334 |
nan |
nan |
0.0 |
nan |
0.667 |
0.875 |
nan |
0.083 |
0.0 |
| put_eggplant_in_basket/matching_entire |
0.20833333333333334 |
nan |
nan |
0.0 |
nan |
0.431 |
0.569 |
nan |
0.041 |
0.0 |
| ckpt_name |
Pi05-ft |
RT-1(Converged) |
RT-1(15%) |
RT-1-X |
RT-2-X |
Octo-Base |
Octo-Small |
RT-1(begin) |
OpenVLA |
RoboVLM |
Overall, the performance is somewhat worse than a result from a open sourced version pi0 (https://github.com/allenzren/open-pi-zero)
Hi, thanks for your great efforts on this work. I was wandering if there are any attempt to test pi05 on SIMPLERENV benchmark. I tried by myself and find it take a lot to make it work well
What I have tried and detailed setting
task: SIMPLERENV, Widow X
dataset: https://huggingface.co/datasets/IPEC-COMMUNITY/bridge_orig_lerobot
evaluation: https://github.com/DelinQu/SimplerEnv-OpenVLA
steps: 80k
batchsize: 1024 on 32 H100 (32 per GPU)
lr: 5e-5
norm: zscore (i tried the default one, quantile norm, which performs even worse)
results (ignore the invalid numbers, which is from evaluation codebase):
Overall, the performance is somewhat worse than a result from a open sourced version pi0 (https://github.com/allenzren/open-pi-zero)