Evaluation on SimplerEnv After Finetuning Pi05 Does Not Meet Expected Performance

Hi, thanks for your great efforts on this work. I was wandering if there are any attempt to test pi05 on SIMPLERENV benchmark. I tried by myself and find it take a lot to make it work well

What I have tried and detailed setting 
task: SIMPLERENV, Widow X
dataset: https://huggingface.co/datasets/IPEC-COMMUNITY/bridge_orig_lerobot
evaluation:  https://github.com/DelinQu/SimplerEnv-OpenVLA
steps: 80k
batchsize: 1024 on 32 H100 (32 per GPU)
lr: 5e-5
norm: zscore (i tried the default one, quantile norm, which performs even worse)

results (ignore the invalid numbers, which is from evaluation codebase):
|                                                    | 0                   | 1               | 2         | 3      | 4      | 5         | 6          | 7           | 8       | 9       |
|:---------------------------------------------------|:--------------------|:----------------|:----------|:-------|:-------|:----------|:-----------|:------------|:--------|:--------|
| put_spoon_on_tablecloth/matching_partial           | 0.7083333333333334  | nan             | nan       | 0.167  | nan    | 0.347     | 0.778      | nan         | 0.041   | 0.375   |
| put_spoon_on_tablecloth/matching_entire            | 0.5416666666666666  | nan             | nan       | 0.0    | nan    | 0.125     | 0.472      | nan         | 0.0     | 0.208   |
| put_carrot_on_plate/matching_partial               | 0.9166666666666666  | nan             | nan       | 0.208  | nan    | 0.528     | 0.278      | nan         | 0.333   | 0.333   |
| put_carrot_on_plate/matching_entire                | 0.6666666666666666  | nan             | nan       | 0.042  | nan    | 0.083     | 0.097      | nan         | 0.0     | 0.25    |
| stack_green_block_on_yellow_block/matching_partial | 0.9166666666666666  | nan             | nan       | 0.083  | nan    | 0.319     | 0.403      | nan         | 0.125   | 0.083   |
| stack_green_block_on_yellow_block/matching_entire  | 0.5                 | nan             | nan       | 0.0    | nan    | 0.0       | 0.042      | nan         | 0.0     | 0.083   |
| put_eggplant_in_basket/matching_partial            | 0.20833333333333334 | nan             | nan       | 0.0    | nan    | 0.667     | 0.875      | nan         | 0.083   | 0.0     |
| put_eggplant_in_basket/matching_entire             | 0.20833333333333334 | nan             | nan       | 0.0    | nan    | 0.431     | 0.569      | nan         | 0.041   | 0.0     |
| ckpt_name                                          | Pi05-ft                | RT-1(Converged) | RT-1(15%) | RT-1-X | RT-2-X | Octo-Base | Octo-Small | RT-1(begin) | OpenVLA | RoboVLM |

Overall, the performance is somewhat worse than a result from a open sourced version pi0 (https://github.com/allenzren/open-pi-zero) 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluation on SimplerEnv After Finetuning Pi05 Does Not Meet Expected Performance #799

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

	0	1	2	3	4	5	6	7	8	9
put_spoon_on_tablecloth/matching_partial	0.7083333333333334	nan	nan	0.167	nan	0.347	0.778	nan	0.041	0.375
put_spoon_on_tablecloth/matching_entire	0.5416666666666666	nan	nan	0.0	nan	0.125	0.472	nan	0.0	0.208
put_carrot_on_plate/matching_partial	0.9166666666666666	nan	nan	0.208	nan	0.528	0.278	nan	0.333	0.333
put_carrot_on_plate/matching_entire	0.6666666666666666	nan	nan	0.042	nan	0.083	0.097	nan	0.0	0.25
stack_green_block_on_yellow_block/matching_partial	0.9166666666666666	nan	nan	0.083	nan	0.319	0.403	nan	0.125	0.083
stack_green_block_on_yellow_block/matching_entire	0.5	nan	nan	0.0	nan	0.0	0.042	nan	0.0	0.083
put_eggplant_in_basket/matching_partial	0.20833333333333334	nan	nan	0.0	nan	0.667	0.875	nan	0.083	0.0
put_eggplant_in_basket/matching_entire	0.20833333333333334	nan	nan	0.0	nan	0.431	0.569	nan	0.041	0.0
ckpt_name	Pi05-ft	RT-1(Converged)	RT-1(15%)	RT-1-X	RT-2-X	Octo-Base	Octo-Small	RT-1(begin)	OpenVLA	RoboVLM

Evaluation on SimplerEnv After Finetuning Pi05 Does Not Meet Expected Performance #799

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions