Higher memory usage on GPU0 than other GPUs when finetuning univla-7b on Libero datasets

<img width="1766" height="708" alt="Image" src="https://github.com/user-attachments/assets/4df687f5-f658-4619-95ce-6fbaff2b8644" />

Thanks for your brilliant open-sourced work! 

I have some questions when fine-tuning univla-7b on Libero datasets.
I found that gpu0 has a higher memory usage than other gpus. Could you explains why or give me some advice on fixing this issue? 

My CLI command is as follows:
```
torchrun --standalone --nnodes 1 --nproc-per-node 8 finetune_libero.py \
    --vla_path /path/to/checkpoints/univla \
    --lam_path /path/to/checkpoints/lam-stage-2.ckpt \
    --data_root_dir /path/to/dataset/libero/modified_libero_rlds \
    --dataset_name libero_spatial_no_noops \
    --run_root_dir /path/to/UniVLA/runs \
    --adapter_tmp_dir /path/to/runs/adapter_tmp \
    --batch_size 8 \
    --max_steps 30005 \
    --save_steps 5000 \
    --learning_rate 3.5e-4 \
    --grad_accumulation_steps 1 \
    --image_aug True \
    --shuffle_buffer_size 10000 \
    --save_latest_checkpoint_only False \
    --run_id_note libero_spatial
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Higher memory usage on GPU0 than other GPUs when finetuning univla-7b on Libero datasets #79

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Higher memory usage on GPU0 than other GPUs when finetuning univla-7b on Libero datasets #79

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions