Question about GPU memory usage. #59

mxjmtxrm · 2024-04-25T02:09:29Z

Hi, I tried to finetune a llama7b model with HQQ-LORA using dual GPUs.
I found that during "Loading & Quantizing Model Shards", the peak GPU memory usage acheved 35G. What's the problem?
the run command is:

export CUDA_VISIBLE_DEVICES=3,4
python train.py \
--world_size 2 \
--model_name /workspace/model/Llama-2-7b-chat-hf \
--gradient_accumulation_steps 2 \
--batch_size 1 \
--context_length 4096 \
--num_epochs 1 \
--sharding_strategy full_shard \
--precision bf16 \
--train_type hqq_lora \
--use_gradient_checkpointing true \
--use_cpu_offload true \
--dataset dummy \
--verbose true

Looking forward to your reply.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about GPU memory usage. #59

Question about GPU memory usage. #59

mxjmtxrm commented Apr 25, 2024 •

edited

Loading

Question about GPU memory usage. #59

Question about GPU memory usage. #59

Comments

mxjmtxrm commented Apr 25, 2024 • edited Loading

mxjmtxrm commented Apr 25, 2024 •

edited

Loading