Unable to do inference on Falcon-40b fine-tuned using LORA #198

guptashrey · 2023-06-23T19:50:52Z

Hi everyone,

I was able to finetune a Falcon-4b model using the finetune/lora.py script. Now, I am trying to generate responses using the generate/lora.py script, but it gets stuck loading the model, and the inference doesn't work. Can anyone help me with this?

carmocca · 2023-06-24T17:39:00Z

You are using FSDP for inference, right? It won't fit in a single 80GB card.
How many devices are you using?

guptashrey · 2023-06-24T20:48:32Z

No, I just leave the strategy to "auto" which essentially means that I am not using "FSDP". Also, I tried using both 1 and multiple devices but the model weights just keep loading and the script gets stuck.

carmocca · 2023-06-24T21:44:12Z

The model won't fit into any single 80GB card unless you do quantization. So either you do that or the model needs to be sharded by using FSDP.

I'm don't know why it would get stuck as you describe, but you won't be able to load it anyways without the above techniques enabled.

guptashrey · 2023-06-26T23:43:38Z

I tried using FSDP and using 8 80GB A100 GPUs but still, it gets stuck while using the generate/lora.py script.

The commands I am using are:

python generate/lora.py --lora_path s3/out/adapter/mixed_40b/lit_model_lora_finetuned.pth --checkpoint_dir s3/checkpoints/tiiuae/falcon-40b --strategy "fsdp" --devices 8
python generate/lora.py --lora_path s3/out/adapter/mixed_40b/lit_model_lora_finetuned.pth --checkpoint_dir s3/checkpoints/tiiuae/falcon-40b --devices 8

gpravi · 2023-06-28T22:09:39Z

@guptashrey Can you please let us know the configurations you used for finetuning? I ran into OOM with 8 80GB A100 GPUs

weilong-web · 2023-06-29T07:31:03Z

#207

gpravi mentioned this issue Jun 28, 2023

falcon-40b out of memory #165

Closed

carmocca mentioned this issue Aug 14, 2023

Support QLoRA 4-bit finetuning with bitsandbytes #275

Merged

carmocca closed this as completed in #275 Aug 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to do inference on Falcon-40b fine-tuned using LORA #198

Unable to do inference on Falcon-40b fine-tuned using LORA #198

guptashrey commented Jun 23, 2023

carmocca commented Jun 24, 2023

guptashrey commented Jun 24, 2023

carmocca commented Jun 24, 2023

guptashrey commented Jun 26, 2023

gpravi commented Jun 28, 2023

weilong-web commented Jun 29, 2023

Unable to do inference on Falcon-40b fine-tuned using LORA #198

Unable to do inference on Falcon-40b fine-tuned using LORA #198

Comments

guptashrey commented Jun 23, 2023

carmocca commented Jun 24, 2023

guptashrey commented Jun 24, 2023

carmocca commented Jun 24, 2023

guptashrey commented Jun 26, 2023

gpravi commented Jun 28, 2023

weilong-web commented Jun 29, 2023