Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to do inference on Falcon-40b fine-tuned using LORA #198

Closed
guptashrey opened this issue Jun 23, 2023 · 6 comments · Fixed by #275
Closed

Unable to do inference on Falcon-40b fine-tuned using LORA #198

guptashrey opened this issue Jun 23, 2023 · 6 comments · Fixed by #275

Comments

@guptashrey
Copy link

Hi everyone,

I was able to finetune a Falcon-4b model using the finetune/lora.py script. Now, I am trying to generate responses using the generate/lora.py script, but it gets stuck loading the model, and the inference doesn't work. Can anyone help me with this?

@carmocca
Copy link
Contributor

You are using FSDP for inference, right? It won't fit in a single 80GB card.
How many devices are you using?

@guptashrey
Copy link
Author

No, I just leave the strategy to "auto" which essentially means that I am not using "FSDP". Also, I tried using both 1 and multiple devices but the model weights just keep loading and the script gets stuck.

@carmocca
Copy link
Contributor

The model won't fit into any single 80GB card unless you do quantization. So either you do that or the model needs to be sharded by using FSDP.

I'm don't know why it would get stuck as you describe, but you won't be able to load it anyways without the above techniques enabled.

@guptashrey
Copy link
Author

I tried using FSDP and using 8 80GB A100 GPUs but still, it gets stuck while using the generate/lora.py script.

The commands I am using are:

  1. python generate/lora.py --lora_path s3/out/adapter/mixed_40b/lit_model_lora_finetuned.pth --checkpoint_dir s3/checkpoints/tiiuae/falcon-40b --strategy "fsdp" --devices 8
  2. python generate/lora.py --lora_path s3/out/adapter/mixed_40b/lit_model_lora_finetuned.pth --checkpoint_dir s3/checkpoints/tiiuae/falcon-40b --devices 8

@gpravi
Copy link

gpravi commented Jun 28, 2023

@guptashrey Can you please let us know the configurations you used for finetuning? I ran into OOM with 8 80GB A100 GPUs

@weilong-web
Copy link

#207

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants