-
Notifications
You must be signed in to change notification settings - Fork 904
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LitGPT Python API v1 #1463
LitGPT Python API v1 #1463
Conversation
So far, the basic use case on a single device works: # run `litgpt download EleutherAI/pythia160-m`
from litgpt.api import LLM
llm = LLM.load("EleutherAI/pythia-160m", device_type="cuda", devices=1)
text = llm.generate("What do Llamas eat?", top_k=1) I am concerned with the multi-GPU support though. I remember from a discussion with Luca that we want the option to load the model onto multiple devices. But how would I do it with fabric if I don't want to train / use FSDP right away? I think the natural way would be to do from litgpt.api import LLM
llm = LLM.load("EleutherAI/pythia-160m", device_type="cuda", devices=4)
llm.instruction_finetune(dataset, ...) approach (not implemented yet, but just thinking down the road). I am honestly a bit stuck. Would the only way be to load the model on a single device and then use multiple devices only when finetuning? Any ideas here @awaelchli ? (And how does the overall code look like? It feels a bit ugly to me but I hope it's not too terrible) |
hi @rasbt, which method do we use here for streaming the response? |
@aniketmaurya I haven't added streaming to this v1, but I can add it if it's important because it should be relatively straight forward. |
Yes, I think we would need that for LitServe streaming example and as well if we want to serve an OpenAI API compatible API. |
I tried to add it but it's getting a bit messy to do all that in a single PR because streaming requires more refactoring because the way Python does pattern matching with you have a yield and a return on a method or function. We can add streaming in a separate PR later once we have the basics working. |
If you have some time, could you take a look at whether this v1 looks structurally ok @lantiga @awaelchli ? As mentioned at the top, more features will be added later. This is a simple v1 that focuses on the basics to not bloat the PR too much. |
Let's merge it so that it can be used by Aniket as an experimental feature from the main branch. This is not going to be advertised or recommended to people yet until it's a bit more mature and more functionality is added. If you have a chance some time (maybe after all the conferences), your expert feedback and second pair of eyes would be super appreciated. |
This is a PR to implement a subset of the LitGPT Python API as discussed in #1459 (CC @aniketmaurya). This subset will focus only on the inference aspects (not the training and finetuning, yet).
TODOs
generate
methodcheckpoint_dir
tomodel
To get the v1 out for inference soon, and to lower the reviewer burden for a single PR, these features will be added in separate PRs: