Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LitGPT Python API draft #1459

Merged
merged 5 commits into from
Jun 7, 2024
Merged

LitGPT Python API draft #1459

merged 5 commits into from
Jun 7, 2024

Conversation

rasbt
Copy link
Collaborator

@rasbt rasbt commented Jun 4, 2024

This is a first draft for the Python API.

In LitServe, the usage could then look like as follows:

LitServe example

#######################
# LitServe example
#######################

# litgpt_server.py
import litserve as ls
from litgpt import LLM

class LitGPTAPI(ls.LitAPI):
    def setup(self, device):
        llm = LLM.load(path="meta-llama/llama3", hub="local", device="cuda", ...)


    def decode_request(self, request):
        # llm.generate() already encodes the prompt before passing it to the LLM. So no action needed.
        return request["prompt"]
    
    def predict(self, prompt):
        text = self.llm.generate(prompt, max_tokens=100, temperature=1.0, top_k=25)
        return text
    
    def encode_response(self, output):
        # llm.generate() already decodes the generated token IDs. So no action needed.
        return {"response": output["text"]}
    

if __name__ == "__main__":
    api = LitGPTAPI()
    server = ls.LitServer(api, accelerator="gpu", timeout=1000, workers_per_device=2)
    server.run(port=8000)

CC @lantiga @aniketmaurya

@rasbt rasbt marked this pull request as draft June 4, 2024 19:16
@aniketmaurya
Copy link
Contributor

Thanks @rasbt. this is really clean!!

@lantiga
Copy link
Contributor

lantiga commented Jun 4, 2024

This looks great @rasbt

One question related to conversations and statefulness: right now the API will take a prompt and generate a formatted prompt from it internally, and then come back with the output.

However you may also come in with a multi-turn conversation, and you’ll have to format it properly. So we either add support for a conversation (i.e. type of input is str | dict[…]), or we provide control over the kvcache, or both (the latter can happen later).

If we allow the input to be a dict of turns, we’ll also need to expand the prompt formatting accordingly (which we know we need to do anyway).

@rasbt
Copy link
Collaborator Author

rasbt commented Jun 4, 2024

Fair point.

I guess we can handle it 2 ways:

Option 1

# First interaction, returns a response dict
# handle state, kv cache etc appropriately if `multiturn=True`
response_1 = llm.generate(
     prompt="What do Llamas eat?",
     response=None, # New
     multiturn=True, # New
     temperature=0.1,
     top_p=0.8,
     ...
 )

# pass response back in:
response_2 = llm.generate(
     prompt="What do Llamas eat?",
     response=response_1, # New
     multiturn=True, # New
     temperature=0.1,
     top_p=0.8,
     ...
 )

Option 2

We reserve generate for just the simple prompting case, and chat is always multiturn by default

# internally tracks state as `llm.chat_history`
response = llm.chat(
     prompt="What do Llamas eat?",
     temperature=0.1,
     top_p=0.8,
     ...
 )

What do you think @aniketmaurya @lantiga @awaelchli ?

@lantiga
Copy link
Contributor

lantiga commented Jun 4, 2024

I think the input of multiturn should always be a dict with all turns.

We shouldn’t make the api stateful (ie assuming kvcache is there) or we won’t be able to orchestrate multiple conversations in the future.

@Andrei-Aksionov
Copy link
Collaborator

Wow, that's actually happening. Thanks Sebastian 👍

@lantiga I don't understand what is an orchestration of multiple conversations.

tutorials/python-api.md Outdated Show resolved Hide resolved
tutorials/python-api.md Outdated Show resolved Hide resolved
tutorials/python-api.md Outdated Show resolved Hide resolved
tutorials/python-api.md Outdated Show resolved Hide resolved
tutorials/python-api.md Outdated Show resolved Hide resolved
rasbt and others added 2 commits June 5, 2024 08:28
Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
@rasbt
Copy link
Collaborator Author

rasbt commented Jun 5, 2024

@Andrei-Aksionov

@lantiga I don't understand what is an orchestration of multiple conversations.

I think this means if you have multiple conversations going. For example thing of an example where you are hosting the model via LitServe, and then you have multiple people interacting with it (similar to how multiple people use ChatGPT). The interaction of one user shouldn't influence the interaction by another user.

@Andrei-Aksionov
Copy link
Collaborator

Thanks for the explanation.
I haven't tried chatgpt (it's blocked here), only saw how it works in a video 😂

@rasbt rasbt mentioned this pull request Jun 5, 2024
8 tasks
@rasbt rasbt marked this pull request as ready for review June 7, 2024 19:57
@rasbt
Copy link
Collaborator Author

rasbt commented Jun 7, 2024

Merging this documentation it here so that we don't have too many open PRs sitting for a long time. This will then be implemented in 2 steps. V1 for the version @aniketmaurya needs asap, and then adding the training functionality.

@rasbt rasbt merged commit 0bb34ab into main Jun 7, 2024
9 checks passed
@rasbt rasbt deleted the python-api branch June 7, 2024 20:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants