LitGPT Python API draft #1459

rasbt · 2024-06-04T19:15:27Z

This is a first draft for the Python API.

In LitServe, the usage could then look like as follows:

LitServe example

#######################
# LitServe example
#######################

# litgpt_server.py
import litserve as ls
from litgpt import LLM

class LitGPTAPI(ls.LitAPI):
    def setup(self, device):
        llm = LLM.load(path="meta-llama/llama3", hub="local", device="cuda", ...)


    def decode_request(self, request):
        # llm.generate() already encodes the prompt before passing it to the LLM. So no action needed.
        return request["prompt"]
    
    def predict(self, prompt):
        text = self.llm.generate(prompt, max_tokens=100, temperature=1.0, top_k=25)
        return text
    
    def encode_response(self, output):
        # llm.generate() already decodes the generated token IDs. So no action needed.
        return {"response": output["text"]}
    

if __name__ == "__main__":
    api = LitGPTAPI()
    server = ls.LitServer(api, accelerator="gpu", timeout=1000, workers_per_device=2)
    server.run(port=8000)

CC @lantiga @aniketmaurya

aniketmaurya · 2024-06-04T19:25:52Z

Thanks @rasbt. this is really clean!!

lantiga · 2024-06-04T21:54:05Z

This looks great @rasbt

One question related to conversations and statefulness: right now the API will take a prompt and generate a formatted prompt from it internally, and then come back with the output.

However you may also come in with a multi-turn conversation, and you’ll have to format it properly. So we either add support for a conversation (i.e. type of input is str | dict[…]), or we provide control over the kvcache, or both (the latter can happen later).

If we allow the input to be a dict of turns, we’ll also need to expand the prompt formatting accordingly (which we know we need to do anyway).

rasbt · 2024-06-04T22:00:23Z

Fair point.

I guess we can handle it 2 ways:

Option 1

# First interaction, returns a response dict
# handle state, kv cache etc appropriately if `multiturn=True`
response_1 = llm.generate(
     prompt="What do Llamas eat?",
     response=None, # New
     multiturn=True, # New
     temperature=0.1,
     top_p=0.8,
     ...
 )

# pass response back in:
response_2 = llm.generate(
     prompt="What do Llamas eat?",
     response=response_1, # New
     multiturn=True, # New
     temperature=0.1,
     top_p=0.8,
     ...
 )

Option 2

We reserve generate for just the simple prompting case, and chat is always multiturn by default

# internally tracks state as `llm.chat_history`
response = llm.chat(
     prompt="What do Llamas eat?",
     temperature=0.1,
     top_p=0.8,
     ...
 )

What do you think @aniketmaurya @lantiga @awaelchli ?

lantiga · 2024-06-04T22:13:27Z

I think the input of multiturn should always be a dict with all turns.

We shouldn’t make the api stateful (ie assuming kvcache is there) or we won’t be able to orchestrate multiple conversations in the future.

Andrei-Aksionov · 2024-06-05T06:16:37Z

Wow, that's actually happening. Thanks Sebastian 👍

@lantiga I don't understand what is an orchestration of multiple conversations.

tutorials/python-api.md

Co-authored-by: awaelchli <aedu.waelchli@gmail.com>

rasbt · 2024-06-05T14:28:10Z

@Andrei-Aksionov

@lantiga I don't understand what is an orchestration of multiple conversations.

I think this means if you have multiple conversations going. For example thing of an example where you are hosting the model via LitServe, and then you have multiple people interacting with it (similar to how multiple people use ChatGPT). The interaction of one user shouldn't influence the interaction by another user.

Andrei-Aksionov · 2024-06-05T14:38:22Z

Thanks for the explanation.
I haven't tried chatgpt (it's blocked here), only saw how it works in a video 😂

rasbt · 2024-06-07T20:10:48Z

Merging this documentation it here so that we don't have too many open PRs sitting for a long time. This will then be implemented in 2 steps. V1 for the version @aniketmaurya needs asap, and then adding the training functionality.

LitGPT Python API draft

1457187

rasbt requested review from awaelchli and lantiga as code owners June 4, 2024 19:15

rasbt marked this pull request as draft June 4, 2024 19:16

awaelchli reviewed Jun 5, 2024

View reviewed changes

rasbt and others added 2 commits June 5, 2024 08:28

Update tutorials/python-api.md

c8a17ee

Co-authored-by: awaelchli <aedu.waelchli@gmail.com>

argument consistency

0443358

rasbt mentioned this pull request Jun 5, 2024

LitGPT Python API v1 #1463

Merged

8 tasks

move file

99be8d7

rasbt marked this pull request as ready for review June 7, 2024 19:57

Merge branch 'main' into python-api

65d81c9

rasbt merged commit 0bb34ab into main Jun 7, 2024
9 checks passed

rasbt deleted the python-api branch June 7, 2024 20:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LitGPT Python API draft #1459

LitGPT Python API draft #1459

rasbt commented Jun 4, 2024

aniketmaurya commented Jun 4, 2024

lantiga commented Jun 4, 2024 •

edited

Loading

rasbt commented Jun 4, 2024

lantiga commented Jun 4, 2024

Andrei-Aksionov commented Jun 5, 2024

rasbt commented Jun 5, 2024

Andrei-Aksionov commented Jun 5, 2024

rasbt commented Jun 7, 2024

LitGPT Python API draft #1459

LitGPT Python API draft #1459

Conversation

rasbt commented Jun 4, 2024

LitServe example

aniketmaurya commented Jun 4, 2024

lantiga commented Jun 4, 2024 • edited Loading

rasbt commented Jun 4, 2024

Option 1

Option 2

lantiga commented Jun 4, 2024

Andrei-Aksionov commented Jun 5, 2024

rasbt commented Jun 5, 2024

Andrei-Aksionov commented Jun 5, 2024

rasbt commented Jun 7, 2024

lantiga commented Jun 4, 2024 •

edited

Loading