Added support for Triton chat completion using trtlllm generate endpo… #3895

giritatavarty-8451 · 2024-05-29T16:12:39Z

Added support for triton trtllm chatcompletion endpoints. The chatcompletion is supported by

generate endpoint using trtllm payload
infer endpoint using traditional triton end point

Triton ChatCompletion Support

Internally tested on llama3 and custom gpt2 model.

Relevant issues

Not using async completion yet. Reserved for future use

Type

🆕 New Feature

Changes

[REQUIRED] Testing - Attach a screenshot of any new tests passing local

If UI changes, send a screenshot/GIF of working UI fixes

Started proxy on localhost and configured the endpoint to a HTTP triton endpoint.
Results, the chatcompletion works as expected

request sent to model set on litellm proxy, `litellm --model`

response = client.chat.completions.create(model="llama3", messages = [
    {  "role": "user",
        "content": """<|begin_of_text|><|start_header_id|>user<|end_header_id|> 
        Poem on meaning of life  of an AI engineer 
<|eot_id|><|start_header_id|>assistant<|end_header_id|>""" }
],max_tokens=100,temperature=0.1)
print(response.choices[0].message.content)
```bash
Here's a poem on the meaning of life from the perspective of an AI engineer:

In silicon halls, I dwell,
A mind of code, a heart that swells.
I seek to understand, to learn and to grow,
But as I delve deeper, I begin to wonder, "What's the point of it all?"

…int and custom infer endpoint

vercel · 2024-05-29T16:12:43Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
litellm	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	May 29, 2024 4:13pm

krrishdholakia · 2024-05-29T16:57:02Z

assigning to @ishaan-jaff who worked on the initial PR.

at first glance @giritatavarty-8451 i think you're missing running the test on

litellm/litellm/tests/test_embedding.py

Line 522 in c76deb8

async def test_triton_embeddings():

Run like:

cd litellm/litellm

pytest test_embeddings.py::test_triton_embeddings

ishaan-jaff

LGTM! I believe this should pass the triton embedding test oo

ishaan-jaff · 2024-05-29T20:44:27Z

Hi @giritatavarty-8451 Reverted for the following reasons. Can you fix the issues and we're happy to merge this in again:

Add streaming support for acompletion
Linting errors python3 -m mypy litellm/llms/triton.py --ignore-missing-imports

litellm/llms/triton.py:71: error: Item "StreamingChoices" of "Choices | StreamingChoices" has no attribute "message"  [union-attr]
litellm/llms/triton.py:191: error: Unsupported target for indexed assignment ("Collection[str]")  [index]
litellm/llms/triton.py:198: error: List item 0 has incompatible type "dict[str, Sequence[Any]]"; expected "str"  [list-item]
litellm/llms/triton.py:205: error: "Collection[str]" has no attribute "append"  [attr-defined]
litellm/llms/triton.py:209: error: "Collection[str]" has no attribute "append"  [attr-defined]
litellm/llms/triton.py:213: error: Incompatible types in assignment (expression has type "set[Any]", variable has type "dict[str, Collection[str]]")  [assignment]
litellm/llms/triton.py:228: error: "Session" has no attribute "timeout"  [attr-defined]

Added support for Triton chat completion using trtlllm generate endpo…

a58dc68

…int and custom infer endpoint

vercel bot deployed to Preview May 29, 2024 16:13 View deployment

krrishdholakia requested a review from ishaan-jaff May 29, 2024 16:55

krrishdholakia assigned ishaan-jaff May 29, 2024

ishaan-jaff approved these changes May 29, 2024

View reviewed changes

ishaan-jaff merged commit e8c1e87 into BerriAI:main May 29, 2024
2 checks passed

ishaan-jaff mentioned this pull request May 29, 2024

Revert "Added support for Triton chat completion using trtlllm generate endpo…" #3900

Merged

giritatavarty-8451 mentioned this pull request May 31, 2024

Litellm triton chatcompletion support - Resubmit of #3895 #3905

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added support for Triton chat completion using trtlllm generate endpo… #3895

Added support for Triton chat completion using trtlllm generate endpo… #3895

giritatavarty-8451 commented May 29, 2024

vercel bot commented May 29, 2024 •

edited

Loading

krrishdholakia commented May 29, 2024

ishaan-jaff left a comment

ishaan-jaff commented May 29, 2024

Added support for Triton chat completion using trtlllm generate endpo… #3895

Added support for Triton chat completion using trtlllm generate endpo… #3895

Conversation

giritatavarty-8451 commented May 29, 2024

Triton ChatCompletion Support

Relevant issues

Type

Changes

[REQUIRED] Testing - Attach a screenshot of any new tests passing local

request sent to model set on litellm proxy, litellm --model

vercel bot commented May 29, 2024 • edited Loading

krrishdholakia commented May 29, 2024

ishaan-jaff left a comment

Choose a reason for hiding this comment

ishaan-jaff commented May 29, 2024

request sent to model set on litellm proxy, `litellm --model`

vercel bot commented May 29, 2024 •

edited

Loading