Add support for dedicated and serverless inference endpoints via inference API #238

philschmid · 2024-01-11T13:00:59Z

What does this PR do?

This PR currently adds a dirty implementation of how we could support dedicated Inference Endpoints and serverless Inference Endpoints via the Inference API.

On init we try to check if the provided "endpoint_name_or_model_id" (happy to revert back to endpoint_name) is available serverless using the list_deployed_models method.

example

import os

from distilabel.llm import InferenceEndpointsLLM
from distilabel.tasks import TextGenerationTask

token = os.getenv("HF_TOKEN")  # hf_...

llm = InferenceEndpointsLLM(
    "openchat/openchat-3.5-0106",
    token=token,
    task=TextGenerationTask(),
    max_new_tokens=512,
)


result = llm.generate([{"input": "What are critique LLMs?"}])
result

Note: didn't work on docs yet.

Wauplin

Made a quick review. Usage of InferenceClient/InferenceEndpoint looks good to me!

src/distilabel/llm/huggingface/inference_endpoints.py

Co-authored-by: Lucain <lucainp@gmail.com>

plaguss · 2024-01-11T14:25:32Z

Thank you very much @philschmid for your contribution and @Wauplin for the fast review 😄. Looks good, let me take a look at the docs, it shouldn't need many updates

plaguss · 2024-01-11T14:36:00Z

Could you please update the new variable name in this example and in the docs in this snippet? Also if you could install pre-commit, the errors from the tests should be solved.
Note: I updated the PR name.

ignacioct

Some minor remarks on code, but it looks really nice! I'll test the functionality in a bit and get back with more feedback.

src/distilabel/llm/huggingface/inference_endpoints.py

ignacioct

Code's working on my side

dirty support for inference API

52a059d

philschmid changed the title ~~dirty support for inference API~~ Dirty support for inference API Jan 11, 2024

philschmid added 2 commits January 11, 2024 14:22

improved poc

623add8

improvements

71150dc

Wauplin reviewed Jan 11, 2024

View reviewed changes

philschmid and others added 4 commits January 11, 2024 14:50

Update src/distilabel/llm/huggingface/inference_endpoints.py

938f4ac

Co-authored-by: Lucain <lucainp@gmail.com>

Update src/distilabel/llm/huggingface/inference_endpoints.py

e58b24e

Co-authored-by: Lucain <lucainp@gmail.com>

Update src/distilabel/llm/huggingface/inference_endpoints.py

be87fda

Co-authored-by: Lucain <lucainp@gmail.com>

Update src/distilabel/llm/huggingface/inference_endpoints.py

1caca07

Co-authored-by: Lucain <lucainp@gmail.com>

plaguss changed the title ~~Dirty support for inference API~~ Add support for dedicated and serverless inference endpoints via inference API Jan 11, 2024

plaguss mentioned this pull request Jan 11, 2024

Add dedicated endpoints to anyscale option #240

Closed

ignacioct reviewed Jan 11, 2024

View reviewed changes

src/distilabel/llm/huggingface/inference_endpoints.py Show resolved Hide resolved

src/distilabel/llm/huggingface/inference_endpoints.py Outdated Show resolved Hide resolved

ignacioct approved these changes Jan 11, 2024

View reviewed changes

philschmid added 6 commits January 12, 2024 17:49

add stop sequence and fix prometheus

48ec15d

add suggestion

ae1bc08

remove requests

3834cdd

make format

c5976b7

remove print

2c1d6e3

fix all endpoint_name to endpoint_name_or_model_id

5d87436

philschmid force-pushed the hf-inference-api-support branch from 9ab97c0 to 5d87436 Compare January 12, 2024 17:05

philschmid added 2 commits January 12, 2024 18:30

fix endpoints example

b4de67a

fixed running endpoint

d69b32f

plaguss merged commit 78147a7 into argilla-io:main Jan 14, 2024
4 checks passed

davidberenstein1957 added this to the 0.4.0 milestone Jan 17, 2024

davanstrien mentioned this pull request Feb 29, 2024

Refactor model availability check in is_serverless_endpoint_available #363

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for dedicated and serverless inference endpoints via inference API #238

Add support for dedicated and serverless inference endpoints via inference API #238

philschmid commented Jan 11, 2024 •

edited

Wauplin left a comment

plaguss commented Jan 11, 2024

plaguss commented Jan 11, 2024

ignacioct left a comment

ignacioct left a comment

Add support for dedicated and serverless inference endpoints via inference API #238

Add support for dedicated and serverless inference endpoints via inference API #238

Conversation

philschmid commented Jan 11, 2024 • edited

What does this PR do?

Wauplin left a comment

Choose a reason for hiding this comment

plaguss commented Jan 11, 2024

plaguss commented Jan 11, 2024

ignacioct left a comment

Choose a reason for hiding this comment

ignacioct left a comment

Choose a reason for hiding this comment

philschmid commented Jan 11, 2024 •

edited