Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for dedicated and serverless inference endpoints via inference API #238

Merged
merged 15 commits into from
Jan 14, 2024

Conversation

philschmid
Copy link
Contributor

@philschmid philschmid commented Jan 11, 2024

What does this PR do?

This PR currently adds a dirty implementation of how we could support dedicated Inference Endpoints and serverless Inference Endpoints via the Inference API.

On init we try to check if the provided "endpoint_name_or_model_id" (happy to revert back to endpoint_name) is available serverless using the list_deployed_models method.

example

import os

from distilabel.llm import InferenceEndpointsLLM
from distilabel.tasks import TextGenerationTask

token = os.getenv("HF_TOKEN")  # hf_...

llm = InferenceEndpointsLLM(
    "openchat/openchat-3.5-0106",
    token=token,
    task=TextGenerationTask(),
    max_new_tokens=512,
)


result = llm.generate([{"input": "What are critique LLMs?"}])
result

Note: didn't work on docs yet.

@philschmid philschmid changed the title dirty support for inference API Dirty support for inference API Jan 11, 2024
Copy link
Contributor

@Wauplin Wauplin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made a quick review. Usage of InferenceClient/InferenceEndpoint looks good to me!

src/distilabel/llm/huggingface/inference_endpoints.py Outdated Show resolved Hide resolved
src/distilabel/llm/huggingface/inference_endpoints.py Outdated Show resolved Hide resolved
src/distilabel/llm/huggingface/inference_endpoints.py Outdated Show resolved Hide resolved
src/distilabel/llm/huggingface/inference_endpoints.py Outdated Show resolved Hide resolved
philschmid and others added 4 commits January 11, 2024 14:50
Co-authored-by: Lucain <lucainp@gmail.com>
Co-authored-by: Lucain <lucainp@gmail.com>
Co-authored-by: Lucain <lucainp@gmail.com>
Co-authored-by: Lucain <lucainp@gmail.com>
@plaguss
Copy link
Contributor

plaguss commented Jan 11, 2024

Thank you very much @philschmid for your contribution and @Wauplin for the fast review 😄. Looks good, let me take a look at the docs, it shouldn't need many updates

@plaguss plaguss changed the title Dirty support for inference API Add support for dedicated and serverless inference endpoints via inference API Jan 11, 2024
@plaguss
Copy link
Contributor

plaguss commented Jan 11, 2024

Could you please update the new variable name in this example and in the docs in this snippet? Also if you could install pre-commit, the errors from the tests should be solved.
Note: I updated the PR name.

Copy link
Contributor

@ignacioct ignacioct left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor remarks on code, but it looks really nice! I'll test the functionality in a bit and get back with more feedback.

Copy link
Contributor

@ignacioct ignacioct left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code's working on my side

@plaguss plaguss merged commit 78147a7 into argilla-io:main Jan 14, 2024
4 checks passed
@davidberenstein1957 davidberenstein1957 added this to the 0.4.0 milestone Jan 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants