## Using LLMs via Hugging Face Inference Client

Thankfully HuggingFace has made its new [__Inference Client__](https://huggingface.co/docs/huggingface_hub/en/package_reference/inference_client) free to use with some basic rate limits etc. in place so you don't end up making unlimited requests on its servers.

The best part is you can access 150,000+ deep learning models without worrying about your infrastructure. Similar to the inference API

In [None]:
import huggingface_hub
print(f"huggingface_hub version: {huggingface_hub.__version__}")
# Version should be >= 0.36.0 for Inference Providers to work
from huggingface_hub import InferenceClient

  from .autonotebook import tqdm as notebook_tqdm


Feel free to refer to the [documentation](https://huggingface.co/docs/huggingface_hub/en/package_reference/inference_client#huggingface_hub.InferenceClient) at any time as needed for more details on function names, arguments and more.

In [2]:
from dotenv import load_dotenv
import os

load_dotenv()
hf_key = os.getenv("HF_TOKEN")

In [3]:
# Using a model available via HuggingFace Inference Providers
# Note: Only models with "warm" inference status work with the free API
model_name = "meta-llama/Llama-3.1-8B-Instruct"
client = InferenceClient(token=hf_key)

chat = [
    { "role": "user", "content": "Explain what is Generative AI in 2 bullet points" },
]

response = client.chat_completion(chat, model=model_name, max_tokens=1000)
print(response)

ChatCompletionOutput(choices=[ChatCompletionOutputComplete(finish_reason='stop', index=0, message=ChatCompletionOutputMessage(role='assistant', content='Here are 2 bullet points explaining what Generative AI is:\n\n• **Definition**: Generative AI refers to a type of artificial intelligence that can create new, original content such as images, music, text, or videos using algorithms and machine learning models. These models are trained on large datasets and can learn patterns, styles, and structures to generate new content that is often indistinguishable from human-created work.\n\n• **Applications**: Generative AI has numerous applications across various industries, including art and design, music and audio production, writing and content creation, and even product design. Some examples of generative AI include generating realistic images of people, creating new music tracks, or producing automated content such as news articles or social media posts.', reasoning=None, tool_call_id=None

In [4]:
# Extract the message content from the ChatCompletion response
print(response.choices[0].message.content)

Here are 2 bullet points explaining what Generative AI is:

• **Definition**: Generative AI refers to a type of artificial intelligence that can create new, original content such as images, music, text, or videos using algorithms and machine learning models. These models are trained on large datasets and can learn patterns, styles, and structures to generate new content that is often indistinguishable from human-created work.

• **Applications**: Generative AI has numerous applications across various industries, including art and design, music and audio production, writing and content creation, and even product design. Some examples of generative AI include generating realistic images of people, creating new music tracks, or producing automated content such as news articles or social media posts.
