# From Open AI to Open LLMs with Messages API

In [1]:
!pip install --upgrade -q huggingface_hub


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0.1[0m[39;49m -> [0m[32;49m23.3.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


## Create an Inference Endpoint using `huggingface_hub`

The `huggingface_hub` Python library allows you to programatically create and manage Inference Endpoints which just a few steps. Here, we'll use it to deploy the powerful [Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1) as an endpoint running on [Text Generation Inference](https://huggingface.co/docs/text-generation-inference/index), our high performance inference solution for serving LLMs in production.

We need to specify the endpoint name and model repository for the text-generation task. A protected Inference Endpoint means a valid HF token is required to access the deployed API. We also need to configure the hardware requirements like vendor, region, accelerator, instance type, and size. You can check out the list of available resources [here](https://api.endpoints.huggingface.cloud/#get-/v2/provider).

In [30]:
from huggingface_hub import create_inference_endpoint

endpoint = create_inference_endpoint(
    "mixtral-8x7b-instruct-v0-1-demo",
    repository="mistralai/Mixtral-8x7B-Instruct-v0.1",
    framework="pytorch",
    task="text-generation",
    accelerator="gpu",
    vendor="aws",
    region="us-east-1",
    type="protected",
    instance_type="p4de",
    instance_size="2xlarge",
    namespace="HF-test-lab",
    custom_image={
        "health_route": "/health",
        "env": {
            "MAX_INPUT_LENGTH": "1024",
            "MAX_BATCH_PREFILL_TOKENS": "2048",
            "MAX_TOTAL_TOKENS": "32000",
            "MAX_BATCH_TOTAL_TOKENS": "1024000",
            "MODEL_ID": "/repository",
        },
        # "url": "ghcr.io/huggingface/text-generation-inference:1.4.0",  # must be >= 1.4.0
        "url": "ghcr.io/huggingface/text-generation-inference:sha-ee1cf51",
    },
)

endpoint.wait()
print(endpoint.status)

running


It will take a few minutes for our deployment to spin up. We can utilize the `.wait()` utility to block the running thread until the endpoint reaches a final "running" state. Once its running, we can run a quick check to see everything is working as expected.

In [None]:
endpoint.client.text_generation(
    "<s>[INST] Why is open-source so important? [/INST]",
    max_new_tokens=100,
    do_sample=True,
)

Great, we now have a working deployment! But notice how we needed to carefully format the prompt according to the model's instruction format? While our [chat templates](https://huggingface.co/docs/transformers/chat_templating) handle all of this nuance, the new Messages API makes things even simpler...

## Using the Messages API via the OpenAI SDK

The added support for messages in TGI makes Inference Endpoints directly compatibile with the OpenAI Chat Completion API. This means that any existing scripts that use OpenAI models via the OpenAI client libraries can be directly swapped out to use any open-source LLM running on a TGI endpoint!

The example below shows how to make this transition to stream responses from our Inference Endpoint. Simply replace the `base_url` with your endpoint URL (be sure to include `v1/` the suffix) and populate the `api_key` field with a valid Hugging Face user token.

In [2]:
from openai import OpenAI

# init the client but point it to TGI
client = OpenAI(
    base_url="https://ey1416en78lct0cg.us-east-1.aws.endpoints.huggingface.cloud/"
    + "v1/",
    api_key="",
)
chat_completion = client.chat.completions.create(
    model="tgi",
    messages=[
        {"role": "user", "content": "Why is open-source software important?"},
    ],
    stream=True,
    max_tokens=500,
)

# iterate and print stream
for message in chat_completion:
    print(message.choices[0].delta.content, end="")

 Open-source software (OSS) is important for several reasons:

1. Cost-effective: OSS is typically free to use, which can help organizations save money on software licensing fees.
2. Flexibility and customization: Because the source code is openly available, users can modify and customize OSS to meet their specific needs.
3. Community-driven development: OSS is often developed and maintained by a community of developers, which can lead to faster innovation and bug fixes compared to proprietary software.
4. Transparency: The open-source nature of the software allows users to see exactly how it works, which can lead to increased trust and security.
5. Interoperability: OSS is often designed to be compatible with a wide range of platforms and systems, which can make it easier to integrate with existing infrastructure.
6. Encourage Innovation: Open-source software allows for a wider range of people to contribute to the development and improvement of the software which can lead to more inno