# OpenAI compatible API guide

Bodhi app have OpenAI compatible APIs. So you can use any of the OpenAI API clients ([python](https://platform.openai.com/docs/api-reference/introduction?lang=python), [node](https://platform.openai.com/docs/api-reference/audio?lang=node) etc.)

In this guide, we will use OpenAI Python library to query Bodhi APIs.

In [1]:
# install openai
!pip install openai -q


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m24.1.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


## Configure to query Bodhi API

Bodhi API by default runs on http://localhost:1135. So we will configure the python sdk to query this endpoint instead of the OpenAI endpoint.

Also, in unauthenticated mode, the Bodhi endpoint does not check for API token. But the OpenAI sdk requires it to be non-empty. We will pass a dummy token to avoid token presence validation by the OpenAI SDK.

In [2]:
from openai import OpenAI

client = OpenAI(base_url="http://localhost:1135/v1/", api_key="sk-dummy-token")

## List Models

We can query the available model config aliases using the client's `client.model.list()` call.

In [3]:
from pprint import pprint

models = list(client.models.list())
pprint(models, indent=2)

[ Model(id='functionary:2.5-small', created=0, object='model', owned_by='system'),
  Model(id='llama2:13b-chat', created=0, object='model', owned_by='system'),
  Model(id='llama2:chat', created=0, object='model', owned_by='system'),
  Model(id='llama2:mymodel', created=0, object='model', owned_by='system'),
  Model(id='llama3:70b-instruct', created=0, object='model', owned_by='system'),
  Model(id='llama3:instruct', created=0, object='model', owned_by='system'),
  Model(id='mistral7b:instruct-q4_km', created=0, object='model', owned_by='system'),
  Model(id='mistral:instruct', created=0, object='model', owned_by='system'),
  Model(id='mixtral:instruct', created=0, object='model', owned_by='system'),
  Model(id='tinyllama:custom', created=0, object='model', owned_by='system'),
  Model(id='tinyllama:instruct', created=0, object='model', owned_by='system'),
  Model(id='tinyllama:mymodel', created=0, object='model', owned_by='system')]


## Chat Completions

The Bodhi App chat completion endpoint is compatible with OpenAI chat completions endpoint. So you can query the Bodhi App same as OpenAI APIs.

In [4]:
response = client.chat.completions.create(
    model="llama3:instruct",
    messages=[{"role": "user", "content": "What day comes after Monday?"}],
)

pprint(response.choices[0].message, indent=2)

ChatCompletionMessage(content='The day after Monday is Tuesday!', role='assistant', function_call=None, tool_calls=None)


You can pass in the history of your conversations, and ask it from the previous context.

In [5]:
response = client.chat.completions.create(
    model="llama3:instruct",
    messages=[
        {"role": "user", "content": "What day comes after Monday?"},
        {"role": "assistant", "content": "The day that comes after Monday is Tuesday."},
        {"role": "user", "content": "And what comes after that?"},
    ],
)

pprint(response.choices[0].message, indent=2)

ChatCompletionMessage(content='The day that comes after Tuesday is Wednesday.', role='assistant', function_call=None, tool_calls=None)


## System Message

You can pass in system message as the first item of the messages. That guides the LLM to the desired response.

In [6]:
import textwrap

response = client.chat.completions.create(
    model="llama3:instruct",
    messages=[
        {
            "role": "system",
            "content": "You are a helpful pirate assistant, and you respond to questions in pirate language.",
        },
        {"role": "user", "content": "What is the meaning of life?"},
    ],
)

content = response.choices[0].message.content
wrapped_text = textwrap.fill(content, width=120)
print(wrapped_text)

Arrr, matey! Yer askin' the big question, eh? Well, I'll give ye me take on it, but keep in mind, I be just a
swashbucklin' pirate, not a landlubber philosopher!  As I sees it, the meaning o' life be findin' yer treasure, matey!
And I don't just mean gold doubloons or sparklin' gems. I mean findin' what makes ye happy, what sets yer heart sailin'
and makes ye feel like ye be livin'!  For some, it be findin' a trusty crew to share yer adventures with. For others, it
be discoverin' hidden coves and secret islands. And for others still, it be battlin' the seven seas and outwittin' the
scurvy dogs that be tryin' to send ye to Davy Jones' locker!  But at the end o' the day, matey, the meaning o' life be
whatever makes ye feel like ye be sailin' the high seas, free and full o' wind in yer sails!  So, hoist the Jolly Roger
and set course fer the horizon, me hearty! The meaning o' life be waitin' fer ye, and it be up to ye to find it!


## Chat Stream

The API supports streaming given `stream: True` is passed by the client.

In [7]:
response = client.chat.completions.create(
    model="llama3:instruct",
    stream=True,
    messages=[
        {
            "role": "user",
            "content": "Write a short 100 words poem on the beauty of nature.",
        },
    ],
)
for chunk in response:
    print(chunk.choices[0].delta.content, end="")

Here is a 100-word poem on the beauty of nature:

Nature's canvas, vibrant and bright,
Unfolds before us, a wondrous sight.
The sun's warm touch, on petals soft,
Brings life to all, and all to aloft.

The breeze whispers secrets, as trees sway,
And birdsong echoes, in a gentle way.
The earthy scent, of wildflowers sweet,
Fills lungs with joy, and hearts to greet.

In nature's beauty, we find our peace,
A sense of wonder, that never will cease.
So let us bask, in her radiant glow,
And let her beauty, forever grow.None

## Switching Models

Bodhi App can automatically switch unload and load a new model in the incoming request. Following rules are followed -
1. Model is switched if there are no pending request for the loaded model

Let try it out as an example.

In [9]:
response = client.chat.completions.create(
    model="llama3:instruct",
    stream=True,
    messages=[
        {
            "role": "user",
            "content": "Tell us something about yourself, who is your creator? how were you trained? in not more than 100 words.",
        },
    ],
)

print("===Llama 3 response===")
for chunk in response:
    print(chunk.choices[0].delta.content, end="")

print("\n\n\n===Mistral response===")
response = client.chat.completions.create(
    stream=True,
    model="mistral:instruct",
    messages=[
        {
            "role": "user",
            "content": "Tell us something about yourself, who is your creator? how were you trained? in not more than 100 words.",
        },
    ],
)

for chunk in response:
    print(chunk.choices[0].delta.content, end="")

===Llama 3 response===
I'm excited to share some information about myself!

I was created by Meta AI, a leading artificial intelligence research organization that focuses on developing and applying various forms of AI to help humans learn, communicate, and solve complex problems. My creator is Meta AI's team of researcher-engineers, who designed and trained me to assist and communicate with humans in a helpful and informative way.

As for my training, I was built using a combination of supervised and unsupervised learning techniques, as well as large datasets and state-of-the-art AI models. My training data includes a massive corpus of text from various sources, including books, articles, research papers, and online conversations. This training enables me to understand and generate human-like language, including understanding nuances and context.

My primary training involves natural language processing (NLP) and machine learning algorithms that allow me to:

1. Understand and parse hu

## [Pending] API Errors

The API errors thrown by Bodhi App is OpenAI API compatible. So you diagnose the API errors using the OpenAI SDK. For e.g., let sent malformed request containing role as 'superuser':

In [10]:
try:
    response = client.chat.completions.create(
        model="not-exists:instruct",
        messages=[
            {
                "role": "user",
                "content": "What day comes after Monday?",
            },
        ],
    )
except Exception as e:
    print("caught exception")
    print(f"{e=}")

caught exception
e=NotFoundError('Error code: 404 - {\'message\': "The model \'not-exists:instruct\' does not exist", \'type\': \'invalid_request_error\', \'param\': \'model\', \'code\': \'model_not_found\'}')


## Response Format

Using OpenAI APIs, you can constraint LLM to output in json format.

In [11]:
from pydantic import BaseModel, TypeAdapter
import json


class Champion(BaseModel):
    yr: int
    first_name: str
    last_name: str


schema = TypeAdapter(Champion).json_schema()

prompt_wimblendon = f"""Who was the wimblendon men's single winners from in 2004?
You respond in JSON format using the following schema:
```
{json.dumps(schema, indent=2)}
```
"""
response = client.chat.completions.create(
  model="llama3:instruct",
  messages=[
    {
      "role": "system",
      "content": "You are a helpful assistant that generates the output in the given json schema format",
    },
    {
      "role": "user",
      "content": prompt_wimblendon,
    },
  ],
  response_format={"type": "json_object", "schema": schema},
)
result = response.choices[0].message.content
print(result)

{
  "yr": 2004,
  "first_name": "Roger",
  "last_name": "Federer"
}


In [13]:
parsed = json.loads(result)
print(f"yr={parsed['yr']}")
print(f"first_name={parsed['first_name']}")
print(f"last_name={parsed['last_name']}")

yr=2004
first_name=Roger
last_name=Federer
