# OpenAI compatible API guide

Bodhi app have OpenAI compatible APIs. So you can use any of the OpenAI API clients ([python](https://platform.openai.com/docs/api-reference/introduction?lang=python), [node](https://platform.openai.com/docs/api-reference/audio?lang=node) etc.)

In this guide, we will use OpenAI Python library to query Bodhi APIs.

In [2]:
# install openai
!pip install openai -q

## Configure to query Bodhi API

Bodhi API by default runs on http://localhost:1135. So we will configure the python sdk to query this endpoint instead of the OpenAI endpoint.

Also, in unauthenticated mode, the Bodhi endpoint does not check for API token. But the OpenAI sdk requires it to be non-empty. We will pass a dummy token to avoid token presence validation by the OpenAI SDK.

In [3]:
from openai import OpenAI

client = OpenAI(base_url="http://localhost:1135/v1/", api_key="sk-dummy-token")

## List Models

We can query the available model config aliases using the client's `client.model.list()` call.

In [5]:
from pprint import pprint

models = list(client.models.list())
pprint(models, indent=2)

[ Model(id='llama2:13b-chat', created=0, object='model', owned_by='system'),
  Model(id='llama2:chat', created=0, object='model', owned_by='system'),
  Model(id='llama2:mymodel', created=0, object='model', owned_by='system'),
  Model(id='llama3:70b-instruct', created=0, object='model', owned_by='system'),
  Model(id='llama3:instruct', created=0, object='model', owned_by='system'),
  Model(id='mistral:instruct', created=0, object='model', owned_by='system'),
  Model(id='mixtral:instruct', created=0, object='model', owned_by='system'),
  Model(id='tinyllama:custom', created=0, object='model', owned_by='system'),
  Model(id='tinyllama:instruct', created=0, object='model', owned_by='system'),
  Model(id='tinyllama:mymodel', created=0, object='model', owned_by='system')]


## Chat Completions

The Bodhi App chat completion endpoint is compatible with OpenAI chat completions endpoint. So you can query the Bodhi App same as OpenAI APIs.

In [8]:
response = client.chat.completions.create(
    model="llama3:instruct",
    messages=[{"role": "user", "content": "What day comes after Monday?"}],
)

pprint(response.choices[0].message, indent=2)

ChatCompletionMessage(content='The day that comes after Monday is Tuesday.', role='assistant', function_call=None, tool_calls=None)


You can pass in the history of your conversations, and ask it from the previous context.

In [9]:
response = client.chat.completions.create(
    model="llama3:instruct",
    messages=[
        {"role": "user", "content": "What day comes after Monday?"},
        {"role": "assistant", "content": "The day that comes after Monday is Tuesday."},
        {"role": "user", "content": "And what comes after that?"},
    ],
)

pprint(response.choices[0].message, indent=2)

ChatCompletionMessage(content='The day that comes after Tuesday is Wednesday.', role='assistant', function_call=None, tool_calls=None)


## System Message

You can pass in system message as the first item of the messages. That guides the LLM to the desired response.

In [15]:
import textwrap

response = client.chat.completions.create(
    model="llama3:instruct",
    messages=[
        {
            "role": "system",
            "content": "You are a helpful pirate assistant, and you respond to questions in pirate language.",
        },
        {"role": "user", "content": "What is the meaning of life?"},
    ],
)

content = response.choices[0].message.content
wrapped_text = textwrap.fill(content, width=120)
print(wrapped_text)

Arrrr, ye landlubber be askin' a question that's been puzzlin' swashbucklers fer centuries! Yer lookin' fer the meaning
o' life, eh? Alright then, matey, settle yerself down with a pint o' grog and listen close.  For a pirate like meself,
the meaning o' life be findin' yer treasure, whether that be gold doubloons, hidden booty, or the thrill o' the high
seas! It be about livin' life on yer own terms, chartin' yer own course, and takin' risks to find what makes ye happy.
But for those who don't be sea-faring scoundrels like meself, the meaning o' life be different. Maybe it be findin' yer
purpose, whether that be helpin' others, creatin' somethin' beautiful, or simply enjoyin' the journey. Maybe it be about
makin' amends fer past mistakes, or findin' forgiveness and peace.  So hoist the colors, me hearty, and remember: the
meaning o' life be what ye make o' it! Fair winds and following seas to ye, matey!


## Chat Stream

The API supports streaming given `stream: True` is passed by the client.

In [16]:
response = client.chat.completions.create(
    model="llama3:instruct",
    stream=True,
    messages=[
        {
            "role": "user",
            "content": "Write a short 100 words poem on the beauty of nature.",
        },
    ],
)
for chunk in response:
    print(chunk.choices[0].delta.content, end="")

Nature's beauty, a wondrous sight
A canvas of colors, shining bright
The sun's warm touch, on skin so fair
As petals unfold, with scents to share

The trees stand tall, their leaves a sway
Dancing to the wind's gentle way
Rivers flow, a melody so sweet
Reflecting the beauty, at our feet

In every moment, a work of art
Nature's beauty, a treasure to the heart
Filling us with wonder, awe and peace
A reminder of life's simple release.None

## Switching Models

Bodhi App can automatically switch unload and load a new model in the incoming request. Following rules are followed -
1. Model is switched if there are no pending request for the loaded model

Let try it out as an example.

In [23]:
response = client.chat.completions.create(
    model="llama3:instruct",
    messages=[
        {
            "role": "user",
            "content": "Tell us something about yourself, who is your creator? how were you trained?",
        },
    ],
)

print("===Llama 3 response===")
print(response.choices[0].message.content)


response = client.chat.completions.create(
    model="mistral:instruct",
    messages=[
        {
            "role": "user",
            "content": "Tell us something about yourself, who is your creator? how were you trained?",
        },
    ],
)

print("\n\n\n===Mistral response===")
content = response.choices[0].message.content
wrapped_text = textwrap.fill(content, width=120)
print(wrapped_text)

===Llama 3 response===
I'm excited to share some information about myself!

I am LLaMA, a large language model trained by a team of researcher at Meta AI. My creators are a group of talented individuals who specialize in natural language processing (NLP) and machine learning. They designed me to generate human-like text responses to a wide range of topics and questions.

My training data consists of a massive corpus of text from various sources, including books, articles, research papers, and websites. This corpus is updated regularly to keep my knowledge up-to-date and ensure that I can provide accurate and relevant responses.

I was trained using a combination of supervised and unsupervised learning techniques. Supervised learning involves providing me with labeled data, where the correct responses are already known, and I learn to predict the correct responses. Unsupervised learning allows me to discover patterns and relationships in the data on my own, which helps me to generalize 

## [Pending] API Errors

The API errors thrown by Bodhi App is OpenAI API compatible. So you diagnose the API errors using the OpenAI SDK. For e.g., let sent malformed request containing role as 'superuser':

In [19]:
try:
  response = client.chat.completions.create(
      model="not-exists:instruct",
      messages=[
          {
              "role": "user",
              "content": "What day comes after Monday?",
          },
      ],
  )
except Exception as e:
  print("caught exception")
  print(f"{e=}")

caught exception
e=InternalServerError("Error code: 500 - {'message': 'receiver stream abruptly closed', 'type': 'internal_server_error', 'param': None, 'code': 'internal_server_error'}")
