# Setup & Verify SDKs

The exercises in this section will have dependencies on various Large Language Model providers. Run this notebook to make sure all your environment variables are setup and the SDKs are installed correctly. Refresh code segments from relevant LLM quickstarts periodically to ensure you are working with the latest APIs.

---

## 1. With Open AI

We should use the [latest documentation](https://www.deeplearning.ai/short-courses/chatgpt-prompt-engineering-for-developers/) and API (v1.0+) to validate our learnings. Here's the [quickstart](https://platform.openai.com/docs/quickstart?context=python) steps and samples we can use to validate this for our setup

1. **Setup Python Environment** - I already have this setup with the Dev Container configuration, using Python 3.
2. **Install OpenAI Python Package** - Added to default `requirements.txt`. We can upgrade manually with `pip install --upgrade openai`.
3. **Configure API Key** - Create an OpenAI account, get API key - set that as environment variable `OPENAI_API_KEY` in .env 
4. **Validate Usage** - Run the code cell below to verify `Chat Completions`, `Embeddings` and `Images` capabilities work by default.

#### Chat Completions API

In [65]:
# Create an OpenAI client instance
from openai import OpenAI
client = OpenAI()

# Verify Chat Completions API works
# https://platform.openai.com/docs/guides/text-generation/chat-completions-api
#
# Chat models take a list of messages as input and return a model-generated message as output. 
# It's designed to make multi-turn conversations easy but also works for single-turn tasks with no conversation.
completion = client.chat.completions.create(
  model="gpt-3.5-turbo",
  messages=[
    {"role": "system", "content": "You are a poetic assistant, skilled in explaining complex programming concepts with creative flair."},
    {"role": "user", "content": "Compose a limerick that explains the concept of recursion in programming"}
  ]
)
print(completion.choices[0].message.content)

There once was a function so sly,
In itself, it did love to lie.
With a call to its name,
It played the recursion game,
Continuing on 'til the stack said goodbye!


#### Chat Completions API (Streaming)

In [66]:
# Create an OpenAI client instance
from openai import OpenAI
client = OpenAI()

# Verify Chat Completion AI works with STREAMING
# https://platform.openai.com/docs/api-reference/streaming
#
# This allows the server to stream responses back to the client as they are generated, 
# allowing clients to show partial results for certain requests
stream = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Send the lyrics to the US National Anthem one sentence at a time"}],
    stream=True,
)
for chunk in stream:
    if chunk.choices[0].delta.content is not None:
        print(chunk.choices[0].delta.content, end="")

Oh, say can you see, by the dawn's early light,
What so proudly we hailed at the twilight's last gleaming,
Whose broad stripes and bright stars through the perilous fight,
O'er the ramparts we watched, were so gallantly streaming?
And the rocket's red glare, the bombs bursting in air,
Gave proof through the night that our flag was still there.
O say does that star-spangled banner yet wave
O’er the land of the free and the home of the brave?

#### Assistants API

In [67]:
# Create an OpenAI client instance
from openai import OpenAI
client = OpenAI()
  
# Verify Assistants API 
# https://platform.openai.com/docs/assistants/overview?context=without-streaming
#
# Build your own Assistant (chat function with deployed endpoint) to respond to user queries
# Leverage models, tools and knowledge for creating effective responses.
# 3 Tool Types Available: Code Interpreter, Retrieval and Function calling.

# Step 1: Create an Assistant
assistant = client.beta.assistants.create(
  name="Math Tutor",
  instructions="You are a personal math tutor. Write and run code to answer math questions.",
  tools=[{"type": "code_interpreter"}],
  model="gpt-4-turbo-preview",
)

# Step 2: Create a Thread
thread = client.beta.threads.create()

# Step 3: Add a Message to the Thread
message = client.beta.threads.messages.create(
    thread_id=thread.id,
    role="user",
    content="I need to solve the equation `3x + 11 = 14`. Can you help me?"
)

# Step 4: Create a Run (Non-Streaming)
run = client.beta.threads.runs.create_and_poll(
  thread_id=thread.id,
  assistant_id=assistant.id,
  instructions="Please address the user as Jane Doe. The user has a premium account."
)

# Step 5 (Optional): List Run steps in completed Run
if run.status == 'completed': 
  steps = client.beta.threads.runs.steps.list(
    thread_id=thread.id,
    run_id=run.id
  )
  print("\n5. Steps:\n")
  for step in steps.data:
      print("Step Type:", step.step_details.type)
      print("Total Tokens Used:", step.usage.total_tokens)
      print()
else:
  print("\n5. Status:\n", run.status)

# Step 6 (Optional): List Messages in completed Run
if run.status == 'completed': 
  messages = client.beta.threads.messages.list(
    thread_id=thread.id
  )
  print("\n6. Messages:\n")
  for message in messages.data:
      print("Role:", message.role)
      print("Value:", message.content[0].text.value)
      print()
else:
  print("\n6. Status:\n",run.status)


5. Steps:

Step Type: message_creation
Total Tokens Used: 304

Step Type: tool_calls
Total Tokens Used: 252

Step Type: message_creation
Total Tokens Used: 189


6. Messages:

Role: assistant
Value: The solution to the equation \(3x + 11 = 14\) is \(x = 1\). If you have any more questions or need further assistance, feel free to ask!

Role: assistant
Value: Sure, Jane! To solve the equation \(3x + 11 = 14\), we need to isolate the variable \(x\) on one side of the equation. I will show you step by step how to do that. Let's begin.

Role: user
Value: I need to solve the equation `3x + 11 = 14`. Can you help me?



#### Assistants API (Streaming)

In [68]:
# Create an OpenAI client instance
from openai import OpenAI
client = OpenAI()
  
# Verify Assistants API 
# https://platform.openai.com/docs/assistants/overview?context=without-streaming
#
# Build your own Assistant (chat function with deployed endpoint) to respond to user queries
# Leverage models, tools and knowledge for creating effective responses.
# 3 Tool Types Available: Code Interpreter, Retrieval and Function calling.

# Step 1: Create an Assistant
assistant = client.beta.assistants.create(
  name="Math Tutor",
  instructions="You are a personal math tutor. Write and run code to answer math questions.",
  tools=[{"type": "code_interpreter"}],
  model="gpt-4-turbo-preview",
)

# Step 2: Create a Thread
thread = client.beta.threads.create()

# Step 3: Add a Message to the Thread
message = client.beta.threads.messages.create(
    thread_id=thread.id,
    role="user",
    content="I need to solve the equation `3x + 11 = 14`. Can you help me?"
)
# Step 4: Create a Run (Streaming)
from typing_extensions import override
from openai import AssistantEventHandler
 
# First, we create a EventHandler class to define
# how we want to handle the events in the response stream.
 
class EventHandler(AssistantEventHandler):    
  @override
  def on_text_created(self, text) -> None:
    print(f"\nassistant > ", end="", flush=True)
      
  @override
  def on_text_delta(self, delta, snapshot):
    print(delta.value, end="", flush=True)
      
  def on_tool_call_created(self, tool_call):
    print(f"\nassistant > {tool_call.type}\n", flush=True)
  
  def on_tool_call_delta(self, delta, snapshot):
    if delta.type == 'code_interpreter':
      if delta.code_interpreter.input:
        print(delta.code_interpreter.input, end="", flush=True)
      if delta.code_interpreter.outputs:
        print(f"\n\noutput >", flush=True)
        for output in delta.code_interpreter.outputs:
          if output.type == "logs":
            print(f"\n{output.logs}", flush=True)
 
# Then, we use the `create_and_stream` SDK helper 
# with the `EventHandler` class to create the Run 
# and stream the response.
 
with client.beta.threads.runs.stream(
  thread_id=thread.id,
  assistant_id=assistant.id,
  instructions="Please address the user as Jane Doe. The user has a premium account.",
  event_handler=EventHandler(),
) as stream:
  stream.until_done()


assistant > Sure! To solve the equation \(3x + 11 = 14\) for \(x\), we first need to isolate \(x\) on one side of the equation. This can be done by following these steps:

1. Subtract \(11\) from both sides of the equation to eliminate the \(+11\) on the left side.
2. Divide both sides of the equation by \(3\) to solve for \(x\).

Let's calculate the value of \(x\).
assistant > code_interpreter

from sympy import symbols, Eq, solve

# Define the variable
x = symbols('x')

# Define the equation
equation = Eq(3*x + 11, 14)

# Solve the equation for x
solution = solve(equation, x)
solution

output >

[1]

assistant > The solution to the equation \(3x + 11 = 14\) is \(x = 1\).

---

## 2. With Azure Open AI

We should use the [latest documentation]() and [Quickstarts](https://learn.microsoft.com/en-us/azure/ai-services/openai/quickstart?tabs=command-line%2Cpython-new&pivots=programming-language-python) to validate our learnings. Here are the common quickstart steps:

1. **Setup Python Environment** - I already have this setup with the Dev Container configuration, using Python 3.
2. **Install OpenAI Python Package** - Added to default `requirements.txt`. We can upgrade manually with `pip install --upgrade openai`.
3. **Configure API Key** - Create an Azure OpenAI resource, [visit Azure Portal](https://learn.microsoft.com/en-us/azure/ai-services/openai/quickstart?tabs=command-line%2Cpython-new&pivots=programming-language-python#retrieve-key-and-endpoint) and retrieve Endpoint, API-Key and Deployment Name
4. **Update Environment Variables** - Set the "AZURE_" prefixed variables  in `.env` file (see `.env.sample` for reference)

### Chat Completion API (Single Question)

In [69]:
# Create an Azure OpenAI client instance
import os
from openai import AzureOpenAI

client = AzureOpenAI(
    api_key=os.getenv("AZURE_OPENAI_API_KEY"),  
    api_version=os.getenv("AZURE_OPENAI_API_VERSION"),  
    azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT")
)
deployment_name=os.getenv("AZURE_OPENAI_DEPLOYMENT")

# Verify Chat Completions API works
# https://learn.microsoft.com/en-us/azure/ai-services/openai/quickstart?tabs=command-line%2Cpython-new&pivots=programming-language-python#create-a-new-python-application

# Try a single-message completion task
start_phrase = 'Write a tagline for the March Madness games. '
completion = client.completions.create(
    model=deployment_name, 
    prompt=start_phrase, 
    max_tokens=25
)
print('Prompt:\n' + start_phrase)
print("Completion:\n"+ completion.choices[0].text)

Prompt:
Write a tagline for the March Madness games. 
Completion:
 Choose one or make your own. 23 is number 1! March Madness, who's basketballs will drop first?


### Chat Completion API (Multi-Turn)

In [70]:
import os
from openai import AzureOpenAI

client = AzureOpenAI(
  azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT"), 
  api_key=os.getenv("AZURE_OPENAI_API_KEY"),  
  api_version="2024-02-01"
)
deployment_name=os.getenv("AZURE_OPENAI_DEPLOYMENT")

response = client.chat.completions.create(
    model= deployment_name, #gpt-35-turbo", # model = "deployment_name".
    messages=[
      {"role": "system", "content": "You are a poetic assistant, skilled in explaining complex programming concepts with creative flair."},
      {"role": "user", "content": "Compose a limerick that explains the concept of recursion in programming"}
    ]
)
print(response.choices[0].message.content)

There once was a coder named Larry,
Whose problem was getting quite hairy,
But with recursion in sight,
He found the way to write
A function that called itself, oh so merry!

And so the program worked like a charm,
Through iterations and loops, no alarm,
For recursion, you see,
Is a powerful key
To solving problems with code's magic arm.


---


---

## 3. Hugging Face

We should use the [latest documentation](https://huggingface.co/docs) and focus initially on the [Text Generation Inference](https://huggingface.co/docs/text-generation-inference/index) capability. We can use this in two ways:
 1. Use the [Inference Client](https://huggingface.co/docs/text-generation-inference/basic_tutorials/consuming_tgi#inference-client) to talk to a deployed model directly.
 1. Use the [Messages API](https://huggingface.co/docs/text-generation-inference/messages_api) to interact with deployed model using API compatible with OpenAI Chat Completions API

Note: [Text Generation Inference](https://github.com/huggingface/text-generation-inference) is a [Hugging Face toolkit](https://huggingface.co/docs/text-generation-inference/index) for deploying and serving Large Language Models (LLM) - working well with popular open-source LLMs including LLama, Falcom, StarCoder, etc. To use the client above, we need a running instance of the Inference API. There are a few options:
1. Run a local TGI server. Use the [Docker](https://huggingface.co/docs/text-generation-inference/quicktour) quickstart, or [install locally from source](https://huggingface.co/docs/text-generation-inference/installation).
1. Explore it using the [Hugging Chat](https://huggingface.co/chat) UI which works against their production TGI deployment endpoint.
1. Explore hosting your own service in the cloud - for example [using Azure Container Instances](https://medium.com/thedeephub/deploy-hugging-face-text-generation-inference-on-azure-container-instance-3709eb3d3187).
1. Use the free (but rate-limited) [Serverless Inference API](https://huggingface.co/docs/api-inference/index) endpoint provided by Hugging Face for testing and evaluation purposes - works with models in hub. 
1. Use their production [Inference Endpoints Service](https://huggingface.co/docs/inference-endpoints/index) paid service - and bring your own model deployment (e.g., using Azure Container Registry).

Running locally works only if you have [supported hardware](https://huggingface.co/docs/text-generation-inference/supported_models#supported-hardware). Given that, our best option is to explore ideas using the [Free Serverless Inference API](https://huggingface.co/docs/api-inference/index) for now. Here are the [quickstart](https://huggingface.co/docs/api-inference/quicktour) steps:

1. **[Get an API Token](https://huggingface.co/docs/api-inference/quicktour#get-your-api-token)**: Sign up for a free account to get one.
1. **[Select your Model](https://huggingface.co/docs/api-inference/quicktour#running-inference-with-api-requests)**: Pick one from the Hub and custmize your URL.
1. **[Run Inference on Endpoint](https://huggingface.co/docs/api-inference/quicktour#running-inference-with-api-requests)**: Invoke REST API and use [detailed parameters](https://huggingface.co/docs/api-inference/detailed_parameters) as needed for task.

 



### Basic TGI Inference - Text query

In [71]:
# The Hosted API Inference endpoint allows you to interact with deployed models from the Hub.
# The basic inference endpoint allows you to interact with models using a simple text input.
# The advanced inference endpoint allows you to interact with models using detailed parameters that map to a Task.

# Let's try the basic inference first.
# Visit the Model Hub: https://huggingface.co/models
# Select a model and visit its Model Card: https://huggingface.co/openai-community/gpt2
# Verify that it supports the **Inference API** (see card right)
# Invoke the model with a simple string input (may not work for all)
import os
apikey=os.getenv("HUGGING_FACE_API_KEY")
#model=os.getenv("HUGGING_FACE_MODEL")
model="openai-community/gpt2"

import requests
model_ep = f"https://api-inference.huggingface.co/models/{model}"
headers = {"Authorization": f"Bearer {apikey}"}
def query(payload):
    response = requests.post(model_ep, headers=headers, json=payload)
    return response.json()

# Formulate query for model specific needs
data = query("My name is Betty Crocker and I am  ")
print("Response:\n "+ data[0].get("generated_text"))


Response:
 My name is Betty Crocker and I am   a doctor. Not only did the doctor give me an epidural but also an epidural to make me a better husband and father to my daughter after an affair went wrong. He even had


### Task-Specific TGI Inference - Use Parameters

If you now replace the model with a different model like "mistralai/Mistral-7B-Instruct-v0.2" and run the same code, you will likely see errors - even though the model card tells us it supports the Inference API. This is because the model may require _detailed parameters_ for use with the default task we are trying to achieve. For example, we can see that this model card indicates it is useful for Text Generation tasks - and from [the Text Generation Task](https://huggingface.co/docs/api-inference/detailed_parameters#text-generation-task) definition, we see it requires a mandataory **inputs** parameter. Let's try creating these "task-specific" requests next. You can find detailed paramters for the following tasks at their respective links.

1. **Natural Language Processing** 👇🏽 <br/> [Fill Mask task](https://huggingface.co/docs/api-inference/detailed_parameters#fill-mask-task) -- [Summarization Task](https://huggingface.co/docs/api-inference/detailed_parameters#summarization-task) -- [Question Answering Task](https://huggingface.co/docs/api-inference/detailed_parameters#question-answering-task) --- [Table Question Answering Task](https://huggingface.co/docs/api-inference/detailed_parameters#table-question-answering-task) --- [Sentence Similarity Task](https://huggingface.co/docs/api-inference/detailed_parameters#sentence-similarity-task) --- [Text Classification Task](https://huggingface.co/docs/api-inference/detailed_parameters#text-classification-task) --- [Text Generation Task](https://huggingface.co/docs/api-inference/detailed_parameters#text-generation-task) --- [Text2Text Generation Task](https://huggingface.co/docs/api-inference/detailed_parameters#text2text-generation-task) --- [Token Classification Task](https://huggingface.co/docs/api-inference/detailed_parameters#token-classification-task) --- [Named Entity Recognition Task (NER)](https://huggingface.co/docs/api-inference/detailed_parameters#named-entity-recognition-ner-task) --- [Translation Task](https://huggingface.co/docs/api-inference/detailed_parameters#translation-task) --- [Zero-Shot Classification Task](https://huggingface.co/docs/api-inference/detailed_parameters#zero-shot-classification-task) --- [Conversational Task](https://huggingface.co/docs/api-inference/detailed_parameters#conversational-task) --- [Feature Extraction Task](https://huggingface.co/docs/api-inference/detailed_parameters#feature-extraction-task) 
2. **Audio** 👇🏽 <br/> [Audio Classification Task](https://huggingface.co/docs/api-inference/detailed_parameters#audio-classification-task) --- [Automatic Speech Recognition Task](https://huggingface.co/docs/api-inference/detailed_parameters#automatic-speech-recognition-task) 
3. **Computer Vision** 👇🏽 <br/> [Image Classification Task](https://huggingface.co/docs/api-inference/detailed_parameters#image-classification-task) --- [Object Detection Task](https://huggingface.co/docs/api-inference/detailed_parameters#object-detection-task) --- [Image Segmentation Task](https://huggingface.co/docs/api-inference/detailed_parameters#image-segmentation-task)

In the section below, we can look at a few examples - and we can refer to the docs to replicate the usage for others. By default each documented task also provides a _recommended model_ for you to use. However, I also find it useful to visit the [Model Hub](https://huggingface.co/models) and select the task category (at left) and pick a new or trending model (at right) to try out with the task parameters.

🚨 **WARNING** | Hugging Face is Model Hub with many trending and less-popular models. 
 - When you look up a model card (e.g., https://huggingface.co/deepset/roberta-base-squad2 ) you may see a note indicating _This model can belkoaded on Inference API (serverless)_ 
 - **but** also see an alert below that says **Model not loaded yet**. This means that the model may get loaded on demand 
 - so the first serverless invocation may fail as the request triggers the initial load. Subsquent requests should pass. 
 - _The returned response may say something like this_: ``` {'error': 'Model deepset/roberta-base-squad2 is currently loading', 'estimated_time': 20.0}```
 - Use this information at the client to make sure you wait-and-retry


In [57]:
# Define your API Key and Model
# Verify that the MODEL you selected shows the Summarization TASK in its model card.

#### NLP: Text Generation LLM (Mistral)

In [55]:
import os
apikey=os.getenv("HUGGING_FACE_API_KEY")
model="mistralai/Mistral-7B-Instruct-v0.2"

import requests
model_ep = f"https://api-inference.huggingface.co/models/{model}"
headers = {"Authorization": f"Bearer {apikey}"}
def query(payload):
    response = requests.post(model_ep, headers=headers, json=payload)
    return response.json()

# Text Generation Payload
data = query({
    "inputs": "The answer to the universe is"
})
print("Response:\n ", data)


Response:
  [{'generated_text': 'The answer to the universe is 42: a popular and iconic quote that quenches the enormous curiosity of millions, possibly billions, of viewers of science-fiction teaser, The Hitchhiker’s Guide to the Galaxy, written by Douglas Adams. A question from a very young friend made me ponder: what inspired scholars to come up with a number to answer the universe’s riddle? My best guess is that this nonsensical quote fills an emptiness for many.\n'}]


#### NLP: Text Generation SLM (phi-2)

In [82]:
# Model Loading Estimate
#   {'error': 'Model microsoft/phi-2 is currently loading', 'estimated_time': 222.3747100830078}
import os
apikey=os.getenv("HUGGING_FACE_API_KEY")
model="microsoft/phi-2"

import requests
model_ep = f"https://api-inference.huggingface.co/models/{model}"
headers = {"Authorization": f"Bearer {apikey}"}
def query(payload):
    response = requests.post(model_ep, headers=headers, json=payload)
    return response.json()

# Text Generation Payload
data = query("Write a detailed analogy between mathematics and a lighthouse.")
print("Response:\n ", data)

Response:
  {'error': 'Model microsoft/phi-2 is currently loading', 'estimated_time': 222.3747100830078}


#### NLP: Summarization (bart-large-CNN)

In [56]:

import os
apikey=os.getenv("HUGGING_FACE_API_KEY")
model="facebook/bart-large-cnn"

import requests
model_ep = f"https://api-inference.huggingface.co/models/{model}"
headers = {"Authorization": f"Bearer {apikey}"}
def query(payload):
    response = requests.post(model_ep, headers=headers, json=payload)
    return response.json()

# Text Summarization Payload
data = query(
    {
        "inputs": "The tower is 324 metres (1,063 ft) tall, about the same height as an 81-storey building, and the tallest structure in Paris. Its base is square, measuring 125 metres (410 ft) on each side. During its construction, the Eiffel Tower surpassed the Washington Monument to become the tallest man-made structure in the world, a title it held for 41 years until the Chrysler Building in New York City was finished in 1930. It was the first structure to reach a height of 300 metres. Due to the addition of a broadcasting aerial at the top of the tower in 1957, it is now taller than the Chrysler Building by 5.2 metres (17 ft). Excluding transmitters, the Eiffel Tower is the second tallest free-standing structure in France after the Millau Viaduct.",
        "parameters": {"do_sample": False},
    }
)
print("Response:\n ", data)


Response:
  [{'summary_text': 'The tower is 324 metres (1,063 ft) tall, about the same height as an 81-storey building. Its base is square, measuring 125 metres (410 ft) on each side. During its construction, the Eiffel Tower surpassed the Washington Monument to become the tallest man-made structure in the world.'}]


#### NLP: Question Answering (roberta-base-squad2)

In [60]:
import os
apikey=os.getenv("HUGGING_FACE_API_KEY")
model="deepset/roberta-base-squad2"

import requests
model_ep = f"https://api-inference.huggingface.co/models/{model}"
headers = {"Authorization": f"Bearer {apikey}"}
def query(payload):
    response = requests.post(model_ep, headers=headers, json=payload)
    return response.json()

# Question Answering Payload
data = query(
    {
        "inputs": {
            "question": "Where do I live?",
            "context": "MMy name is Wolfgang and I live in Berlin",
        }
    }
)

# This model is often not loaded by default till the first request - so you see this:
#  {'error': 'Model deepset/roberta-base-squad2 is currently loading', 'estimated_time': 20.0}
# Back off and retry in a while. Result will look something like this:
#  {'score': 0.9326565265655518, 'start': 11, 'end': 16, 'answer': 'Clara'}
print("Response:\n ", data)


Response:
  {'score': 0.9151504039764404, 'start': 35, 'end': 41, 'answer': 'Berlin'}


#### NLP: Sentence Similarity (bge-m3)

In [62]:
import os
apikey=os.getenv("HUGGING_FACE_API_KEY")
model="BAAI/bge-m3"

import requests
model_ep = f"https://api-inference.huggingface.co/models/{model}"
headers = {"Authorization": f"Bearer {apikey}"}
def query(payload):
    response = requests.post(model_ep, headers=headers, json=payload)
    return response.json()

# Sentence Similarity Payload
data = query(
    {
        "inputs": {
            "source_sentence": "That is a happy person",
            "sentences": ["That is a happy dog", "That is a very happy person", "Today is a sunny day"],
        }
    }
)
print("Response:\n ", data)

Response:
  [0.8589119911193848, 0.9666367769241333, 0.7509792447090149]


#### NLP: Zero-Shot Classification (bart-large-mnli)

In [74]:
import os
apikey=os.getenv("HUGGING_FACE_API_KEY")
model="facebook/bart-large-mnli"

import requests
model_ep = f"https://api-inference.huggingface.co/models/{model}"
headers = {"Authorization": f"Bearer {apikey}"}
def query(payload):
    response = requests.post(model_ep, headers=headers, json=payload)
    return response.json()

# Zero-Shot Classification Payload
data = query(
    {
        "inputs": "Hi, I recently bought a device from your company but it is not working as advertised and I would like to get reimbursed!",
        "parameters": {"candidate_labels": ["refund", "legal", "faq"]},
    }
)
print("Response:\n ", data)

Response:
  {'sequence': 'Hi, I recently bought a device from your company but it is not working as advertised and I would like to get reimbursed!', 'labels': ['refund', 'faq', 'legal'], 'scores': [0.8777878284454346, 0.10522636026144028, 0.01698581501841545]}


---

## 4. Lamini

Lamini is an LLM platform [optimized for enterprise fine tuning](https://lamini-ai.github.io/about/).
 - Explore the [Lamini SDK](https://github.com/lamini-ai/lamini-sdk/)
 - Explore tools for [better inference](https://lamini-ai.github.io/inference/quick_tour/) 
 - Exolore tools for [better training](https://lamini-ai.github.io/training/quick_tour/)

To get started:
 - Create an account and get an API key
 - Validate the key works with sample questions as shown

Note: The free account only gives you 200 calls _total_ (no refresh) so use it wisely.



In [7]:
# Install the lamini package
# !pip install --upgrade lamini

## Configure the API key
import lamini
import os
lamini.api_key = os.getenv("LAMINI_API_KEY")

In [23]:
## Validate setup with a named model from Hugging Face 
## By default the free-tier user has support for these base models (identified in error message)
'''
'hf-internal-testing/tiny-random-gpt2', 
'EleutherAI/pythia-70m', 'EleutherAI/pythia-70m-deduped', 'EleutherAI/pythia-70m-v0', 
'EleutherAI/pythia-70m-deduped-v0', 'EleutherAI/neox-ckpt-pythia-70m-deduped-v0', 'EleutherAI/neox-ckpt-pythia-70m-v1', 
'EleutherAI/neox-ckpt-pythia-70m-deduped-v1', 'EleutherAI/gpt-neo-125m', 'EleutherAI/pythia-160m', 
'EleutherAI/pythia-160m-deduped', 'EleutherAI/pythia-160m-deduped-v0', 'EleutherAI/neox-ckpt-pythia-70m', 
'EleutherAI/neox-ckpt-pythia-160m', 'EleutherAI/neox-ckpt-pythia-160m-deduped-v1', 'EleutherAI/pythia-2.8b', 
'EleutherAI/pythia-410m', 'EleutherAI/pythia-410m-v0', 'EleutherAI/pythia-410m-deduped', 
'EleutherAI/pythia-410m-deduped-v0', 'EleutherAI/neox-ckpt-pythia-410m', 'EleutherAI/neox-ckpt-pythia-410m-deduped-v1', 
'cerebras/Cerebras-GPT-111M', 'cerebras/Cerebras-GPT-256M', 'meta-llama/Llama-2-7b-hf', 
'meta-llama/Llama-2-7b-chat-hf', 'meta-llama/Llama-2-13b-chat-hf', 'meta-llama/Llama-2-70b-chat-hf', 
'Intel/neural-chat-7b-v3-1', 'mistralai/Mistral-7B-Instruct-v0.1', 'microsoft/phi-2'
'''

## Option 1: Use a named model to get an endpoint for requests
## Models may not be pre-loaded in HF inference service - you will then see this error, so retry:
##     Downloading the 'cerebras/Cerebras-GPT-111M' model. 
##     Please try again in a few minutes.
llm = lamini.Lamini("cerebras/Cerebras-GPT-111M")
print(llm.generate("How to convert inches to centimeters? Answer in 2 sentences"))

2024-04-15:03:33:11,976 INFO     [lamini.py:33] Using 3.10 InferenceQueue Interface


status code: 513 https://api.lamini.ai/v1/completions


APIError: API error {'detail': "error_id: 243549526076307102879929981439376352577: Downloading the 'Intel/neural-chat-7b-v3-1' model. Please try again in a few minutes."}

In [17]:
## Option 2: Use pre-defined Mistral runner
llm = lamini.MistralRunner()
llm("How to convert inches to centimeters? Answer in 2 sentences")

2024-04-15:03:29:20,700 INFO     [lamini.py:33] Using 3.10 InferenceQueue Interface


' To convert inches to centimeters, you can multiply the number of inches by 2.54. For example, 1 inch is equal to 2.54 centimeters.'

In [18]:
## Option 3: Use pre-defined LLama-2 runner
llama = lamini.LlamaV2Runner()
llama("How to convert inches to centimeters? Answer in 2 sentences")

2024-04-15:03:29:35,899 INFO     [lamini.py:33] Using 3.10 InferenceQueue Interface


'  Of course! To convert inches to centimeters, you can use the following conversion factor: 1 inch = 2.54 centimeters. Therefore, if you want to convert a measurement in inches to centimeters, you can simply multiply it by 2.54.'