<a target="_blank" href="https://colab.research.google.com/github/amjadraza/datafy-llm-workshop/blob/main/notebooks/01_LiteLLM_Models.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

# Hands-on: LLM using LiteLLLM Package

https://docs.litellm.ai/

**LiteLLM a light package to simplify calling OpenAI, Azure, Cohere, Anthropic, Huggingface API Endpoints.** It manages:

- translating inputs to the provider's completion and embedding endpoints
- guarantees consistent output, text responses will always be available at `['choices'][0]['message']['content']`
- exception mapping - common exceptions across providers are mapped to the OpenAI exception types

Standard Output Object: Using OpenAI API Specifications. For more details read on https://platform.openai.com/docs/api-reference/chat/object

```
<OpenAIObject chat.completion id=chatcmpl-81WDml2zvKjlA6Vm8QMsNK8hfEVPU at 0x7a09a3cb9080> JSON: {
  "id": "chatcmpl-81WDml2zvKjlA6Vm8QMsNK8hfEVPU",
  "object": "chat.completion",
  "created": 1695372878,
  "model": "gpt-3.5-turbo-0613",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! I am an AI language model, so I don't have feelings, but I'm here to help you. How can I assist you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 13,
    "completion_tokens": 31,
    "total_tokens": 44
  },
  "response_ms": 2123.604
}

```

# 🔑 LiteLLM Keys (Access Claude-2, Llama2-70b, etc.)

Using LiteLLM Keys, user can try some models free. List of models to be tried can be found on below link.

https://docs.litellm.ai/docs/proxy_api#supported-models-for-litellm-key

> LiteLLM provides a free $10 community-key for testing all providers on LiteLLM. You can replace this with your own key. Send email to krrish@berri.ai for dedicated key.

In [1]:
!pip install litellm -qU

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m10.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.0/77.0 kB[0m [31m10.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m22.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.8/3.8 MB[0m [31m35.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m268.8/268.8 kB[0m [31m22.3 MB/s[0m eta [36m0:00:00[0m
[?25h

#OpenAI

LiteLLM Supports OpenAI API Models. To read details about the support Models by OpenAI API follow https://docs.litellm.ai/docs/providers/openai.


> liteLLM provides a free $10 community-key for testing all providers on LiteLLM. You can replace this with your own key. Send email to krrish@berri.ai for dedicated key.

In [None]:
import os
from litellm import completion

os.environ["OPENAI_API_KEY"] = "sk-litellm-7_NPZhMGxY2GoHC59LgbDw" # [OPTIONAL] replace with your openai key


messages = [{ "content": "Hello, how are you?","role": "user"}]

# openai call
# response = completion("gpt-3.5-turbo", messages)

response = completion("gpt-4", messages)

In [None]:
response

<OpenAIObject chat.completion id=chatcmpl-81XRg12rOdAsk8PXxhxECedXxv5FF at 0x7da0eb3ea250> JSON: {
  "id": "chatcmpl-81XRg12rOdAsk8PXxhxECedXxv5FF",
  "object": "chat.completion",
  "created": 1695377584,
  "model": "gpt-4-0613",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "As an AI, I don't have feelings, but I'm here and ready to assist you. How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 13,
    "completion_tokens": 27,
    "total_tokens": 40
  },
  "response_ms": 3832.11
}

In [None]:
response['choices'][0]['message']['content']

"Hello! I am an AI language model, so I don't have feelings, but I'm here to help you. How can I assist you today?"

# VertexAI / Google Palm

LiteLLM also supports Google's PalM models through VertexAI

**pre-requisite**

Your Project ID litellm.vertex_project = "hardy-device-38811" Your Project ID

Your Project Location litellm.vertex_location = "us-central1"

**Learning on VertexAI**

https://medium.com/generative-ai/google-palm-api-generative-models-for-code-generation-275589c5bd71

In [None]:
!pip install google-cloud-aiplatform

In [None]:
# Authenticating user
from google.colab import auth as google_auth
google_auth.authenticate_user()

In [None]:
import litellm
import os
from litellm import completion
litellm.vertex_project = "generative-ai-training" # Your Project ID
litellm.vertex_location = "us-central1"  # proj location

messages=[{"role": "user", "content": "write code for saying hi from LiteLLM"}]

response = completion(model="chat-bison", messages = messages)

In [None]:
response

<ModelResponse chat.completion id=chatcmpl-812932b3-2c83-4a6b-912b-d65aa6243141 at 0x7a09989ec4a0> JSON: {
  "object": "chat.completion",
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "message": {
        "content": " ```litellm\ndef say_hi():\n  print(\"Hi!\")\n\nsay_hi()\n```",
        "role": "assistant",
        "logprobs": null
      }
    }
  ],
  "id": "chatcmpl-812932b3-2c83-4a6b-912b-d65aa6243141",
  "created": 1695372887.6308095,
  "response_ms": 937.9159999999999,
  "model": "chat-bison",
  "usage": {
    "prompt_tokens": null,
    "completion_tokens": null,
    "total_tokens": null
  }
}

In [None]:
response['choices'][0]['message']['content']

' ```litellm\ndef say_hi():\n  print("Hi!")\n\nsay_hi()\n```'

# Anthropic
LiteLLM supports Claude-1, 1.2 and Claude-2.

In [None]:
import os
from litellm import completion

# set env - [OPTIONAL] replace with your anthropic key
os.environ["ANTHROPIC_API_KEY"] = "sk-litellm-7_NPZhMGxY2GoHC59LgbDw"

messages = [{"role": "user", "content": "Hey! how's it going?"}]
response = completion(model="claude-2", messages=messages)
print(response)

{
  "object": "chat.completion",
  "choices": [
    {
      "finish_reason": "stop_sequence",
      "index": 0,
      "message": {
        "content": " I'm doing well, thanks for asking! How about you?",
        "role": "assistant",
        "logprobs": null
      }
    }
  ],
  "id": "chatcmpl-f6885cab-33b7-4897-b4c4-583137e93d0e",
  "created": 1695377630.360206,
  "model": "claude-2",
  "usage": {
    "prompt_tokens": 14,
    "completion_tokens": 13,
    "total_tokens": 27
  },
  "response_ms": 1799.288
}


In [None]:
response['choices'][0]['message']['content']

" I'm doing well, thanks for asking!"

In [None]:
# Streaming
messages = [{"role": "user", "content": "Hey! how's it going?"}]
st_response = completion(model="claude-instant-1", messages=messages, stream=True)

In [None]:
type(st_response)

litellm.utils.CustomStreamWrapper

In [None]:
st_response

<litellm.utils.CustomStreamWrapper at 0x7a099897ce50>

In [None]:
for chunk in st_response:
    print(chunk["choices"][0]["delta"]["content"])  # same as openai format

 I
'm
 doing
 well
,
 thanks
 for
 asking
!


# AI21
LiteLLM supports j2-light, j2-mid and j2-ultra from AI21.

They're available to use without a waitlist.

In [None]:
from litellm import completion

# set env variable - [OPTIONAL] replace with your ai21 key
os.environ["AI21_API_KEY"] = "sk-litellm-7_NPZhMGxY2GoHC59LgbDw"

messages = [{"role": "user", "content": "Write me a poem about the blue sky"}]

response = completion(model="j2-light", messages=messages)

#Together AI
LiteLLM supports all models on Together AI. Read Details on https://docs.litellm.ai/docs/providers/togetherai for all supported Models.

You can modify the Model Name to experiment with different models hosted on TogetherAI Platform

In [None]:
from litellm import completion

# set env variable - [OPTIONAL] replace with your together ai key
os.environ["TOGETHERAI_API_KEY"] = "sk-litellm-7_NPZhMGxY2GoHC59LgbDw"

messages = [{"role": "user", "content": "Write me a poem about the blue sky"}]

response = completion(model="together_ai/togethercomputer/Llama-2-7B-32K-Instruct", messages=messages)

print(response)

{
  "object": "chat.completion",
  "choices": [
    {
      "finish_reason": "length",
      "index": 0,
      "message": {
        "content": "\n\n\nThe blue sky, a canvas of infinite possibility\nAbove us",
        "role": "assistant",
        "logprobs": null
      }
    }
  ],
  "id": "chatcmpl-4f9619fb-0c78-45dc-9531-f584f7dec794",
  "created": 1695374542.4376473,
  "model": "togethercomputer/Llama-2-7B-32K-Instruct",
  "usage": {
    "prompt_tokens": 14,
    "completion_tokens": 13,
    "total_tokens": 27
  },
  "response_ms": 1040.355
}


In [None]:
response['choices'][0]['message']['content']

'\n\n\nThe blue sky, a canvas of infinite possibility\nAbove us'

# **Huggingface**
LiteLLM supports Huggingface models that use the text-generation-inference format or the Conversational task format.

- Text-generation-interface: Here's all the models that use this format.
- Conversational task: Here's all the models that use this format.
- Non TGI/Conversational-task LLMs

By default, we assume the you're trying to call models with the 'text-generation-interface' format (e.g. Llama2, Falcon, WizardCoder, MPT, etc.)

This can be changed by setting task="conversational" in the completion call. Example

In [None]:
import os
from litellm import completion
import litellm

## Llama2 public Huggingface endpoint

**Usage**

You need to tell LiteLLM when you're calling Huggingface. Do that by setting it as part of the model name - completion(model="huggingface/<model_name>",...).

*We used the Falcon-7b and Llama2-7b LLM.*

In [None]:
# [OPTIONAL] set env var
os.environ["HUGGINGFACE_API_KEY"] = ""

model_falcon="huggingface/tiiuae/falcon-7b"
model_llama = "huggingface/NousResearch/llama-2-7b-chat-hf"

**Text-generation-interface (TGI) - Falcon-7b LLMs**

In [None]:
messages = [{ "content": "There's a llama in my garden 😱 What should I do?","role": "user"}]

# e.g. Call 'WizardLM/WizardCoder-Python-34B-V1.0' hosted on HF Inference endpoints
response = completion(model=model_falcon, messages=messages)

print(response)

{
  "object": "chat.completion",
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "message": {
        "content": "\nLlamas are a very popular pet in the UK, but they are not native to the",
        "role": "assistant",
        "logprobs": null
      }
    }
  ],
  "id": "chatcmpl-3f19e7a7-a230-49bc-8f1c-c24927809200",
  "created": 1695297005.7203326,
  "response_ms": 831.654,
  "model": "tiiuae/falcon-7b",
  "usage": {
    "prompt_tokens": 14,
    "completion_tokens": 20,
    "total_tokens": 34
  }
}


In [None]:
print("Content : ", response['choices'][0]['message']['content'])
print(f"Prompt Tokens {response['usage']['prompt_tokens']} + Completion Tokens {response['usage']['completion_tokens']} = Total Tokens {response['usage']['total_tokens']}")

Content :  
Llamas are a very popular pet in the UK, but they are not native to the
Prompt Tokens 14 + Completion Tokens 20 = Total Tokens 34


**Conversational-task (BlenderBot, etc.) - Blenderbot-400M LLMs**

Key Change: completion(..., task="conversational")



In [None]:
messages = [{ "content": "There's a llama in my garden 😱 What should I do?","role": "user"}]

# e.g. Call 'facebook/blenderbot-400M-distill' hosted on HF Inference endpoints
response = completion(model='huggingface/facebook/blenderbot-400M-distill', messages=messages, task="conversational")

print(response)

{
  "object": "chat.completion",
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "message": {
        "content": " I'm not sure what to do, but I do know that llamas are herbivorous mammals.",
        "role": "assistant",
        "logprobs": null
      }
    }
  ],
  "id": "chatcmpl-9f718bf5-2906-4028-920f-6ca10538269a",
  "created": 1695295229.5946302,
  "response_ms": 12365.456,
  "model": "facebook/blenderbot-400M-distill",
  "usage": {
    "prompt_tokens": 14,
    "completion_tokens": 21,
    "total_tokens": 35
  }
}


In [None]:
print("Content : ", response['choices'][0]['message']['content'])
print(f"Prompt Tokens {response['usage']['prompt_tokens']} + Completion Tokens {response['usage']['completion_tokens']} = Total Tokens {response['usage']['total_tokens']}")

Content :   I'm not sure what to do, but I do know that llamas are herbivorous mammals.
Prompt Tokens 14 + Completion Tokens 21 = Total Tokens 35


**Models with Prompt Formatting - Blenderbot-400M LLM**

For models with special prompt templates (e.g. Llama2), we format the prompt to fit their template.

- What if we don't support a model you need? You can also specify you're own custom prompt formatting, in case we don't have your model covered yet.

- Does this mean you have to specify a prompt for all models? No. By default we'll concatenate your message content to make a prompt.

Custom prompt templates

In [None]:
# Create your own custom prompt template works
litellm.register_prompt_template(
        model = 'huggingface/facebook/blenderbot-400M-distill',
        roles={
            "system": {
                "pre_message": "[INST] <<SYS>>\n",
                "post_message": "\n<</SYS>>\n [/INST]\n"
            },
            "user": {
                "pre_message": "[INST] ",
                "post_message": " [/INST]\n"
            },
            "assistant": {
                "post_message": "\n"
            }
        }
    )

def test_huggingface_custom_model():
    response = completion(model='huggingface/facebook/blenderbot-400M-distill', messages=messages, task="conversational")
    return response

test_huggingface_custom_model()

<ModelResponse chat.completion id=chatcmpl-becd7e26-dfe1-4b9b-93f0-1b02bc4f8fa8 at 0x7e9dcae20fe0> JSON: {
  "object": "chat.completion",
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "message": {
        "content": " I'm not sure what to do, but I do know that llamas are herbivorous mammals.",
        "role": "assistant",
        "logprobs": null
      }
    }
  ],
  "id": "chatcmpl-becd7e26-dfe1-4b9b-93f0-1b02bc4f8fa8",
  "created": 1695296587.5900128,
  "response_ms": 772.165,
  "model": "facebook/blenderbot-400M-distill",
  "usage": {
    "prompt_tokens": 14,
    "completion_tokens": 21,
    "total_tokens": 35
  }
}

In [None]:
print("Content : ", response['choices'][0]['message']['content'])
print(f"Prompt Tokens {response['usage']['prompt_tokens']} + Completion Tokens {response['usage']['completion_tokens']} = Total Tokens {response['usage']['total_tokens']}")

Content :   I'm not sure what to do, but I do know that llamas are herbivorous mammals.
Prompt Tokens 14 + Completion Tokens 21 = Total Tokens 35


# **Error Exploration**

These types of issues are faced when using the Falcon-7b LLM.

**Text-generation-interface (TGI) - Llama2-7b LLMs**

In [None]:
messages = [{ "content": "There's a llama in my garden 😱 What should I do?","role": "user"}]

# e.g. Call 'WizardLM/WizardCoder-Python-34B-V1.0' hosted on HF Inference endpoints
response = completion(model=model_llama, messages=messages)

print(response)

APIError: ignored

In [None]:
messages = [{ "content": "There's a llama in my garden 😱 What should I do?","role": "user"}]

response = completion(model=model_falcon, messages=messages, task="conversational")

print(response)

APIError: ignored

In [None]:
# Create your own custom prompt template works
litellm.register_prompt_template(
        model_falcon,
        roles={
            "system": {
                "pre_message": "[INST] <<SYS>>\n",
                "post_message": "\n<</SYS>>\n [/INST]\n"
            },
            "user": {
                "pre_message": "[INST] ",
                "post_message": " [/INST]\n"
            },
            "assistant": {
                "post_message": "\n"
            }
        }
    )

def test_huggingface_custom_model():
    response = completion(model=model_falcon, messages=messages, task="conversational")
    print(response['choices'][0]['message']['content'])
    return response

test_huggingface_custom_model()

APIError: ignored

# OpenRouter

LiteLLM also supports models hosted by OpenRouter.

https://docs.litellm.ai/docs/providers/openrouter

In [4]:
import os
from litellm import completion
os.environ["OPENROUTER_API_KEY"] = ""

messages = [{ "content": "Hello, how are you?","role": "user"}]

response = completion(
            model="openrouter/openai/gpt-4",
            messages=messages,
        )

In [5]:
response

<OpenAIObject id=gen-je3Qttz9haQRN30DUb1GD1F2cbsO at 0x7d1ee82c13a0> JSON: {
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "As an artificial intelligence, I don't have feelings, but I'm functioning as expected. Thanks for asking! How can I assist you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "model": "gpt-4-0613",
  "usage": {
    "prompt_tokens": 13,
    "completion_tokens": 29,
    "total_tokens": 42
  },
  "id": "gen-je3Qttz9haQRN30DUb1GD1F2cbsO"
}

In [6]:
# Google Palm-2-chat-bision
response = completion(
            model="openrouter/google/palm-2-chat-bison",
            messages=messages,
        )
response

<OpenAIObject id=gen-pZ3jQLYmSPCZF73hvxNnTW3bHJYP at 0x7d1f01a39710> JSON: {
  "id": "gen-pZ3jQLYmSPCZF73hvxNnTW3bHJYP",
  "model": "chat-bison@001",
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "I am doing well, thank you for asking."
      }
    }
  ]
}