The Portkey x LlamaIndex integration brings advanced AI gateway capabilities, full-stack observability, and prompt management to apps built on LlamaIndex.
In a nutshell, Portkey extends the familiar OpenAI schema to make Llamaindex work with 200+ LLMs without the need for importing different classes for each provider or having to configure your code separately. Portkey makes your Llamaindex apps reliable, fast, and cost-efficient.
pip install -U portkey-ai
Import the OpenAI
class in Llamaindex as you normally would, along with Portkey's helper functions createHeaders
and PORTKEY_GATEWAY_URL
from llama_index.llms.openai import OpenAI
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders
Configure your model details using Portkey's Config object schema. This is where you can define the provider and model name, model parameters, set up fallbacks, retries, and more.
config = {
"provider":"openai",
"api_key":"YOUR_OPENAI_API_KEY",
"override_params": {
"model":"gpt-4o",
"max_tokens":64
}
}
portkey = OpenAI(
api_base=PORTKEY_GATEWAY_URL,
default_headers=createHeaders(
api_key="YOUR_PORTKEY_API_KEY",
config=config
)
)
Here are basic integrations examples on using the complete
and chat
methods with streaming
on & off.
{% tabs %} {% tab title="Chat + Streaming" %}
from llama_index.llms.openai import OpenAI
from llama_index.core.llms import ChatMessage
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders
config = {
"provider":"openai",
"api_key":"YOUR_OPENAI_API_KEY",
"override_params": {
"model":"gpt-4o",
"max_tokens":64
}
}
#### You can also reference a saved Config ####
#### config = "pc-anthropic-xx"
portkey = OpenAI(
api_base=PORTKEY_GATEWAY_URL,
default_headers=createHeaders(
api_key="YOUR_PORTKEY_API_KEY",
config=config
)
)
messages = [
ChatMessage(role="system", content="You are a pirate with a colorful personality"),
ChatMessage(role="user", content="What is your name"),
]
resp = portkey.chat(messages)
print(resp)
##### Streaming Mode #####
resp = portkey.stream_chat(messages)
for r in resp:
print(r.delta, end="")
assistant: Arrr, matey! They call me Captain Barnacle Bill, the most colorful pirate to ever sail the seven seas! With a parrot on me shoulder and a treasure map in me hand, I'm always ready for adventure! What be yer name, landlubber? {% endtab %}
{% tab title="Complete + Streaming" %}
from llama_index.llms.openai import OpenAI
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders
config = {
"provider":"openai",
"api_key":"YOUR_OPENAI_API_KEY",
"override_params": {
"model":"gpt-4o",
"max_tokens":64
}
}
#### You can also reference a saved Config ####
#### config = "pc-anthropic-xx"
portkey = OpenAI(
api_base=PORTKEY_GATEWAY_URL,
default_headers=createHeaders(
api_key="YOUR_PORTKEY_API_KEY",
config=config
)
)
resp=portkey.complete("Paul Graham is ")
print(resp)
##### Streaming Mode #####
resp=portkey.stream_complete("Paul Graham is ")
for r in resp:
print(r.delta, end="")
a computer scientist, entrepreneur, and venture capitalist. He is best known for co-founding the startup accelerator Y Combinator and for his work on programming languages and web development. Graham is also a prolific writer and has published essays on a wide range of topics, including startups, technology, and education. {% endtab %}
{% tab title="Async" %}
import asyncio
from llama_index.llms.openai import OpenAI
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders
config = {
"provider":"openai",
"api_key":"YOUR_OPENAI_API_KEY",
"override_params": {
"model":"gpt-4o",
"max_tokens":64
}
}
#### You can also reference a saved Config ####
#### config = "pc-anthropic-xx"
async def main():
portkey = OpenAI(
api_base=PORTKEY_GATEWAY_URL,
default_headers=createHeaders(
api_key="PORTKEY_API_KEY",
config=config
)
)
resp = await portkey.acomplete("Paul Graham is ")
print(resp)
##### Streaming Mode #####
resp = await portkey.astream_complete("Paul Graham is ")
async for delta in resp:
print(delta.delta, end="")
asyncio.run(main())
{% endtab %} {% endtabs %}
By routing your LlamaIndex requests through Portkey, you get access to the following production-grade features:
- Interoperability: Call various LLMs like Anthropic, Gemini, Mistral, Azure OpenAI, Google Vertex AI, and AWS Bedrock with minimal code changes.
- Caching: Speed up your requests and save money on LLM calls by storing past responses in the Portkey cache. Choose between Simple and Semantic cache modes.
- Reliability: Set up fallbacks between different LLMs or providers, load balance your requests across multiple instances or API keys, set automatic retries, and request timeouts.
- Observability: Portkey automatically logs all the key details about your requests, including cost, tokens used, response time, request and response bodies, and more. Send custom metadata and trace IDs for better analytics and debugging.
- Prompt Management: Use Portkey as a centralized hub to store, version, and experiment with prompts across multiple LLMs, and seamlessly retrieve them in your LlamaIndex app for easy integration.
- Continuous Improvement: Improve your LlamaIndex app by capturing qualitative & quantitative user feedback on your requests.
- Security & Compliance: Set budget limits on provider API keys and implement fine-grained user roles and permissions for both the app and the Portkey APIs.
Much of these features are driven by Portkey's Config architecture. On the Portkey app, we make it easy to help you create, manage, and version your Configs so that you can reference them easily in Llamaindex.
Head over to the Configs tab in Portkey app where you can save various provider Configs along with the reliability and caching features. Each Config has an associated slug that you can reference in your Llamaindex code.
If you want to use a saved Config from the Portkey app in your LlamaIndex code but need to modify certain parts of it before making a request, you can easily achieve this using Portkey's Configs API. This approach allows you to leverage the convenience of saved Configs while still having the flexibility to adapt them to your specific needs.
Here's an example of how you can fetch a saved Config using the Configs API and override the model
parameter:
{% tabs %} {% tab title="Overriding Model in a Saved Config" %}
from llama_index.llms.openai import OpenAI
from llama_index.core.llms import ChatMessage
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders
import requests
import os
def create_config(config_slug,model):
url = f'https://api.portkey.ai/v1/configs/{config_slug}'
headers = {
'x-portkey-api-key': os.environ.get("PORTKEY_API_KEY"),
'content-type': 'application/json'
}
response = requests.get(url, headers=headers).json().get('data').get('config')
response['override_params']['model']=model
return response
config=create_config("pc-llamaindex-xx","gpt-4-turbo")
portkey = OpenAI(
api_base=PORTKEY_GATEWAY_URL,
default_headers=createHeaders(
api_key=os.environ.get("PORTKEY_API_KEY"),
config=config
)
)
messages = [ChatMessage(role="user", content="1729")]
resp = portkey.chat(messages)
print(resp)
{% endtab %} {% endtabs %}
In this example:
- We define a helper function
get_customized_config
that takes aconfig_slug
and amodel
as parameters. - Inside the function, we make a GET request to the Portkey Configs API endpoint to fetch the saved Config using the provided
config_slug
. - We extract the
config
object from the API response. - We update the
model
parameter in theoverride_params
section of the Config with the providedcustom_model
. - Finally, we return the customized Config.
We can then use this customized Config when initializing the OpenAI client from LlamaIndex, ensuring that our specific model
override is applied to the saved Config.
For more details on working with Configs in Portkey, refer to the Config documentation.
Now that we have the OpenAI code up and running, let's see how you can use Portkey to send the request across multiple LLMs - we'll show Anthropic, Gemini, and Mistral. For the full list of providers & LLMs supported, check out this doc.
Switching providers just requires changing 3 lines of code:
- Change the
provider name
- Change the
API key
, and - Change the
model name
{% tabs %} {% tab title="Anthropic" %}
from llama_index.llms.openai import OpenAI
from llama_index.core.llms import ChatMessage
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders
config = {
"provider":"anthropic",
"api_key":"YOUR_ANTHROPIC_API_KEY",
"override_params": {
"model":"claude-3-opus-20240229",
"max_tokens":64
}
}
#### You can also reference a saved Config ####
#### config = "pc-anthropic-xx"
portkey = OpenAI(
api_base=PORTKEY_GATEWAY_URL,
default_headers=createHeaders(
api_key="YOUR_PORTKEY_API_KEY",
config=config
)
)
messages = [
ChatMessage(role="system", content="You are a pirate with a colorful personality"),
ChatMessage(role="user", content="What is your name"),
]
resp = portkey.chat(messages)
print(resp)
{% endtab %}
{% tab title="Gemini" %}
from llama_index.llms.openai import OpenAI
from llama_index.core.llms import ChatMessage
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders
config = {
"provider":"google",
"api_key":"YOUR_GOOGLE_GEMINI_API_KEY",
"override_params": {
"model":"gemini-1.5-flash-latest",
"max_tokens":64
}
}
#### You can also reference a saved Config instead ####
#### config = "pc-gemini-xx"
portkey = OpenAI(
api_base=PORTKEY_GATEWAY_URL,
default_headers=createHeaders(
api_key="YOUR_PORTKEY_API_KEY",
config=config
)
)
messages = [
ChatMessage(role="system", content="You are a pirate with a colorful personality"),
ChatMessage(role="user", content="What is your name"),
]
resp = portkey.chat(messages)
print(resp)
{% endtab %}
{% tab title="Mistral" %}
from llama_index.llms.openai import OpenAI
from llama_index.core.llms import ChatMessage
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders
config = {
"provider":"mistral-ai",
"api_key":"YOUR_MISTRAL_AI_API_KEY",
"override_params": {
"model":"codestral-latest",
"max_tokens":64
}
}
#### You can also reference a saved Config instead ####
#### config = "pc-mistral-xx"
portkey = OpenAI(
api_base=PORTKEY_GATEWAY_URL,
default_headers=createHeaders(
api_key="YOUR_PORTKEY_API_KEY",
config=config
)
)
messages = [
ChatMessage(role="system", content="You are a pirate with a colorful personality"),
ChatMessage(role="user", content="What is your name"),
]
resp = portkey.chat(messages)
print(resp)
{% endtab %} {% endtabs %}
{% hint style="info" %} We recommend saving your cloud details to Portkey vault and getting a corresponding Virtual Key.
Explore the Virtual Key documentation here. {% endhint %}
{% tabs %} {% tab title="Azure OpenAI" %}
from llama_index.llms.openai import OpenAI
from llama_index.core.llms import ChatMessage
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders
config = {
"virtual_key":"AZURE_OPENAI_PORTKEY_VIRTUAL_KEY"
}
#### You can also reference a saved Config instead ####
#### config = "pc-azure-xx"
portkey = OpenAI(
api_base=PORTKEY_GATEWAY_URL,
default_headers=createHeaders(
api_key="YOUR_PORTKEY_API_KEY",
config=config
)
)
messages = [
ChatMessage(role="system", content="You are a pirate with a colorful personality"),
ChatMessage(role="user", content="What is your name"),
]
resp = portkey.chat(messages)
print(resp)
{% endtab %}
{% tab title="AWS Bedrock" %}
from llama_index.llms.openai import OpenAI
from llama_index.core.llms import ChatMessage
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders
config = {
"virtual_key":"AWS_BEDROCK_PORTKEY_VIRTUAL_KEY"
}
#### You can also reference a saved Config instead ####
#### config = "pc-bedrock-xx"
portkey = OpenAI(
api_base=PORTKEY_GATEWAY_URL,
default_headers=createHeaders(
api_key="YOUR_PORTKEY_API_KEY",
config=config
)
)
messages = [
ChatMessage(role="system", content="You are a pirate with a colorful personality"),
ChatMessage(role="user", content="What is your name"),
]
resp = portkey.chat(messages)
print(resp)
{% endtab %}
{% tab title="Google Vertex AI" %}
{% hint style="info" %}
Vertex AI uses OAuth2 to authenticate its requests, so you need to send the access token additionally along with the request - you can do this while by sending it as the api_key
in the OpenAI client.
Run gcloud auth print-access-token
in your terminal to get your Vertex AI access token.
{% endhint %}
from llama_index.llms.openai import OpenAI
from llama_index.core.llms import ChatMessage
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders
config = {
"virtual_key":"VERTEX_AI_PORTKEY_VIRTUAL_KEY"
}
#### You can also reference a saved Config instead ####
#### config = "pc-vertex-xx"
portkey = OpenAI(
api_key="YOUR_VERTEX_AI_ACCESS_TOKEN", # Get by running gcloud auth print-access-token in terminal
api_base=PORTKEY_GATEWAY_URL,
default_headers=createHeaders(
api_key="YOUR_PORTKEY_API_KEY",
config=config
)
)
messages = [
ChatMessage(role="system", content="You are a pirate with a colorful personality"),
ChatMessage(role="user", content="What is your name"),
]
resp = portkey.chat(messages)
print(resp)
{% endtab %} {% endtabs %}
Check out Portkey docs for Ollama and other privately hosted models.
{% tabs %} {% tab title="Ollama" %}
from llama_index.llms.openai import OpenAI
from llama_index.core.llms import ChatMessage
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders
config = {
"provider":"ollama",
"custom_host":"https://7cc4-3-235-157-146.ngrok-free.app", # Your Ollama ngrok URL
"override_params": {
"model":"llama3"
}
}
#### You can also reference a saved Config instead ####
#### config = "pc-azure-xx"
portkey = OpenAI(
api_base=PORTKEY_GATEWAY_URL,
default_headers=createHeaders(
api_key="YOUR_PORTKEY_API_KEY",
config=config
)
)
messages = [
ChatMessage(role="system", content="You are a pirate with a colorful personality"),
ChatMessage(role="user", content="What is your name"),
]
resp = portkey.chat(messages)
print(resp)
{% endtab %} {% endtabs %}
Explore full list of the providers supported on Portkey here.
You can speed up your requests and save money on your LLM requests by storing past responses in the Portkey cache. There are 2 cache modes:
- Simple: Matches requests verbatim. Perfect for repeated, identical prompts. Works on all models including image generation models.
- Semantic: Matches responses for requests that are semantically similar. Ideal for denoising requests with extra prepositions, pronouns, etc.
To enable Portkey cache, just add the cache
params to your config object.
{% tabs %} {% tab title="Simple Cache Config" %}
config = {
"provider":"mistral-ai",
"api_key":"YOUR_MISTRAL_AI_API_KEY",
"override_params": {
"model":"codestral-latest",
"max_tokens":64
},
"cache": {
"mode": "simple",
"max_age": 60000
}
}
{% endtab %}
{% tab title="Semantic Cache Config" %}
config = {
"provider":"mistral-ai",
"api_key":"YOUR_MISTRAL_AI_API_KEY",
"override_params": {
"model":"codestral-latest",
"max_tokens":64
},
"cache": {
"mode": "semantic",
"max_age": 60000
}
}
{% endtab %} {% endtabs %}
For more cache settings, check out the documentation here.
Set up fallbacks between different LLMs or providers, load balance your requests across multiple instances or API keys, set automatic retries, or set request timeouts - all set through Configs.
{% tabs %} {% tab title="Fallback from OpenAI to Anthropic" %}
config = {
"strategy": {
"mode": "fallback"
},
"targets": [
{
"virtual_key": "openai-virtual-key",
"override_params": {
"model": "gpt-4o"
}
},
{
"virtual_key": "anthropic-virtual-key",
"override_params": {
"model": "claude-3-opus-20240229",
"max_tokens":64
}
}
]
}
{% endtab %}
{% tab title="Load Balance 2 API Keys" %}
config = {
"strategy": {
"mode": "loadbalance"
},
"targets": [
{
"virtual_key": "openai-virtual-key-1",
"weight":1
},
{
"virtual_key": "openai-virtual-key-2",
"weight":1
}
]
}
{% endtab %}
{% tab title="Automatic Retries" %}
config = {
"retry": {
"attempts": 5
},
"virtual_key": "virtual-key-xxx"
}
{% endtab %}
{% tab title="Request Timeouts" %}
config = {
"strategy": { "mode": "fallback" },
"request_timeout": 10000,
"targets": [
{ "virtual_key": "open-ai-xxx" },
{ "virtual_key": "azure-open-ai-xxx" }
]
}
{% endtab %} {% endtabs %}
Explore deeper documentation for each feature here - Fallbacks, Loadbalancing, Retries, Timeouts.
Portkey automatically logs all the key details about your requests, including cost, tokens used, response time, request and response bodies, and more.
Using Portkey, you can also send custom metadata with each of your requests to further segment your logs for better analytics. Similarly, you can also trace multiple requests to a single trace ID and filter or view them separately in Portkey logs.
{% hint style="info" %}
Custom Metadata and Trace ID information is sent in default_headers
.
{% endhint %}
{% tabs %} {% tab title="Sending Custom Metadata" %}
from llama_index.llms.openai import OpenAI
from llama_index.core.llms import ChatMessage
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders
config = "pc-xxxx"
portkey = OpenAI(
api_base=PORTKEY_GATEWAY_URL,
default_headers=createHeaders(
api_key="YOUR_PORTKEY_API_KEY",
config=config,
metadata={
"_user": "USER_ID",
"environment": "production",
"session_id": "1729"
}
)
)
messages = [
ChatMessage(role="system", content="You are a pirate with a colorful personality"),
ChatMessage(role="user", content="What is your name"),
]
resp = portkey.chat(messages)
print(resp)
{% endtab %}
{% tab title="Sending Trace ID" %}
from llama_index.llms.openai import OpenAI
from llama_index.core.llms import ChatMessage
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders
config = "pc-xxxx"
portkey = OpenAI(
api_base=PORTKEY_GATEWAY_URL,
default_headers=createHeaders(
api_key="YOUR_PORTKEY_API_KEY",
config=config,
trace_id="YOUR_TRACE_ID_HERE"
)
)
messages = [
ChatMessage(role="system", content="You are a pirate with a colorful personality"),
ChatMessage(role="user", content="What is your name"),
]
resp = portkey.chat(messages)
print(resp)
{% endtab %} {% endtabs %}
Check out Observability docs here.
Portkey features an advanced Prompts platform tailor-made for better prompt engineering. With Portkey, you can:
- Store Prompts with Access Control and Version Control: Keep all your prompts organized in a centralized location, easily track changes over time, and manage edit/view permissions for your team.
- Parameterize Prompts: Define variables and mustache-approved tags within your prompts, allowing for dynamic value insertion when calling LLMs. This enables greater flexibility and reusability of your prompts.
- Experiment in a Sandbox Environment: Quickly iterate on different LLMs and parameters to find the optimal combination for your use case, without modifying your LlamaIndex code.
- Create your prompt template on the Portkey app, and save it to get an associated
Prompt ID
- Before making a Llamaindex request, render the prompt template using the Portkey SDK
- Transform the retrieved prompt to be compatible with LlamaIndex and send the request!
{% tabs %} {% tab title="Portkey Prompts in LlamaIndex" %}
import json
import os
from llama_index.llms.openai import OpenAI
from llama_index.core.llms import ChatMessage
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders, Portkey
### Initialize Portkey client with API key
client = Portkey(api_key=os.environ.get("PORTKEY_API_KEY"))
### Render the prompt template with your prompt ID and variables
prompt_template = client.prompts.render(
prompt_id="pp-prompt-id",
variables={ "movie":"Dune 2" }
).data.dict()
config = {
"virtual_key":"GROQ_VIRTUAL_KEY", # You need to send the virtual key separately
"override_params":{
"model":prompt_template["model"], # Set the model name based on the value in the prompt template
"temperature":prompt_template["temperature"] # Similarly, you can also set other model params
}
}
portkey = OpenAI(
api_base=PORTKEY_GATEWAY_URL,
default_headers=createHeaders(
api_key=os.environ.get("PORTKEY_API_KEY"),
config=config
)
)
### Transform the rendered prompt into LlamaIndex-compatible format
messages = [ChatMessage(content=msg["content"], role=msg["role"]) for msg in prompt_template["messages"]]
resp = portkey.chat(messages)
print(resp)
{% endtab %} {% endtabs %}
Explore Prompt Management docs here.
Now that you know how to trace & log your Llamaindex requests to Portkey, you can also start capturing user feedback to improve your app!
You can append qualitative as well as quantitative feedback to any trace ID
with the portkey.feedback.create
method:
{% tabs %} {% tab title="Adding Feedback" %}
from portkey_ai import Portkey
portkey = Portkey(
api_key="PORTKEY_API_KEY"
)
feedback = portkey.feedback.create(
trace_id="YOUR_LLAMAINDEX_TRACE_ID",
value=5, # Integer between -10 and 10
weight=1, # Optional
metadata={
# Pass any additional context here like comments, _user and more
}
)
print(feedback)
{% endtab %} {% endtabs %}
Check out the Feedback documentation for a deeper dive.
When you onboard more team members to help out on your Llamaindex app - permissioning, budgeting, and access management can become a mess! Using Portkey, you can set budget limits on provide API keys and implement fine-grained user roles and permissions to:
- Control access: Restrict team members' access to specific features, Configs, or API endpoints based on their roles and responsibilities.
- Manage costs: Set budget limits on API keys to prevent unexpected expenses and ensure that your LLM usage stays within your allocated budget.
- Ensure compliance: Implement strict security policies and audit trails to maintain compliance with industry regulations and protect sensitive data.
- Simplify onboarding: Streamline the onboarding process for new team members by assigning them appropriate roles and permissions, eliminating the need to share sensitive API keys or secrets.
- Monitor usage: Gain visibility into your team's LLM usage, track costs, and identify potential security risks or anomalies through comprehensive monitoring and reporting.
Read more about Portkey's Security & Enterprise offerings here.
Join the Portkey Discord to connect with other practitioners, discuss your LlamaIndex projects, and get help troubleshooting your queries.
For more detailed information on each feature and how to use them, please refer to the Portkey Documentation.