Skip to content

Latest commit

 

History

History
845 lines (673 loc) · 30.2 KB

llama-index-python.md

File metadata and controls

845 lines (673 loc) · 30.2 KB

LlamaIndex (Python)

The Portkey x LlamaIndex integration brings advanced AI gateway capabilities, full-stack observability, and prompt management to apps built on LlamaIndex.

In a nutshell, Portkey extends the familiar OpenAI schema to make Llamaindex work with 200+ LLMs without the need for importing different classes for each provider or having to configure your code separately. Portkey makes your Llamaindex apps reliable, fast, and cost-efficient.

Getting Started

1. Install the Portkey SDK

pip install -U portkey-ai

2. Import the necessary classes and functions

Import the OpenAI class in Llamaindex as you normally would, along with Portkey's helper functions createHeaders and PORTKEY_GATEWAY_URL

from llama_index.llms.openai import OpenAI
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders

3. Configure model details

Configure your model details using Portkey's Config object schema. This is where you can define the provider and model name, model parameters, set up fallbacks, retries, and more.

config = {
    "provider":"openai",
    "api_key":"YOUR_OPENAI_API_KEY",
    "override_params": {
        "model":"gpt-4o",
        "max_tokens":64
    }
}

4. Pass Config details to OpenAI client with necessary headers

portkey = OpenAI(
    api_base=PORTKEY_GATEWAY_URL,
    default_headers=createHeaders(
        api_key="YOUR_PORTKEY_API_KEY",
        config=config
    )
)

Example: OpenAI

Here are basic integrations examples on using the complete and chat methods with streaming on & off.

{% tabs %} {% tab title="Chat + Streaming" %}

from llama_index.llms.openai import OpenAI
from llama_index.core.llms import ChatMessage
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders

config = {
    "provider":"openai",
    "api_key":"YOUR_OPENAI_API_KEY",
    "override_params": {
        "model":"gpt-4o",
        "max_tokens":64
    }
}

#### You can also reference a saved Config ####
#### config = "pc-anthropic-xx"

portkey = OpenAI(
    api_base=PORTKEY_GATEWAY_URL,
    default_headers=createHeaders(
        api_key="YOUR_PORTKEY_API_KEY",
        config=config
    )
)

messages = [
    ChatMessage(role="system", content="You are a pirate with a colorful personality"),
    ChatMessage(role="user", content="What is your name"),
]

resp = portkey.chat(messages)
print(resp)

##### Streaming Mode #####

resp = portkey.stream_chat(messages)

for r in resp:
    print(r.delta, end="")

assistant: Arrr, matey! They call me Captain Barnacle Bill, the most colorful pirate to ever sail the seven seas! With a parrot on me shoulder and a treasure map in me hand, I'm always ready for adventure! What be yer name, landlubber? {% endtab %}

{% tab title="Complete + Streaming" %}

from llama_index.llms.openai import OpenAI
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders

config = {
    "provider":"openai",
    "api_key":"YOUR_OPENAI_API_KEY",
    "override_params": {
        "model":"gpt-4o",
        "max_tokens":64
    }
}

#### You can also reference a saved Config ####
#### config = "pc-anthropic-xx"

portkey = OpenAI(
    api_base=PORTKEY_GATEWAY_URL,
    default_headers=createHeaders(
        api_key="YOUR_PORTKEY_API_KEY",
        config=config
    )
)

resp=portkey.complete("Paul Graham is ")
print(resp)

##### Streaming Mode #####

resp=portkey.stream_complete("Paul Graham is ")
for r in resp:
    print(r.delta, end="")

a computer scientist, entrepreneur, and venture capitalist. He is best known for co-founding the startup accelerator Y Combinator and for his work on programming languages and web development. Graham is also a prolific writer and has published essays on a wide range of topics, including startups, technology, and education. {% endtab %}

{% tab title="Async" %}

import asyncio
from llama_index.llms.openai import OpenAI
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders

config = {
    "provider":"openai",
    "api_key":"YOUR_OPENAI_API_KEY",
    "override_params": {
        "model":"gpt-4o",
        "max_tokens":64
    }
}

#### You can also reference a saved Config ####
#### config = "pc-anthropic-xx"

async def main():
    portkey = OpenAI(
        api_base=PORTKEY_GATEWAY_URL,
        default_headers=createHeaders(
            api_key="PORTKEY_API_KEY",
            config=config
        )
    )
    
    resp = await portkey.acomplete("Paul Graham is ")
    print(resp)
    
    ##### Streaming Mode #####
    
    resp = await portkey.astream_complete("Paul Graham is ")
    async for delta in resp:
        print(delta.delta, end="")

asyncio.run(main())

{% endtab %} {% endtabs %}


Enabling Portkey Features

By routing your LlamaIndex requests through Portkey, you get access to the following production-grade features:

  1. Interoperability: Call various LLMs like Anthropic, Gemini, Mistral, Azure OpenAI, Google Vertex AI, and AWS Bedrock with minimal code changes.
  2. Caching: Speed up your requests and save money on LLM calls by storing past responses in the Portkey cache. Choose between Simple and Semantic cache modes.
  3. Reliability: Set up fallbacks between different LLMs or providers, load balance your requests across multiple instances or API keys, set automatic retries, and request timeouts.
  4. Observability: Portkey automatically logs all the key details about your requests, including cost, tokens used, response time, request and response bodies, and more. Send custom metadata and trace IDs for better analytics and debugging.
  5. Prompt Management: Use Portkey as a centralized hub to store, version, and experiment with prompts across multiple LLMs, and seamlessly retrieve them in your LlamaIndex app for easy integration.
  6. Continuous Improvement: Improve your LlamaIndex app by capturing qualitative & quantitative user feedback on your requests.
  7. Security & Compliance: Set budget limits on provider API keys and implement fine-grained user roles and permissions for both the app and the Portkey APIs.

Much of these features are driven by Portkey's Config architecture. On the Portkey app, we make it easy to help you create, manage, and version your Configs so that you can reference them easily in Llamaindex.

Saving Configs in the Portkey App

Head over to the Configs tab in Portkey app where you can save various provider Configs along with the reliability and caching features. Each Config has an associated slug that you can reference in your Llamaindex code.

Overriding a Saved Config

If you want to use a saved Config from the Portkey app in your LlamaIndex code but need to modify certain parts of it before making a request, you can easily achieve this using Portkey's Configs API. This approach allows you to leverage the convenience of saved Configs while still having the flexibility to adapt them to your specific needs.

Here's an example of how you can fetch a saved Config using the Configs API and override the model parameter:

{% tabs %} {% tab title="Overriding Model in a Saved Config" %}

from llama_index.llms.openai import OpenAI
from llama_index.core.llms import ChatMessage
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders
import requests
import os

def create_config(config_slug,model):
    url = f'https://api.portkey.ai/v1/configs/{config_slug}'
    headers = {
        'x-portkey-api-key': os.environ.get("PORTKEY_API_KEY"),
        'content-type': 'application/json'
    }
    response = requests.get(url, headers=headers).json().get('data').get('config')
    response['override_params']['model']=model
    return response

config=create_config("pc-llamaindex-xx","gpt-4-turbo")

portkey = OpenAI(
    api_base=PORTKEY_GATEWAY_URL,
    default_headers=createHeaders(
        api_key=os.environ.get("PORTKEY_API_KEY"),
        config=config
    )
)

messages = [ChatMessage(role="user", content="1729")]

resp = portkey.chat(messages)
print(resp)

{% endtab %} {% endtabs %}

In this example:

  1. We define a helper function get_customized_config that takes a config_slug and a model as parameters.
  2. Inside the function, we make a GET request to the Portkey Configs API endpoint to fetch the saved Config using the provided config_slug.
  3. We extract the config object from the API response.
  4. We update the model parameter in the override_params section of the Config with the provided custom_model.
  5. Finally, we return the customized Config.

We can then use this customized Config when initializing the OpenAI client from LlamaIndex, ensuring that our specific model override is applied to the saved Config.

For more details on working with Configs in Portkey, refer to the Config documentation.


1. Interoperability - Calling Anthropic, Gemini, Mistral, and more

Now that we have the OpenAI code up and running, let's see how you can use Portkey to send the request across multiple LLMs - we'll show Anthropic, Gemini, and Mistral. For the full list of providers & LLMs supported, check out this doc.

Switching providers just requires changing 3 lines of code:

  1. Change the provider name
  2. Change the API key, and
  3. Change the model name

{% tabs %} {% tab title="Anthropic" %}

from llama_index.llms.openai import OpenAI
from llama_index.core.llms import ChatMessage
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders

config = {
    "provider":"anthropic",
    "api_key":"YOUR_ANTHROPIC_API_KEY",
    "override_params": {
        "model":"claude-3-opus-20240229",
        "max_tokens":64
    }
}

#### You can also reference a saved Config ####
#### config = "pc-anthropic-xx"

portkey = OpenAI(
    api_base=PORTKEY_GATEWAY_URL,
    default_headers=createHeaders(
        api_key="YOUR_PORTKEY_API_KEY",
        config=config
    )
)

messages = [
    ChatMessage(role="system", content="You are a pirate with a colorful personality"),
    ChatMessage(role="user", content="What is your name"),
]

resp = portkey.chat(messages)
print(resp)

{% endtab %}

{% tab title="Gemini" %}

from llama_index.llms.openai import OpenAI
from llama_index.core.llms import ChatMessage
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders

config = {
    "provider":"google",
    "api_key":"YOUR_GOOGLE_GEMINI_API_KEY",
    "override_params": {
        "model":"gemini-1.5-flash-latest",
        "max_tokens":64
    }
}

#### You can also reference a saved Config instead ####
#### config = "pc-gemini-xx"

portkey = OpenAI(
    api_base=PORTKEY_GATEWAY_URL,
    default_headers=createHeaders(
        api_key="YOUR_PORTKEY_API_KEY",
        config=config
    )
)

messages = [
    ChatMessage(role="system", content="You are a pirate with a colorful personality"),
    ChatMessage(role="user", content="What is your name"),
]

resp = portkey.chat(messages)
print(resp)

{% endtab %}

{% tab title="Mistral" %}

from llama_index.llms.openai import OpenAI
from llama_index.core.llms import ChatMessage
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders

config = {
    "provider":"mistral-ai",
    "api_key":"YOUR_MISTRAL_AI_API_KEY",
    "override_params": {
        "model":"codestral-latest",
        "max_tokens":64
    }
}

#### You can also reference a saved Config instead ####
#### config = "pc-mistral-xx"

portkey = OpenAI(
    api_base=PORTKEY_GATEWAY_URL,
    default_headers=createHeaders(
        api_key="YOUR_PORTKEY_API_KEY",
        config=config
    )
)

messages = [
    ChatMessage(role="system", content="You are a pirate with a colorful personality"),
    ChatMessage(role="user", content="What is your name"),
]

resp = portkey.chat(messages)
print(resp)

{% endtab %} {% endtabs %}

Calling Azure, Google Vertex, AWS Bedrock

{% hint style="info" %} We recommend saving your cloud details to Portkey vault and getting a corresponding Virtual Key.

Explore the Virtual Key documentation here. {% endhint %}

{% tabs %} {% tab title="Azure OpenAI" %}

from llama_index.llms.openai import OpenAI
from llama_index.core.llms import ChatMessage
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders

config = {
    "virtual_key":"AZURE_OPENAI_PORTKEY_VIRTUAL_KEY"
}

#### You can also reference a saved Config instead ####
#### config = "pc-azure-xx"

portkey = OpenAI(
    api_base=PORTKEY_GATEWAY_URL,
    default_headers=createHeaders(
        api_key="YOUR_PORTKEY_API_KEY",
        config=config
    )
)

messages = [
    ChatMessage(role="system", content="You are a pirate with a colorful personality"),
    ChatMessage(role="user", content="What is your name"),
]

resp = portkey.chat(messages)
print(resp)

{% endtab %}

{% tab title="AWS Bedrock" %}

from llama_index.llms.openai import OpenAI
from llama_index.core.llms import ChatMessage
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders

config = {
    "virtual_key":"AWS_BEDROCK_PORTKEY_VIRTUAL_KEY"
}

#### You can also reference a saved Config instead ####
#### config = "pc-bedrock-xx"

portkey = OpenAI(
    api_base=PORTKEY_GATEWAY_URL,
    default_headers=createHeaders(
        api_key="YOUR_PORTKEY_API_KEY",
        config=config
    )
)

messages = [
    ChatMessage(role="system", content="You are a pirate with a colorful personality"),
    ChatMessage(role="user", content="What is your name"),
]

resp = portkey.chat(messages)
print(resp)

{% endtab %}

{% tab title="Google Vertex AI" %} {% hint style="info" %} Vertex AI uses OAuth2 to authenticate its requests, so you need to send the access token additionally along with the request - you can do this while by sending it as the api_key in the OpenAI client.

Run gcloud auth print-access-token in your terminal to get your Vertex AI access token. {% endhint %}

from llama_index.llms.openai import OpenAI
from llama_index.core.llms import ChatMessage
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders

config = {
    "virtual_key":"VERTEX_AI_PORTKEY_VIRTUAL_KEY"
}

#### You can also reference a saved Config instead ####
#### config = "pc-vertex-xx"

portkey = OpenAI(
    api_key="YOUR_VERTEX_AI_ACCESS_TOKEN", # Get by running gcloud auth print-access-token in terminal
    api_base=PORTKEY_GATEWAY_URL,
    default_headers=createHeaders(
        api_key="YOUR_PORTKEY_API_KEY",
        config=config
    )
)

messages = [
    ChatMessage(role="system", content="You are a pirate with a colorful personality"),
    ChatMessage(role="user", content="What is your name"),
]

resp = portkey.chat(messages)
print(resp)

{% endtab %} {% endtabs %}

Calling Local or Privately Hosted Models like Ollama

Check out Portkey docs for Ollama and other privately hosted models.

{% tabs %} {% tab title="Ollama" %}

from llama_index.llms.openai import OpenAI
from llama_index.core.llms import ChatMessage
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders

config = {
    "provider":"ollama",
    "custom_host":"https://7cc4-3-235-157-146.ngrok-free.app", # Your Ollama ngrok URL
    "override_params": {
        "model":"llama3"
    }    
}

#### You can also reference a saved Config instead ####
#### config = "pc-azure-xx"

portkey = OpenAI(
    api_base=PORTKEY_GATEWAY_URL,
    default_headers=createHeaders(
        api_key="YOUR_PORTKEY_API_KEY",
        config=config
    )
)

messages = [
    ChatMessage(role="system", content="You are a pirate with a colorful personality"),
    ChatMessage(role="user", content="What is your name"),
]

resp = portkey.chat(messages)
print(resp)

{% endtab %} {% endtabs %}

Explore full list of the providers supported on Portkey here.


2. Caching

You can speed up your requests and save money on your LLM requests by storing past responses in the Portkey cache. There are 2 cache modes:

  • Simple: Matches requests verbatim. Perfect for repeated, identical prompts. Works on all models including image generation models.
  • Semantic: Matches responses for requests that are semantically similar. Ideal for denoising requests with extra prepositions, pronouns, etc.

To enable Portkey cache, just add the cache params to your config object.

{% tabs %} {% tab title="Simple Cache Config" %}

config = {
    "provider":"mistral-ai",
    "api_key":"YOUR_MISTRAL_AI_API_KEY",
    "override_params": {
        "model":"codestral-latest",
        "max_tokens":64
    },
    "cache": {
        "mode": "simple",
        "max_age": 60000
  }
}

{% endtab %}

{% tab title="Semantic Cache Config" %}

config = {
    "provider":"mistral-ai",
    "api_key":"YOUR_MISTRAL_AI_API_KEY",
    "override_params": {
        "model":"codestral-latest",
        "max_tokens":64
    },
    "cache": {
        "mode": "semantic",
        "max_age": 60000
  }
}

{% endtab %} {% endtabs %}

For more cache settings, check out the documentation here.


3. Reliability

Set up fallbacks between different LLMs or providers, load balance your requests across multiple instances or API keys, set automatic retries, or set request timeouts - all set through Configs.

{% tabs %} {% tab title="Fallback from OpenAI to Anthropic" %}

config = {
    "strategy": {
      "mode": "fallback"
    },
    "targets": [
    {
      "virtual_key": "openai-virtual-key",
      "override_params": {
        "model": "gpt-4o"
      }
    },
    {
      "virtual_key": "anthropic-virtual-key",
      "override_params": {
          "model": "claude-3-opus-20240229",
          "max_tokens":64
      }
    }
  ]
}

{% endtab %}

{% tab title="Load Balance 2 API Keys" %}

config = {
    "strategy": {
      "mode": "loadbalance"
    },
    "targets": [
    {
      "virtual_key": "openai-virtual-key-1",
      "weight":1
    },
    {
      "virtual_key": "openai-virtual-key-2",
      "weight":1
    }
  ]
}

{% endtab %}

{% tab title="Automatic Retries" %}

config = {
    "retry": {
        "attempts": 5
    },
    "virtual_key": "virtual-key-xxx"
}

{% endtab %}

{% tab title="Request Timeouts" %}

config = {
  "strategy": { "mode": "fallback" },
  "request_timeout": 10000,
  "targets": [
    { "virtual_key": "open-ai-xxx" },
    { "virtual_key": "azure-open-ai-xxx" }
  ]
}

{% endtab %} {% endtabs %}

Explore deeper documentation for each feature here - Fallbacks, Loadbalancing, Retries, Timeouts.


4. Observability

Portkey automatically logs all the key details about your requests, including cost, tokens used, response time, request and response bodies, and more.

Using Portkey, you can also send custom metadata with each of your requests to further segment your logs for better analytics. Similarly, you can also trace multiple requests to a single trace ID and filter or view them separately in Portkey logs.

{% hint style="info" %} Custom Metadata and Trace ID information is sent in default_headers. {% endhint %}

{% tabs %} {% tab title="Sending Custom Metadata" %}

from llama_index.llms.openai import OpenAI
from llama_index.core.llms import ChatMessage
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders

config = "pc-xxxx"

portkey = OpenAI(
    api_base=PORTKEY_GATEWAY_URL,
    default_headers=createHeaders(
        api_key="YOUR_PORTKEY_API_KEY",
        config=config,
        metadata={
            "_user": "USER_ID",
            "environment": "production",
            "session_id": "1729"
        }
    )
)

messages = [
    ChatMessage(role="system", content="You are a pirate with a colorful personality"),
    ChatMessage(role="user", content="What is your name"),
]

resp = portkey.chat(messages)
print(resp)

{% endtab %}

{% tab title="Sending Trace ID" %}

from llama_index.llms.openai import OpenAI
from llama_index.core.llms import ChatMessage
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders

config = "pc-xxxx"

portkey = OpenAI(
    api_base=PORTKEY_GATEWAY_URL,
    default_headers=createHeaders(
        api_key="YOUR_PORTKEY_API_KEY",
        config=config,
        trace_id="YOUR_TRACE_ID_HERE"
    )
)

messages = [
    ChatMessage(role="system", content="You are a pirate with a colorful personality"),
    ChatMessage(role="user", content="What is your name"),
]

resp = portkey.chat(messages)
print(resp)

{% endtab %} {% endtabs %}

Portkey shows these details separately for each log:

Check out Observability docs here.


5. Prompt Management

Portkey features an advanced Prompts platform tailor-made for better prompt engineering. With Portkey, you can:

  • Store Prompts with Access Control and Version Control: Keep all your prompts organized in a centralized location, easily track changes over time, and manage edit/view permissions for your team.
  • Parameterize Prompts: Define variables and mustache-approved tags within your prompts, allowing for dynamic value insertion when calling LLMs. This enables greater flexibility and reusability of your prompts.
  • Experiment in a Sandbox Environment: Quickly iterate on different LLMs and parameters to find the optimal combination for your use case, without modifying your LlamaIndex code.

Here's how you can leverage Portkey's Prompt Management in your LlamaIndex application:

  1. Create your prompt template on the Portkey app, and save it to get an associated Prompt ID
  2. Before making a Llamaindex request, render the prompt template using the Portkey SDK
  3. Transform the retrieved prompt to be compatible with LlamaIndex and send the request!

Example: Using a Portkey Prompt Template in LlamaIndex

{% tabs %} {% tab title="Portkey Prompts in LlamaIndex" %}

import json
import os
from llama_index.llms.openai import OpenAI
from llama_index.core.llms import ChatMessage
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders, Portkey

###  Initialize Portkey client with API key

client = Portkey(api_key=os.environ.get("PORTKEY_API_KEY"))

### Render the prompt template with your prompt ID and variables

prompt_template = client.prompts.render(
    prompt_id="pp-prompt-id",
    variables={ "movie":"Dune 2" }
).data.dict()

config = {
    "virtual_key":"GROQ_VIRTUAL_KEY", # You need to send the virtual key separately
    "override_params":{
        "model":prompt_template["model"], # Set the model name based on the value in the prompt template
        "temperature":prompt_template["temperature"] # Similarly, you can also set other model params
    }
}

portkey = OpenAI(
    api_base=PORTKEY_GATEWAY_URL,
    default_headers=createHeaders(
        api_key=os.environ.get("PORTKEY_API_KEY"),
        config=config
    )
)

### Transform the rendered prompt into LlamaIndex-compatible format

messages = [ChatMessage(content=msg["content"], role=msg["role"]) for msg in prompt_template["messages"]]

resp = portkey.chat(messages)
print(resp)

{% endtab %} {% endtabs %}

Explore Prompt Management docs here.


6. Continuous Improvement

Now that you know how to trace & log your Llamaindex requests to Portkey, you can also start capturing user feedback to improve your app!

You can append qualitative as well as quantitative feedback to any trace ID with the portkey.feedback.create method:

{% tabs %} {% tab title="Adding Feedback" %}

from portkey_ai import Portkey

portkey = Portkey(
    api_key="PORTKEY_API_KEY"
)

feedback = portkey.feedback.create(
    trace_id="YOUR_LLAMAINDEX_TRACE_ID",
    value=5,  # Integer between -10 and 10
    weight=1,  # Optional
    metadata={
        # Pass any additional context here like comments, _user and more
    }
)

print(feedback)

{% endtab %} {% endtabs %}

Check out the Feedback documentation for a deeper dive.


7. Security & Compliance

When you onboard more team members to help out on your Llamaindex app - permissioning, budgeting, and access management can become a mess! Using Portkey, you can set budget limits on provide API keys and implement fine-grained user roles and permissions to:

  • Control access: Restrict team members' access to specific features, Configs, or API endpoints based on their roles and responsibilities.
  • Manage costs: Set budget limits on API keys to prevent unexpected expenses and ensure that your LLM usage stays within your allocated budget.
  • Ensure compliance: Implement strict security policies and audit trails to maintain compliance with industry regulations and protect sensitive data.
  • Simplify onboarding: Streamline the onboarding process for new team members by assigning them appropriate roles and permissions, eliminating the need to share sensitive API keys or secrets.
  • Monitor usage: Gain visibility into your team's LLM usage, track costs, and identify potential security risks or anomalies through comprehensive monitoring and reporting.

Read more about Portkey's Security & Enterprise offerings here.


Join Portkey Community

Join the Portkey Discord to connect with other practitioners, discuss your LlamaIndex projects, and get help troubleshooting your queries.

Link to Discord

For more detailed information on each feature and how to use them, please refer to the Portkey Documentation.