# Portkey | Building Resilient Llamaindex Apps

**Portkey** is a full-stack LLMOps platform that productionizes your Gen AI app reliably and securely.

### Key Features of Portkey's Integration with Llamaindex:

1. **AI Gateway**:
    - **Automated Fallbacks & Retries**: Ensure your application remains functional even if a primary service fails.
    - **Load Balancing**: Efficiently distribute incoming requests among multiple models.
    - **Semantic Caching**: Reduce costs and latency by intelligently caching results.
    
2. **Observability**:
    - **Logging**: Keep track of all requests for monitoring and debugging.
    - **Requests Tracing**: Understand the journey of each request for optimization.
    - **Custom Tags**: Segment and categorize requests for better insights.

To harness these features, let's start with the setup:


#### **Step 1: Get your Portkey API key**

Log into [Portkey here](https://app.portkey.ai/), then click on the profile icon on top right and "Copy API Key". Let's also set OpenAI & Anthropic API keys.

You do not need to install **any** other SDKs or import them in your Llamaindex app.

Here's a step-by-step guide to Portkey features and their integration with Llamaindex:

In [None]:
# !pip install portkey-ai -U

# Set the portkey api key as environment variable.
import os

os.environ["PORTKEY_API_KEY"] = "PORTKEY_API_KEY"
os.environ["OPENAI_API_KEY"] = ""
os.environ["ANTHROPIC_API_KEY"] = ""

In [None]:
# Installing the Portkey's AI gateway SDK developed by the Portkey team
# Importing necessary libraries and modules
from llama_index.llms import Portkey, ChatMessage
import portkey as pk

#### **Step 2: Configure Portkey Features**

To harness the full potential of Portkey's integration with Llamaindex, you can configure various features as illustrated above. Here's a guide to all Portkey features and the expected values:

| Feature             | Config Key              | Value(Type)                                      | Required    |
|---------------------|-------------------------|--------------------------------------------------|-------------|
| API Key             | `api_key`               | `string`                                         | ✅ Required (can be set externally) |
| Mode                | `mode`                  | `fallback`, `loadbalance`, `single`              | ✅ Required |
| Cache Type          | `cache_status`          | `simple`, `semantic`                             | ❔ Optional |
| Force Cache Refresh | `cache_force_refresh`   | `True`, `False`                                  | ❔ Optional |
| Cache Age           | `cache_age`             | `integer` (in seconds)                           | ❔ Optional |
| Trace ID            | `trace_id`              | `string`                                         | ❔ Optional |
| Retries         | `retry`           | `integer` [0,5]                                  | ❔ Optional |
| Metadata            | `metadata`              | `json object` [More info](https://docs.portkey.ai/key-features/custom-metadata)          | ❔ Optional |
| Base URL | `base_url` | `url` | ❔ Optional |

Here's an example of how to set up some of these features:

In [None]:
portkey_client = Portkey(
    mode="single",
)

# Since we have defined the Portkey API Key with os.environ, we do not need to set api_key again here

#### **Step 3: Constructing the LLM**

With the Portkey integration, constructing an LLM is simplified. Use the `LLMOptions` function for all providers, with the exact same keys you're accustomed to in your OpenAI or Anthropic constructors. The only new key is `weight`, essential for the load balancing feature.

In [None]:
openai_llm = pk.LLMOptions(
    provider="openai", model="gpt-4", virtual_key="open-ai-key-66ah788"
)

The above code illustrates how to utilize the `LLMOptions` function to set up an LLM with the OpenAI provider and the GPT-4 model. This same function can be used for other providers as well, making the integration process streamlined and consistent across various providers.

#### **Step 4: Activate the Portkey LLM**

Once you've constructed the LLM using the `LLMOptions` function, the next step is to activate it with Portkey. This step is essential to ensure that all the Portkey features are available for your LLM.

In [None]:
portkey_client.add_llms(openai_llm)

And, that's it! In just 4 steps, you have infused your Llamaindex app with sophisticated production capabilities.

#### **Testing the Integration**

Let's ensure that everything is set up correctly. Below, we create a simple chat scenario and pass it through our Portkey-enhanced LLM to see the response.

In [None]:
messages = [
    ChatMessage(role="system", content="You are a helpful assistant"),
    ChatMessage(role="user", content="What can you do?"),
]
print("Testing Portkey Llamaindex integration:")
response = portkey_client.chat(messages)
print(response)

#### **Recap and References**

Congratulations! You've successfully set up and tested the Portkey integration with Llamaindex. To recap:

1. pip install portkey-ai
2. Import Portkey from llama_index.llms.
3. Grab your Portkey API Key from [here](https://app.portkey.ai/).
4. Construct your Portkey client with `portkey_client=Portkey(mode="fallback")` and any other Portkey modes you want
5. Construct your provider LLM with `opneai_llm=PortkeyBase(provider="openai",model="gpt-4")`
6. Add the provider LLM to Portkey LLM with `portkey_client.add_llms(openai_llm)`
7. Call the Portkey methods regularly like you would any other LLM, with `portkey_client.chat(messages)`

Here's the guide to all the functions and their params:
- [Portkey LLM Constructor](#step-2-add-all-the-portkey-features-you-want-as-illustrated-below-by-calling-the-portkey-class)
- [LLMOptions Constructor](https://github.com/Portkey-AI/rubeus-python-sdk/blob/4cf3e17b847225123e92f8e8467b41d082186d60/rubeus/api_resources/utils.py#L179)
- [List of Portkey + Llamaindex Features](#portkeys-integration-with-llamaindex-adds-the-following-production-capabilities-to-your-apps-out-of-the-box)


#### **Implementing Fallbacks and Retries with Portkey**

Fallbacks and retries are essential for building resilient AI applications. With Portkey, implementing these features is straightforward:

- **Fallbacks**: If a primary service or model fails, Portkey will automatically switch to a backup model.
- **Retries**: If a request fails, Portkey can be configured to retry the request multiple times.

Below, we demonstrate how to set up fallbacks and retries using Portkey:

In [None]:
portkey_client = Portkey(mode="fallback")
messages = [
    ChatMessage(role="system", content="You are a helpful assistant"),
    ChatMessage(role="user", content="What can you do?"),
]

llm1 = pk.LLMOptions(
    provider="openai",
    model="gpt-4",
    retry_settings={"on_status_codes": [429, 500], "attempts": 2},
    virtual_key="open-ai-key-66ah788",
)
llm2 = pk.LLMOptions(
    provider="openai", model="gpt-3.5-turbo", virtual_key="open-ai-key-66ah788"
)

portkey_client.add_llms(llm_params=[llm1])

print("Testing Fallback & Retry functionality:")
response = portkey_client.chat(messages)
print(response)

#### **Implementing Load Balancing with Portkey**

Load balancing ensures that incoming requests are efficiently distributed among multiple models. This not only enhances the performance but also provides redundancy in case one model fails.

With Portkey, implementing load balancing is simple. You need to:

- Define the `weight` parameter for each LLM. This weight determines how requests are distributed among the LLMs.
- Ensure that the sum of weights for all LLMs equals 1.

Here's an example of setting up load balancing with Portkey:


In [None]:
portkey_client = Portkey(mode="ab_test")
messages = [
    ChatMessage(role="system", content="You are a helpful assistant"),
    ChatMessage(role="user", content="What can you do?"),
]

llm1 = pk.LLMOptions(
    provider="openai", virtual_key="open-ai-key-66ah788", model="gpt-4", weight=0.2
)
llm2 = pk.LLMOptions(
    provider="openai",
    virtual_key="open-ai-key-66ah788",
    model="gpt-3.5-turbo",
    weight=0.8,
)

portkey_client.add_llms(llm_params=[llm1, llm2])

print("Testing Loadbalance functionality:")
response = portkey_client.chat(messages)
print(response)

#### **Implementing Semantic Caching with Portkey**

Semantic caching is a smart caching mechanism that understands the context of a request. Instead of caching based solely on exact input matches, semantic caching identifies similar requests and serves cached results, reducing redundant requests and improving response times as well as saving money.

Let's see how to implement semantic caching with Portkey:

In [None]:
import time

portkey_client = Portkey(mode="single")

openai_llm = pk.LLMOptions(
    provider="openai",
    virtual_key="open-ai-key-66ah788",
    model="gpt-3.5-turbo",
    cache_status="semantic",
)
portkey_client.add_llms(openai_llm)

current_messages = [
    ChatMessage(role="system", content="You are a helpful assistant"),
    ChatMessage(role="user", content="What are the ingredients of a pizza?"),
]

print("Testing Portkey Semantic Cache:")

start = time.time()
response = portkey_client.chat(current_messages)
end = time.time() - start

print(response)
print("\n--------------------------------------\n")
print(f"Served in {end} seconds.")

new_messages = [
    ChatMessage(role="system", content="You are a helpful assistant"),
    ChatMessage(role="user", content="Ingredients of pizza"),
]

print("\n--------------------------------------\n")

print("Testing Portkey Semantic Cache:")

start = time.time()
response = portkey_client.chat(new_messages)
end = time.time() - start

print(response)
print("\n--------------------------------------\n")
print(f"Served in {end} seconds.")
print("\n--------------------------------------\n")

Portkey's cache supports two more cache-critical functions - Force Refresh and Age.

`cache_force_refresh`: Force-send a request to your provider instead of serving it from a cache.
`cache_age`: Decide the interval at which the cache store for this particular string should get automatically refreshed. The cache age is set in seconds.

Here's how you can use it:

In [None]:
# Setting the cache status as `semantic` and cache_age as 60s.
openai_llm = pk.LLMOptions(
    provider="openai",
    virtual_key="open-ai-key-66ah788",
    model="gpt-3.5-turbo",
    cache_force_refresh=True,
    cache_age=50,
)

#### **Observability with Portkey**

Having insight into your application's behavior is paramount. Portkey's observability features allow you to monitor, debug, and optimize your AI applications with ease. You can track each request, understand its journey, and segment them based on custom tags. This level of detail can help in identifying bottlenecks, optimizing costs, and enhancing the overall user experience.

Here's how to set up observability with Portkey:

In [None]:
metadata = {
    "_environment": "production",
    "_prompt": "test",
    "_user": "user",
    "_organisation": "acme",
}

portkey_client = Portkey(mode="single")
portkey_client.add_llms(openai_llm)

print("Testing Observability functionality:")
response = portkey_client.chat(messages)
print(response)

#### **AI Gateway with Portkey**

Portkey is an open-source AI gateway that powers features like load balancing and fallbacks. It acts as an intermediary, ensuring that your requests are processed optimally. One of the advantages of using Portkey is its flexibility. You can easily customize its behavior, redirect requests to different providers, or even bypass logging to Portkey.

Here's an example of customizing the behavior with Portkey:

```py
portkey_client.base_url=None
```

#### **Feedback with Portkey**

Continuous improvement is a cornerstone of AI. To ensure your models and applications evolve and serve users better, feedback is vital. Portkey's Feedback API offers a straightforward way to gather weighted feedback from users, allowing you to refine and improve over time.

Here's how to utilize the Feedback API with Portkey:

Read more about [Feedback here](https://docs.portkey.ai/key-features/feedback-api).

In [None]:
import requests
import json

# Endpoint URL
url = "https://api.portkey.ai/v1/feedback"

# Headers
headers = {
    "x-portkey-api-key": "<YOUR PORTKEY API KEY>",
    "Content-Type": "application/json",
}

# Data
data = {"trace_id": "REQUEST_TRACE_ID", "value": 1}

# Making the request
response = requests.post(url, headers=headers, data=json.dumps(data))

# Print the response
print(response.text)

#### **Conclusion**

Integrating Portkey with Llamaindex simplifies the process of building robust and resilient AI applications. With features like semantic caching, observability, load balancing, feedback, and fallbacks, you can ensure optimal performance and continuous improvement.

By following this guide, you've set up and tested the Portkey integration with Llamaindex. As you continue to build and deploy AI applications, remember to leverage the full potential of this integration!

For further assistance or questions, reach out to the developers [on Twitter](https://twitter.com/portkeyai).

Join our community of practitioners putting LLMs into production [on Discord](https://discord.gg/tmnpp4pqzv).
