# Portkey

**Portkey** is a full-stack LLMOps platform that productionizes your Gen AI app reliably and securely.

#### Key Features of Portkey's Integration with Langchain:

<img src="https://3798672042-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FeWEp2XRBGxs7C1jgAdk7%2Fuploads%2FjDGBQvw5aFOCqctr0xwp%2FColab%20Version%202.png?alt=media&token=16057c99-b86c-416c-932e-c2b71549c506" alt="header" width=600 />


1. **🚪 AI Gateway**:
    - **[Automated Fallbacks & Retries](#🔁-implementing-fallbacks-and-retries-with-portkey)**: Ensure your application remains functional even if a primary service fails.
    - **[Load Balancing](#⚖️-implementing-load-balancing-with-portkey)**: Efficiently distribute incoming requests among multiple models.
    - **[Semantic Caching](#🧠-implementing-semantic-caching-with-portkey)**: Reduce costs and latency by intelligently caching results.
2. **[🔬 Observability](#🔬-observability-with-portkey)**:
    - **Logging**: Keep track of all requests for monitoring and debugging.
    - **Requests Tracing**: Understand the journey of each request for optimization.
    - **Custom Tags**: Segment and categorize requests for better insights.
3. **[📝 Continuous Improvement with User Feedback](#📝-feedback-with-portkey)**:
    - **Feedback Collection**: Seamlessly gather feedback on any served request, be it on a generation or conversation level.
    - **Weighted Feedback**: Obtain nuanced information by attaching weights to user feedback values.
    - **Feedback Metadata**: Incorporate custom metadata with the feedback to provide context, allowing for richer insights and analyses.
4. **[🔑 Secure Key Management](#feedback-with-portkey)**:
    - **Virtual Keys**: Portkey transforms original provider keys into virtual keys, ensuring your primary credentials remain untouched.
    - **Multiple Identifiers**: Ability to add multiple keys for the same provider or the same key under different names for easy identification without compromising security.

To harness these features, let's start with the setup:

<a href="https://colab.research.google.com/github/jerryjliu/llama_index/blob/main/docs/examples/llm/portkey.ipynb" target="_blank">
    <img src="https://colab.research.google.com/assets/colab-badge.svg" alt=\"Open In Colab\" width=150 />
</a>

In [2]:
# Installing Langchain & Portkey SDK
# !pip install langchain
# !pip install -U portkey-ai

# Importing necessary libraries and modules
from langchain.llms import Portkey
from langchain.schema import AIMessage, HumanMessage, SystemMessage
from langchain.chat_models import ChatPortkey
import portkey as pk

You do not need to install **any** other SDKs or import them in your Langchain app.

#### **Step 1️⃣: Get your Portkey API Key and your Virtual Keys for OpenAI, Anthropic, and more**

**[Portkey API Key](https://app.portkey.ai/)**: Log into [Portkey here](https://app.portkey.ai/), then click on the profile icon on top left and "Copy API Key".

In [1]:
import os
os.environ["PORTKEY_API_KEY"] = "/turdjWE+tIUeAzmzGxGEkkJLBQ="

**[Virtual Keys](https://docs.portkey.ai/key-features/ai-provider-keys)**
1. Navigate to the "Virtual Keys" page on [Portkey dashboard](https://app.portkey.ai/) and hit the "Add Key" button located at the top right corner.
2. Choose your AI provider (OpenAI, Anthropic, Cohere, HuggingFace, etc.), assign a unique name to your key, and, if needed, jot down any relevant usage notes. Your virtual key is ready!
3. Now copy and paste the keys below - you can use them anywhere within the Portkey ecosystem and keep your original key secure and untouched.

<img src="https://3798672042-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FeWEp2XRBGxs7C1jgAdk7%2Fuploads%2F66S1ik16Gle8jS1u6smr%2Fvirtual_keys.png?alt=media&token=2fec1c39-df4e-4c93-9549-7445a833321c" alt="header" width=600 />

In [3]:
openai_virtual_key_a = "open-ai-key-6317bc"
openai_virtual_key_b = ""

anthropic_virtual_key_a = ""
anthropic_virtual_key_b = ""

cohere_virtual_key_a = ""
cohere_virtual_key_b = ""

If you don't want to use Portkey's Virtual keys, you can also use your AI provider keys directly.

In [4]:
os.environ["OPENAI_API_KEY"] = ""
os.environ["ANTHROPIC_API_KEY"] = ""

#### **Step 2️⃣: Configure Portkey Features**

To harness the full potential of Portkey's integration with Langchain, you can configure various features as illustrated above. Here's a guide to all Portkey features and the expected values:

| Feature             | Config Key              | Value(Type)                                      | Required    |
|---------------------|-------------------------|--------------------------------------------------|-------------|
| API Key             | `api_key`               | `string`                                         | ✅ Required (can be set externally) |
| Mode                | `mode`                  | `fallback`, `loadbalance`, `single`              | ✅ Required |
| Cache Type          | `cache_status`          | `simple`, `semantic`                             | ❔ Optional |
| Force Cache Refresh | `cache_force_refresh`   | `True`, `False`                                  | ❔ Optional |
| Cache Age           | `cache_age`             | `integer` (in seconds)                           | ❔ Optional |
| Trace ID            | `trace_id`              | `string`                                         | ❔ Optional |
| Retries         | `retry`           | `integer` [0,5]                                  | ❔ Optional |
| Metadata            | `metadata`              | `json object` [More info](https://docs.portkey.ai/key-features/custom-metadata)          | ❔ Optional |
| Base URL | `base_url` | `url` | ❔ Optional |

* `api_key` and `mode` are required values.
* You can set your Portkey API key using the Portkey constructor or you can also set it as an environment variable.
* There are **3** modes - Single, Fallback, Loadbalance.
  * **Single** - This is the standard mode. Use it if you do not want Fallback OR Loadbalance features.
  * **Fallback** - Set this mode if you want to enable the Fallback feature. [Check out the guide here](#implementing-fallbacks-and-retries-with-portkey).
  * **Loadbalance** - Set this mode if you want to enable the Loadbalance feature. [Check out the guide here](#implementing-load-balancing-with-portkey).

Here's an example of how to set up some of these features:

In [5]:
portkey_client = Portkey(
    mode="single",
)

portkey_chat_client = ChatPortkey(
    mode="single",
)

# Since we have defined the Portkey API Key with os.environ, we do not need to set api_key again here

#### **Step 3️⃣: Constructing the LLM**

With the Portkey integration, constructing an LLM is simplified. Use the `LLMOptions` function for all providers, with the exact same keys you're accustomed to in your OpenAI or Anthropic constructors. The only new key is `weight`, essential for the load balancing feature.

In [6]:
openai_chat_llm = pk.LLMOptions(
    provider="openai",
    model="gpt-4",
    virtual_key=openai_virtual_key_a,
)

openai_text_llm = pk.LLMOptions(
    provider="openai",
    model="text-davinci-003",
    virtual_key=openai_virtual_key_a,
)

The above code illustrates how to utilize the `LLMOptions` function to set up an LLM with the OpenAI provider and the GPT-4 model. This same function can be used for other providers as well, making the integration process streamlined and consistent across various providers.

#### **Step 4️⃣: Activate the Portkey Client**

Once you've constructed the LLM using the `LLMOptions` function, the next step is to activate it with Portkey. This step is essential to ensure that all the Portkey features are available for your LLM.

In [7]:
portkey_client.add_llms(openai_text_llm)
portkey_chat_client.add_llms(openai_chat_llm)

ChatPortkey(cache=None, verbose=False, callbacks=None, callback_manager=None, tags=None, metadata=None, mode='single', model='gpt-4', llm={}, streaming=False, llms=[LLMOptions(model='gpt-4', suffix=None, max_tokens=None, temperature=None, top_k=None, top_p=None, n=None, stop_sequences=None, timeout=None, retry_settings=None, functions=None, function_call=None, logprobs=None, echo=None, stop=None, presence_penalty=None, frequency_penalty=None, best_of=None, logit_bias=None, user=None, organization=None, prompt=None, messages=None, provider=<ProviderTypes.OPENAI: 'openai'>, api_key=None, virtual_key='open-ai-key-6317bc', cache=None, cache_age=None, cache_status=None, cache_force_refresh=None, trace_id=None, metadata=None, weight=None, retry=None, deployment_id=None, resource_name=None, api_version=None)])

And, that's it! In just 4 steps, you have infused your Langchain app with sophisticated production capabilities.

#### **🔧 Testing the Integration**

Let's ensure that everything is set up correctly. Below, we create a simple chat scenario and pass it through our Portkey client to see the response.

In [8]:
messages = [
    SystemMessage(
        content="You are a helpful assistant that translates English to French."
    ),
    HumanMessage(
        content="Translate this sentence from English to French. I love programming."
    ),
]
print("Testing Portkey Langchain integration:")
response = portkey_chat_client(messages)
print(response)

Testing Portkey Langchain integration:
content="J'aime la programmation." additional_kwargs={} example=False


Here's how your logs will appear on your [Portkey dashboard](https://app.portkey.ai/):

<img src="https://portkey.ai/blog/content/images/2023/09/Log-1.png" alt="Logs" width=600 />

#### **⏩ Streaming Responses**

With Portkey, streaming responses has never been more straightforward. Portkey has 4 response functions:

1. `.complete(prompt)`
2. `.stream_complete(prompt)`
3. `.chat(messages)`
4. `.stream_chat(messages)`

While the `complete` function expects a string input(`str`), the `chat` function works with an array of `ChatMessage` objects.

**Example usage:**

In [9]:
# Let's set up a prompt and then use the stream_complete function to obtain a streamed response.

prompt = "Why is the sky blue ?"

print("Testing Stream Complete:")
response = portkey_client.stream(prompt)
for i in response:
    print(i, end="", flush=True)

#Let's prepare a set of chat messages and then utilize the stream_chat function to achieve a streamed chat response.
print("\n\n", "-"*50)
print("Testing Stream Chat:")

messages = [
    SystemMessage(content="You are a helpful assistant"),
    HumanMessage(content="What can you do?"),
]
response = portkey_chat_client.stream(messages)
for i in response:
    print(i.content, end="", flush=True)

Testing Stream Complete:


The sky is blue because of the way sunlight scatters off the gases

 --------------------------------------------------
Testing Stream Chat:
As an AI assistant, I can perform a variety of tasks, including:

1. Provide information: I can help you find information about almost any topic, whether you're curious about the weather forecast, news updates, historical facts, or scientific concepts. 

2. Manage schedules: I can remind you about upcoming appointments, meetings, and special events. I can also help set alarms and timers to support time management.

3. Provide recommendations: Whether you're looking for a good book, a new recipe, or the best local pizza place, I can provide recommendations based on the information available to me.

4. Answer questions: Ask me anything! I may not always have the answer, but I will do my best to give you a helpful response. 

6. Educate: I can help you learn about various topics by providing explanations and facts.

7. Assi

#### **🔍 Recap and References**

Congratulations! 🎉 You've successfully set up and tested the Portkey integration with Langchain. To recap the steps:

1. pip install portkey-ai
2. from llama_index.llms import Portkey
3. Grab your Portkey API Key and create your virtual provider keys from [here](https://app.portkey.ai/).
4. Construct your Portkey client to set trace id, cache, metadata, retry count, and mode: `portkey_client=Portkey(mode="fallback")`
5. Construct your provider LLM with LLMOptions: `openai_llm = pk.LLMOptions(provider="openai", model="gpt-4", virtual_key=openai_key_a)`
6. Add the LLM to Portkey with `portkey_client.add_llms(openai_llm)`
7. Call the Portkey methods regularly like you would any other LLM, with `portkey_client.chat(messages)`

Here's the guide to all the functions and their params:
- [Portkey LLM Constructor](#step-2-add-all-the-portkey-features-you-want-as-illustrated-below-by-calling-the-portkey-class)
- [LLMOptions Constructor](https://github.com/Portkey-AI/rubeus-python-sdk/blob/4cf3e17b847225123e92f8e8467b41d082186d60/rubeus/api_resources/utils.py#L179)
- [List of Portkey + Langchain Features](#portkeys-integration-with-Langchain-adds-the-following-production-capabilities-to-your-apps-out-of-the-box)

<a href="https://colab.research.google.com/github/jerryjliu/llama_index/blob/main/docs/examples/llm/portkey.ipynb" target="_blank">
    <img src="https://colab.research.google.com/assets/colab-badge.svg" alt=\"Open In Colab\" width=150 />
</a>

Here's the guide to all the functions and their params:
- [Portkey LLM Constructor](#step-2-add-all-the-portkey-features-you-want-as-illustrated-below-by-calling-the-portkey-class)
- [LLMBase Constructor](https://github.com/Portkey-AI/portkey-python-sdk/blob/main/portkey/api_resources/utils.py#L185)
- [List of Portkey + Langchain Features](#portkeys-integration-with-Langchain-adds-the-following-production-capabilities-to-your-apps-out-of-the-box)

#### **🔁 Implementing Fallbacks and Retries with Portkey**

Fallbacks and retries are essential for building resilient AI applications. With Portkey, implementing these features is straightforward:

- **Fallbacks**: If a primary service or model fails, Portkey will automatically switch to a backup model.
- **Retries**: If a request fails, Portkey can be configured to retry the request multiple times.

Below, we demonstrate how to set up fallbacks and retries using Portkey:

In [10]:
portkey_chat_client = ChatPortkey(mode="fallback")
messages = [
    SystemMessage(content="You are a helpful assistant"),
    HumanMessage(content="What can you do?"),
]

llm1 = pk.LLMOptions(
    provider="openai",
    model="gpt-4",
    retry_settings={"on_status_codes": [429, 500], "attempts": 2},
    virtual_key=openai_virtual_key_a,
)

llm2 = pk.LLMOptions(
    provider="openai",
    model="gpt-3.5-turbo",
    virtual_key=openai_virtual_key_b,
)

portkey_chat_client.add_llms(llm_params=[llm1,llm2])

print("Testing Fallback & Retry functionality:")
response = portkey_chat_client(messages)
print(response)

Testing Fallback & Retry functionality:
content='As an AI, I can assist you with a variety of tasks such as:\n\n1. Providing information on a wide range of topics.\n2. Offering reminders for your tasks and appointments.\n3. Sending messages or making calls.\n4. Setting alarms or timers for you.\n5. Checking your calendar for upcoming events.\n6. Answering general knowledge queries.\n7. Reading out loud texts or articles.\n8. Managing to-do lists.\n9. Offering language translation.\n10. Providing weather updates.\n\nTo utilise most effectively, you just need to ask and I will do my best to assist you.' additional_kwargs={} example=False


#### **⚖️ Implementing Load Balancing with Portkey**

Load balancing ensures that incoming requests are efficiently distributed among multiple models. This not only enhances the performance but also provides redundancy in case one model fails.

With Portkey, implementing load balancing is simple. You need to:

- Define the `weight` parameter for each LLM. This weight determines how requests are distributed among the LLMs.
- Ensure that the sum of weights for all LLMs equals 1.

Here's an example of setting up load balancing with Portkey:

In [11]:
portkey_chat_client = ChatPortkey(mode="ab_test")

messages = [
    SystemMessage(content="You are a helpful assistant"),
    HumanMessage(content="What can you do?"),
]

llm1 = pk.LLMOptions(
    provider="openai",
    model="gpt-4",
    virtual_key=openai_virtual_key_a,
    weight=0.2,
)

llm2 = pk.LLMOptions(
    provider="openai",
    model="gpt-3.5-turbo",
    virtual_key=openai_virtual_key_a,
    weight=0.8,
)

portkey_chat_client.add_llms(llm_params=[llm1, llm2])

print("Testing Loadbalance functionality:")
response = portkey_chat_client(messages)
print(response)

Testing Loadbalance functionality:
content='I can assist you in various tasks such as answering questions, providing information, giving recommendations, setting reminders and alarms, scheduling events, calculating numbers, converting units, giving weather updates, playing music, making phone calls, sending messages, and more. Just let me know what you need help with!' additional_kwargs={} example=False


#### **🧠 Implementing Semantic Caching with Portkey**

Semantic caching is a smart caching mechanism that understands the context of a request. Instead of caching based solely on exact input matches, semantic caching identifies similar requests and serves cached results, reducing redundant requests and improving response times as well as saving money.

Let's see how to implement semantic caching with Portkey:

In [12]:
import time

portkey_chat_client = ChatPortkey(mode="single")

openai_llm = pk.LLMOptions(
    provider="openai",
    model="gpt-3.5-turbo",
    virtual_key=openai_virtual_key_a,
    cache_status="semantic",
)

portkey_chat_client.add_llms(openai_llm)

current_messages = [
    SystemMessage(content="You are a helpful assistant"),
    HumanMessage(content="What are the ingredients of a pizza?"),
]

print("Testing Portkey Semantic Cache:")

start = time.time()
response = portkey_chat_client(current_messages)
end = time.time() - start

print(response)
print(f"{'-'*50}\nServed in {end} seconds.\n{'-'*50}")

new_messages = [
    SystemMessage(content="You are a helpful assistant"),
    HumanMessage(content="Ingredients of pizza"),
]

print("Testing Portkey Semantic Cache:")

start = time.time()
response = portkey_chat_client(new_messages)
end = time.time() - start

print(response)
print(f"{'-'*50}\nServed in {end} seconds.\n{'-'*50}")

Testing Portkey Semantic Cache:
content='The ingredients commonly found in a pizza are dough, tomato sauce, cheese (typically mozzarella), and various toppings such as pepperoni, mushrooms, onions, bell peppers, olives, and more. Additionally, herbs and spices like oregano, basil, and garlic can be added for flavor.' additional_kwargs={} example=False
--------------------------------------------------
Served in 2.572169065475464 seconds.
--------------------------------------------------
Testing Portkey Semantic Cache:
content='The basic ingredients of pizza include:\n\n1. Pizza crust or dough\n2. Pizza sauce or marinara sauce\n3. Cheese (typically mozzarella, but other varieties like cheddar, provolone, or Parmesan can be used)\n4. Toppings (such as pepperoni, sausage, mushrooms, onions, bell peppers, black olives, anchovies, etc.)\n5. Olive oil (for brushing the crust or drizzling on top)\n6. Herbs and spices (such as oregano, basil, garlic powder, red pepper flakes)\n7. Optional ing

Portkey's cache supports two more cache-critical functions - Force Refresh and Age.

`cache_force_refresh`: Force-send a request to your provider instead of serving it from a cache.
`cache_age`: Decide the interval at which the cache store for this particular string should get automatically refreshed. The cache age is set in seconds.

Here's how you can use it:

In [13]:
# Setting the cache status as `semantic` and cache_age as 60s.
openai_llm = pk.LLMOptions(
    provider="openai",
    model="gpt-3.5-turbo",
    virtual_key=openai_virtual_key_a,
    cache_force_refresh=True,
    cache_age=60,
)

#### **🔬 Observability with Portkey**

Having insight into your application's behavior is paramount. Portkey's observability features allow you to monitor, debug, and optimize your AI applications with ease. You can track each request, understand its journey, and segment them based on custom tags. This level of detail can help in identifying bottlenecks, optimizing costs, and enhancing the overall user experience.

Here's how to set up observability with Portkey:

In [14]:
metadata = {
    "_environment": "production",
    "_prompt": "test",
    "_user": "user",
    "_organisation": "acme",
}

trace_id = "Langchain_portkey"

portkey_chat_client = ChatPortkey(mode="single")

openai_llm = pk.LLMOptions(
    provider="openai",
    model="gpt-3.5-turbo",
    virtual_key=openai_virtual_key_a,
    metadata = metadata,
    trace_id = trace_id,
)

portkey_chat_client.add_llms(openai_llm)

print("Testing Observability functionality:")
response = portkey_chat_client(messages)
print(response)

Testing Observability functionality:
content="I can do several things:\n\n1. Answer questions: I can provide information on a wide range of topics.\n\n2. Provide recommendations: Whether it's book recommendations, movie suggestions, or travel destinations, I can help you find something that suits your preferences.\n\n3. Set reminders: If you need help remembering important dates or tasks, I can set up reminders for you.\n\n4. Solve problems: If you're facing a specific problem or challenge, I can offer suggestions or solutions to help you out.\n\n5. Assist with organization: From creating to-do lists to organizing your schedule, I can assist you with managing your tasks and time effectively.\n\n6. Offer general assistance: If you have any other specific requests or needs, feel free to ask, and I'll do my best to assist you." additional_kwargs={} example=False


#### **🌉 Open Source AI Gateway**

Portkey's AI Gateway uses the [open source project Rubeus](https://github.com/portkey-ai/rubeus) internally. Rubeus powers features like interoperability of LLMs, load balancing, fallbacks, and acts as an intermediary, ensuring that your requests are processed optimally.

One of the advantages of using Portkey is its flexibility. You can easily customize its behavior, redirect requests to different providers, or even bypass logging to Portkey altogether.

Here's an example of customizing the behavior with Portkey:

```py
portkey_client.base_url=None
```

#### **📝 Feedback with Portkey**

Continuous improvement is a cornerstone of AI. To ensure your models and applications evolve and serve users better, feedback is vital. Portkey's Feedback API offers a straightforward way to gather weighted feedback from users, allowing you to refine and improve over time.

Here's how to utilize the Feedback API with Portkey:

Read more about [Feedback here](https://docs.portkey.ai/key-features/feedback-api).

In [15]:
import requests
import json

# Endpoint URL
url = "https://api.portkey.ai/v1/feedback"

# Headers
headers = {
    "x-portkey-api-key": os.environ.get("PORTKEY_API_KEY"),
    "Content-Type": "application/json",
}

# Data
data = {"trace_id": "REQUEST_TRACE_ID", "value": 1}

# Making the request
response = requests.post(url, headers=headers, data=json.dumps(data))

# Print the response
print(response.text)

{"status":"success","message":"Feedback saved"}


All the feedback with `weight` and `value` for each trace id is available on the Portkey dashboard:

<img src="https://portkey.ai/blog/content/images/2023/09/feedback.png" alt="Feedback" width=600 />

#### **✅ Conclusion**

Integrating Portkey with Langchain simplifies the process of building robust and resilient AI applications. With features like semantic caching, observability, load balancing, feedback, and fallbacks, you can ensure optimal performance and continuous improvement.

By following this guide, you've set up and tested the Portkey integration with Langchain. As you continue to build and deploy AI applications, remember to leverage the full potential of this integration!

For further assistance or questions, reach out to the developers ➡️ <br />
<a href="https://twitter.com/intent/follow?screen_name=portkeyai" target="_blank">
  <img src="https://img.shields.io/twitter/follow/portkeyai?style=social&logo=twitter" alt="Twitter">
</a>

Join our community of practitioners putting LLMs into production ➡️ <br />
<a href="https://discord.gg/sDk9JaNfK8" target="_blank">
  <img src="https://img.shields.io/discord/1143393887742861333?logo=discord" alt="Discord">
</a>