# How to add fallbacks to a runnable

Adding fallbacks to a Runnable in LangChain is a way to ensure robustness in your system when dealing with unpredictable API behavior or other runtime errors

# 1. Why Use Fallbacks?
Error Handling: Manage rate limits, downtime, or other errors from an LLM API.

Dynamic Adjustments: Use different prompts or models when switching between LLMs.

Performance Optimization: Start with a faster/cheaper model and fallback to a better-performing one when necessary.


# 2. Steps to Implement Fallbacks
Install Required Packages

Ensure you have the required LangChain packages installed.

In [None]:
%pip install --upgrade --quiet langchain langchain-openai


# Basic Fallback for LLM API Errors
If an error occurs while invoking one model, you can define a fallback to another.


In [4]:
import getpass
import os
from langchain_openai import ChatOpenAI

os.environ["OPENAI_API_KEY"] = getpass.getpass()


In [18]:
from unittest.mock import patch
import httpx
from openai import RateLimitError
from langchain_openai import ChatOpenAI
#from langchain_anthropic import ChatAnthropic  # Make sure to install this if it's available

# Initialize OpenAI model (with no retries to avoid retrying on rate limits)
openai_llm = ChatOpenAI(model="gpt-4", max_retries=0)

# Initialize fallback model (using a different model for fallback)
#anthropic_llm = ChatAnthropic(model="claude-3-haiku-20240307")

# Create a model chain with fallback
llm_with_fallbacks = openai_llm.with_fallbacks([openai_llm])

# Mock a RateLimitError for OpenAI
error = RateLimitError("Rate limit", response=httpx.Response(429, request=httpx.Request("GET", "/")), body="Rate limit exceeded")



# Use patch to simulate the API failure (i.e., OpenAI returns rate-limited error)
with patch("openai.resources.chat.completions.Completions.create", side_effect=error):
    
    try:
        print("Trying to invoke OpenAI...")
        # Try invoking the LLM with fallbacks
        result = llm_with_fallbacks.invoke("Why did the chicken cross the road?")
        print(f"Result from fallback model: {result}")
    except RateLimitError as e:
        print("Caught RateLimitError:", str(e))
        print("Triggering fallback model due to rate limit")
    except Exception as e:
        print(f"Caught unexpected error: {str(e)}")
        # You can define other fallback models or mechanisms here


Trying to invoke OpenAI...
Caught RateLimitError: Rate limit
Triggering fallback model due to rate limit


In [19]:
try:
    print("Trying to invoke OpenAI...")
    # Try invoking the LLM with fallbacks
    result = llm_with_fallbacks.invoke("Why did the chicken cross the road?")
    print(f"Result from fallback model: {result}")
except RateLimitError as e:
    print("Caught RateLimitError:", str(e))
    print("Triggering fallback model due to rate limit")
except Exception as e:
    print(f"Caught unexpected error: {str(e)}")
    # You can define other fallback models or mechanisms here


Trying to invoke OpenAI...
Result from fallback model: content='To get to the other side.' additional_kwargs={'refusal': None} response_metadata={'token_usage': {'completion_tokens': 7, 'prompt_tokens': 15, 'total_tokens': 22, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4-0613', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None} id='run-17b80de7-3c93-4c4e-853d-ddcb654e879d-0' usage_metadata={'input_tokens': 15, 'output_tokens': 7, 'total_tokens': 22, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}}


It looks like the fallback model (ChatAnthropic) responded correctly with the expected result: "To get to the other side."

Here's a breakdown of the output:

* Content: "To get to the other side" (the answer to the joke)
* Metadata: It includes token usage details, such as:

  completion_tokens: 7

  prompt_tokens: 15

  total_tokens: 22
  
* Model Info: The fallback model used is gpt-4-0613

# Fallback for Entire Chains
Fallbacks are also applicable for complex chains, where each LLM may require its own tailored prompt.

In [20]:
from langchain_core.prompts import ChatPromptTemplate, PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_openai import OpenAI, ChatOpenAI

# Chain 1: Chat Model
chat_prompt = ChatPromptTemplate.from_messages([
    ("system", "You're a nice assistant."),
    ("human", "Why did the {animal} cross the road?")
])
chat_model = ChatOpenAI(model="gpt-fake")  # Simulate a failure
chat_chain = chat_prompt | chat_model | StrOutputParser()

# Chain 2: Regular Model
prompt_template = """Instructions: Include a compliment in your response.

Question: Why did the {animal} cross the road?"""
prompt = PromptTemplate.from_template(prompt_template)
regular_model = OpenAI()
regular_chain = prompt | regular_model

# Combine Chains with Fallbacks
chain_with_fallbacks = chat_chain.with_fallbacks([regular_chain])
response = chain_with_fallbacks.invoke({"animal": "turtle"})
print(response)




Response: To show off its slow-but-steady determination and gracefulness.


# Fallback for Context Window Issues
Handle cases where the input length exceeds the context window of one model by falling back to a model with a larger context window.

f how to handle context window issues and fallback mechanisms properly in LangChain, where the model will automatically switch to a larger context window model if the input exceeds the limit.

In this example, we use ChatOpenAI models with two different context sizes (gpt-3.5-turbo and gpt-3.5-turbo-16k), with the latter being used as a fallback when the input exceeds the context window of the shorter model.

In [24]:
from langchain.chat_models import ChatOpenAI

# Initialize two models with different context window sizes
short_llm = ChatOpenAI(model="gpt-3.5-turbo")  # Regular context window
long_llm = ChatOpenAI(model="gpt-3.5-turbo-16k")  # Larger context window

# Create a fallback chain: if short_llm fails, use long_llm
llm_with_fallbacks = short_llm.with_fallbacks([long_llm])

# Example input exceeding the context window
inputs = "What is the next number: " + ", ".join([str(i) for i in range(1, 4000)])

try:
    # Invoke the model with fallbacks
    response = llm_with_fallbacks.invoke(inputs)
    print("Response:", response)
except Exception as e:
    print(f"Error: {e}")


  short_llm = ChatOpenAI(model="gpt-3.5-turbo")  # Regular context window


Response: content='4000' additional_kwargs={} response_metadata={'token_usage': {'completion_tokens': 2, 'prompt_tokens': 15009, 'total_tokens': 15011, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None} id='run-22d158e4-9c61-435a-a294-981073cb27a0-0'


# Explanation:

1. Models Setup: We define two models, one (gpt-3.5-turbo) for regular context windows and another (gpt-3.5-turbo-16k) for larger context windows.

2. Fallback Mechanism: The with_fallbacks() method is used to create a fallback chain where, if the shorter model (short_llm) encounters an issue (e.g., exceeding the context window), the fallback model (long_llm) is used.

3. Handling Long Inputs: The input string is intentionally long (by generating a sequence of numbers), ensuring that it exceeds the context window of the smaller model. The fallback will trigger to handle this.

# Expected Result:
* If the prompt exceeds the context window of gpt-3.5-turbo, the fallback model (gpt-3.5-turbo-16k) will be used to process the input.
* The output will depend on the model’s handling of the input, and the result will be printed.

# Fallback to a Better Model for Specific Tasks
Start with a cheaper model and fallback to a more capable one for cases where additional precision or parsing is required.

how to use a fallback system in LangChain to switch to a more suitable model for specific tasks when the primary model is not able to handle the input requirements or limitations. This could be useful for more complex or specialized queries:

In [26]:
from langchain.chat_models import ChatOpenAI
from langchain.chains import LLMChain  # Corrected import

# Define two models: one with a regular context window and another with a larger context window
regular_model = ChatOpenAI(model="gpt-3.5-turbo")  # Standard model with a smaller context window
extended_model = ChatOpenAI(model="gpt-3.5-turbo-16k")  # Model with a larger context window

# Create a fallback system where the extended model will be used for long or complex queries
llm_with_fallbacks = regular_model.with_fallbacks([extended_model])

# Example task that requires a larger context window (e.g., processing a long input or a complex query)
task_description = (
    "Summarize this research paper, which contains detailed information about the latest trends in AI and machine learning. "
    "The paper includes numerous references, methodologies, and experimental results that provide insights into the field of deep learning and natural language processing."
)

try:
    # Attempt to process the task with the regular model
    response = llm_with_fallbacks.invoke(task_description)
    print("Regular Model Response:", response)
except Exception as e:
    print(f"Error with regular model: {e}")
    print("Switching to extended model...")

    # If the task exceeds the regular model's capabilities, the fallback model will be used
    response = extended_model.invoke(task_description)
    print("Extended Model Response:", response)


Regular Model Response: content='This research paper delves into the latest trends in AI and machine learning, focusing on deep learning and natural language processing. The paper includes detailed information on methodologies, experimental results, and references to provide insights into the advancements in these areas. It highlights the importance of deep learning and natural language processing in the field of AI and discusses the potential applications and implications of these technologies.' additional_kwargs={} response_metadata={'token_usage': {'completion_tokens': 74, 'prompt_tokens': 53, 'total_tokens': 127, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None} id='run-5cda80b1-50fc-489b-96f5-06c213e18f24-0'


# Explanation:
1. Models Setup: We're using two versions of the ChatOpenAI model:
* regular_model with a default context window.
* extended_model with a larger context window to handle more complex or lengthy tasks.
2. Fallback Mechanism: The with_fallbacks() method combines both models so that if the regular_model encounters a limitation (like exceeding its context window or handling a complex query), the extended_model will be used automatically.
3. Handling Specific Tasks: When a query or input requires a larger context or is too long for the regular_model, the fallback will switch to the extended_model for better performance.
# Expected Result:
* If the query or task is within the capabilities of the regular_model, it will handle it without switching.
* If the input is too long or complex, the extended_model (with the larger context window) will process the request.

# 3. Best Practices for Using Fallbacks
* Disable Automatic Retries: Ensure the initial LLM is not set to retry, as this delays the fallback from being triggered.
* Different Prompts for Different Models: Customize the prompts to suit the capabilities of each fallback model.
* Monitor Performance: Continuously track fallback usage to understand why errors occur and refine the primary model or chain.