In [14]:
from langchain.chat_models import ChatOpenAI
from langchain.prompts import ChatPromptTemplate
from langchain.output_parsers import DatetimeOutputParser

from openai.error import RateLimitError
from unittest.mock import patch

## What is a Fallback

When working with language models, you may often encounter issues from the underlying APIs, whether these be rate limiting or downtime. Therefore, as you go to move your LLM applications into production it becomes more and more important to safeguard against these. That's why we've introduced the concept of fallbacks.

A `fallback` is an alternative plan that may be used in an emergency.

## Fallback for LLM API Errors

This is maybe the most common use case for fallbacks. A request to an LLM API can fail for a variety of reasons - the API could be down, you could have hit rate limits, any number of things. Therefore, using fallbacks can help protect against these types of things

In [8]:
# Note that we set max_retries = 0 to avoid retrying on RateLimits, etc

openai_llm = ChatOpenAI(max_retries=0, openai_api_key=open("openai_api.txt").read())

with patch('openai.ChatCompletion.create', side_effect=RateLimitError()):
    try:
         print(openai_llm.invoke("Why did the chicken cross the road?"))
    except:
        print("Hit error")

Hit error


## Fallback for Long Inputs

One of the big limiting factors of LLMs is their context window. Usually, you can count and track the length of prompts before sending them to an LLM, but in situations where that is hard/complicated, you can fallback to a model with a longer context length.

In [12]:
openai_llm = ChatOpenAI(openai_api_key=open("openai_api.txt").read())

inputs = "What is the next number: " + ", ".join(["one", "two"] * 4000)

try:
    print(openai_llm.invoke(inputs))
except Exception as e:
    print(e)

This model's maximum context length is 4097 tokens. However, your messages resulted in 16012 tokens. Please reduce the length of the messages.


## Fallback to Better Model

Often times we ask models to output format in a specific format (like JSON). Models like GPT-3.5 can do this okay, but sometimes struggle. This naturally points to fallbacks - we can try with GPT-3.5 (faster, cheaper), but then if parsing fails we can use GPT-4.