LangChain provides an optional caching layer for <b><u>chat models</u></b>. This is useful for two reasons:

It can save you money by reducing the number of API calls you make to the LLM provider, if you’re often requesting the same completion multiple times. It can speed up your application by reducing the number of API calls you make to the LLM provider.

In [1]:
from langchain.cache import InMemoryCache, SQLiteCache
from langchain.globals import set_llm_cache
from langchain_openai import ChatOpenAI

### InMemoryCache

In [2]:
set_llm_cache(InMemoryCache())

In [3]:
llm = ChatOpenAI(model="gpt-3.5-turbo-0125",temperature=0)

In [4]:
%%time
llm.invoke("Tell me joke about cats")

CPU times: user 17.5 ms, sys: 9 ms, total: 26.4 ms
Wall time: 967 ms


AIMessage(content='Why was the cat sitting on the computer?\n\nBecause it wanted to keep an eye on the mouse!', response_metadata={'token_usage': {'completion_tokens': 20, 'prompt_tokens': 12, 'total_tokens': 32}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': 'fp_3b956da36b', 'finish_reason': 'stop', 'logprobs': None}, id='run-5fbc421f-0c50-47bb-a195-9d59045f46ee-0')

In [5]:
%%time
llm.invoke("Tell me joke about cats")

CPU times: user 510 µs, sys: 178 µs, total: 688 µs
Wall time: 661 µs


AIMessage(content='Why was the cat sitting on the computer?\n\nBecause it wanted to keep an eye on the mouse!', response_metadata={'token_usage': {'completion_tokens': 20, 'prompt_tokens': 12, 'total_tokens': 32}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': 'fp_3b956da36b', 'finish_reason': 'stop', 'logprobs': None}, id='run-5fbc421f-0c50-47bb-a195-9d59045f46ee-0')

### SQLLiteCache

In [7]:
# Remove any existing db file
!rm .langchain.db

rm: .langchain.db: No such file or directory


In [9]:
# Create a SQLLite db
set_llm_cache(SQLiteCache(database_path='.langchain.db'))

In [10]:
%%time
llm.invoke("Tell me a joke")

CPU times: user 13.4 ms, sys: 4.71 ms, total: 18.1 ms
Wall time: 1.36 s


AIMessage(content='Why did the scarecrow win an award? Because he was outstanding in his field!', response_metadata={'token_usage': {'completion_tokens': 17, 'prompt_tokens': 11, 'total_tokens': 28}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': 'fp_3b956da36b', 'finish_reason': 'stop', 'logprobs': None}, id='run-70ad6f33-092f-4815-98db-2874cbe27460-0')

In [11]:
%%time
llm.invoke("Tell me a joke")

CPU times: user 49.1 ms, sys: 33.9 ms, total: 83 ms
Wall time: 82.3 ms


AIMessage(content='Why did the scarecrow win an award? Because he was outstanding in his field!', id='run-70ad6f33-092f-4815-98db-2874cbe27460-0')