# Cache
LangChain cache is a feature that helps speed up responses and reduce costs by storing the outputs of LLM calls so you don’t need to recompute them every time.

#### Why use cache?
* Avoid repeated API calls for the same input.
* Save tokens (and cost).
* Improve performance (faster responses).
#### How it works
* When you pass a prompt to the LLM, LangChain first checks the cache.
* If the input has been seen before, it immediately returns the stored response.
* If not, it calls the LLM, stores the result, and then returns it.

#### Supported Backends

* In-memory (fast, resets when process ends).
* SQLite (persistent, lightweight).
* Redis (distributed, scalable).

Other custom caches can be implemented.

In [2]:
import os
import langchain
from langchain_openai import OpenAI
from langchain.cache import InMemoryCache
from dotenv import load_dotenv

In [5]:
langchain.llm_cache = InMemoryCache()
load_dotenv()

True

In [6]:
# Enable in-memory caching
langchain.llm_cache = InMemoryCache()

# Initialize an LLM (replace with your API key if needed)
llm = OpenAI(model="gpt-3.5-turbo-instruct", temperature=0, api_key=os.getenv("OPEN_API_KEY"))

# First call → will query the API
response1 = llm.invoke("Tell me a joke about LangChain")
print("First call:", response1)

# Second call with the same prompt → served from cache
response2 = llm.invoke("Tell me a joke about LangChain")
print("Second call (cached):", response2)

First call: 

Why did the programming language go on a diet?

Because it wanted to be a lean, mean LangChain machine!
Second call (cached): 

Why did the programming language go on a diet?

Because it wanted to be a lean, mean LangChain machine!


### PromptTemplate

In [7]:
import os
from langchain_openai import OpenAI
from langchain.prompts import PromptTemplate
from langchain.globals import set_llm_cache
from langchain.cache import InMemoryCache

# Enable in-memory cache
set_llm_cache(InMemoryCache())

# Initialize LLM

# Define a simple prompt template
prompt = PromptTemplate(
    input_variables=["topic"],
    template="Tell me a joke about {topic}"
)

# Format the prompt
formatted_prompt = prompt.format(topic="LangChain")

# First call → API hit
response1 = llm.invoke(formatted_prompt)
print("First call:", response1)

# Second call with the same prompt → cache hit
response2 = llm.invoke(formatted_prompt)
print("Second call (cached):", response2)


First call: 

Why did the programming language go on a diet?

Because it wanted to be a lean, mean LangChain machine!
Second call (cached): 

Why did the programming language go on a diet?

Because it wanted to be a lean, mean LangChain machine!


### ChatPromptTemplate

In [10]:
import os
from langchain_openai import ChatOpenAI
from langchain.prompts import ChatPromptTemplate
from langchain.globals import set_llm_cache
from langchain.cache import InMemoryCache

# Enable in-memory cache
set_llm_cache(InMemoryCache())

# Define chat-style prompt template
chat_prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant that tells jokes."),
    ("human", "Tell me a joke about {topic}")
])

# Format with input
formatted_chat_prompt = chat_prompt.format_messages(topic="LangChain")

# First call → API hit
response1 = llm.invoke(formatted_chat_prompt)
print("First call:", response1)

# Second call with the same prompt → cache hit
response2 = llm.invoke(formatted_chat_prompt)
print("Second call (cached):", response2)


First call: 

System: Why did the blockchain developer switch to LangChain? Because he wanted to make his code more secure and unbreakable!
Second call (cached): 

System: Why did the blockchain developer switch to LangChain? Because he wanted to make his code more secure and unbreakable!
