# Lesson 6.2: Caching and Callbacks

---

In previous lessons, we learned how to build complex chains and Agents with LangChain Expression Language (LCEL). As LLM applications grow larger and are used more frequently, two important aspects to consider are **performance** and **observability**. This lesson will introduce **Caching** for speed and cost reduction, along with **Callbacks** for monitoring and controlling execution flow.

## 1. Caching

### 1.1. Why is Caching Needed?

**Caching** is a technique of storing the results of expensive computations (like LLM calls) so that when the same computation is requested again, the result can be returned immediately from the cache instead of being re-executed.

* **Reduce API Costs:** Calls to LLMs often incur costs (per token). If the same question is asked multiple times, caching helps avoid paying for each call.
* **Increase Response Speed:** Retrieving data from cache is significantly faster than calling the LLM from scratch, improving user experience.
* **Avoid Redundant LLM Calls for Duplicate Questions:** Especially useful in Question Answering (Q&A) applications or chatbots where users might repeat questions.



### 1.2. Types of Cache in LangChain

LangChain supports various cache types, suitable for different environments and storage requirements:

* **`InMemoryCache`:**
    * **Concept:** The simplest cache, storing data in the application's RAM.
    * **Pros:** Very fast, easy to set up.
    * **Cons:** Data is lost when the application shuts down, cannot be shared between application instances.
    * **When to use:** Development, testing, or small applications that don't require data persistence.
* **`SQLiteCache`:**
    * **Concept:** Stores data in a SQLite database file.
    * **Pros:** Data is persistent (not lost when the application shuts down), easier to set up than dedicated DB servers.
    * **Cons:** Performance might not be as high as specialized DB servers for heavy loads.
    * **When to use:** Local applications requiring data persistence without complex DB server setup.
* **`RedisCache`:**
    * **Concept:** Uses Redis (an in-memory data structure store) as the cache backend.
    * **Pros:** Very fast, can be shared across multiple applications/servers, supports TTL (Time To Live) for data.
    * **Cons:** Requires Redis server installation and management.
    * **When to use:** Production applications requiring high performance, shared cache, and scalability.
* **Other types:** LangChain also supports other cache types like `GPTCache`, `UpstashRedisCache`, etc.

### 1.3. How to Set Up and Use Caching

To use caching in LangChain, you need to set up a cache backend and then enable it.

In [None]:
# Install libraries if not already installed
# pip install langchain-openai openai

import langchain
from langchain_openai import ChatOpenAI
from langchain.cache import InMemoryCache
import os
import time

# Thiết lập biến môi trường cho khóa API của OpenAI
# os.environ["OPENAI_API_KEY"] = "YOUR_OPENAI_API_KEY"

# 1. Thiết lập Caching Backend
# Kích hoạt InMemoryCache
langchain.llm_cache = InMemoryCache()
print("Đã thiết lập InMemoryCache.") # InMemoryCache has been set up.

# Để sử dụng SQLiteCache:
# from langchain.cache import SQLiteCache
# langchain.llm_cache = SQLiteCache(database_path=".langchain.db")
# print("Đã thiết lập SQLiteCache.") # SQLiteCache has been set up.

# Để sử dụng RedisCache: (cần cài đặt redis và redis-py)
# pip install redis
# from langchain.cache import RedisCache
# import redis
# langchain.llm_cache = RedisCache(redis.Redis(host="localhost", port=6379, db=0))
# print("Đã thiết lập RedisCache.") # RedisCache has been set up.

# 2. Khởi tạo LLM (hoặc Chain/Agent)
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0.7)

# 3. Sử dụng LLM như bình thường
print("\n--- Thực hành sử dụng Caching ---") # --- Practical Caching Example ---

# Lần gọi đầu tiên (sẽ gọi LLM thực sự)
print("Lần gọi 1:") # Call 1:
start_time = time.time()
response1 = llm.invoke("Kể một câu chuyện ngắn về một con chó phiêu lưu.") # Tell a short story about an adventurous dog.
end_time = time.time()
print(f"Thời gian phản hồi: {end_time - start_time:.2f} giây") # Response time: {end_time - start_time:.2f} seconds
print(f"Phản hồi: {response1.content[:50]}...") # Response: {response1.content[:50]}...

# Lần gọi thứ hai với cùng một prompt (sẽ lấy từ cache)
print("\nLần gọi 2 (cùng prompt):") # Call 2 (same prompt):
start_time = time.time()
response2 = llm.invoke("Kể một câu chuyện ngắn về một con chó phiêu lưu.") # Tell a short story about an adventurous dog.
end_time = time.time()
print(f"Thời gian phản hồi: {end_time - start_time:.2f} giây") # Response time: {end_time - start_time:.2f} seconds
print(f"Phản hồi: {response2.content[:50]}...") # Response: {response2.content[:50]}...

# Lần gọi thứ ba với prompt khác (sẽ gọi LLM thực sự)
print("\nLần gọi 3 (prompt khác):") # Call 3 (different prompt):
start_time = time.time()
response3 = llm.invoke("Kể một câu chuyện ngắn về một con mèo dễ thương.") # Tell a short story about a cute cat.
end_time = time.time()
print(f"Thời gian phản hồi: {end_time - start_time:.2f} giây") # Response time: {end_time - start_time:.2f} seconds
print(f"Phản hồi: {response3.content[:50]}...") # Response: {response3.content[:50]}...

# Xóa cache (nếu cần)
langchain.llm_cache.clear()
print("\nĐã xóa cache.") # Cache cleared.


**Explanation:**
* You set up the cache by assigning a cache object (e.g., `InMemoryCache()`) to `langchain.llm_cache`.
* Once the cache is set up, any call to the LLM (or via Chain/Agent) with the same prompt will automatically be served from the cache if the result is already available.
* You will notice a significant difference in response time between the first call (slower) and the second call (much faster) with the same prompt.


---

## 2. Callbacks

### 2.1. Concept of Callbacks

**Callbacks** in LangChain are a powerful mechanism that allows you to **monitor and control the execution flow** of LangChain components (LLMs, Chains, Tools, Agents). They provide "hooks" where you can attach your custom functions to perform actions when a specific event occurs.



### 2.2. Callback Events

LangChain provides various callback events, allowing you to monitor application activity in detail:

* `on_llm_start`, `on_llm_end`, `on_llm_error`: When an LLM starts, ends, or encounters an error.
* `on_chain_start`, `on_chain_end`, `on_chain_error`: When a Chain starts, ends, or encounters an error.
* `on_tool_start`, `on_tool_end`, `on_tool_error`: When a Tool starts, ends, or encounters an error (applies only to Agents).
* `on_agent_action`, `on_agent_finish`: When an Agent takes an action or finishes.
* `on_text`: When a piece of text is generated (useful for streaming).
* `on_retriever_start`, `on_retriever_end`, `on_retriever_error`: When a Retriever starts, ends, or encounters an error.
* And many other events.

You can create a **Custom Callback Handler** by inheriting from `BaseCallbackHandler` and overriding the corresponding methods for the events you want to monitor.

### 2.3. Applications of Callbacks

Callbacks have countless applications in developing and deploying LLM applications:

* **Logging:** Record detailed activities of LLMs, Chains, Tools for debugging or later analysis.
* **Monitoring:** Track performance (response time, token count), costs, and errors in real-time.
* **Performance Analysis:** Collect data to analyze and optimize components.
* **Custom Error Handling:** Perform specific actions when errors occur (e.g., send notifications, retry).
* **Progress Tracking:** Provide feedback to the user about the progress of a long task.
* **Flow Control:** Change the behavior of a Chain/Agent based on events.

### 2.4. Practical Example: Using Callbacks to Monitor Chain or Agent Activity

We will create a simple Custom Callback Handler to log key events when a Chain or Agent runs.

In [None]:
# Install libraries if not already installed
# pip install langchain-openai openai

import os
from langchain_openai import ChatOpenAI
from langchain.chains import LLMChain
from langchain_core.prompts import PromptTemplate
from langchain_core.callbacks import BaseCallbackHandler
from typing import Any, Dict, List, Optional, Union
from langchain_core.messages import BaseMessage
from langchain_core.outputs import LLMResult

# Thiết lập biến môi trường cho khóa API của OpenAI
# os.environ["OPENAI_API_KEY"] = "YOUR_OPENAI_API_KEY"

# 1. Tạo Custom Callback Handler
class MyCustomHandler(BaseCallbackHandler):
    def on_llm_start(
        self, serialized: Dict[str, Any], prompts: List[str], **kwargs: Any
    ) -> None:
        """Chạy khi LLM bắt đầu."""
        print(f"\n--- LLM BẮT ĐẦU ---") # --- LLM START ---
        print(f"  Mô hình: {serialized['name']}") #   Model:
        print(f"  Prompts: {prompts}") #   Prompts:

    def on_llm_end(self, response: LLMResult, **kwargs: Any) -> None:
        """Chạy khi LLM kết thúc thành công."""
        print(f"--- LLM KẾT THÚC ---") # --- LLM END ---
        print(f"  Đầu ra: {response.generations[0][0].text[:50]}...") #   Output:

    def on_llm_error(
        self, error: Union[Exception, KeyboardInterrupt], **kwargs: Any
    ) -> None:
        """Chạy khi LLM gặp lỗi."""
        print(f"--- LLM LỖI ---") # --- LLM ERROR ---
        print(f"  Lỗi: {error}") #   Error:

    def on_chain_start(
        self, serialized: Dict[str, Any], inputs: Dict[str, Any], **kwargs: Any
    ) -> None:
        """Chạy khi một Chain bắt đầu."""
        print(f"\n=== CHAIN BẮT ĐẦU: {serialized['name']} ===") # === CHAIN START: {serialized['name']} ===
        print(f"  Đầu vào Chain: {inputs}") #   Chain Input:

    def on_chain_end(self, outputs: Dict[str, Any], **kwargs: Any) -> None:
        """Chạy khi một Chain kết thúc thành công."""
        print(f"=== CHAIN KẾT THÚC ===") # === CHAIN END ===
        print(f"  Đầu ra Chain: {outputs}") #   Chain Output:

    def on_tool_start(
        self, serialized: Dict[str, Any], input_str: str, **kwargs: Any
    ) -> None:
        """Chạy khi một Tool bắt đầu (chỉ với Agents)."""
        print(f"\n--- TOOL BẮT ĐẦU: {serialized['name']} ---") # --- TOOL START: {serialized['name']} ---
        print(f"  Đầu vào Tool: {input_str}") #   Tool Input:

    def on_tool_end(self, output: str, **kwargs: Any) -> None:
        """Chạy khi một Tool kết thúc thành công (chỉ với Agents)."""
        print(f"--- TOOL KẾT THÚC ---") # --- TOOL END ---
        print(f"  Đầu ra Tool: {output[:50]}...") #   Tool Output:

    def on_agent_action(self, action: Any, **kwargs: Any) -> Any:
        """Chạy khi Agent đưa ra một hành động."""
        print(f"\n--- AGENT HÀNH ĐỘNG ---") # --- AGENT ACTION ---
        print(f"  Hành động: {action.tool} với đầu vào: {action.tool_input}") #   Action: {action.tool} with input: {action.tool_input}

    def on_agent_finish(self, finish: Any, **kwargs: Any) -> None:
        """Chạy khi Agent kết thúc."""
        print(f"\n--- AGENT KẾT THÚC ---") # --- AGENT FINISH ---
        print(f"  Kết quả cuối cùng: {finish.return_values['output'][:50]}...") #   Final result:

# 2. Khởi tạo LLM và Chain
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0.7)
prompt_template = PromptTemplate.from_template("Trả lời câu hỏi sau: {question}") # Answer the following question:
llm_chain = LLMChain(llm=llm, prompt=prompt_template)

# 3. Sử dụng Callbacks
print("--- Thực hành sử dụng Callbacks với Chain ---") # --- Practical Callbacks with Chain ---

# Cách 1: Truyền callback handler trực tiếp vào .invoke()
response_chain = llm_chain.invoke(
    {"question": "LangChain là gì?"}, # What is LangChain?
    config={"callbacks": [MyCustomHandler()]} # Truyền callback handler ở đây
)
print(f"\nPhản hồi cuối cùng từ Chain: {response_chain['text'][:100]}...") # Final response from Chain:

# Cách 2: Gắn callback handler vào LLM (sẽ áp dụng cho tất cả các cuộc gọi LLM qua LLM này)
# llm_with_callback = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0.7, callbacks=[MyCustomHandler()])
# llm_chain_with_callback = LLMChain(llm=llm_with_callback, prompt=prompt_template)
# print("\n--- Thực hành sử dụng Callbacks gắn vào LLM ---")
# response_chain_attached = llm_chain_with_callback.invoke({"question": "RAG là gì?"})
# print(f"\nPhản hồi cuối cùng từ Chain (gắn vào LLM): {response_chain_attached['text'][:100]}...")

# 4. Thực hành Callbacks với Agent (cần cài đặt thêm thư viện cho Tools)
# pip install google-search-results numexpr
from langchain.agents import AgentExecutor, create_react_agent
from langchain_community.utilities import SerpAPIWrapper
from langchain_community.tools import Tool
from langchain_community.tools.calculator.tool import Calculator
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

# Khởi tạo Tools
search_tool = Tool(
    name="Google Search",
    func=SerpAPIWrapper().run,
    description="Hữu ích khi bạn cần tìm kiếm thông tin trên Google về các sự kiện hiện tại hoặc dữ liệu thực tế." # Useful when you need to search for information on Google about current events or factual data.
)
calculator_tool = Calculator()
tools = [search_tool, calculator_tool]

# Định nghĩa Prompt cho Agent
agent_prompt = ChatPromptTemplate.from_messages([
    ("system", "Bạn là một trợ lý hữu ích. Bạn có quyền truy cập vào các công cụ sau: {tools}. Sử dụng chúng để trả lời các câu hỏi."), # You are a helpful assistant. You have access to the following tools: {tools}. Use them to answer questions.
    MessagesPlaceholder(variable_name="agent_scratchpad"),
    ("human", "{input}"),
])

# Tạo Agent
agent = create_react_agent(llm, tools, agent_prompt)

# Tạo Agent Executor
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=False) # verbose=False để callback handler kiểm soát output

print("\n--- Thực hành sử dụng Callbacks với Agent ---") # --- Practical Callbacks with Agent ---
response_agent = agent_executor.invoke(
    {"input": "Thời tiết hôm nay ở Đà Nẵng thế nào?"}, # What's the weather like today in Da Nang?
    config={"callbacks": [MyCustomHandler()]} # Truyền callback handler vào Agent Executor
)
print(f"\nPhản hồi cuối cùng từ Agent: {response_agent['output'][:100]}...") # Final response from Agent:

print("\n--- Kết thúc Callbacks thực hành ---") # --- End Callbacks practical example ---


**Explanation:**
* We create a `MyCustomHandler` class inheriting from `BaseCallbackHandler` and override methods like `on_llm_start`, `on_llm_end`, `on_chain_start`, `on_chain_end`, `on_tool_start`, `on_tool_end`, `on_agent_action`, `on_agent_finish`.
* When you run `llm_chain.invoke` or `agent_executor.invoke` and pass `config={"callbacks": [MyCustomHandler()]}` to it, the corresponding methods in `MyCustomHandler` will automatically be called when events occur during the Chain or Agent execution.
* You will see `print` statements from `MyCustomHandler` appear, showing you exactly when the LLM, Chain, or Tool starts/ends, and what their inputs/outputs are. This is extremely useful for debugging and monitoring.


---

## Lesson Summary

This lesson introduced two important concepts for optimizing and monitoring LangChain applications: **Caching** and **Callbacks**. You understood **why caching is needed** to reduce API costs and increase response speed, as well as common cache types like `InMemoryCache`, `SQLiteCache`, and `RedisCache`. Regarding **Callbacks**, you grasped the **concept** of the execution flow monitoring mechanism, key **callback events**, and their **applications** in logging, monitoring, and performance analysis. Finally, you practiced **using Callbacks** to closely observe the activity of a Chain and an Agent, giving you deeper insights into how LangChain components interact.