# üîî LangChain Callbacks Deep Dive

Callbacks are a powerful mechanism in LangChain that allow you to **hook into various stages** of your LLM application's execution. They enable:

- üìä **Monitoring** - Track token usage, costs, and performance
- üêõ **Debugging** - Inspect prompts, responses, and intermediate steps
- üìù **Logging** - Record events for audit trails and analytics
- ‚ö° **Streaming** - Handle real-time token-by-token output
- üîÑ **Custom Actions** - Trigger side effects during execution

---

## Table of Contents
1. [Setup](#setup)
2. [Built-in Callbacks](#built-in-callbacks)
3. [Custom Callback Handlers](#custom-callback-handlers)
4. [Streaming Callbacks](#streaming-callbacks)
5. [Cost & Token Tracking](#cost-tracking)
6. [Advanced: Logging to File](#file-logging)
7. [Async Callbacks](#async-callbacks)

<a id="setup"></a>
## 1. Setup

First, let's load our environment variables and import the necessary modules.


In [5]:
from dotenv import load_dotenv, find_dotenv
import time
from datetime import datetime
from typing import Any, Dict, List, Optional
from uuid import UUID

# LangChain Core
from langchain_openai import ChatOpenAI
from langchain.prompts import PromptTemplate, ChatPromptTemplate
from langchain.callbacks import StdOutCallbackHandler, get_openai_callback
from langchain.callbacks.base import BaseCallbackHandler
from langchain_core.outputs import LLMResult

load_dotenv(find_dotenv())

True

<a id="built-in-callbacks"></a>
## 2. Built-in Callbacks

LangChain provides several built-in callback handlers. The most common is `StdOutCallbackHandler` which prints execution details to the console.

### 2.1 StdOutCallbackHandler (Using Modern LCEL)

> ‚ö†Ô∏è **Note**: We're using LCEL (LangChain Expression Language) with the pipe `|` operator instead of the deprecated `LLMChain`.


In [6]:
# Initialize the LLM
llm = ChatOpenAI(model="gpt-4o-mini")

# Create a prompt template
prompt = PromptTemplate.from_template("Tell me a joke about {topic}")

# Create a chain using LCEL (modern approach)
chain = prompt | llm

# Use StdOutCallbackHandler to see execution details
handler = StdOutCallbackHandler()

# Invoke with callbacks in config
result = chain.invoke(
    {"topic": "rabbits"}, 
    config={"callbacks": [handler]}
)

print("\nüì§ Final Result:")
print(result.content)



[1m> Entering new RunnableSequence chain...[0m


[1m> Entering new PromptTemplate chain...[0m

[1m> Finished chain.[0m

[1m> Finished chain.[0m

üì§ Final Result:
What do you call a line of rabbits hopping backward? 

A receding hare-line!


<a id="custom-callback-handlers"></a>
## 3. Custom Callback Handlers

You can create custom handlers by extending `BaseCallbackHandler`. This gives you fine-grained control over what happens at each stage of execution.

### Available Callback Methods

| Method | Triggered When |
|--------|----------------|
| `on_llm_start` | LLM begins processing |
| `on_llm_end` | LLM finishes generating |
| `on_llm_error` | LLM encounters an error |
| `on_chain_start` | Chain begins execution |
| `on_chain_end` | Chain completes |
| `on_chain_error` | Chain encounters an error |
| `on_llm_new_token` | New token generated (streaming) |

### 3.1 Comprehensive Custom Handler


In [7]:
class DetailedCallbackHandler(BaseCallbackHandler):
    """A comprehensive callback handler that tracks all major events."""
    
    def __init__(self):
        self.start_time = None
        
    def on_llm_start(
        self, 
        serialized: Dict[str, Any], 
        prompts: List[str], 
        **kwargs
    ) -> None:
        """Called when LLM starts processing."""
        self.start_time = time.time()
        print("=" * 50)
        print("üöÄ LLM STARTED")
        print(f"‚è∞ Time: {datetime.now().strftime('%H:%M:%S')}")
        print(f"üìù Prompt: {prompts[0][:100]}...")
        print("=" * 50)
    
    def on_llm_end(self, response: LLMResult, **kwargs) -> None:
        """Called when LLM finishes generating."""
        elapsed = time.time() - self.start_time if self.start_time else 0
        print("\n" + "=" * 50)
        print("‚úÖ LLM COMPLETED")
        print(f"‚è±Ô∏è  Duration: {elapsed:.2f}s")
        
        # Extract token usage if available
        if response.llm_output:
            token_usage = response.llm_output.get('token_usage', {})
            print(f"üìä Tokens - Input: {token_usage.get('prompt_tokens', 'N/A')}, "
                  f"Output: {token_usage.get('completion_tokens', 'N/A')}, "
                  f"Total: {token_usage.get('total_tokens', 'N/A')}")
        print("=" * 50)
    
    def on_llm_error(self, error: Exception, **kwargs) -> None:
        """Called when LLM encounters an error."""
        print(f"‚ùå LLM ERROR: {error}")
    
    def on_chain_start(
        self, 
        serialized: Dict[str, Any], 
        inputs: Dict[str, Any], 
        **kwargs
    ) -> None:
        """Called when chain starts."""
        print(f"\nüîó Chain Started with inputs: {inputs}")
    
    def on_chain_end(self, outputs: Dict[str, Any], **kwargs) -> None:
        """Called when chain completes."""
        print(f"üîó Chain Completed")


# Test the detailed handler
detailed_handler = DetailedCallbackHandler()

result = chain.invoke(
    {"topic": "programming"},
    config={"callbacks": [detailed_handler]}
)

print(f"\nüì§ Response:\n{result.content}")


üîó Chain Started with inputs: {'topic': 'programming'}

üîó Chain Started with inputs: {'topic': 'programming'}
üîó Chain Completed
üöÄ LLM STARTED
‚è∞ Time: 16:40:40
üìù Prompt: Human: Tell me a joke about programming...

‚úÖ LLM COMPLETED
‚è±Ô∏è  Duration: 2.01s
üìä Tokens - Input: 13, Output: 12, Total: 25
üîó Chain Completed

üì§ Response:
Why do programmers prefer dark mode?

Because light attracts bugs!


<a id="streaming-callbacks"></a>
## 4. Streaming Callbacks

Streaming allows you to receive tokens as they're generated, creating a more responsive user experience. The `on_llm_new_token` callback is essential for this.

In [8]:
class StreamingCallbackHandler(BaseCallbackHandler):
    """Handler for streaming tokens as they're generated."""
    
    def __init__(self):
        self.tokens = []
        self.token_count = 0
    
    def on_llm_new_token(self, token: str, **kwargs) -> None:
        """Called for each new token generated."""
        self.tokens.append(token)
        self.token_count += 1
        # Print token without newline for streaming effect
        print(token, end="", flush=True)
    
    def on_llm_end(self, response: LLMResult, **kwargs) -> None:
        """Called when generation completes."""
        print(f"\n\n‚úÖ Streaming complete! Generated {self.token_count} tokens.")


# Create a streaming-enabled LLM
streaming_llm = ChatOpenAI(model="gpt-4o-mini", streaming=True)
streaming_chain = prompt | streaming_llm

# Create handler and invoke
stream_handler = StreamingCallbackHandler()

print("üé¨ Streaming Response:\n")
result = streaming_chain.invoke(
    {"topic": "artificial intelligence"},
    config={"callbacks": [stream_handler]}
)

üé¨ Streaming Response:

Why did the robot go on a diet?

Because it had too many bytes!

‚úÖ Streaming complete! Generated 18 tokens.


<a id="cost-tracking"></a>
## 5. Cost & Token Tracking

LangChain provides a convenient `get_openai_callback` context manager to track API usage and costs automatically.


In [9]:
# Track costs and token usage with context manager
with get_openai_callback() as cb:
    # Run multiple calls to accumulate costs
    result1 = chain.invoke({"topic": "cats"})
    result2 = chain.invoke({"topic": "dogs"})
    result3 = chain.invoke({"topic": "birds"})

# Print comprehensive usage report
print("=" * 50)
print("üìä USAGE REPORT")
print("=" * 50)
print(f"üî¢ Total Tokens:      {cb.total_tokens:,}")
print(f"   ‚îú‚îÄ Prompt Tokens:  {cb.prompt_tokens:,}")
print(f"   ‚îî‚îÄ Output Tokens:  {cb.completion_tokens:,}")
print(f"üí∞ Total Cost:        ${cb.total_cost:.6f}")
print(f"üìû Successful Calls:  {cb.successful_requests}")
print("=" * 50)


üìä USAGE REPORT
üî¢ Total Tokens:      101
   ‚îú‚îÄ Prompt Tokens:  39
   ‚îî‚îÄ Output Tokens:  62
üí∞ Total Cost:        $0.000043
üìû Successful Calls:  3


In [10]:
result1

AIMessage(content='Why was the cat sitting on the computer? \n\nBecause it wanted to keep an eye on the mouse!', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 21, 'prompt_tokens': 13, 'total_tokens': 34, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_11f3029f6b', 'id': 'chatcmpl-ClvMITmw9c74UtC9xpufcXH1Svj3A', 'finish_reason': 'stop', 'logprobs': None}, id='run-00db6cf9-00f3-4a5c-a346-c9c954bf2b0d-0', usage_metadata={'input_tokens': 13, 'output_tokens': 21, 'total_tokens': 34, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})

<a id="file-logging"></a>
## 6. Advanced: Logging to File

For production applications, you'll often want to log callback events to a file for auditing and debugging.


In [None]:
import json
import logging

class FileLoggingHandler(BaseCallbackHandler):
    """Logs all LLM events to a JSON file for auditing."""
    
    def __init__(self, log_file: str = "llm_logs.json"):
        self.log_file = log_file
        self.logs = []
        
    def _log_event(self, event_type: str, data: dict):
        """Helper to log events with timestamp."""
        event = {
            "timestamp": datetime.now().isoformat(),
            "event_type": event_type,
            "data": data
        }
        self.logs.append(event)
        
        # Append to file
        with open(self.log_file, "a") as f:
            f.write(json.dumps(event) + "\n")
    
    def on_llm_start(self, serialized, prompts, **kwargs):
        self._log_event("llm_start", {
            "prompts": prompts,
            "model": serialized.get("kwargs", {}).get("model_name", "unknown")
        })
    
    def on_llm_end(self, response: LLMResult, **kwargs):
        token_usage = {}
        if response.llm_output:
            token_usage = response.llm_output.get("token_usage", {})
        
        self._log_event("llm_end", {
            "response": response.generations[0][0].text if response.generations else "",
            "token_usage": token_usage
        })
    
    def on_llm_error(self, error: Exception, **kwargs):
        self._log_event("llm_error", {"error": str(error)})


# Test file logging
file_handler = FileLoggingHandler("llm_audit_log.json")

result = chain.invoke(
    {"topic": "space exploration"},
    config={"callbacks": [file_handler]}
)

print(f"‚úÖ Response generated and logged!")
print(f"üìÅ Logs written to: llm_audit_log.json")
print(f"\nüìÑ Latest log entry:")
print(json.dumps(file_handler.logs[-1], indent=2))


<a id="async-callbacks"></a>
## 7. Async Callbacks

For high-performance applications, you can use async callback handlers with `AsyncCallbackHandler`.


In [None]:
import asyncio
from langchain.callbacks.base import AsyncCallbackHandler

class AsyncStreamHandler(AsyncCallbackHandler):
    """Async handler for non-blocking streaming."""
    
    async def on_llm_new_token(self, token: str, **kwargs) -> None:
        """Handle each new token asynchronously."""
        print(token, end="", flush=True)
        # Simulate async work (e.g., sending to websocket)
        await asyncio.sleep(0.01)
    
    async def on_llm_start(self, serialized, prompts, **kwargs) -> None:
        print("üöÄ Async LLM started...\n")
    
    async def on_llm_end(self, response, **kwargs) -> None:
        print("\n\n‚úÖ Async streaming complete!")


async def run_async_example():
    """Run async streaming example."""
    async_handler = AsyncStreamHandler()
    async_llm = ChatOpenAI(model="gpt-4o-mini", streaming=True)
    async_chain = prompt | async_llm
    
    result = await async_chain.ainvoke(
        {"topic": "quantum computing"},
        config={"callbacks": [async_handler]}
    )
    return result

# Run the async example
print("üé¨ Async Streaming Demo:\n")
await run_async_example()


## 8. Combining Multiple Callbacks

You can use multiple callback handlers simultaneously to achieve different purposes.


In [None]:
# Create multiple handlers for different purposes
timing_handler = DetailedCallbackHandler()  # For timing and debug info
log_handler = FileLoggingHandler("combined_logs.json")  # For persistent logs

# Use multiple handlers together
result = chain.invoke(
    {"topic": "machine learning"},
    config={"callbacks": [timing_handler, log_handler]}
)

print(f"\nüì§ Final response:\n{result.content}")


## üìö Summary

| Callback Type | Use Case | Key Method |
|---------------|----------|------------|
| `StdOutCallbackHandler` | Quick debugging | Built-in |
| `BaseCallbackHandler` | Custom sync handlers | `on_llm_start`, `on_llm_end`, etc. |
| `AsyncCallbackHandler` | Non-blocking async ops | `async on_llm_*` methods |
| `get_openai_callback` | Cost/token tracking | Context manager |

### Key Takeaways

1. **Use LCEL** (`prompt | llm`) instead of deprecated `LLMChain`
2. **Callbacks are composable** - use multiple handlers for different concerns
3. **Streaming callbacks** provide real-time UX with `on_llm_new_token`
4. **Always track costs** in production using `get_openai_callback`
5. **Log to files** for auditing and debugging in production

### Further Reading

- [LangChain Callbacks Documentation](https://python.langchain.com/docs/concepts/callbacks/)
- [LangChain Expression Language (LCEL)](https://python.langchain.com/docs/concepts/lcel/)
