# LangChain 1.0 Integration

## The Framework to Rule All of the APIs

Welcome to this comprehensive guide on LangChain 1.0! This notebook will walk you through all the core concepts of LangChain, the library that reconciles different packages in the LLM space.

### What You'll Learn
- **Models**: Abstractions for LLMs with invoke, stream, and batch capabilities
- **Tools**: Connecting models to the real world
- **Agents**: LLM + Tools + Autonomy
- **Memory**: Managing conversation context
- **Messages**: The unit of context
- **Advanced Features**: Prompt caching, rate limiting, and more

### The Trade-off
Instead of learning every single API (OpenAI, Anthropic, Gemini, Cohere, Pinecone, Chroma, etc.), you learn to reason in the LangChain Framework.

## Setup and Installation

First, let's install the necessary packages for LangChain 1.0:

# Install required packages
!pip install langchain langchain-openai langchain-anthropic langchain-core langgraph python-dotenv -q

In [None]:
# Import necessary libraries
import os
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

# Verify API key is loaded
if not os.getenv("OPENAI_API_KEY"):
    raise ValueError("OPENAI_API_KEY not found in environment variables. Please create a .env file with your API key.")

---

## 1. Models

### What is a Model?
A **Model** is an abstraction of LLMs in LangChain. Models have:
- **Tool calling** → calling external tools
- **Structured Output** → model response follows a defined format
- **Multimodality** → process more than text (images, audio, video)
- **Reasoning** → performs multi-step reasoning

### What Models Can Do:
- **Invoke** → generate response and output a message
- **Stream** → generate response and output it in real time
- **Batch** → send multiple requests (improves processing time)

### Method 1: Using `init_chat_model` (Unified Interface)

In [None]:
from langchain.chat_models import init_chat_model

# Initialize a model using the unified interface
model = init_chat_model(
    model="gpt-4o-mini",
    model_provider="openai",
    temperature=0.7
)

print(f"Model initialized: {model}")

### Method 2: Using Model Classes Directly

In [None]:
from langchain_openai import ChatOpenAI

# Initialize using the class directly
chat_model = ChatOpenAI(
    model="gpt-4o-mini",
    temperature=0.7
)

print(f"ChatOpenAI model: {chat_model}")

### Model Operation 1: Invoke

**Invoke** generates a complete response and outputs a message.

In [None]:
# Simple invoke example
response = model.invoke("Explain what LangChain is in one sentence.")
print(f"Response: {response.content}")
print(f"\nFull response object: {response}")

### Model Operation 2: Stream

**Stream** generates a response and outputs it in real-time, which is great for user experience.

In [None]:
# Stream example
print("Streaming response:")
for chunk in model.stream("Tell me a short story about AI in 3 sentences."):
    print(chunk.content, end="", flush=True)
print("\n")

### Model Operation 3: Batch

**Batch** sends multiple requests at once to improve processing time.

#### Important Notes on Batching:
- LangChain parallelizes model calls on the **CLIENT-SIDE**
- Some APIs support batching by the **INFERENCE provider**
- Results can arrive out of order with `batch_as_completed` and need to be matched by IDs
- Use `max_concurrency` to control concurrent calls and avoid overwhelming the API

In [None]:
# Batch example
questions = [
    "What is machine learning?",
    "What is deep learning?",
    "What is natural language processing?"
]

# Batch process all questions
responses = model.batch(questions)

for i, response in enumerate(responses):
    print(f"Q{i+1}: {questions[i]}")
    print(f"A{i+1}: {response.content}")
    print("-" * 50)

In [None]:
# Batch with max_concurrency to control rate
responses = model.batch(
    questions,
    config={"max_concurrency": 2}  # Only 2 concurrent requests
)

print(f"Processed {len(responses)} responses with controlled concurrency")

---

## 2. Messages

### The Unit of Context

Messages hold the input and output of the model. Each message has:
- **Role**: Who is speaking
- **Content**: What is being said
- **Metadata**: Additional information

### Message Types:
1. **System Message** → context and model instructions
2. **Human Message** → user input
3. **AI Message** → message by the LLM
4. **Tool Message** → output of tool calls

In [None]:
from langchain_core.messages import SystemMessage, HumanMessage, AIMessage, ToolMessage

# Create different message types
system_msg = SystemMessage(content="You are a helpful AI assistant specializing in Python programming.")
human_msg = HumanMessage(content="How do I create a list in Python?")
ai_msg = AIMessage(content="You can create a list in Python using square brackets: my_list = [1, 2, 3]")

print("System Message:")
print(f"  Role: {system_msg.type}")
print(f"  Content: {system_msg.content}")
print("\nHuman Message:")
print(f"  Role: {human_msg.type}")
print(f"  Content: {human_msg.content}")
print("\nAI Message:")
print(f"  Role: {ai_msg.type}")
print(f"  Content: {ai_msg.content}")

### Using Messages with Models

In [None]:
# Invoke model with messages
messages = [
    SystemMessage(content="You are a helpful assistant that explains concepts concisely."),
    HumanMessage(content="What is LangChain?")
]

response = model.invoke(messages)
print(f"Response: {response.content}")

### Conversation with Message History

In [None]:
# Multi-turn conversation
conversation = [
    SystemMessage(content="You are a helpful AI tutor."),
    HumanMessage(content="What is a variable in programming?"),
]

# First response
response1 = model.invoke(conversation)
conversation.append(AIMessage(content=response1.content))

# Follow-up question
conversation.append(HumanMessage(content="Can you give me an example in Python?"))
response2 = model.invoke(conversation)

print("Conversation:")
for msg in conversation:
    print(f"\n{msg.type.upper()}: {msg.content}")
print(f"\nAI: {response2.content}")

---

## 3. Tools

### Connecting Models to the Real World

**Tools** connect the models to the real world. They require:
- **Tool Schema** (name, description, argument definitions)
- **Function or coroutine** to execute

#### Key Terms:
- **Tool calling**, **function calling** → all mean the same thing!
- **bind_tools** → allows the model to CHOOSE to use the tool
- **Coroutine** → a special function that can pause, let other code run, and continue where it left off (great for APIs and LLMs)

### Tool Execution Flow:
1. Model calls the tool
2. You execute the tool
3. You give the result back to the model

**Note:** If you want this to happen automatically, you need an **AGENT**!

### Creating a Simple Tool with the @tool Decorator

In [None]:
from langchain_core.tools import tool
from datetime import datetime

# Simple tool using @tool decorator
@tool
def get_current_time(timezone: str = "UTC") -> str:
    """Get the current time in a specific timezone.
    
    Args:
        timezone: The timezone to get the time for (default: UTC)
    
    Returns:
        The current time as a string
    """
    current_time = datetime.now()
    return f"The current time in {timezone} is {current_time.strftime('%Y-%m-%d %H:%M:%S')}"

@tool
def calculate_sum(a: float, b: float) -> float:
    """Calculate the sum of two numbers.
    
    Args:
        a: First number
        b: Second number
    
    Returns:
        The sum of a and b
    """
    return a + b

@tool
def get_weather(city: str) -> str:
    """Get the weather for a specific city.
    
    Args:
        city: The name of the city
    
    Returns:
        A description of the weather (simulated)
    """
    # Simulated weather data
    return f"The weather in {city} is sunny with a temperature of 72°F."

print("Tools created:")
print(f"1. {get_current_time.name}: {get_current_time.description}")
print(f"2. {calculate_sum.name}: {calculate_sum.description}")
print(f"3. {get_weather.name}: {get_weather.description}")

### Creating Tools with Pydantic Models (Complex Inputs)

In [None]:
from pydantic import BaseModel, Field

class SearchInput(BaseModel):
    """Input schema for search tool."""
    query: str = Field(description="The search query")
    max_results: int = Field(default=5, description="Maximum number of results to return")
    category: str = Field(default="general", description="Search category (general, news, images)")

@tool(args_schema=SearchInput)
def search_web(query: str, max_results: int = 5, category: str = "general") -> str:
    """Search the web for information.
    
    Returns:
        Simulated search results
    """
    return f"Found {max_results} results for '{query}' in {category} category: [Result 1], [Result 2], ..."

print(f"Complex tool created: {search_web.name}")
print(f"Args schema: {search_web.args_schema.schema()}")

### Binding Tools to Models

Use **bind_tools** to allow the model to choose when to use tools.

In [None]:
# Bind tools to the model
tools = [get_current_time, calculate_sum, get_weather]
model_with_tools = model.bind_tools(tools)

print(f"Model now has access to {len(tools)} tools")

### Tool Calling Example (Manual Execution)

In [None]:
# Ask the model a question that requires a tool
response = model_with_tools.invoke("What's 25 + 37?")

print(f"Response type: {type(response)}")
print(f"Content: {response.content}")

# Check if the model wants to use a tool
if response.tool_calls:
    print("\nTool calls requested:")
    for tool_call in response.tool_calls:
        print(f"  Tool: {tool_call['name']}")
        print(f"  Arguments: {tool_call['args']}")
        print(f"  ID: {tool_call['id']}")
        
        # Execute the tool manually
        if tool_call['name'] == 'calculate_sum':
            result = calculate_sum.invoke(tool_call['args'])
            print(f"  Result: {result}")
else:
    print("No tool calls made")

### Multiple Tool Calls Example

In [None]:
# Question requiring multiple tools
response = model_with_tools.invoke(
    "What's the weather in Paris and what's the sum of 15 and 28?"
)

print("Model response:")
if response.tool_calls:
    print(f"Model wants to use {len(response.tool_calls)} tool(s):\n")
    for i, tool_call in enumerate(response.tool_calls, 1):
        print(f"Tool Call {i}:")
        print(f"  Name: {tool_call['name']}")
        print(f"  Args: {tool_call['args']}")
        print()

---

## 4. Agents

### LLM + Tools + Autonomy

An **Agent** is a loop where the LLM runs its tools and tries to achieve user requests.

#### Key Features:
- The LLM stops when it achieves its goal OR when the iteration limit is reached
- Created using `create_agent` (which uses LangGraph)
- Can handle multiple tools, parallel tool calling, dynamic tools, and error handling

#### Components:
- **Model** (static or dynamic)
- **Tools** (multiple tools, parallel calling, retry logic)
- **System Prompt**
- **Memory**
- **Streaming**
- **Middleware** (modify original behavior)

**Important:** Tools can access runtime information!

In [None]:
from langchain.agents import create_agent

# Create an agent with tools
agent = create_agent(
    model=model,
    tools=tools,
    system_prompt="You are a helpful assistant that can check the time, calculate sums, and get weather information."
)

print("Agent created with ReAct pattern")
print(f"Tools available: {[tool.name for tool in tools]}")

### Running the Agent

In [None]:
# Simple agent execution
result = agent.invoke({
    "messages": [HumanMessage(content="What's 42 + 58?")]
})

print("Agent execution result:")
print(f"Final answer: {result['messages'][-1].content}")

In [None]:
# Complex agent execution with multiple steps
result = agent.invoke({
    "messages": [HumanMessage(content="What's the weather in London? Also, what's the current time?")]
})

print("\nAgent conversation:")
for i, message in enumerate(result['messages']):
    if hasattr(message, 'content') and message.content:
        print(f"\n[{message.type.upper()}]: {message.content}")
    if hasattr(message, 'tool_calls') and message.tool_calls:
        print(f"\n[TOOL CALLS]:")
        for tc in message.tool_calls:
            print(f"  - {tc['name']}({tc['args']})")

### Streaming Agent Responses

In [None]:
# Stream agent execution
print("Streaming agent response:\n")

for chunk in agent.stream({
    "messages": [HumanMessage(content="Calculate the sum of 100 and 250, then tell me about it.")]
}):
    if 'agent' in chunk:
        messages = chunk['agent'].get('messages', [])
        for msg in messages:
            if hasattr(msg, 'content') and msg.content:
                print(f"[{msg.type}]: {msg.content}")
    if 'tools' in chunk:
        print(f"[Tool execution]: {chunk['tools']}")

---

## 5. Memory

### Short-Term Memory in Conversations

**Memory** refers to the model's capacity to remember previous interactions within a single conversation.

#### Issues with Memory:
- LLMs perform poorly with too much information
- More context = slower, more costly, and worse quality

#### LangChain's Approach:
- Saves messages in an **AgentState**
- **Production** → store in a database
- **MVPs/Testing** → saved in RAM memory
- You can build custom memory by inheriting the class

### Memory with Agents (Built-in)

In [None]:
# The agent automatically maintains memory across the conversation
# Each invoke maintains state within that call

conversation_state = {
    "messages": [
        HumanMessage(content="My name is Alice and I live in Tokyo.")
    ]
}

# First interaction
result1 = agent.invoke(conversation_state)
print("First interaction:")
print(result1['messages'][-1].content)

# Add to conversation
conversation_state['messages'] = result1['messages']
conversation_state['messages'].append(HumanMessage(content="What city did I say I live in?"))

# Second interaction (should remember)
result2 = agent.invoke(conversation_state)
print("\nSecond interaction:")
print(result2['messages'][-1].content)

### Custom Memory Management

In [None]:
# Simple in-memory conversation history
class ConversationMemory:
    def __init__(self, max_messages: int = 10):
        self.messages = []
        self.max_messages = max_messages
    
    def add_message(self, message):
        """Add a message to memory."""
        self.messages.append(message)
        # Keep only the last max_messages
        if len(self.messages) > self.max_messages:
            # Keep system message if it exists
            system_msgs = [m for m in self.messages if isinstance(m, SystemMessage)]
            other_msgs = [m for m in self.messages if not isinstance(m, SystemMessage)]
            self.messages = system_msgs + other_msgs[-self.max_messages:]
    
    def get_messages(self):
        """Get all messages."""
        return self.messages
    
    def clear(self):
        """Clear all messages."""
        self.messages = []

# Create memory
memory = ConversationMemory(max_messages=6)

# Add system message
memory.add_message(SystemMessage(content="You are a helpful assistant."))

# Conversation with memory
memory.add_message(HumanMessage(content="I'm learning Python."))
response1 = model.invoke(memory.get_messages())
memory.add_message(AIMessage(content=response1.content))

print("First exchange:")
print(f"Human: I'm learning Python.")
print(f"AI: {response1.content}")

# Follow-up
memory.add_message(HumanMessage(content="What programming language am I learning?"))
response2 = model.invoke(memory.get_messages())
memory.add_message(AIMessage(content=response2.content))

print("\nSecond exchange:")
print(f"Human: What programming language am I learning?")
print(f"AI: {response2.content}")

print(f"\nTotal messages in memory: {len(memory.get_messages())}")

---

## 6. Advanced Features

### Prompt Caching

**Prompt Caching** reduces latency and cost on repeating tokens.

#### Two Types:
1. **Implicit**: The provider does this automatically (OpenAI, Gemini)
2. **Explicit**: You indicate the cache for more efficient savings
   - ChatOpenAI → `prompt_cache_key`
   - Anthropic → `AnthropicPromptCachingMiddleware`

In [None]:
# Example with implicit caching (OpenAI)
# The same system message will be cached automatically
system_prompt = "You are an expert Python programmer. You write clean, efficient, and well-documented code."

# First call
messages1 = [
    SystemMessage(content=system_prompt),
    HumanMessage(content="Write a function to reverse a string.")
]
response1 = model.invoke(messages1)

# Second call with same system message (will benefit from caching)
messages2 = [
    SystemMessage(content=system_prompt),
    HumanMessage(content="Write a function to check if a number is prime.")
]
response2 = model.invoke(messages2)

print("Prompt caching example:")
print("Both calls use the same system message, which gets cached.")
print(f"\nResponse 1 excerpt: {response1.content[:100]}...")
print(f"\nResponse 2 excerpt: {response2.content[:100]}...")

### Rate Limiting

Limits the number of invocations in a given time period.

In [None]:
import time
from datetime import datetime

class RateLimiter:
    def __init__(self, max_calls: int, time_window: int):
        """Initialize rate limiter.
        
        Args:
            max_calls: Maximum number of calls allowed
            time_window: Time window in seconds
        """
        self.max_calls = max_calls
        self.time_window = time_window
        self.calls = []
    
    def can_make_call(self) -> bool:
        """Check if a call can be made."""
        now = time.time()
        # Remove old calls outside the time window
        self.calls = [call_time for call_time in self.calls if now - call_time < self.time_window]
        return len(self.calls) < self.max_calls
    
    def record_call(self):
        """Record a call."""
        self.calls.append(time.time())

# Example: 3 calls per 10 seconds
rate_limiter = RateLimiter(max_calls=3, time_window=10)

print("Rate limiting example (3 calls per 10 seconds):")
for i in range(5):
    if rate_limiter.can_make_call():
        rate_limiter.record_call()
        print(f"Call {i+1}: Allowed at {datetime.now().strftime('%H:%M:%S')}")
    else:
        print(f"Call {i+1}: Rate limited at {datetime.now().strftime('%H:%M:%S')}")
    time.sleep(1)

### Token Usage Tracking

Keeps track of tokens as part of the invocation response.

In [None]:
# Token usage is included in the response metadata
response = model.invoke("Explain artificial intelligence in 2 sentences.")

print("Token usage information:")
if hasattr(response, 'usage_metadata'):
    print(f"Input tokens: {response.usage_metadata.get('input_tokens', 'N/A')}")
    print(f"Output tokens: {response.usage_metadata.get('output_tokens', 'N/A')}")
    print(f"Total tokens: {response.usage_metadata.get('total_tokens', 'N/A')}")
elif hasattr(response, 'response_metadata'):
    print(f"Response metadata: {response.response_metadata}")
else:
    print("Token information not available in this response")

### Configuration of Invocation

Allows you to configure invocations via a dictionary, helpful for monitoring applications in production.

In [None]:
from langchain_core.runnables import RunnableConfig

# Configure the invocation
config = RunnableConfig(
    tags=["production", "user-query"],
    metadata={
        "user_id": "user_123",
        "session_id": "session_456",
        "environment": "production"
    },
    max_concurrency=2
)

# Invoke with config
response = model.invoke(
    "What is machine learning?",
    config=config
)

print("Invocation with configuration:")
print(f"Response: {response.content[:100]}...")
print(f"\nConfiguration used:")
print(f"  Tags: {config.get('tags')}")
print(f"  Metadata: {config.get('metadata')}")

---

## 7. Structured Output

One of the key features of models is the ability to generate **structured output** - responses that follow a defined format.

In [None]:
from pydantic import BaseModel, Field
from typing import List

# Define output schema
class Person(BaseModel):
    """Information about a person."""
    name: str = Field(description="The person's name")
    age: int = Field(description="The person's age")
    occupation: str = Field(description="The person's occupation")
    skills: List[str] = Field(description="List of the person's skills")

# Create a model with structured output
structured_model = model.with_structured_output(Person)

# Get structured output
result = structured_model.invoke(
    "Tell me about Alice, a 28-year-old software engineer who knows Python, JavaScript, and machine learning."
)

print("Structured Output:")
print(f"Name: {result.name}")
print(f"Age: {result.age}")
print(f"Occupation: {result.occupation}")
print(f"Skills: {', '.join(result.skills)}")
print(f"\nType: {type(result)}")

In [None]:
# More complex structured output
class Article(BaseModel):
    """A news article."""
    title: str = Field(description="Article title")
    summary: str = Field(description="Brief summary")
    key_points: List[str] = Field(description="Main points from the article")
    category: str = Field(description="Article category (tech, business, science, etc.)")

article_model = model.with_structured_output(Article)

result = article_model.invoke(
    "Write an article about the recent advances in AI language models, focusing on their capabilities and impact."
)

print("Structured Article Output:")
print(f"\nTitle: {result.title}")
print(f"\nCategory: {result.category}")
print(f"\nSummary: {result.summary}")
print(f"\nKey Points:")
for i, point in enumerate(result.key_points, 1):
    print(f"  {i}. {point}")

---

## 8. Coroutines and Async Operations

A **coroutine** is a special type of function that can pause, let other code run, and continue where it left off.

**Good for:** Increase responsiveness, handle tasks that involve waiting (like LLM and API calls)

In [None]:
import asyncio

# Async tool example
@tool
async def async_search(query: str) -> str:
    """Asynchronously search for information.
    
    Args:
        query: The search query
    
    Returns:
        Simulated search results
    """
    # Simulate async API call
    await asyncio.sleep(1)
    return f"Search results for '{query}': [Result 1], [Result 2], [Result 3]"

# Async model invocation
async def async_example():
    # Multiple concurrent invocations
    tasks = [
        model.ainvoke("What is Python?"),
        model.ainvoke("What is JavaScript?"),
        model.ainvoke("What is Go?")
    ]
    
    responses = await asyncio.gather(*tasks)
    
    for i, response in enumerate(responses, 1):
        print(f"\nResponse {i}: {response.content[:80]}...")

# Run async example
print("Running async operations (3 concurrent model calls):")
await async_example()

---

## 9. Complete Agent Example with All Features

Let's put it all together with a comprehensive example that demonstrates:
- Multiple tools
- Agent with memory
- Structured output
- Token tracking
- Configuration

In [None]:
# Define comprehensive tools
@tool
def calculate(expression: str) -> str:
    """Safely evaluate a mathematical expression.
    
    Args:
        expression: A mathematical expression (e.g., '2 + 2', '10 * 5')
    
    Returns:
        The result of the calculation
    """
    try:
        # Only allow safe operations
        allowed_chars = set('0123456789+-*/.() ')
        if not all(c in allowed_chars for c in expression):
            return "Error: Expression contains invalid characters"
        result = eval(expression)
        return f"The result of {expression} is {result}"
    except Exception as e:
        return f"Error calculating: {str(e)}"

@tool
def get_user_info(user_id: str) -> str:
    """Get information about a user.
    
    Args:
        user_id: The user's ID
    
    Returns:
        User information (simulated)
    """
    # Simulated user database
    users = {
        "alice": "Alice (ID: alice) - Software Engineer, specializes in Python and AI",
        "bob": "Bob (ID: bob) - Data Scientist, expert in machine learning",
        "charlie": "Charlie (ID: charlie) - Product Manager, 5 years experience"
    }
    return users.get(user_id.lower(), f"User {user_id} not found")

@tool
def create_task(title: str, priority: str = "medium") -> str:
    """Create a new task.
    
    Args:
        title: The task title
        priority: Task priority (low, medium, high)
    
    Returns:
        Confirmation message
    """
    task_id = hash(title) % 10000
    return f"Task created: '{title}' (ID: {task_id}, Priority: {priority})"

# Create comprehensive agent
comprehensive_tools = [calculate, get_user_info, create_task, get_weather, get_current_time]

comprehensive_agent = create_agent(
    model=model,
    tools=comprehensive_tools,
    system_prompt="""You are a helpful AI assistant with access to multiple tools.
    You can perform calculations, get user information, create tasks, check weather, and get the current time.
    Always be helpful and provide detailed responses."""
)

print("Comprehensive agent created with tools:")
for tool in comprehensive_tools:
    print(f"  - {tool.name}")

In [None]:
# Test the comprehensive agent
test_queries = [
    "Calculate 25 * 17 + 100",
    "Get information about user alice and create a high priority task to review her code",
    "What's the weather in San Francisco and what time is it?"
]

for i, query in enumerate(test_queries, 1):
    print(f"\n{'='*60}")
    print(f"Query {i}: {query}")
    print('='*60)
    
    result = comprehensive_agent.invoke({
        "messages": [HumanMessage(content=query)]
    })
    
    print(f"\nFinal Response: {result['messages'][-1].content}")