# üõ†Ô∏è Exploring Tools in LangChain

## Learning Objectives
In this notebook, you will learn:
1. **What are Tools?** - Interfaces that enable LLMs to interact with external systems
2. **Built-in Tools** - How to use pre-built tools like Wikipedia and Tavily Search
3. **Custom Tools** - How to create your own tools with proper type validation
4. **Tool Calling** - How LLMs automatically select and invoke the right tools

## Prerequisites
- Basic understanding of LangChain
- Familiarity with Python decorators and type hints
- API keys for OpenAI, Tavily, and WeatherAPI (optional)

---
## üì¶ Step 1: Environment Setup

First, let's set up our environment by importing necessary libraries and suppressing warnings.

In [None]:
# ============================================================================
# ENVIRONMENT SETUP: Suppress Warnings for Cleaner Output
# ============================================================================
# We suppress warnings to keep our notebook output clean and focused
# In production, you may want to review warnings for debugging

from warnings import filterwarnings
filterwarnings('ignore')

print("‚úÖ Warnings suppressed successfully!")

In [None]:
# ============================================================================
# OPTIONAL: Install Required Packages
# ============================================================================
# Uncomment and run these lines if you haven't installed the packages yet
# These packages are required for LangChain functionality

# Core LangChain packages
# !pip install langchain==0.3.14
# !pip install langchain-openai==0.3.0
# !pip install langchain-community==0.3.14

# Data extraction and utility packages
# !pip install wikipedia==1.4.0    # For Wikipedia tool
# !pip install markitdown           # For extracting content from URLs
# !pip install rich                 # For pretty-printing JSON output

---
## üîë Step 2: Configure LLM and API Keys

We'll initialize our LLM using helper functions that handle API key management.

> **Note**: Make sure you have a `.env` file with your API keys:
> - `OPENAI_API_KEY` - For OpenAI models
> - `GROQ_API_KEY` - For Groq models
> - `TAVILY_API_KEY` - For Tavily Search ([Get free key](https://tavily.com/#api))
> - `WEATHER_API_KEY` - For WeatherAPI ([Get free key](https://www.weatherapi.com/signup.aspx))

In [None]:
# ============================================================================
# SETUP: Import LLM Helper Functions & Initialize LLM
# ============================================================================
# We use helper functions to create LLM instances with proper configuration
# These functions handle API key loading from .env and model configuration

import os
import sys

# Add parent directory to path for importing helpers
sys.path.append(os.path.abspath("../.."))

# Import our LLM factory functions
# - get_groq_llm(): Creates a Groq-hosted LLM (fast inference with open-source models)
# - get_openai_llm(): Creates an OpenAI GPT model
# - get_databricks_llm(): Creates a Databricks-hosted LLM
from helpers.utils import get_groq_llm, get_openai_llm, get_databricks_llm

print("‚úÖ LLM helpers imported successfully!")

# -----------------------------------------------------------------------------
# Initialize the LLM
# Choose your preferred LLM provider by uncommenting the appropriate line
# -----------------------------------------------------------------------------
llm = get_databricks_llm("databricks-gemini-2-5-pro")  # Databricks-hosted Gemini
# llm = get_groq_llm()        # Fast, open-source models hosted by Groq
# llm = get_openai_llm()      # OpenAI's GPT models

# Print which LLM we're using
if hasattr(llm, 'model_name'):
    print(f"ü§ñ LLM initialized: {llm.model_name}")
elif hasattr(llm, 'model'):
    print(f"ü§ñ LLM initialized: {llm.model} (Databricks)")
else:
    print("ü§ñ LLM initialized: Unknown model")

---
## üîß Part 1: Exploring Built-in Tools

LangChain provides several pre-built tools that you can use out-of-the-box. These tools wrap common APIs and services, making it easy to give your LLM access to external capabilities.

### Key Concepts:
- **Tool**: An interface that an agent/LLM can use to interact with the world
- **API Wrapper**: Handles the low-level API calls
- **Tool Attributes**: Each tool has `name`, `description`, and `args`

### 1.1 üìö Wikipedia Tool

The Wikipedia tool enables you to tap into Wikipedia's vast knowledge base through their API. This is useful for retrieving factual information about entities, concepts, and events.

In [None]:
# ============================================================================
# WIKIPEDIA TOOL: Setup and Configuration
# ============================================================================
from langchain_community.tools import WikipediaQueryRun
from langchain_community.utilities import WikipediaAPIWrapper

# Configure the Wikipedia API wrapper
# - top_k_results: Number of Wikipedia pages to return (default: 3)
# - doc_content_chars_max: Maximum characters to return per document (default: 4000)
wiki_api_wrapper = WikipediaAPIWrapper(
    top_k_results=3,
    doc_content_chars_max=8000
)

# Create the Wikipedia tool by wrapping the API
wiki_tool = WikipediaQueryRun(
    api_wrapper=wiki_api_wrapper, 
    features="lxml"  # Use lxml parser for HTML parsing
)

# Inspect the tool's attributes - these are what the LLM uses to understand the tool
print("üìã Tool Name:", wiki_tool.name)
print("üìù Tool Description:", wiki_tool.description)
print("üì¶ Tool Arguments:", wiki_tool.args)

In [None]:
# ============================================================================
# WIKIPEDIA TOOL: Demonstration
# ============================================================================
# Let's test the Wikipedia tool by searching for information about Microsoft

result = wiki_tool.invoke({"query": "Microsoft"})
print("üîç Wikipedia Search Result for 'Microsoft':")
print("=" * 60)
print(result[:2000])  # Print first 2000 characters for readability
print("\n... [truncated for display]")

#### üé® Customizing Built-in Tools

You can customize any built-in tool by wrapping it with your own name, description, and behavior. This is useful when you want to:
- Provide a more specific description for your use case
- Rename the tool for clarity
- Add pre/post-processing logic

In [None]:
# ============================================================================
# CUSTOMIZING TOOLS: Creating a Custom Wikipedia Tool
# ============================================================================
from langchain_core.tools import Tool

# Create a custom version of the Wikipedia tool with our own name and description
wiki_tool_custom = Tool(
    name="Wikipedia",
    func=wiki_api_wrapper.run,  # Use the same underlying function
    description="Useful when you need a detailed answer about general knowledge, "
                "historical facts, famous people, places, or scientific concepts."
)

# Compare the custom tool attributes
print("üìã Custom Tool Name:", wiki_tool_custom.name)
print("üìù Custom Tool Description:", wiki_tool_custom.description)
print("üì¶ Custom Tool Arguments:", wiki_tool_custom.args)

# Test the custom tool (note: uses 'tool_input' instead of 'query')
print("\nüîç Testing custom tool with 'AI':")
print(wiki_tool_custom.invoke({"tool_input": "AI"})[:500])

### 1.2 üîç Tavily Search Tool

**Tavily Search API** is a search engine optimized for LLMs and RAG (Retrieval-Augmented Generation). It provides:
- Real-time web search results
- Clean, structured output
- Advanced search capabilities
- Raw content extraction from web pages

In [None]:
# ============================================================================
# TAVILY SEARCH TOOL: Setup and Configuration
# ============================================================================
# Note: Requires TAVILY_API_KEY in your environment variables

from langchain_community.tools.tavily_search import TavilySearchResults

# Configure the Tavily search tool
# - max_results: Maximum number of search results to return
# - search_depth: 'basic' or 'advanced' (advanced provides more detailed results)
# - include_raw_content: Whether to include the raw HTML content
tavily_tool = TavilySearchResults(
    max_results=5,
    search_depth='advanced',
    include_raw_content=True
)

# Inspect tool attributes
print("üìã Tool Name:", tavily_tool.name)
print("üìù Tool Description:", tavily_tool.description)
print("üì¶ Tool Arguments:", tavily_tool.args)

In [None]:
# ============================================================================
# TAVILY SEARCH TOOL: Demonstration
# ============================================================================
# Let's search for information about Microsoft

results = tavily_tool.invoke("Tell me about Microsoft")

print("üîç Tavily Search Results:")
print("=" * 60)
for i, result in enumerate(results[:3], 1):  # Show first 3 results
    print(f"\nüìÑ Result {i}:")
    print(f"   URL: {result.get('url', 'N/A')}")
    print(f"   Content: {result.get('content', 'N/A')[:200]}...")

---
## üî® Part 2: Building Custom Tools

While built-in tools are useful, you'll often need to create custom tools for your specific use cases. LangChain provides several ways to create tools.

### Key Components of a Tool:
1. **Name** - A unique identifier for the tool
2. **Description** - Explains what the tool does (LLM uses this to decide when to use it)
3. **Args Schema** - JSON schema defining the input parameters
4. **Function** - The actual code that runs when the tool is invoked
5. **Return Direct** - Whether to return the result directly to the user

### 2.1 ‚ûï Building a Simple Math Tool

The simplest way to create a tool is using the `@tool` decorator. Let's create a basic multiplication tool.

In [None]:
# ============================================================================
# CUSTOM TOOLS: Simple Tool with @tool Decorator
# ============================================================================
from langchain_core.tools import tool

@tool
def multiply(a, b):
    """Multiply two numbers."""
    return a * b

# Inspect the automatically generated tool attributes
print("üìã Tool Name:", multiply.name)
print("üìù Tool Description:", multiply.description)
print("üì¶ Tool Arguments:", multiply.args)
print("üîß Tool Type:", type(multiply))

# Test the tool with different inputs
print("\nüßÆ Testing multiply tool:")
print(f"   2 √ó 3 = {multiply.invoke({'a': 2, 'b': 3})}")
print(f"   2.1 √ó 3.2 = {multiply.invoke({'a': 2.1, 'b': 3.2})}")

# Note: Without type hints, the tool accepts any type!
print(f"   2 √ó 'abc' = {multiply.invoke({'a': 2, 'b': 'abc'})} (string repeated!)")

### 2.2 üîí Building a Type-Safe Tool with Pydantic

The simple `@tool` decorator doesn't enforce type checking. For production use, you should use **Pydantic** schemas to validate inputs. This approach:
- Enforces type safety
- Provides better descriptions for each argument
- Generates proper JSON schemas for LLM understanding

In [None]:
# ============================================================================
# CUSTOM TOOLS: Type-Safe Tool with Pydantic Schema
# ============================================================================
from pydantic import BaseModel, Field
from langchain_core.tools import StructuredTool

# Define the input schema using Pydantic
# This provides type validation and rich descriptions
class CalculatorInput(BaseModel):
    """Input schema for calculator operations."""
    a: float = Field(description="The first number to multiply")
    b: float = Field(description="The second number to multiply")

# Define the function with proper type hints
def multiply_safe(a: float, b: float) -> float:
    """Multiply two numbers safely."""
    return a * b

# Create a StructuredTool with the Pydantic schema
multiply = StructuredTool.from_function(
    func=multiply_safe,
    name="multiply",
    description="Use this tool to multiply two numbers together. "
                "Both inputs must be valid numbers (integers or decimals).",
    args_schema=CalculatorInput,
    return_direct=True  # Return result directly without further processing
)

# Inspect the tool - note the improved argument schema
print("üìã Tool Name:", multiply.name)
print("üìù Tool Description:", multiply.description)
print("üì¶ Tool Arguments:", multiply.args)

# Test with valid input
print("\n‚úÖ Valid input (2 √ó 3):", multiply.invoke({"a": 2, "b": 3}))

# Test with invalid input - this will raise a validation error!
print("\n‚ùå Invalid input (2 √ó 'abc'):")
try:
    multiply.invoke({"a": 2, "b": 'abc'})
except Exception as e:
    print(f"   Validation Error: {type(e).__name__}")

### 2.3 üåê Building a Web Search & Information Extraction Tool

Let's create a more sophisticated tool that:
1. Searches the web using Tavily
2. Extracts content from the found URLs using MarkItDown
3. Returns clean, structured information

In [None]:
# ============================================================================
# CUSTOM TOOLS: Advanced Web Search & Content Extraction Tool
# ============================================================================
from markitdown import MarkItDown
from langchain_community.tools.tavily_search import TavilySearchResults
from tqdm import tqdm
import requests

# Initialize the search tool and markdown converter
tavily_tool = TavilySearchResults(
    max_results=5,
    search_depth='advanced',
    include_answer=False,
    include_raw_content=True
)
md = MarkItDown()

@tool
def search_web_extract_info(query: str) -> list:
    """
    Search the web for a query and extract useful information from the search results.
    
    This tool:
    1. Searches the web using Tavily Search API
    2. Visits each result URL
    3. Extracts and cleans the text content
    4. Returns a list of extracted documents
    
    Args:
        query: The search query to look up on the web
        
    Returns:
        list: A list of extracted text content from web pages
    """
    # Step 1: Search the web
    results = tavily_tool.invoke(query)
    docs = []
    
    # Step 2: Extract content from each URL
    for result in tqdm(results, desc="Extracting content"):
        try:
            extracted_info = md.convert(result['url'])
            text_title = extracted_info.title.strip()
            text_content = extracted_info.text_content.strip()
            docs.append(text_title + '\n' + text_content)
        except Exception as e:
            print(f'‚ö†Ô∏è Extraction blocked for url: {result["url"]}')
            pass
    
    return docs

print("‚úÖ Web search tool created successfully!")
print("üìã Tool Name:", search_web_extract_info.name)
print("üìù Tool Description:", search_web_extract_info.description[:100] + "...")

In [None]:
# ============================================================================
# WEB SEARCH TOOL: Demonstration
# ============================================================================
# Test the web search tool (this may take a few seconds)

docs = search_web_extract_info.invoke('OpenAI GPT-4o')

print(f"\nüìö Extracted {len(docs)} documents")
if docs:
    print("\nüìÑ First document preview:")
    print("=" * 60)
    print(docs[0][:500] + "...")

### 2.4 üå§Ô∏è Building a Weather Tool

Let's create a tool that fetches real-time weather data using the OpenWeatherMap API.

In [None]:
# ============================================================================
# CUSTOM TOOLS: Weather API Tool
# ============================================================================
import requests
import rich

# Get the Weather API key from environment
WEATHER_API_KEY = os.getenv('WEATHER_API_KEY')

@tool
def get_weather(query: str) -> dict:
    """
    Get the current weather for a city using OpenWeatherMap API.
    
    Args:
        query: The name of the city to get weather for (e.g., 'Bangalore', 'Mumbai')
        
    Returns:
        dict: Weather data including temperature, humidity, and conditions,
              or an error message if the city is not found
    """
    url = f"https://api.openweathermap.org/data/2.5/weather?q={query},IN&appid={WEATHER_API_KEY}&units=metric"
    
    response = requests.get(url)
    data = response.json()
    
    if data.get("name"):
        return data
    else:
        return {"error": "Weather Data Not Found", "city": query}

# Test the weather tool
print("üå§Ô∏è Testing Weather Tool:")
result = get_weather.invoke("Bangalore")
rich.print_json(data=result)

---
## ü§ñ Part 3: LLM Tool Calling

Now comes the exciting part! **Tool calling** (also known as function calling) is the ability for an LLM to:
1. Understand available tools from their descriptions
2. Decide which tool(s) to use based on user input
3. Generate the correct arguments for the tool
4. Execute the tool and incorporate results into its response

### Key Insight:
> The LLM doesn't actually execute the tools - it generates the tool calls (name + arguments). 
> Your code is responsible for actually running the tools!

### 3.1 üîó Native Tool Calling (Recommended)

Most modern LLMs (OpenAI, Anthropic, Gemini, etc.) have native support for tool calling. This is the recommended approach as it's more reliable and efficient.

In [None]:
# ============================================================================
# TOOL CALLING: Binding Tools to an LLM
# ============================================================================
# We create an LLM with tools "bound" to it
# This tells the LLM what tools are available and how to use them

tools = [multiply, search_web_extract_info, get_weather]
llm_with_tools = llm.bind_tools(tools)

print("‚úÖ LLM bound with tools:")
for t in tools:
    print(f"   - {t.name}: {t.description[:50]}...")

In [None]:
# ============================================================================
# TOOL CALLING: Let the LLM Decide Which Tools to Use
# ============================================================================
from langchain_core.messages import HumanMessage, ToolMessage
from pprint import pprint

# Create a prompt that requires multiple tools
prompt = """
Given only the tools at your disposal, mention tool calls for the following tasks:
Do not change the query given for any search tasks
1. What is 2.1 times 3.5
2. What is the current weather in Bangalore today
3. What are the 4 major Agentic AI Design Patterns
"""

# Invoke the LLM - it will return tool calls instead of a text response
results = llm_with_tools.invoke(prompt)

print("ü§ñ LLM decided to call these tools:")
print("=" * 60)
pprint(results.tool_calls)

In [None]:
# ============================================================================
# TOOL CALLING: Execute the Tool Calls
# ============================================================================
# Now we actually run the tools that the LLM requested

# Create a mapping of tool names to tool functions
toolkit = {
    "multiply": multiply,
    "search_web_extract_info": search_web_extract_info,
    "get_weather": get_weather
}

print("üîß Executing tool calls:")
print("=" * 60)

for tool_call in results.tool_calls:
    tool_name = tool_call["name"].lower()
    selected_tool = toolkit[tool_name]
    
    print(f"\nüìû Calling tool: {tool_call['name']}")
    print(f"   Arguments: {tool_call['args']}")
    
    tool_output = selected_tool.invoke(tool_call["args"])
    
    # Pretty print the output (truncate if too long)
    output_str = str(tool_output)
    if len(output_str) > 200:
        print(f"   Result: {output_str[:200]}...")
    else:
        print(f"   Result: {tool_output}")
    print("-" * 40)

### 3.2 üìù Prompt-Based Tool Calling (For LLMs Without Native Support)

Some older or open-source LLMs don't have native tool calling support. For these models, we can use a **prompt engineering** approach to get the LLM to output tool calls in a structured format (like JSON).

In [None]:
# ============================================================================
# PROMPT-BASED TOOL CALLING: Setup
# ============================================================================
from langchain_core.output_parsers import JsonOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.tools import render_text_description

# Render tool descriptions as plain text for the prompt
rendered_tools = render_text_description(tools)
print("üìã Tool descriptions for the prompt:")
print("=" * 60)
print(rendered_tools)

In [None]:
# ============================================================================
# PROMPT-BASED TOOL CALLING: Create the Prompt Template
# ============================================================================
# This prompt instructs the LLM to output tool calls as JSON

system_prompt = f"""\
You are an assistant that has access to the following set of tools.
Here are the names and descriptions for each tool:

{rendered_tools}

Given the user instructions, for each instruction do the following:
 - Return the name and input of the tool to use.
 - Return your response as a JSON blob with 'name' and 'arguments' keys.
 - The `arguments` should be a dictionary, with keys corresponding
   to the argument names and the values corresponding to the requested values.
"""

prompt = ChatPromptTemplate.from_messages([
    ("system", system_prompt),
    ("user", "{input}")
])

print("‚úÖ Prompt template created!")

In [None]:
# ============================================================================
# PROMPT-BASED TOOL CALLING: Create the Chain and Execute
# ============================================================================

# Create a chain: Prompt ‚Üí LLM ‚Üí JSON Parser
chain = prompt | llm | JsonOutputParser()

# Define our instructions (each will result in a tool call)
instructions = [
    {"input": "What is 2.1 times 3.5"},
    {"input": "What is the current weather in Greenland"},
    {"input": "Tell me about the current state of Agentic AI in the industry"}
]

# Run all instructions in parallel using map()
responses = chain.map().invoke(instructions)

print("ü§ñ LLM generated these tool calls:")
print("=" * 60)
for i, resp in enumerate(responses, 1):
    print(f"\n{i}. Tool: {resp['name']}")
    print(f"   Args: {resp['arguments']}")

In [None]:
# ============================================================================
# PROMPT-BASED TOOL CALLING: Execute the Tools
# ============================================================================

print("üîß Executing tool calls:")
print("=" * 60)

for tool_call in responses:
    tool_name = tool_call["name"].lower()
    selected_tool = toolkit[tool_name]
    
    print(f"\nüìû Calling tool: {tool_call['name']}")
    tool_output = selected_tool.invoke(tool_call["arguments"])
    
    # Pretty print the output (truncate if too long)
    output_str = str(tool_output)
    if len(output_str) > 300:
        print(f"   Result: {output_str[:300]}...")
    else:
        print(f"   Result: {tool_output}")

---
## üìù Summary

In this notebook, we learned:

### 1. Built-in Tools
- **WikipediaQueryRun**: Query Wikipedia for information
- **TavilySearchResults**: Advanced web search optimized for LLMs
- Tools can be customized with your own name and description

### 2. Custom Tools
- Use `@tool` decorator for simple tools
- Use `StructuredTool` with Pydantic for type-safe tools
- Tools should have clear descriptions for LLM understanding

### 3. Tool Calling
- **Native**: Use `llm.bind_tools()` for LLMs with built-in support
- **Prompt-based**: Use prompt engineering for other LLMs
- The LLM decides which tools to use and generates arguments
- Your code is responsible for executing the actual tools

### Next Steps
- Move on to the next notebook to see how to build complete **Tool-Calling Agents**
- Learn about the **Agent Loop** and how agents handle multi-step tasks