# Module 4: External Tools & Integrations with LangChain 1.0

**Building on Previous Modules:**
- Module 1: Built basic agents
- Module 2: Learned LangGraph workflows
- Module 3: Mastered @tool decorator
- Module 4: **Connect to the real world with external integrations!**

**What you'll learn:**
- 📊 Load data from CSV files (employee records)
- 📄 Process PDF documents (HR policies, handbooks)
- 🔍 Perform web searches (external information)
- 🤖 Build intelligent document Q&A systems
- 🔧 Combine multiple data sources in one agent

**Real HR Use Case:**
Build an intelligent HR assistant that can:
- Query employee database (CSV)
- Answer policy questions (PDF)
- Find external HR best practices (Web Search)
- Provide comprehensive, multi-source answers

**Time:** 2-3 hours

## Setup: Install Dependencies

In [None]:
# Install LangChain 1.0 and integration packages
!pip install --pre -U langchain langchain-openai langgraph langchain-community
!pip install pypdf  # For PDF processing
!pip install faiss-cpu  # For vector storage
!pip install duckduckgo-search  # For web search (free, no API key needed!)

## Setup: Configure API Keys

In [None]:
from google.colab import userdata
import os

os.environ['OPENAI_API_KEY'] = userdata.get('OPENAI_API_KEY')
print("✅ API Keys configured!")

---
# Lab 1: CSV Loader - Employee Database 📊

**Objective:** Load and query employee data from CSV files.

**Why CSV Loaders?**
- Most HR systems export to CSV
- Easy to update and maintain
- Perfect for structured employee data

## Part 1: Create Sample Employee CSV

In [None]:
import csv

# Create sample employee data matching our HR use case
employee_data = [
    ["employee_id", "name", "department", "position", "email", "phone", "hire_date", "salary", "leave_balance"],
    ["101", "Priya Sharma", "Engineering", "Senior Developer", "priya.sharma@company.com", "+91-9876543210", "2020-03-15", "₹12,00,000", "12"],
    ["102", "Rahul Verma", "Engineering", "Manager", "rahul.verma@company.com", "+91-9876543211", "2018-06-20", "₹18,00,000", "8"],
    ["103", "Anjali Patel", "HR", "Director", "anjali.patel@company.com", "+91-9876543212", "2015-01-10", "₹25,00,000", "15"],
    ["104", "Arjun Reddy", "Sales", "Team Lead", "arjun.reddy@company.com", "+91-9876543213", "2019-09-05", "₹15,00,000", "10"],
    ["105", "Sneha Gupta", "Marketing", "Specialist", "sneha.gupta@company.com", "+91-9876543214", "2021-11-22", "₹10,00,000", "5"],
    ["106", "Karan Singh", "Engineering", "Junior Developer", "karan.singh@company.com", "+91-9876543215", "2023-02-14", "₹8,00,000", "20"],
    ["107", "Pooja Reddy", "HR", "Recruiter", "pooja.reddy@company.com", "+91-9876543216", "2022-07-01", "₹9,00,000", "18"],
    ["108", "Vikram Patel", "Sales", "Executive", "vikram.patel@company.com", "+91-9876543217", "2020-12-10", "₹11,00,000", "14"],
]

# Write to CSV file
with open('employees.csv', 'w', newline='', encoding='utf-8') as file:
    writer = csv.writer(file)
    writer.writerows(employee_data)

print("✅ Employee CSV created!")
print(f"Total employees: {len(employee_data) - 1}")
print("\nFile: employees.csv")

## Part 2: Load CSV with LangChain CSVLoader

In [None]:
from langchain_community.document_loaders import CSVLoader

# Load the CSV file
loader = CSVLoader(
    file_path='employees.csv',
    source_column='employee_id',  # Use employee_id as the source
    encoding='utf-8'
)

documents = loader.load()

print(f"✅ Loaded {len(documents)} employee records\n")
print("Sample Document:")
print("=" * 70)
print(f"Content:\n{documents[0].page_content}")
print(f"\nMetadata: {documents[0].metadata}")

## Part 3: Create a @tool for CSV Querying

In [None]:
from langchain_core.tools import tool
from typing import Annotated
import csv

@tool
def search_employee_database(query: Annotated[str, "Search query for employee information (name, ID, department, etc.)"]) -> str:
    """Search the employee database by name, ID, department, or position.
    Returns matching employee information."""
    
    query_lower = query.lower()
    results = []
    
    with open('employees.csv', 'r', encoding='utf-8') as file:
        reader = csv.DictReader(file)
        for row in reader:
            # Search across multiple fields
            searchable_text = ' '.join([
                row.get('employee_id', ''),
                row.get('name', ''),
                row.get('department', ''),
                row.get('position', '')
            ]).lower()
            
            if query_lower in searchable_text:
                results.append(
                    f"ID: {row['employee_id']} | {row['name']} | "
                    f"{row['department']} - {row['position']} | "
                    f"Leave: {row['leave_balance']} days | "
                    f"Email: {row['email']}"
                )
    
    if results:
        return "\n".join(results)
    return f"No employees found matching '{query}'"

# Test the tool
print("Testing Employee Search Tool:")
print("=" * 70)
result = search_employee_database.invoke({"query": "Engineering"})
print(result)

---
# Lab 2: PDF Loader - HR Policy Documents 📄

**Objective:** Process PDF documents and make them queryable.

**Why PDF Loaders?**
- Company policies are usually in PDF
- Employee handbooks, contracts
- Need to answer questions from these documents

## Part 1: Create Sample HR Policy PDF

In [None]:
from reportlab.lib.pagesizes import letter
from reportlab.platypus import SimpleDocTemplate, Paragraph, Spacer
from reportlab.lib.styles import getSampleStyleSheet

# Install reportlab if needed
!pip install -q reportlab

# Create HR Policy PDF
doc = SimpleDocTemplate("hr_policy.pdf", pagesize=letter)
styles = getSampleStyleSheet()
story = []

# Title
story.append(Paragraph("COMPANY HR POLICIES", styles['Title']))
story.append(Spacer(1, 12))

# Policy content
policies = [
    ("Leave Policy", "Employees are entitled to 20 days of paid leave per year. Leave must be approved by the immediate supervisor at least 2 weeks in advance for planned absences. Sick leave requires medical documentation for absences exceeding 3 consecutive days. Unused leave can be carried forward up to 5 days to the next year."),
    ("Work From Home Policy", "Employees can work from home up to 2 days per week with manager approval. Remote work requires stable internet connection and dedicated workspace. Core working hours (10 AM - 4 PM) must be maintained. All communication tools must be active during working hours."),
    ("Performance Review", "Performance reviews are conducted bi-annually in June and December. Reviews assess goal achievement, skill development, and team contribution. Salary increments are based on performance ratings. Employees can request additional feedback sessions with managers at any time."),
    ("Benefits", "All full-time employees receive health insurance, life insurance, and provident fund benefits. Health insurance covers employee and immediate family. Annual health check-ups are provided. Education reimbursement up to ₹50,000 per year for job-related courses."),
    ("Code of Conduct", "Employees must maintain professional behavior and respect colleagues. Discrimination or harassment of any kind is strictly prohibited. Company resources should be used responsibly. Confidential information must not be shared externally."),
]

for title, content in policies:
    story.append(Paragraph(f"<b>{title}</b>", styles['Heading2']))
    story.append(Spacer(1, 6))
    story.append(Paragraph(content, styles['BodyText']))
    story.append(Spacer(1, 12))

doc.build(story)
print("✅ HR Policy PDF created: hr_policy.pdf")

## Part 2: Load and Process PDF

In [None]:
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Load PDF
pdf_loader = PyPDFLoader("hr_policy.pdf")
pdf_documents = pdf_loader.load()

print(f"✅ Loaded {len(pdf_documents)} pages from PDF\n")
print("First page content:")
print("=" * 70)
print(pdf_documents[0].page_content[:500])

# Split into chunks for better retrieval
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=50
)
pdf_chunks = text_splitter.split_documents(pdf_documents)
print(f"\n✅ Split into {len(pdf_chunks)} chunks for processing")

## Part 3: Create Vector Store for PDF Q&A

In [None]:
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS

# Create embeddings and vector store
embeddings = OpenAIEmbeddings()
vectorstore = FAISS.from_documents(pdf_chunks, embeddings)

print("✅ Vector store created from HR policy PDF")
print(f"Indexed {len(pdf_chunks)} text chunks")

# Test retrieval
query = "What is the leave policy?"
results = vectorstore.similarity_search(query, k=2)

print(f"\nTest Query: '{query}'")
print("=" * 70)
print("Top Result:")
print(results[0].page_content)

## Part 4: Create @tool for Policy Questions

In [None]:
@tool
def query_hr_policies(question: Annotated[str, "Question about company HR policies"]) -> str:
    """Answer questions about company HR policies including leave, work from home, 
    performance reviews, benefits, and code of conduct."""
    
    # Retrieve relevant policy sections
    relevant_docs = vectorstore.similarity_search(question, k=3)
    
    # Combine context
    context = "\n\n".join([doc.page_content for doc in relevant_docs])
    
    # Use LLM to answer based on context
    from langchain_openai import ChatOpenAI
    llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
    
    prompt = f"""Based on the following HR policy information, answer the question.

Policy Information:
{context}

Question: {question}

Answer concisely based only on the policy information provided:"""
    
    response = llm.invoke(prompt)
    return response.content

# Test the tool
print("Testing HR Policy Query Tool:")
print("=" * 70)
result = query_hr_policies.invoke({"question": "How many work from home days are allowed?"})
print(result)

---
# Lab 3: Web Search Integration 🔍

**Objective:** Add web search capability for external information.

**Why Web Search?**
- Find latest HR best practices
- Research industry standards
- Get current information not in company docs

## Part 1: Set up DuckDuckGo Search (Free!)

In [None]:
from langchain_community.tools import DuckDuckGoSearchRun

# Create search tool
search = DuckDuckGoSearchRun()

# Test it
print("Testing Web Search:")
print("=" * 70)
result = search.run("best HR practices for remote work 2025")
print(result[:500])  # First 500 chars

## Part 2: Create Custom Search Tool

In [None]:
@tool
def search_hr_best_practices(topic: Annotated[str, "HR topic to search for (e.g., 'remote work policies', 'employee benefits')"]) -> str:
    """Search the web for HR best practices and industry standards on a specific topic.
    Useful when company policies don't have information or when you need external references."""
    
    search_tool = DuckDuckGoSearchRun()
    query = f"HR best practices {topic} 2025"
    
    try:
        results = search_tool.run(query)
        return f"Web search results for '{topic}':\n\n{results[:1000]}"  # Limit to 1000 chars
    except Exception as e:
        return f"Search unavailable: {str(e)}"

# Test
print("Testing HR Best Practices Search:")
print("=" * 70)
result = search_hr_best_practices.invoke({"topic": "employee wellness programs"})
print(result[:600])

---
# Lab 4: Building the Complete HR Assistant 🤖

**Objective:** Combine all tools into one intelligent HR agent.

**The agent can:**
- ✅ Search employee database (CSV)
- ✅ Answer policy questions (PDF)
- ✅ Find external best practices (Web)
- ✅ Provide comprehensive, multi-source answers

## Create the Multi-Source HR Agent

In [None]:
from langchain.agents import create_agent
from langchain_openai import ChatOpenAI

# Collect all our tools
hr_tools = [
    search_employee_database,
    query_hr_policies,
    search_hr_best_practices
]

# Create the agent
hr_agent = create_agent(
    model="openai:gpt-4o-mini",
    tools=hr_tools,
    prompt="""You are an intelligent HR assistant with access to:
    1. Employee database (search_employee_database)
    2. Company HR policies (query_hr_policies)
    3. External HR best practices (search_hr_best_practices)
    
    When answering questions:
    - First check if you need employee-specific information
    - Then consult company policies for internal guidelines
    - Use web search only for external best practices or topics not covered internally
    - Provide comprehensive answers citing your sources
    
    Be helpful, professional, and cite which source you're using."""
)

print("✅ Comprehensive HR Agent created!")
print(f"\nAvailable tools: {len(hr_tools)}")
for tool in hr_tools:
    print(f"  - {tool.name}")

## Test the Complete System

In [None]:
def ask_hr_agent(question: str):
    """Helper function to ask the HR agent a question."""
    print(f"\n{'='*70}")
    print(f"Question: {question}")
    print(f"{'='*70}")
    
    result = hr_agent.invoke({
        "messages": [{"role": "user", "content": question}]
    })
    
    print(f"\n{result['messages'][-1].content}")
    print(f"\n{'='*70}")

# Test 1: Employee lookup
ask_hr_agent("Who are the employees in the Engineering department and how much leave do they have?")

# Test 2: Policy question
ask_hr_agent("What is our company's work from home policy?")

# Test 3: Combined query
ask_hr_agent("How many leave days does Priya Sharma have? Is this in line with our company policy?")

# Test 4: External best practices
ask_hr_agent("What are the current best practices for hybrid work policies in 2025?")

---
# Lab 5: Advanced Integration - Combining with LangGraph 🔧

**Objective:** Build a workflow that uses multiple document sources strategically.

In [None]:
from typing import TypedDict, Annotated
from langgraph.graph import StateGraph, START, END
from langgraph.graph.message import add_messages

class HRQueryState(TypedDict):
    """State for HR query workflow."""
    query: str
    employee_info: str
    policy_info: str
    external_info: str
    final_answer: str
    messages: Annotated[list, add_messages]

# Define workflow nodes
def check_employee_data(state: HRQueryState):
    """Check if query needs employee data."""
    query = state['query'].lower()
    needs_employee_data = any(keyword in query for keyword in 
                             ['employee', 'priya', 'rahul', 'anjali', 'who', 'staff'])
    
    if needs_employee_data:
        result = search_employee_database.invoke({"query": state['query']})
        return {"employee_info": result}
    return {"employee_info": "Not needed"}

def check_policies(state: HRQueryState):
    """Check company policies."""
    query = state['query'].lower()
    needs_policy = any(keyword in query for keyword in 
                      ['policy', 'leave', 'work from home', 'benefits', 'review'])
    
    if needs_policy:
        result = query_hr_policies.invoke({"question": state['query']})
        return {"policy_info": result}
    return {"policy_info": "Not needed"}

def check_external(state: HRQueryState):
    """Check external best practices if needed."""
    query = state['query'].lower()
    needs_external = any(keyword in query for keyword in 
                        ['best practice', 'industry', 'standard', 'current', '2025'])
    
    if needs_external:
        result = search_hr_best_practices.invoke({"topic": state['query']})
        return {"external_info": result[:500]}  # Limit length
    return {"external_info": "Not needed"}

def synthesize_answer(state: HRQueryState):
    """Combine all information into final answer."""
    from langchain_openai import ChatOpenAI
    llm = ChatOpenAI(model="gpt-4o-mini")
    
    prompt = f"""Based on the following information, answer the user's question comprehensively.

User Question: {state['query']}

Employee Information: {state.get('employee_info', 'None')}

Company Policy: {state.get('policy_info', 'None')}

External Best Practices: {state.get('external_info', 'None')}

Provide a clear, comprehensive answer citing relevant sources:"""
    
    response = llm.invoke(prompt)
    return {
        "final_answer": response.content,
        "messages": [("assistant", response.content)]
    }

# Build workflow
workflow = StateGraph(HRQueryState)
workflow.add_node("check_employee", check_employee_data)
workflow.add_node("check_policy", check_policies)
workflow.add_node("check_external", check_external)
workflow.add_node("synthesize", synthesize_answer)

workflow.add_edge(START, "check_employee")
workflow.add_edge("check_employee", "check_policy")
workflow.add_edge("check_policy", "check_external")
workflow.add_edge("check_external", "synthesize")
workflow.add_edge("synthesize", END)

hr_workflow = workflow.compile()
print("✅ HR Workflow created!")

## Test the Workflow

In [None]:
# Test the workflow
test_query = "How many leave days does Rahul Verma have? Is this aligned with company policy and industry standards?"

result = hr_workflow.invoke({
    "query": test_query,
    "employee_info": "",
    "policy_info": "",
    "external_info": "",
    "final_answer": "",
    "messages": []
})

print(f"\n{'='*70}")
print(f"Query: {test_query}")
print(f"{'='*70}")
print(f"\nFinal Answer:\n{result['final_answer']}")

---
# Summary & Key Learnings

## What You Built:

1. **CSV Loader** 📊
   - Loaded employee database
   - Created searchable employee records
   - Built @tool for employee queries

2. **PDF Loader** 📄
   - Processed HR policy documents
   - Created vector store for Q&A
   - Built @tool for policy questions

3. **Web Search** 🔍
   - Integrated DuckDuckGo search
   - Created tool for external best practices
   - No API key needed!

4. **Complete HR Agent** 🤖
   - Combined all data sources
   - Intelligent tool selection
   - Multi-source answers

5. **LangGraph Workflow** 🔧
   - Strategic data retrieval
   - Parallel processing
   - Comprehensive synthesis

## Integration Patterns:

| Data Source | Loader | Use Case | Best For |
|-------------|--------|----------|----------|
| CSV | CSVLoader | Structured data | Employee records, sales data |
| PDF | PyPDFLoader | Documents | Policies, contracts, reports |
| Web | DuckDuckGoSearch | External info | Best practices, news |
| Google Drive | GoogleDriveLoader | Cloud docs | Shared documents |

## Best Practices:

1. **Choose the Right Loader**
   - CSV for structured tabular data
   - PDF for formatted documents
   - Web search for current external information

2. **Optimize Vector Stores**
   - Chunk documents appropriately (300-500 chars)
   - Use overlap for context preservation
   - Index strategically

3. **Tool Design**
   - Clear, specific purposes
   - Good docstrings for LLM understanding
   - Handle errors gracefully

4. **Multi-Source Strategy**
   - Internal data first (CSV, PDF)
   - External when needed (Web)
   - Synthesize comprehensively

## Next Steps:

- Explore more loaders: Google Docs, Notion, Confluence
- Add more advanced retrieval (semantic search)
- Implement caching for performance
- Build specialized agents for different departments
- Deploy as a production service

---

**More Integrations to Explore:**
- Slack integration for team communication
- Google Drive for document access
- Notion for knowledge bases
- SQL databases for live data
- APIs for real-time information

Check the [LangChain Integrations](https://docs.langchain.com/oss/python/integrations/providers/all_providers) for 200+ more!

# Exercises

## Exercise 1: Add More Employee Data
Create a second CSV with department information (budget, headcount, manager) and create a tool to query it.

## Exercise 2: PDF Processing
Create an employee handbook PDF and add a tool to search through it.

## Exercise 3: Multi-Document Q&A
Build a system that can answer questions requiring information from both CSV and PDF.

## Exercise 4: Advanced Search
Implement a tool that searches the web for competitive salary information for specific roles.

## Bonus: Build a Complete HR Portal
Combine all learnings from Modules 1-4 to build a comprehensive HR system with:
- Employee management (CSV)
- Policy Q&A (PDF)
- External benchmarking (Web)
- Workflow automation (LangGraph)
- Professional tools (@tool decorator)