# Day 1, Session 3: Building Agent Workflows with LangGraph

## From Simple Chains to Complex Workflows

In the previous session, we built a ReAct agent from scratch. Now we'll use LangGraph to create more sophisticated agent workflows with state management, conditional routing, and parallel execution. This is how production agent systems are built.

### What is LangGraph?

LangGraph is a framework for building stateful, multi-step applications with LLMs. It provides:
- **State Management**: Track data throughout the workflow
- **Graph Structure**: Define complex agent flows
- **Conditional Logic**: Route based on outputs
- **Parallel Execution**: Run multiple steps simultaneously
- **Checkpointing**: Save and resume workflows

### Our Goal

Build a production-ready invoice processing workflow that:
1. Extracts invoice data
2. Validates multiple aspects in parallel
3. Makes approval decisions
4. Handles edge cases gracefully

In [None]:
# Download real invoice and receipt images first
import requests
import zipfile
import io
import os

# Dropbox shared link for the folder
dropbox_url = "https://www.dropbox.com/scl/fo/m9hyfmvi78snwv0nh34mo/AMEXxwXMLAOeve-_yj12ck8?rlkey=urinkikgiuven0fro7r4x5rcu&st=hv3of7g7&dl=1"

print(f"Downloading real invoice data from: {dropbox_url}")

try:
    response = requests.get(dropbox_url)
    response.raise_for_status()

    # Read the content as a zip file
    with zipfile.ZipFile(io.BytesIO(response.content)) as z:
        # Extract all contents to a directory named 'downloaded_images'
        z.extractall("downloaded_images")

    print("✅ Downloaded and extracted images to 'downloaded_images' folder.")
    
    # List downloaded files
    for root, dirs, files in os.walk("downloaded_images"):
        for file in files:
            print(f"  📄 {os.path.join(root, file)}")

except Exception as e:
    print(f"❌ Error downloading images: {e}")

# Install required packages
!pip install -q langgraph langchain langchain-community

# Configuration for course LLM server
OLLAMA_URL = "http://XX.XX.XX.XX"  # Instructor provides
API_TOKEN = "YOUR_TOKEN_HERE"
MODEL = "qwen3:8b"

## Step 1: Create State Definition

State is the core concept in LangGraph. It holds all data that flows through your workflow.

In [None]:
from typing import TypedDict, List, Optional, Dict, Any
from dataclasses import dataclass
from datetime import datetime, timedelta
import json

# Define our workflow state
class InvoiceState(TypedDict):
    # Input
    invoice_id: str
    raw_text: Optional[str]
    
    # Extracted data
    vendor_name: Optional[str]
    amount: Optional[float]
    currency: Optional[str]
    invoice_date: Optional[str]
    due_date: Optional[str]
    payment_terms: Optional[str]
    vat_number: Optional[str]
    line_items: Optional[List[Dict]]
    
    # Validation results
    vat_valid: Optional[bool]
    vendor_risk_score: Optional[float]
    payment_terms_approved: Optional[bool]
    
    # Workflow metadata
    errors: List[str]
    warnings: List[str]
    approval_status: Optional[str]  # 'approved', 'rejected', 'manual_review'
    processing_time: Optional[float]
    steps_executed: List[str]

print("✅ State structure defined")
print("State tracks:")
print("- Invoice data (vendor, amount, dates)")
print("- Validation results (VAT, risk, terms)")
print("- Workflow metadata (errors, approval status)")

## Step 2: Create Node Functions

Each node in our graph is a function that takes state and returns updated state.

In [None]:
import random
import time

# Node 1: Extract invoice data
def extract_invoice_data(state: InvoiceState) -> InvoiceState:
    """Extract structured data from invoice"""
    print("📄 Extracting invoice data...")
    
    # Simulate extraction (in production, would use OCR/NLP)
    if state["invoice_id"] == "INV-2024-001":
        state["vendor_name"] = "TechSupplies Co."
        state["amount"] = 15000.00
        state["currency"] = "USD"
        state["invoice_date"] = "2024-01-15"
        state["payment_terms"] = "Net 30"
        state["vat_number"] = "GB123456789"
        state["line_items"] = [
            {"description": "Laptops", "quantity": 5, "unit_price": 2000},
            {"description": "Software Licenses", "quantity": 10, "unit_price": 500}
        ]
    else:
        state["errors"].append(f"Invoice {state['invoice_id']} not found")
    
    state["steps_executed"].append("extract_data")
    return state

# Node 2: Validate VAT
def validate_vat(state: InvoiceState) -> InvoiceState:
    """Validate VAT number"""
    print("🔍 Validating VAT number...")
    
    if state["vat_number"]:
        # Simple validation (in production, use VIES API)
        if state["vat_number"].startswith("GB") and len(state["vat_number"]) == 11:
            state["vat_valid"] = True
        else:
            state["vat_valid"] = False
            state["warnings"].append("VAT number format appears invalid")
    
    state["steps_executed"].append("validate_vat")
    return state

# Node 3: Check vendor risk
def check_vendor_risk(state: InvoiceState) -> InvoiceState:
    """Assess vendor risk score"""
    print("📊 Checking vendor risk...")
    
    # Simulate risk scoring
    vendor_scores = {
        "TechSupplies Co.": 0.2,  # Low risk
        "NewVendor Inc.": 0.7,    # High risk
        "Unknown": 0.9            # Very high risk
    }
    
    state["vendor_risk_score"] = vendor_scores.get(
        state["vendor_name"], 
        0.5  # Default medium risk
    )
    
    if state["vendor_risk_score"] > 0.6:
        state["warnings"].append(f"High risk vendor (score: {state['vendor_risk_score']})")
    
    state["steps_executed"].append("check_vendor_risk")
    return state

# Node 4: Verify payment terms
def verify_payment_terms(state: InvoiceState) -> InvoiceState:
    """Check if payment terms align with policy"""
    print("💰 Verifying payment terms...")
    
    # Business rules
    if state["payment_terms"] == "Net 30":
        state["payment_terms_approved"] = True
    elif state["payment_terms"] == "Net 60" and state["amount"] > 10000:
        state["payment_terms_approved"] = True
    elif state["payment_terms"] == "Net 90":
        state["payment_terms_approved"] = False
        state["warnings"].append("Net 90 requires CFO approval")
    else:
        state["payment_terms_approved"] = False
    
    # Calculate due date
    if state["invoice_date"] and state["payment_terms"]:
        days = int(state["payment_terms"].split()[1])
        invoice_date = datetime.strptime(state["invoice_date"], "%Y-%m-%d")
        due_date = invoice_date + timedelta(days=days)
        state["due_date"] = due_date.strftime("%Y-%m-%d")
    
    state["steps_executed"].append("verify_payment_terms")
    return state

# Node 5: Make approval decision
def make_approval_decision(state: InvoiceState) -> InvoiceState:
    """Decide whether to approve, reject, or flag for review"""
    print("✅ Making approval decision...")
    
    # Decision logic
    if state["errors"]:
        state["approval_status"] = "rejected"
    elif state["vendor_risk_score"] and state["vendor_risk_score"] > 0.8:
        state["approval_status"] = "manual_review"
    elif not state["vat_valid"]:
        state["approval_status"] = "manual_review"
    elif not state["payment_terms_approved"]:
        state["approval_status"] = "manual_review"
    else:
        state["approval_status"] = "approved"
    
    state["steps_executed"].append("make_decision")
    return state

print("✅ All node functions created")
print("Nodes: extract → validate (parallel) → decision")

## Step 3: Build the Graph

Now we connect our nodes into a workflow graph with conditional routing.

In [None]:
from langgraph.graph import StateGraph, END

# Create the graph
workflow = StateGraph(InvoiceState)

# Add nodes
workflow.add_node("extract", extract_invoice_data)
workflow.add_node("validate_vat", validate_vat)
workflow.add_node("check_risk", check_vendor_risk)
workflow.add_node("verify_terms", verify_payment_terms)
workflow.add_node("decide", make_approval_decision)

# Define conditional routing
def route_after_extraction(state: InvoiceState) -> str:
    """Route based on extraction results"""
    if state["errors"]:
        return "decide"  # Skip validation if extraction failed
    return "continue"

# Set entry point
workflow.set_entry_point("extract")

# Add edges (connections between nodes)
workflow.add_conditional_edges(
    "extract",
    route_after_extraction,
    {
        "continue": "validate_vat",
        "decide": "decide"
    }
)

# Parallel validation steps
workflow.add_edge("validate_vat", "check_risk")
workflow.add_edge("check_risk", "verify_terms")
workflow.add_edge("verify_terms", "decide")

# End after decision
workflow.add_edge("decide", END)

# Compile the graph
app = workflow.compile()

print("✅ Workflow graph compiled")
print("Flow: Extract → Validate (3 checks) → Decision → End")

## Step 4: Visualize the Workflow

LangGraph can generate a visual representation of our workflow.

In [None]:
# Visualize the graph (requires graphviz)
try:
    from IPython.display import Image, display
    display(Image(app.get_graph().draw_png()))
except:
    print("Graphviz not available. Here's the text representation:")
    print("\n📊 Workflow Structure:")
    print("┌─────────┐")
    print("│ Extract │")
    print("└────┬────┘")
    print("     │ (if success)")
    print("┌────▼────┐")
    print("│Validate │")
    print("│  VAT    │")
    print("└────┬────┘")
    print("┌────▼────┐")
    print("│ Check   │")
    print("│  Risk   │")
    print("└────┬────┘")
    print("┌────▼────┐")
    print("│ Verify  │")
    print("│ Terms   │")
    print("└────┬────┘")
    print("┌────▼────┐")
    print("│ Decide  │")
    print("└────┬────┘")
    print("     │")
    print("    END")

## Step 5: Execute the Workflow

Let's process an invoice through our complete workflow.

In [None]:
# Process a valid invoice
print("=" * 60)
print("PROCESSING INVOICE: INV-2024-001")
print("=" * 60)

# Initialize state
initial_state = {
    "invoice_id": "INV-2024-001",
    "errors": [],
    "warnings": [],
    "steps_executed": []
}

# Track execution time
start_time = time.time()

# Run the workflow
result = app.invoke(initial_state)

# Calculate processing time
result["processing_time"] = time.time() - start_time

# Display results
print("\n📋 WORKFLOW RESULTS:")
print("-" * 40)
print(f"Invoice ID: {result['invoice_id']}")
print(f"Vendor: {result.get('vendor_name', 'N/A')}")
print(f"Amount: ${result.get('amount', 0):,.2f} {result.get('currency', '')}")
print(f"Payment Terms: {result.get('payment_terms', 'N/A')}")
print(f"Due Date: {result.get('due_date', 'N/A')}")

print("\n✅ VALIDATION RESULTS:")
print(f"VAT Valid: {result.get('vat_valid', False)}")
print(f"Vendor Risk Score: {result.get('vendor_risk_score', 'N/A')}")
print(f"Payment Terms Approved: {result.get('payment_terms_approved', False)}")

print("\n🎯 FINAL DECISION:")
print(f"Status: {result.get('approval_status', 'Unknown').upper()}")

if result["warnings"]:
    print("\n⚠️ WARNINGS:")
    for warning in result["warnings"]:
        print(f"  - {warning}")

print("\n⏱️ PERFORMANCE:")
print(f"Processing Time: {result['processing_time']:.2f} seconds")
print(f"Steps Executed: {', '.join(result['steps_executed'])}")

## Step 6: Test Error Handling

Let's see how the workflow handles errors and edge cases.

In [None]:
# Test with non-existent invoice
print("=" * 60)
print("TESTING ERROR HANDLING: Non-existent Invoice")
print("=" * 60)

error_state = {
    "invoice_id": "INV-INVALID-999",
    "errors": [],
    "warnings": [],
    "steps_executed": []
}

error_result = app.invoke(error_state)

print("\n🔍 ERROR HANDLING RESULTS:")
print(f"Approval Status: {error_result.get('approval_status', 'Unknown')}")
print(f"Errors: {error_result['errors']}")
print(f"Steps Executed: {error_result['steps_executed']}")
print("\nNote: Workflow correctly skipped validation when extraction failed!")

# Test with high-risk vendor
print("\n" + "=" * 60)
print("TESTING EDGE CASE: High-Risk Vendor")
print("=" * 60)

# Modify node to simulate high-risk vendor
def high_risk_extract(state):
    state["vendor_name"] = "Unknown Vendor"
    state["amount"] = 50000
    state["payment_terms"] = "Net 90"
    state["vat_number"] = "INVALID123"
    state["steps_executed"].append("extract_data")
    return state

# Create modified workflow for testing
test_workflow = StateGraph(InvoiceState)
test_workflow.add_node("extract", high_risk_extract)
test_workflow.add_node("validate_vat", validate_vat)
test_workflow.add_node("check_risk", check_vendor_risk)
test_workflow.add_node("verify_terms", verify_payment_terms)
test_workflow.add_node("decide", make_approval_decision)

test_workflow.set_entry_point("extract")
test_workflow.add_edge("extract", "validate_vat")
test_workflow.add_edge("validate_vat", "check_risk")
test_workflow.add_edge("check_risk", "verify_terms")
test_workflow.add_edge("verify_terms", "decide")
test_workflow.add_edge("decide", END)

test_app = test_workflow.compile()

risk_state = {
    "invoice_id": "INV-RISKY-001",
    "errors": [],
    "warnings": [],
    "steps_executed": []
}

risk_result = test_app.invoke(risk_state)

print("\n🚨 HIGH-RISK RESULTS:")
print(f"Vendor: {risk_result.get('vendor_name')}")
print(f"Amount: ${risk_result.get('amount', 0):,.2f}")
print(f"Risk Score: {risk_result.get('vendor_risk_score')}")
print(f"Approval Status: {risk_result.get('approval_status').upper()}")
print(f"Warnings: {risk_result['warnings']}")

## Step 7: Advanced Features - Parallel Execution

LangGraph can execute multiple nodes in parallel for better performance.

In [None]:
from langgraph.graph import StateGraph, END
import asyncio

# Create parallel validation workflow
parallel_workflow = StateGraph(InvoiceState)

# Add nodes
parallel_workflow.add_node("extract", extract_invoice_data)
parallel_workflow.add_node("validate_vat", validate_vat)
parallel_workflow.add_node("check_risk", check_vendor_risk)
parallel_workflow.add_node("verify_terms", verify_payment_terms)
parallel_workflow.add_node("decide", make_approval_decision)

# Set entry point
parallel_workflow.set_entry_point("extract")

# Create parallel validation after extraction
# All three validation nodes run simultaneously
parallel_workflow.add_edge("extract", "validate_vat")
parallel_workflow.add_edge("extract", "check_risk")
parallel_workflow.add_edge("extract", "verify_terms")

# All validation nodes lead to decision
parallel_workflow.add_edge("validate_vat", "decide")
parallel_workflow.add_edge("check_risk", "decide")
parallel_workflow.add_edge("verify_terms", "decide")

parallel_workflow.add_edge("decide", END)

# Compile parallel workflow
parallel_app = parallel_workflow.compile()

print("=" * 60)
print("PARALLEL EXECUTION DEMO")
print("=" * 60)

# Run parallel workflow
parallel_state = {
    "invoice_id": "INV-2024-001",
    "errors": [],
    "warnings": [],
    "steps_executed": []
}

print("\n🚀 Running validations in PARALLEL...")
start = time.time()
parallel_result = parallel_app.invoke(parallel_state)
parallel_time = time.time() - start

print("\n⚡ PARALLEL EXECUTION COMPLETE!")
print(f"Time: {parallel_time:.2f} seconds")
print(f"Steps executed: {parallel_result['steps_executed']}")
print("\nNote: All three validations ran simultaneously!")
print("This is much faster than sequential execution.")

## Step 8: State Persistence and Checkpointing

LangGraph supports saving workflow state for resumption.

In [None]:
from langgraph.checkpoint import MemorySaver

# Create workflow with checkpointing
memory = MemorySaver()

# Compile with checkpointer
checkpointed_app = workflow.compile(checkpointer=memory)

print("=" * 60)
print("CHECKPOINTING DEMO")
print("=" * 60)

# Run with thread ID for tracking
config = {"configurable": {"thread_id": "invoice-001"}}

checkpoint_state = {
    "invoice_id": "INV-2024-001",
    "errors": [],
    "warnings": [],
    "steps_executed": []
}

# Run workflow
print("\n📁 Running workflow with checkpointing...")
result_with_checkpoint = checkpointed_app.invoke(checkpoint_state, config)

# Get state history
print("\n📜 State History:")
for state in checkpointed_app.get_state_history(config):
    if state.values:
        print(f"  Step: {state.values.get('steps_executed', [])} ")
        print(f"  Status: {state.values.get('approval_status', 'processing')}")
    break  # Just show latest for demo

print("\n✅ Workflow state saved and can be resumed!")
print("This is useful for:")
print("- Long-running workflows")
print("- Human-in-the-loop approval steps")
print("- Debugging and auditing")

## Key Learnings

### What We Built:

1. **Stateful Workflow**
   - Defined typed state that flows through all nodes
   - Each node transforms state predictably
   - State accumulates data and metadata

2. **Conditional Routing**
   - Workflow adapts based on data
   - Skip unnecessary steps on errors
   - Different paths for different scenarios

3. **Parallel Execution**
   - Multiple validations run simultaneously
   - Significant performance improvement
   - All results converge for decision

4. **Error Handling**
   - Graceful degradation on failures
   - Clear error and warning tracking
   - Appropriate routing based on errors

5. **Production Features**
   - Checkpointing for persistence
   - State history for auditing
   - Resumable workflows

### LangGraph vs Manual Implementation:

- **Structure**: Graph definition is declarative and clear
- **Complexity**: Handles parallel execution automatically
- **Reliability**: Built-in error handling and state management
- **Scalability**: Easy to add new nodes and conditions
- **Debugging**: State tracking and visualization built-in

### Real-World Applications:

- Document processing pipelines
- Multi-step approval workflows
- Data validation and enrichment
- Complex decision trees
- Human-in-the-loop systems

### Next Steps:

In the next session, we'll integrate this with:
- Real OCR for document extraction
- LLM-based decision making
- External API integrations
- Production deployment patterns