# Day 1, Session 3 - Lab: Advanced Workflows with LangGraph

## Building Production-Ready Agent Workflows

In this lab, you'll build sophisticated agent workflows using LangGraph, implementing state management, conditional routing, parallel execution, and error recovery. This represents the current state-of-the-art in production agent systems.

### Lab Objectives

By completing this lab, you will:
1. Design and implement stateful workflows with LangGraph
2. Create conditional routing based on processing results
3. Implement parallel tool execution for performance
4. Add comprehensive error handling and recovery
5. Build a complete invoice approval workflow
6. Add state persistence and checkpointing

### Success Criteria

You've successfully completed this lab when you can:
- ✅ Create a workflow with at least 5 nodes and conditional routing
- ✅ Handle multiple error scenarios gracefully
- ✅ Demonstrate parallel execution of validation steps
- ✅ Save and restore workflow state
- ✅ Process invoices end-to-end with full audit trail

### Time Estimate: 75 minutes

---

## Part 1: Environment Setup and Data Download (10 minutes)

First, let's set up our environment and download real invoice data.

In [None]:
# Download real invoice and receipt images
import requests
import zipfile
import io
import os
import json
import time
from datetime import datetime, timedelta
from typing import TypedDict, List, Optional, Dict, Any

# Dropbox shared link for the folder
dropbox_url = "https://www.dropbox.com/scl/fo/m9hyfmvi78snwv0nh34mo/AMEXxwXMLAOeve-_yj12ck8?rlkey=urinkikgiuven0fro7r4x5rcu&st=hv3of7g7&dl=1"

print(f"Downloading real invoice data from: {dropbox_url}")

try:
    response = requests.get(dropbox_url)
    response.raise_for_status()

    # Read the content as a zip file
    with zipfile.ZipFile(io.BytesIO(response.content)) as z:
        # Extract all contents to a directory named 'downloaded_images'
        z.extractall("downloaded_images")

    print("✅ Downloaded and extracted images to 'downloaded_images' folder.")
    
    # List downloaded files
    for root, dirs, files in os.walk("downloaded_images"):
        for file in files:
            print(f"  📄 {os.path.join(root, file)}")

except Exception as e:
    print(f"❌ Error downloading images: {e}")

# Install required packages
!pip install -q langgraph langchain langchain-community

print("\n✅ Environment setup complete!")

### Task 1.1: Define Comprehensive Workflow State

**Your Task**: Create a TypedDict that will hold all data flowing through your workflow.

**Requirements**:
- Include fields for input data, processing results, and workflow metadata
- Support multiple validation stages
- Track errors, warnings, and processing history
- Include timing and performance metrics

In [None]:
# TODO: Define your comprehensive workflow state
# This should include all data that flows through your workflow

class InvoiceWorkflowState(TypedDict):
    # TODO: Define state fields
    # Input fields:
    # - invoice_id, raw_data, document_path
    
    # Extracted data:
    # - vendor_info, amounts, dates, line_items
    
    # Validation results:
    # - business_rules_valid, vendor_approved, amount_approved
    # - risk_score, compliance_check
    
    # Workflow metadata:
    # - current_step, errors, warnings, processing_log
    # - start_time, end_time, total_duration
    # - approval_status, next_action
    
    pass

print("✅ Workflow state defined")
print("State includes comprehensive tracking of:")
print("- Input and extracted data")
print("- Multi-stage validation results")
print("- Workflow metadata and audit trail")
print("- Performance and timing metrics")

---

## Part 2: Implementing Workflow Nodes (25 minutes)

Create the individual processing nodes that will form your workflow.

### Task 2.1: Create Data Extraction Node

**Your Task**: Implement a node that extracts structured data from invoices.

**Requirements**:
- Extract vendor, amount, dates, line items
- Handle missing or malformed data gracefully
- Log extraction confidence and any issues
- Update state with extracted information

In [None]:
def extract_invoice_data_node(state: InvoiceWorkflowState) -> InvoiceWorkflowState:
    """
    Extract structured data from invoice.
    
    Args:
        state: Current workflow state
        
    Returns:
        Updated state with extracted data
    """
    print("🔍 Extracting invoice data...")
    
    # TODO: Implement data extraction logic
    # 1. Get invoice_id from state
    # 2. Create mock database or use real extraction
    # 3. Extract all relevant fields
    # 4. Calculate confidence scores
    # 5. Log any issues or warnings
    # 6. Update state with results
    
    start_time = time.time()
    
    # Your extraction logic here:
    
    
    # Log processing step
    processing_time = time.time() - start_time
    # TODO: Add to processing_log in state
    
    return state

def validate_data_quality_node(state: InvoiceWorkflowState) -> InvoiceWorkflowState:
    """
    Validate the quality and completeness of extracted data.
    
    Args:
        state: Current workflow state
        
    Returns:
        Updated state with quality assessment
    """
    print("✅ Validating data quality...")
    
    # TODO: Implement quality validation
    # 1. Check for required fields
    # 2. Validate data formats (dates, amounts)
    # 3. Check for reasonable values
    # 4. Assess overall confidence
    # 5. Set quality flags in state
    
    # Your validation logic here:
    
    
    return state

print("✅ Data extraction nodes implemented")

### Task 2.2: Create Parallel Validation Nodes

**Your Task**: Implement nodes that can run in parallel for different validation checks.

**Requirements**:
- Business rules validation
- Vendor verification
- Compliance checking
- Each node should be independent and runnable in parallel

In [None]:
def validate_business_rules_node(state: InvoiceWorkflowState) -> InvoiceWorkflowState:
    """
    Validate invoice against business rules.
    
    Args:
        state: Current workflow state
        
    Returns:
        Updated state with business rules validation
    """
    print("📋 Validating business rules...")
    
    # TODO: Implement business rules validation
    # 1. Check amount thresholds
    # 2. Validate payment terms
    # 3. Check approval requirements
    # 4. Verify budget availability
    # 5. Update state with validation results
    
    # Your business rules logic here:
    
    
    return state

def verify_vendor_node(state: InvoiceWorkflowState) -> InvoiceWorkflowState:
    """
    Verify vendor information and history.
    
    Args:
        state: Current workflow state
        
    Returns:
        Updated state with vendor verification
    """
    print("🏢 Verifying vendor...")
    
    # TODO: Implement vendor verification
    # 1. Check vendor in approved list
    # 2. Look up payment history
    # 3. Calculate risk score
    # 4. Check for any flags or issues
    # 5. Update state with verification results
    
    # Your vendor verification logic here:
    
    
    return state

def check_compliance_node(state: InvoiceWorkflowState) -> InvoiceWorkflowState:
    """
    Check compliance requirements (tax, legal, etc.).
    
    Args:
        state: Current workflow state
        
    Returns:
        Updated state with compliance check
    """
    print("⚖️ Checking compliance...")
    
    # TODO: Implement compliance checking
    # 1. Validate tax information
    # 2. Check regulatory requirements
    # 3. Verify required documentation
    # 4. Check for compliance flags
    # 5. Update state with compliance results
    
    # Your compliance logic here:
    
    
    return state

print("✅ Parallel validation nodes implemented")

### Task 2.3: Create Decision and Action Nodes

**Your Task**: Implement nodes for making approval decisions and taking actions.

**Requirements**:
- Aggregate all validation results
- Apply decision logic for approval/rejection
- Handle escalation scenarios
- Generate audit reports

In [None]:
def make_approval_decision_node(state: InvoiceWorkflowState) -> InvoiceWorkflowState:
    """
    Make final approval decision based on all validation results.
    
    Args:
        state: Current workflow state
        
    Returns:
        Updated state with approval decision
    """
    print("🎯 Making approval decision...")
    
    # TODO: Implement decision logic
    # 1. Aggregate all validation results
    # 2. Check for any blocking errors
    # 3. Evaluate risk factors
    # 4. Determine approval level needed
    # 5. Set final decision in state
    
    # Your decision logic here:
    
    
    return state

def handle_escalation_node(state: InvoiceWorkflowState) -> InvoiceWorkflowState:
    """
    Handle cases requiring manual review or escalation.
    
    Args:
        state: Current workflow state
        
    Returns:
        Updated state with escalation actions
    """
    print("🚨 Handling escalation...")
    
    # TODO: Implement escalation handling
    # 1. Identify escalation triggers
    # 2. Determine appropriate escalation level
    # 3. Prepare escalation package
    # 4. Set escalation status
    # 5. Schedule follow-up actions
    
    # Your escalation logic here:
    
    
    return state

def generate_audit_report_node(state: InvoiceWorkflowState) -> InvoiceWorkflowState:
    """
    Generate comprehensive audit report of processing.
    
    Args:
        state: Current workflow state
        
    Returns:
        Updated state with audit report
    """
    print("📊 Generating audit report...")
    
    # TODO: Implement audit report generation
    # 1. Collect all processing steps
    # 2. Summarize validation results
    # 3. Include timing and performance data
    # 4. Create structured audit log
    # 5. Store report in state
    
    # Your audit report logic here:
    
    
    return state

print("✅ Decision and action nodes implemented")

---

## Part 3: Building the LangGraph Workflow (20 minutes)

Now we'll construct the actual workflow graph with conditional routing.

### Task 3.1: Create Conditional Routing Functions

**Your Task**: Implement functions that determine workflow routing based on state.

**Requirements**:
- Route based on data quality
- Handle error scenarios
- Determine if escalation is needed
- Support different approval paths

In [None]:
from langgraph.graph import StateGraph, END

def route_after_extraction(state: InvoiceWorkflowState) -> str:
    """
    Determine routing after data extraction.
    
    Args:
        state: Current workflow state
        
    Returns:
        Next node name or routing decision
    """
    # TODO: Implement routing logic
    # 1. Check if extraction was successful
    # 2. Evaluate data quality
    # 3. Decide whether to continue or stop
    # 4. Return appropriate next step
    
    # Your routing logic here:
    
    
    return "validate_quality"  # Default route

def route_after_validation(state: InvoiceWorkflowState) -> str:
    """
    Determine routing after validation steps.
    
    Args:
        state: Current workflow state
        
    Returns:
        Next node name or routing decision
    """
    # TODO: Implement validation routing
    # 1. Check validation results
    # 2. Determine if all validations passed
    # 3. Check for escalation triggers
    # 4. Route to appropriate next step
    
    # Your routing logic here:
    
    
    return "make_decision"  # Default route

def route_after_decision(state: InvoiceWorkflowState) -> str:
    """
    Determine routing after approval decision.
    
    Args:
        state: Current workflow state
        
    Returns:
        Next node name or END
    """
    # TODO: Implement decision routing
    # 1. Check approval decision
    # 2. Determine if escalation is needed
    # 3. Route to escalation or completion
    
    # Your routing logic here:
    
    
    return "generate_audit"  # Default route

print("✅ Conditional routing functions implemented")

### Task 3.2: Construct the Workflow Graph

**Your Task**: Build the complete LangGraph workflow with all nodes and edges.

**Requirements**:
- Include all implemented nodes
- Add conditional routing between nodes
- Support parallel execution of validation steps
- Handle error paths and escalation

In [None]:
def build_invoice_workflow() -> StateGraph:
    """
    Build the complete invoice processing workflow.
    
    Returns:
        Compiled LangGraph workflow
    """
    # TODO: Create and configure the workflow graph
    # 1. Create StateGraph with your state type
    # 2. Add all nodes
    # 3. Set entry point
    # 4. Add conditional edges
    # 5. Add parallel validation paths
    # 6. Connect to END
    
    workflow = StateGraph(InvoiceWorkflowState)
    
    # Add nodes
    # workflow.add_node("extract_data", extract_invoice_data_node)
    # workflow.add_node("validate_quality", validate_data_quality_node)
    # etc...
    
    # Set entry point
    # workflow.set_entry_point("extract_data")
    
    # Add conditional edges
    # workflow.add_conditional_edges(
    #     "extract_data",
    #     route_after_extraction,
    #     {
    #         "validate_quality": "validate_quality",
    #         "error": END
    #     }
    # )
    
    # Your workflow construction here:
    
    
    return workflow.compile()

# Build the workflow
print("🔧 Building invoice processing workflow...")
# invoice_workflow = build_invoice_workflow()
print("✅ Workflow constructed successfully")

# Display workflow structure
print("\n📋 Workflow Structure:")
print("1. Extract Data → Quality Check")
print("2. Parallel Validations: Business Rules | Vendor Check | Compliance")
print("3. Approval Decision → Escalation (if needed)")
print("4. Audit Report Generation → END")

---

## Part 4: Error Handling and Recovery (15 minutes)

Implement robust error handling throughout your workflow.

### Task 4.1: Add Error Handling to Nodes

**Your Task**: Enhance your nodes with comprehensive error handling.

**Requirements**:
- Try-catch blocks around critical operations
- Log errors with context
- Set appropriate error flags in state
- Provide recovery suggestions

In [None]:
def robust_extract_data_node(state: InvoiceWorkflowState) -> InvoiceWorkflowState:
    """
    Enhanced data extraction with comprehensive error handling.
    
    Args:
        state: Current workflow state
        
    Returns:
        Updated state with extraction results or error information
    """
    print("🔍 Extracting invoice data (with error handling)...")
    
    try:
        # TODO: Implement robust extraction
        # 1. Validate input parameters
        # 2. Perform extraction with error checking
        # 3. Validate extraction results
        # 4. Set success flags
        
        # Your robust extraction logic here:
        
        
        pass
        
    except ValueError as e:
        # Handle data validation errors
        print(f"❌ Data validation error: {e}")
        # TODO: Set error state
        
    except FileNotFoundError as e:
        # Handle missing file errors
        print(f"❌ File not found: {e}")
        # TODO: Set error state
        
    except Exception as e:
        # Handle unexpected errors
        print(f"❌ Unexpected error: {e}")
        # TODO: Set error state
    
    return state

def create_error_recovery_node(state: InvoiceWorkflowState) -> InvoiceWorkflowState:
    """
    Attempt to recover from errors or provide alternatives.
    
    Args:
        state: Current workflow state
        
    Returns:
        Updated state with recovery attempts
    """
    print("🔄 Attempting error recovery...")
    
    # TODO: Implement error recovery
    # 1. Analyze error types
    # 2. Attempt appropriate recovery strategies
    # 3. Provide alternative processing paths
    # 4. Update state with recovery results
    
    # Your recovery logic here:
    
    
    return state

print("✅ Error handling and recovery implemented")

---

## Part 5: State Persistence and Checkpointing (10 minutes)

Add the ability to save and resume workflow execution.

### Task 5.1: Implement State Persistence

**Your Task**: Add checkpointing to save and restore workflow state.

**Requirements**:
- Save state at key checkpoints
- Resume from saved state
- Handle state versioning
- Provide state inspection capabilities

In [None]:
from langgraph.checkpoint import MemorySaver
import uuid

def create_checkpointed_workflow() -> StateGraph:
    """
    Create workflow with checkpointing enabled.
    
    Returns:
        Workflow with checkpointing
    """
    # TODO: Create workflow with memory saver
    # 1. Create MemorySaver instance
    # 2. Build workflow as before
    # 3. Compile with checkpointer
    
    memory = MemorySaver()
    workflow = build_invoice_workflow()
    
    # Compile with checkpointer
    # checkpointed_workflow = workflow.compile(checkpointer=memory)
    
    # Your checkpointing setup here:
    
    
    return workflow  # Replace with checkpointed version

def save_workflow_state(workflow, state, thread_id):
    """
    Save workflow state for later resumption.
    
    Args:
        workflow: The workflow instance
        state: Current state to save
        thread_id: Unique thread identifier
    """
    # TODO: Implement state saving
    # 1. Create config with thread_id
    # 2. Save state to checkpoint
    # 3. Log save operation
    
    config = {"configurable": {"thread_id": thread_id}}
    
    # Your saving logic here:
    
    
    pass

def resume_workflow_state(workflow, thread_id):
    """
    Resume workflow from saved state.
    
    Args:
        workflow: The workflow instance
        thread_id: Thread identifier to resume
        
    Returns:
        Restored state or None if not found
    """
    # TODO: Implement state restoration
    # 1. Create config with thread_id
    # 2. Retrieve state from checkpoint
    # 3. Validate state integrity
    # 4. Return restored state
    
    config = {"configurable": {"thread_id": thread_id}}
    
    # Your restoration logic here:
    
    
    return None

print("✅ State persistence and checkpointing implemented")

---

## Part 6: Testing and Evaluation (15 minutes)

Test your complete workflow with various scenarios.

### Task 6.1: Test Standard Processing

**Your Task**: Test your workflow with a standard invoice.

**Requirements**:
- Process a complete invoice workflow
- Verify all nodes execute correctly
- Check state transitions
- Validate final results

In [None]:
def test_standard_processing():
    """
    Test workflow with a standard invoice scenario.
    """
    print("=" * 60)
    print("TESTING STANDARD INVOICE PROCESSING")
    print("=" * 60)
    
    # TODO: Create test scenario
    # 1. Create initial state with test invoice
    # 2. Run workflow
    # 3. Analyze results
    # 4. Verify expected outcomes
    
    # Create initial state
    initial_state = {
        # Your test state here
    }
    
    # Run workflow
    # workflow = build_invoice_workflow()
    # result = workflow.invoke(initial_state)
    
    # Your testing logic here:
    
    
    print("✅ Standard processing test completed")

def test_error_scenarios():
    """
    Test workflow with various error scenarios.
    """
    print("\n" + "=" * 60)
    print("TESTING ERROR SCENARIOS")
    print("=" * 60)
    
    error_scenarios = [
        {"name": "Invalid Invoice ID", "invoice_id": "INVALID-999"},
        {"name": "High Risk Vendor", "invoice_id": "RISK-001"},
        {"name": "Excessive Amount", "invoice_id": "LARGE-001"}
    ]
    
    for scenario in error_scenarios:
        print(f"\n🧪 Testing: {scenario['name']}")
        
        # TODO: Test each error scenario
        # 1. Create state for scenario
        # 2. Run workflow
        # 3. Verify error handling
        # 4. Check recovery attempts
        
        # Your error testing logic here:
        
        
        pass
    
    print("\n✅ Error scenario testing completed")

def test_parallel_execution():
    """
    Test parallel execution of validation nodes.
    """
    print("\n" + "=" * 60)
    print("TESTING PARALLEL EXECUTION")
    print("=" * 60)
    
    # TODO: Test parallel processing
    # 1. Create scenario that triggers parallel validation
    # 2. Measure execution time
    # 3. Verify all validations complete
    # 4. Check that parallel execution is faster
    
    # Your parallel testing logic here:
    
    
    print("✅ Parallel execution testing completed")

# Run all tests
print("🧪 Running comprehensive workflow tests...")
# test_standard_processing()
# test_error_scenarios()
# test_parallel_execution()
print("\n🎉 All tests completed!")

### Task 6.2: Performance Analysis

**Your Task**: Analyze the performance characteristics of your workflow.

**Requirements**:
- Measure execution times for each node
- Compare sequential vs parallel execution
- Analyze bottlenecks
- Provide optimization recommendations

In [None]:
def analyze_workflow_performance():
    """
    Analyze workflow performance and identify optimizations.
    """
    print("=" * 60)
    print("WORKFLOW PERFORMANCE ANALYSIS")
    print("=" * 60)
    
    # TODO: Implement performance analysis
    # 1. Run workflow multiple times
    # 2. Collect timing data for each node
    # 3. Calculate statistics
    # 4. Identify bottlenecks
    # 5. Provide recommendations
    
    performance_data = {
        "extract_data": [],
        "validate_quality": [],
        "business_rules": [],
        "vendor_check": [],
        "compliance": [],
        "make_decision": [],
        "audit_report": []
    }
    
    # Your performance analysis here:
    
    
    print("\n📊 Performance Summary:")
    print("- Average total execution time: XX.XX seconds")
    print("- Fastest node: XXXX (X.XX seconds)")
    print("- Slowest node: XXXX (X.XX seconds)")
    print("- Parallel speedup: X.Xx faster")
    
    print("\n🚀 Optimization Recommendations:")
    print("1. Cache vendor lookup results")
    print("2. Optimize data validation rules")
    print("3. Implement more aggressive parallel processing")
    print("4. Add result caching for repeated invocations")

# analyze_workflow_performance()
print("📈 Performance analysis framework ready")

---

## Lab Summary and Self-Assessment

### What You've Accomplished

If you've completed all tasks, you've successfully:
- ✅ Built a sophisticated stateful workflow with LangGraph
- ✅ Implemented conditional routing and parallel execution
- ✅ Added comprehensive error handling and recovery
- ✅ Created state persistence and checkpointing
- ✅ Built a production-ready invoice processing system
- ✅ Analyzed performance and identified optimizations

### Self-Assessment Questions

Answer these to check your understanding:

1. **What are the advantages of stateful workflows over simple function chains?**
   - Your answer:

2. **How does conditional routing improve workflow flexibility?**
   - Your answer:

3. **What are the benefits of parallel execution in validation workflows?**
   - Your answer:

4. **How does state persistence help with long-running processes?**
   - Your answer:

5. **What production considerations are important for workflow deployment?**
   - Your answer:

### Next Steps

In the next session, you'll learn how to:
- Integrate real OCR and document processing
- Connect to external systems and databases
- Implement human-in-the-loop workflows
- Deploy workflows at enterprise scale

### Advanced Challenges (Optional)

If you finish early, try these:
1. Add a node for generating approval emails
2. Implement workflow versioning and migration
3. Create a dashboard for monitoring workflow execution
4. Add machine learning for risk prediction
5. Implement A/B testing for different approval strategies
6. Add real-time notifications for stakeholders
7. Create a workflow designer UI for business users