# Day 1, Session 1: Introduction to HuggingFace Pipelines
## 🚀 Building Foundations for Multimodal Invoice Processing

---

### 📚 **What You'll Learn in This Session**

By the end of this demonstration, you will understand:

1. **🏗️ HuggingFace Pipeline Architecture**
   - How pipelines abstract complex AI models into simple APIs
   - The three-layer architecture: Pipeline → Model → Tokenizer/Feature Extractor
   - Why this matters for production AI systems

2. **🧠 AI Task Categories for Invoice Processing**
   - **Text Understanding**: Sentiment analysis, question answering, named entity recognition
   - **Computer Vision**: Image classification, object detection, OCR
   - **Multimodal Processing**: Document understanding that combines text + vision

3. **⚡ Performance & Production Considerations**
   - GPU acceleration benefits (10-100x speedup)
   - Memory management strategies for limited resources
   - Model selection trade-offs (accuracy vs speed vs memory)

4. **🎯 Real-World Applications**
   - Processing actual invoice and receipt images
   - Extracting structured data from unstructured documents
   - Building the foundation for automated accounts payable

---

### 🌍 **Why This Matters: The Invoice Processing Challenge**

**The Problem:**
- Companies process millions of invoices annually
- Manual data entry costs $15-40 per invoice
- Human error rates: 1-3% 
- Processing delays impact cash flow

**The AI Solution:**
- Automated extraction reduces costs to $1-5 per invoice
- Error rates drop to <0.1% with proper validation
- Processing time: minutes instead of hours
- 24/7 processing capability

**Technical Foundation:**
- HuggingFace provides 500,000+ pre-trained models
- Models handle 100+ languages and document types
- GPU acceleration enables real-time processing
- This session shows you HOW to build these systems

---

### 🎯 **Session Objectives**

**Technical Goals:**
- ✅ Understand HuggingFace pipeline architecture
- ✅ Demonstrate text and vision processing capabilities  
- ✅ Preview document understanding for invoice analysis
- ✅ Measure GPU acceleration benefits
- ✅ Learn memory management strategies

**Business Impact:**
- ✅ Process real invoice and receipt images
- ✅ Extract structured data automatically
- ✅ Understand production deployment requirements
- ✅ Calculate ROI potential for automation

**Hands-On Experience:**
- ✅ Run state-of-the-art AI models with 3 lines of code
- ✅ Compare different approaches to document processing
- ✅ Measure performance across CPU vs GPU
- ✅ Handle real business documents

---

### 🏗️ **HuggingFace Pipeline Architecture Explained**

```
📄 Input Document
       ↓
🔄 Feature Extraction (Image → Tensors, Text → Tokens)
       ↓  
🧠 Transformer Model (BERT, LayoutLM, Vision Transformer)
       ↓
📊 Post-Processing (Logits → Human-Readable Results)
       ↓
✅ Structured Output (JSON, Classes, Bounding Boxes)
```

**Why Pipelines Matter:**
- **Abstraction**: Hide complex tokenization, model inference, post-processing
- **Standardization**: Same API for 500,000+ models
- **Performance**: Optimized for GPU acceleration and batching
- **Production-Ready**: Built-in error handling and resource management

**Three Types We'll Use:**
1. **Text Pipelines**: Process invoice text content
2. **Vision Pipelines**: Analyze document images
3. **Multimodal Pipelines**: Combine text + vision for document understanding

---

### 📋 **Prerequisites Check**
- ✅ Google Colab with GPU enabled (T4 recommended)
- ✅ Basic Python knowledge
- ✅ Understanding of JSON data structures
- ✅ Familiarity with business invoice concepts

**💡 Pro Tip:** Enable GPU in Runtime → Change Runtime Type → GPU for 10-100x performance boost!

---

Let's begin building the future of automated document processing! 🚀

## Step 1: Environment Setup and GPU Check

First, let's verify GPU availability and install required packages.

In [None]:
# Check if we're on Google Colab and have GPU access
import subprocess
import sys

# Check GPU availability
try:
    gpu_info = subprocess.check_output('nvidia-smi', shell=True).decode()
    print("GPU is available")
    print("\nGPU Information:")
    print("=" * 50)
    !nvidia-smi --query-gpu=name,memory.total,memory.free --format=csv,noheader
except:
    print("No GPU detected. Please enable GPU in Runtime > Change runtime type")
    print("This notebook will still work but will be much slower on CPU.")

In [None]:
# Download real invoice and receipt images
import requests
import zipfile
import io
import os

# Dropbox shared link for the folder
dropbox_url = "https://www.dropbox.com/scl/fo/m9hyfmvi78snwv0nh34mo/AMEXxwXMLAOeve-_yj12ck8?rlkey=urinkikgiuven0fro7r4x5rcu&st=hv3of7g7&dl=1"

print(f"Downloading real invoice data from: {dropbox_url}")

try:
    response = requests.get(dropbox_url)
    response.raise_for_status()

    # Read the content as a zip file
    with zipfile.ZipFile(io.BytesIO(response.content)) as z:
        # Extract all contents to a directory named 'downloaded_images'
        z.extractall("downloaded_images")

    print("✅ Downloaded and extracted images to 'downloaded_images' folder.")
    
    # List downloaded files
    downloaded_files = []
    for root, dirs, files in os.walk("downloaded_images"):
        for file in files:
            if file.lower().endswith(('.png', '.jpg', '.jpeg')):
                downloaded_files.append(os.path.join(root, file))
                print(f"  📄 {os.path.join(root, file)}")
    
    # Store file paths for later use
    INVOICE_IMAGES = [f for f in downloaded_files if 'invoice' in f.lower()]
    RECEIPT_IMAGES = [f for f in downloaded_files if 'receipt' in f.lower()]
    
    print(f"\nFound {len(INVOICE_IMAGES)} invoice images and {len(RECEIPT_IMAGES)} receipt images")

except requests.exceptions.RequestException as e:
    print(f"❌ Error downloading the file: {e}")
    INVOICE_IMAGES = []
    RECEIPT_IMAGES = []
except zipfile.BadZipFile:
    print("❌ Error: The downloaded file is not a valid zip file.")
    INVOICE_IMAGES = []
    RECEIPT_IMAGES = []
except Exception as e:
    print(f"❌ An unexpected error occurred: {e}")
    INVOICE_IMAGES = []
    RECEIPT_IMAGES = []

# Install required packages
print("\nInstalling required packages...")
!pip install -q transformers torch torchvision pillow accelerate
!pip install -q timm sentencepiece

print("Packages installed successfully")

In [None]:
# Import libraries and check versions
import torch
import transformers
from PIL import Image
import requests
from io import BytesIO
import time
import warnings
warnings.filterwarnings('ignore')

print(f"PyTorch version: {torch.__version__}")
print(f"Transformers version: {transformers.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU Device: {torch.cuda.get_device_name(0)}")
    print(f"GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")

## Step 2: Text Pipeline Demo - Sentiment Analysis

We'll start with sentiment analysis on invoice-related text to understand pipeline basics.

In [None]:
from transformers import pipeline

# Create a sentiment analysis pipeline
print("Loading sentiment analysis model...")
sentiment_analyzer = pipeline(
    "sentiment-analysis",
    device=0 if torch.cuda.is_available() else -1  # Use GPU if available
)

# Test with invoice-related scenarios
invoice_texts = [
    "Payment received on time. Thank you for your business.",
    "Invoice is 90 days overdue. Please pay immediately.",
    "We're pleased to offer you a 15% discount on this invoice.",
    "Additional late fees have been applied to your account.",
    "Your invoice has been processed successfully."
]

print("\n" + "="*60)
print("SENTIMENT ANALYSIS RESULTS")
print("="*60)

for text in invoice_texts:
    result = sentiment_analyzer(text)[0]
    print(f"\nText: {text}")
    print(f"Sentiment: {result['label']} (confidence: {result['score']:.2%})")

## Step 3: Question Answering Pipeline - Extracting Information

Question answering models can extract specific information from text - a key capability for invoice processing.

In [None]:
# Create a question-answering pipeline
print("Loading question-answering model...")
qa_pipeline = pipeline(
    "question-answering",
    model="distilbert-base-cased-distilled-squad",
    device=0 if torch.cuda.is_available() else -1
)

# Sample invoice context
invoice_context = """
INVOICE #2024-1234
Date: January 15, 2024
Due Date: February 15, 2024

Bill To:
Acme Corporation
123 Business Street
New York, NY 10001

Description: Professional consulting services
Hours: 40
Rate: $150/hour
Subtotal: $6,000
Tax (8%): $480
Total Amount Due: $6,480

Payment Terms: Net 30 days
Late Fee: 1.5% per month
"""

# Questions to ask about the invoice
questions = [
    "What is the invoice number?",
    "What is the total amount due?",
    "When is the payment due?",
    "What is the hourly rate?",
    "What are the payment terms?"
]

print("\n" + "="*60)
print("EXTRACTING INVOICE INFORMATION")
print("="*60)

for question in questions:
    answer = qa_pipeline(question=question, context=invoice_context)
    print(f"\nQuestion: {question}")
    print(f"Answer: {answer['answer']}")
    print(f"Confidence: {answer['score']:.2%}")

## Step 4: Vision Pipeline - Image Classification

Computer vision capabilities are essential for document analysis. Let's explore image classification on document samples.

In [None]:
# Test image classification on real downloaded documents
print("Loading image classification model...")
image_classifier = pipeline(
    "image-classification",
    model="google/vit-base-patch16-224",
    device=0 if torch.cuda.is_available() else -1
)

print("\n" + "="*60)
print("IMAGE CLASSIFICATION RESULTS")
print("="*60)

# Use downloaded images if available, otherwise fallback to URLs
test_images = []

if INVOICE_IMAGES:
    test_images.append(("Downloaded Invoice", INVOICE_IMAGES[0]))
if RECEIPT_IMAGES:
    test_images.append(("Downloaded Receipt", RECEIPT_IMAGES[0]))

# Fallback to online samples if no downloaded images
if not test_images:
    sample_images = {
        "Sample Invoice": "https://raw.githubusercontent.com/naiveHobo/InvoiceNet/master/invoices/1.png",
        "Sample Receipt": "https://raw.githubusercontent.com/Asprise/receipt-ocr/main/receipt-java/sub-receipt.jpg",
    }
    
    for name, url in sample_images.items():
        test_images.append((name, url))

for name, path_or_url in test_images:
    try:
        print(f"\nProcessing: {name}")
        
        if path_or_url.startswith('http'):
            # Download from URL
            response = requests.get(path_or_url)
            image = Image.open(BytesIO(response.content))
        else:
            # Load from local file
            image = Image.open(path_or_url)
        
        # Resize image if too large
        if image.size[0] > 500 or image.size[1] > 500:
            image.thumbnail((500, 500), Image.Resampling.LANCZOS)
        
        # Measure inference time
        start_time = time.time()
        results = image_classifier(image, top_k=3)
        inference_time = time.time() - start_time
        
        print(f"Image size: {image.size}")
        print(f"Inference time: {inference_time:.3f} seconds")
        print("\nTop predictions:")
        for i, result in enumerate(results, 1):
            print(f"  {i}. {result['label']}: {result['score']:.2%}")
            
        # Display the image
        from IPython.display import display
        display(image)
        
    except Exception as e:
        print(f"Error processing {name}: {e}")

## Step 5: Document Understanding Preview

Document question-answering combines vision and language understanding - the core of modern invoice processing.

In [None]:
# Load document question-answering pipeline
print("Loading document question-answering model...")
print("This model can read and understand documents.")
print("Note: This is a larger model and may take a moment to download...\n")

doc_qa = pipeline(
    "document-question-answering",
    model="impira/layoutlm-document-qa",
    device=0 if torch.cuda.is_available() else -1
)

# Use downloaded invoice image if available, otherwise fallback
if INVOICE_IMAGES:
    invoice_path = INVOICE_IMAGES[0]
    print(f"Using downloaded invoice: {invoice_path}")
    invoice_image = Image.open(invoice_path)
else:
    # Fallback to sample image
    invoice_url = "https://raw.githubusercontent.com/naiveHobo/InvoiceNet/master/invoices/1.png"
    print(f"Using sample invoice from: {invoice_url}")
    response = requests.get(invoice_url)
    invoice_image = Image.open(BytesIO(response.content))

try:
    # Questions to ask about the invoice
    questions = [
        "What is the invoice number?",
        "What is the total amount?",
        "What is the date?",
        "Who is the vendor?",
        "What is the due date?"
    ]
    
    print("\n" + "="*60)
    print("DOCUMENT UNDERSTANDING - INVOICE ANALYSIS")
    print("="*60)
    print("\nDemonstrating multimodal document understanding.")
    print("The model reads and interprets invoice images directly.\n")
    
    for question in questions:
        print(f"\nQuestion: {question}")
        
        start_time = time.time()
        result = doc_qa(image=invoice_image, question=question)
        inference_time = time.time() - start_time
        
        if result:
            answer = result[0] if isinstance(result, list) else result
            print(f"Answer: {answer.get('answer', 'Not found')}")
            if 'score' in answer:
                print(f"Confidence: {answer['score']:.2%}")
            print(f"Time: {inference_time:.2f} seconds")
    
    # Display the invoice
    print("\nInvoice being analyzed:")
    invoice_image.thumbnail((600, 800), Image.Resampling.LANCZOS)
    from IPython.display import display
    display(invoice_image)
    
except Exception as e:
    print(f"Document QA error: {e}")
    print("This is expected with some invoice formats.")
    print("We'll explore more robust solutions in upcoming sessions.")
    
    # Still display the image for reference
    print("\nInvoice image for reference:")
    invoice_image.thumbnail((600, 800), Image.Resampling.LANCZOS)
    from IPython.display import display
    display(invoice_image)

## Step 6: Performance Comparison - CPU vs GPU

Understanding the performance benefits of GPU acceleration is critical for production systems.

In [None]:
# Compare CPU vs GPU performance
test_text = "This invoice requires immediate payment to avoid late fees."

print("="*60)
print("PERFORMANCE COMPARISON: CPU vs GPU")
print("="*60)

if torch.cuda.is_available():
    # Test on GPU
    gpu_pipeline = pipeline("sentiment-analysis", device=0)
    
    # Warm up
    _ = gpu_pipeline(test_text)
    
    # Measure GPU time
    start = time.time()
    for _ in range(100):
        _ = gpu_pipeline(test_text)
    gpu_time = time.time() - start
    
    # Test on CPU
    cpu_pipeline = pipeline("sentiment-analysis", device=-1)
    
    # Warm up
    _ = cpu_pipeline(test_text)
    
    # Measure CPU time
    start = time.time()
    for _ in range(100):
        _ = cpu_pipeline(test_text)
    cpu_time = time.time() - start
    
    print(f"\nCPU Time (100 iterations): {cpu_time:.2f} seconds")
    print(f"GPU Time (100 iterations): {gpu_time:.2f} seconds")
    print(f"\nGPU Speedup: {cpu_time/gpu_time:.1f}x faster")
    
    # Clean up
    del gpu_pipeline
    del cpu_pipeline
else:
    print("GPU not available for comparison.")
    print("Enable GPU in Runtime > Change runtime type for better performance.")

## Step 7: Memory Management

Effective memory management is essential when working with large models on limited GPU resources.

In [None]:
import gc

def get_gpu_memory():
    """Get current GPU memory usage"""
    if torch.cuda.is_available():
        return torch.cuda.memory_allocated() / 1e9, torch.cuda.memory_reserved() / 1e9
    return 0, 0

print("="*60)
print("GPU MEMORY MANAGEMENT")
print("="*60)

if torch.cuda.is_available():
    # Check initial memory
    allocated, reserved = get_gpu_memory()
    print(f"\nInitial GPU Memory:")
    print(f"  Allocated: {allocated:.2f} GB")
    print(f"  Reserved: {reserved:.2f} GB")
    
    # Load a model
    print("\nLoading a model...")
    test_pipeline = pipeline("text-generation", model="gpt2", device=0)
    
    allocated, reserved = get_gpu_memory()
    print(f"\nAfter loading GPT-2:")
    print(f"  Allocated: {allocated:.2f} GB")
    print(f"  Reserved: {reserved:.2f} GB")
    
    # Clear memory
    print("\nClearing GPU memory...")
    del test_pipeline
    gc.collect()
    torch.cuda.empty_cache()
    
    allocated, reserved = get_gpu_memory()
    print(f"\nAfter cleanup:")
    print(f"  Allocated: {allocated:.2f} GB")
    print(f"  Reserved: {reserved:.2f} GB")
    
    print("\nMemory Management Best Practices:")
    print("  1. Delete pipelines when done: del pipeline_name")
    print("  2. Clear cache: torch.cuda.empty_cache()")
    print("  3. Use smaller models when possible")
    print("  4. T4 GPU has 16GB - plan model selection accordingly")
else:
    print("GPU not available. Memory management is less critical on CPU.")

## Key Learnings

### Technical Insights:
1. **Pipeline Architecture**: HuggingFace pipelines abstract complex model operations into simple function calls
2. **Automatic Model Management**: Models are downloaded and cached on first use
3. **GPU Acceleration**: Provides 10-100x speedup for inference operations
4. **Memory Constraints**: T4 GPU's 16GB requires careful model selection and memory management
5. **Multimodal Capabilities**: Document understanding models combine vision and language processing

### Practical Applications:
1. **Text Analysis**: Sentiment analysis and question answering for invoice text processing
2. **Vision Processing**: Image classification for document type detection
3. **Document Understanding**: Direct extraction of information from invoice images
4. **Performance Optimization**: GPU usage is critical for production throughput
5. **Resource Management**: Proper cleanup prevents out-of-memory errors

### Next Steps:
1. Session 2: Building a complete invoice extraction agent
2. Session 3: Advanced computer vision for layout understanding
3. Session 4: Deploying with Ollama for production use
4. Session 5: Optimization and scaling strategies

## Additional Resources

### Model Exploration

In [None]:
# Show available tasks and model recommendations
from transformers import pipelines

print("="*60)
print("AVAILABLE PIPELINE TASKS")
print("="*60)

# Common tasks for document processing
relevant_tasks = [
    "text-classification",
    "token-classification", 
    "question-answering",
    "document-question-answering",
    "image-classification",
    "object-detection",
    "image-to-text",
    "zero-shot-classification"
]

print("\nTasks relevant for invoice processing:")
for task in relevant_tasks:
    print(f"  - {task}")

print("\nModel Resources:")
print("  HuggingFace Model Hub: https://huggingface.co/models")
print("  - 500,000+ available models")
print("  - Filter by task, language, and license")
print("  - Community ratings and usage statistics")
print("  - Detailed model documentation")

print("\nRecommended Models for Invoice Processing:")
print("  - LayoutLM: Document understanding with layout awareness")
print("  - Donut: End-to-end document AI without OCR")
print("  - TrOCR: Transformer-based optical character recognition")
print("  - Table Transformer: Table detection and structure recognition")