# PDF Handling and Citations with AWS Bedrock

This notebook demonstrates how to work with PDFs and extract citations using Claude models on AWS Bedrock.

## Table of Contents
1. [Understanding PDF Handling in Bedrock](#understanding-pdf-handling)
2. [Prerequisites and Setup](#prerequisites)
3. [Basic PDF Processing](#basic-pdf-processing)
4. [Working with Citations](#citations)
5. [Practical Examples with Sample PDF](#examples)
6. [Advanced Techniques](#advanced)

## Prerequisites

- AWS account with Bedrock access
- AWS credentials configured
- Required Python packages installed

In [None]:
# Install required packages
!pip install boto3 PyPDF2

## Understanding PDF Handling in Bedrock <a id="understanding-pdf-handling"></a>

### How PDF Handling Works

Amazon Bedrock's PDF handling capability allows Claude models to directly process PDF documents in their native format. Here's how it works:

1. **Native PDF Processing**: The PDF is sent as a base64-encoded binary to the model
2. **Full Document Understanding**: Claude can see and understand:
   - Text content with preserved formatting
   - Document structure (headers, paragraphs, lists)
   - Tables and their relationships
   - Images and diagrams within the PDF
   - Page layout and organization

### Benefits Over Text Extraction

Traditional approaches often extract text from PDFs before sending to language models. Native PDF handling offers significant advantages:

#### 1. **Preserved Context and Layout**
- Maintains spatial relationships between elements
- Preserves table structures and formatting
- Retains connection between images and their captions
- Keeps headers, footers, and page numbers in context

#### 2. **Visual Understanding**
- Can interpret charts, graphs, and diagrams
- Understands visual hierarchies and emphasis
- Recognizes formatting cues (bold, italic, font sizes)
- Can reference specific page locations

#### 3. **Accurate Citations**
- Provides exact page numbers for references
- Can quote specific passages with their locations
- Maintains document integrity for compliance needs
- Enables precise source attribution

#### 4. **Reduced Preprocessing**
- No need for complex PDF parsing libraries
- Eliminates text extraction errors
- Handles complex layouts automatically
- Works with scanned documents (with good OCR)

#### 5. **Better Comprehension**
- Understands document flow and structure
- Can navigate between sections effectively
- Recognizes document types (reports, papers, forms)
- Maintains mathematical formulas and special characters

### When to Use PDF Handling

Native PDF handling is ideal for:
- Academic papers and research documents
- Legal documents requiring precise citations
- Technical manuals with diagrams
- Financial reports with complex tables
- Multi-column layouts
- Documents with mixed content types

## Setup and Configuration <a id="prerequisites"></a>

In [2]:
import json
import boto3
import base64
import io
import os
from botocore.config import Config
from botocore.exceptions import ClientError
from PyPDF2 import PdfReader, PdfWriter
from IPython.display import IFrame, display

# Configure the Bedrock client
def setup_bedrock_client(region='us-east-1'):
    """Setup Bedrock client with retry configuration"""
    config = Config(
        region_name=region,
        retries=dict(max_attempts=1000)
    )
    
    client = boto3.client(
        service_name='bedrock-runtime',
        config=config
    )
    
    return client

# Utility function for displaying metrics
def display_metrics(response_body, description=""):
    """Display usage metrics including prompt caching information"""
    usage = response_body.get('usage', {})
    
    print(f"\n{description} Metrics:")
    print(f"  Input tokens: {usage.get('input_tokens', 0)}")
    print(f"  Output tokens: {usage.get('output_tokens', 0)}")
    print(f"  Total tokens: {usage.get('input_tokens', 0) + usage.get('output_tokens', 0)}")
    
    # Check for prompt caching metrics
    cache_creation_input_tokens = usage.get('cache_creation_input_tokens', 0)
    cache_read_input_tokens = usage.get('cache_read_input_tokens', 0)
    
    if cache_creation_input_tokens > 0:
        print(f"  Cache creation tokens: {cache_creation_input_tokens}")
    if cache_read_input_tokens > 0:
        print(f"  Cache read tokens: {cache_read_input_tokens}")
        print(f"  Cache hit rate: {cache_read_input_tokens / usage.get('input_tokens', 1) * 100:.1f}%")

# Initialize the client
bedrock_client = setup_bedrock_client()
print("Bedrock client initialized successfully!")

Bedrock client initialized successfully!


## Quick Start: Basic Invoke and Converse Examples

Before diving into the full implementation, let's start with simple examples showing both `invoke` and `converse` API calls with PDF handling.

**Note**: The examples below show the correct format for both APIs. The key difference is in how documents are structured in the request.

In [3]:
# Example 1: Basic Invoke API call with PDF
def basic_invoke_example(pdf_path):
    """
    Demonstrates a simple invoke call with PDF handling and metrics tracking.
    """
    
    # Encode the PDF
    with open(pdf_path, "rb") as pdf_file:
        encoded_pdf = base64.b64encode(pdf_file.read()).decode("utf-8")
    
    # Prepare the request
    request_body = {
        "anthropic_version": "bedrock-2023-05-31",
        "messages": [
            {
                "role": "user",
                "content": [
                    {
                        "type": "document",
                        "source": {
                            "type": "base64",
                            "media_type": "application/pdf",
                            "data": encoded_pdf
                        },
                        "title": "Technical Report",
                        "cache_control": {"type": "ephemeral"}
                    },
                    {
                        "type": "text",
                        "text": "What is this document about? Please provide a brief summary."
                    }
                ]
            }
        ],
        "max_tokens": 256
    }
    
    # Make the invoke call
    try:
        response = bedrock_client.invoke_model(
            modelId="us.anthropic.claude-sonnet-4-20250514-v1:0",
            contentType='application/json',
            accept='application/json',
            body=json.dumps(request_body)
        )
        
        # Parse and display the response
        response_body = json.loads(response['body'].read())
        
        print("=== Basic Invoke Example ===")
        print("Model Response:")
        for block in response_body.get('content', []):
            if block.get('type') == 'text':
                print(block.get('text', ''))
        
        # Display metrics with prompt caching information
        display_metrics(response_body, "Basic Invoke")
        
        return response_body
        
    except ClientError as e:
        print(f"Error: {e}")
        return None

# Run the basic invoke example
if os.path.exists("sample_technical_report.pdf"):
    invoke_response = basic_invoke_example("sample_technical_report.pdf")
else:
    print("Please ensure sample_technical_report.pdf is in the current directory")

=== Basic Invoke Example ===
Model Response:
This document is a technical report (TR-2024-001) titled "Advanced Language Models: Performance Analysis and Implementation Guidelines" published by an AI Research Division in June 2025.

**Brief Summary:**

The report presents a comprehensive evaluation of four leading language models (GPT-3.5, Claude 2, Claude 3, and GPT-4) across 50 diverse tasks. The study focuses on performance metrics, implementation best practices, and practical applications for organizations.

**Key Findings:**
- Claude 3 achieved the highest accuracy at 92% for complex reasoning tasks
- Modern language models showed 15% improvement in accuracy over previous generations
- 2.3x increase in processing efficiency when properly optimized
- Token optimization can reduce costs by up to 40%
- Proper prompt engineering increases success rates by 25%
- Hybrid approaches work best for enterprise applications

**Main Content Areas:**
1. **Performance Analysis** - Detailed compa

In [4]:
# Example 2: Basic Converse API call with PDF
def basic_converse_example(pdf_path):
    """
    Demonstrates a simple converse call with PDF handling and metrics tracking.
    The converse API provides a more conversational interface.
    """
    
    # Read the PDF as raw bytes (not base64 encoded)
    with open(pdf_path, "rb") as pdf_file:
        pdf_bytes = pdf_file.read()
    
    # Prepare the message - note the different format for converse
    messages = [
        {
            "role": "user",
            "content": [
                {
                    "document": {
                        "name": "Technical Report",
                        "format": "pdf",
                        "source": {
                            "bytes": pdf_bytes  # Raw bytes, not base64
                        },
                    }
                },
                {
                    "text": "What is the main topic of this document? Please provide a brief summary."
                },
            ]
        }
    ]
    
    # Make the converse call
    try:
        response = bedrock_client.converse(
            modelId="us.anthropic.claude-sonnet-4-20250514-v1:0",
            messages=messages,
            inferenceConfig={
                "maxTokens": 512,
                "temperature": 0.5
            }
        )
        
        print("\n=== Basic Converse Example ===")
        print("Model Response:")
        
        # Extract and display the response
        output = response.get('output', {})
        if 'message' in output:
            content = output['message'].get('content', [])
            for item in content:
                if 'text' in item:
                    print(item['text'])
        
        # Show usage information with prompt caching metrics
        usage = response.get('usage', {})
        print(f"\nConverse API Metrics:")
        print(f"  Input tokens: {usage.get('inputTokens', 0)}")
        print(f"  Output tokens: {usage.get('outputTokens', 0)}")
        print(f"  Total tokens: {usage.get('totalTokens', 0)}")
        
        # Check for prompt caching in converse API (if available)
        if 'cacheCreationInputTokens' in usage:
            print(f"  Cache creation tokens: {usage.get('cacheCreationInputTokens', 0)}")
        if 'cacheReadInputTokens' in usage:
            cache_read = usage.get('cacheReadInputTokens', 0)
            print(f"  Cache read tokens: {cache_read}")
            if cache_read > 0:
                print(f"  Cache hit rate: {cache_read / usage.get('inputTokens', 1) * 100:.1f}%")
        
        return response
        
    except ClientError as e:
        print(f"Error: {e}")
        return None

# Run the basic converse example
if os.path.exists("sample_technical_report.pdf"):
    converse_response = basic_converse_example("sample_technical_report.pdf")
else:
    print("Please ensure sample_technical_report.pdf is in the current directory")


=== Basic Converse Example ===
Model Response:
The main topic of this document is **Advanced Language Models: Performance Analysis and Implementation Guidelines**.

## Brief Summary

This technical report (TR-2024-001) from an AI Research Division provides a comprehensive evaluation of state-of-the-art language models and practical guidance for their implementation. 

**Key aspects covered:**

- **Performance Analysis**: Comparative evaluation of four leading language models (GPT-3.5, Claude 2, Claude 3, and GPT-4) across 50 diverse tasks, measuring accuracy, latency, throughput, and cost efficiency

- **Key Findings**: 
  - Claude 3 achieved the highest accuracy (92%) in complex reasoning tasks
  - Token optimization can reduce costs by up to 40%
  - Proper prompt engineering improves success rates by 25%
  - Overall 15% improvement in accuracy and 2.3x increase in processing efficiency over previous generations

- **Implementation Guidelines**: Best practices for production deployme

## Understanding Invoke vs Converse APIs

Amazon Bedrock provides two main APIs for interacting with Claude models:

### InvokeModel API
- **Purpose**: Direct model invocation with full control over request format
- **Use case**: Single-turn interactions, streaming responses, custom formatting
- **Format**: Uses Anthropic's message format directly
- **Features**: Supports all model features including beta features

### Converse API
- **Purpose**: Simplified conversational interface
- **Use case**: Multi-turn conversations, standardized format across models
- **Format**: Uses AWS's unified message format
- **Features**: Simplified but may not support all beta features immediately

### Key Differences in PDF Handling

1. **Document Structure**:
   - **InvokeModel**: Uses `type: "document"` in content array
   - **Converse**: Uses `document` object with specific structure

2. **PDF Encoding**:
   - **InvokeModel**: Requires base64-encoded PDF data in `source.data`
   - **Converse**: Requires raw PDF bytes in `source.bytes` (NOT base64!)

3. **Response Format**:
   - **InvokeModel**: Returns raw model response
   - **Converse**: Returns structured AWS response format

Choose based on your needs:
- Use **InvokeModel** for maximum control and latest features
- Use **Converse** for simplified multi-turn conversations

### Important Note on PDF Encoding
The most common error when using the Converse API with PDFs is passing base64-encoded data instead of raw bytes. Always remember:
- **InvokeModel**: `encode_pdf()` → base64 string
- **Converse**: `pdf_file.read()` → raw bytes

## Basic PDF Processing Functions <a id="basic-pdf-processing"></a>

In [5]:
def encode_pdf(pdf_path):
    """Encode a PDF file to base64"""
    with open(pdf_path, "rb") as pdf_file:
        encoded_pdf = base64.b64encode(pdf_file.read()).decode("utf-8")
    return encoded_pdf

def encode_pdf_from_bytes(pdf_bytes):
    """Encode PDF bytes to base64"""
    return base64.b64encode(pdf_bytes).decode("utf-8")

def extract_pdf_pages(pdf_path, start_page=0, end_page=None):
    """Extract specific pages from a PDF"""
    reader = PdfReader(pdf_path)
    writer = PdfWriter()
    
    total_pages = len(reader.pages)
    end_page = end_page or total_pages
    
    for i in range(start_page, min(end_page, total_pages)):
        writer.add_page(reader.pages[i])
    
    output_buffer = io.BytesIO()
    writer.write(output_buffer)
    output_buffer.seek(0)
    
    return output_buffer.getvalue()

print("PDF processing functions loaded!")

PDF processing functions loaded!


### Simple PDF Analysis

In [6]:
def analyze_pdf(pdf_path, question, max_tokens=1024):
    """Analyze a PDF with a specific question and track metrics"""
    client = bedrock_client
    encoded_pdf = encode_pdf(pdf_path)
    
    request_body = {
        "anthropic_version": "bedrock-2023-05-31",
        "messages": [
            {
                "role": "user",
                "content": [
                    {
                        "type": "document",
                        "source": {
                            "type": "base64",
                            "media_type": "application/pdf",
                            "data": encoded_pdf
                        },
                        "title": "Technical Report"
                    },
                    {
                        "type": "text",
                        "text": question
                    }
                ]
            }
        ],
        "max_tokens": max_tokens
    }
    
    response = client.invoke_model(
        modelId="us.anthropic.claude-sonnet-4-20250514-v1:0",
        contentType='application/json',
        accept='application/json',
        body=json.dumps(request_body)
    )
    
    response_body = json.loads(response['body'].read())
    
    # Display metrics inline
    display_metrics(response_body, "PDF Analysis")
    
    return response_body

## Working with Citations <a id="citations"></a>

In [18]:
def analyze_pdf_with_citations(pdf_path, question, max_tokens=1024):
    """Analyze a PDF and extract citations with metrics tracking"""
    client = bedrock_client
    encoded_pdf = encode_pdf(pdf_path)
    
    request_body = {
        "anthropic_version": "bedrock-2023-05-31",
        "messages": [
            {
                "role": "user",
                "content": [
                    {
                        "type": "document",
                        "source": {
                            "type": "base64",
                            "media_type": "application/pdf",
                            "data": encoded_pdf
                        },
                        "title": "Technical Report",
                        "citations": {
                            "enabled": True
                        }
                    },
                    {
                        "type": "text",
                        "text": question
                    }
                ]
            }
        ],
        "max_tokens": max_tokens
    }
    
    response = client.invoke_model(
        modelId="us.anthropic.claude-sonnet-4-20250514-v1:0",
        contentType='application/json',
        accept='application/json',
        body=json.dumps(request_body)
    )
    
    response_body = json.loads(response['body'].read())
    
    # Display metrics
    display_metrics(response_body, "Citations Analysis")
    
    return response_body

def format_citations(content):
    """Extract and format citations from response"""
    citations = []
    for block in content:
        if block.get('citations'):
            for citation in block.get('citations', []):
                citations.append({
                    'text': citation.get('cited_text', ''),
                    'document': citation.get('document_title', ''),
                    'start_page': citation.get('start_page_number', 'N/A'),
                    'end_page': citation.get('end_page_number', 'N/A')
                })
    return citations

def display_response_with_citations(response):
    """Display the response text and citations"""
    if not response:
        return
    
    # Extract main text
    full_text = ""
    for block in response.get('content', []):
        if block.get('type') == 'text':
            full_text += block.get('text', '')
    
    print("Response:")
    print(full_text)
    print("\n" + "="*50 + "\n")
    
    # Extract and display citations
    citations = format_citations(response.get('content', []))
    if citations:
        print("Citations:")
        for i, citation in enumerate(citations, 1):
            print(f"\n[{i}] \"{citation['text']}\"")
            print(f"    Source: {citation['document']}")
            print(f"    Pages: {citation['start_page']}-{citation['end_page']}")
    else:
        print("No citations found.")

print("Citation functions loaded!")

Citation functions loaded!


### Working with Text Documents

In [8]:
def analyze_text_with_citations(text_content, question, title="Document", max_tokens=1024):
    """Analyze text content and extract citations with metrics"""
    client = bedrock_client
    
    request_body = {
        "anthropic_version": "bedrock-2023-05-31",
        "messages": [
            {
                "role": "user",
                "content": [
                    {
                        "type": "document",
                        "source": {
                            "type": "text",
                            "media_type": "text/plain",
                            "data": text_content
                        },
                        "title": title,
                        "citations": {
                            "enabled": True
                        }
                    },
                    {
                        "type": "text",
                        "text": question
                    }
                ]
            }
        ],
        "max_tokens": max_tokens
    }
    
    response = client.invoke_model(
        modelId="us.anthropic.claude-sonnet-4-20250514-v1:0",
        contentType='application/json',
        accept='application/json',
        body=json.dumps(request_body)
    )
    
    response_body = json.loads(response['body'].read())
    
    # Display metrics
    display_metrics(response_body, "Text Analysis with Citations")
    
    return response_body

### Streaming Responses

In [9]:
def stream_pdf_analysis(pdf_path, question, max_tokens=1024):
    """Stream the analysis of a PDF with metrics tracking"""
    client = bedrock_client
    encoded_pdf = encode_pdf(pdf_path)
    
    request_body = {
        "anthropic_version": "bedrock-2023-05-31",
        "messages": [
            {
                "role": "user",
                "content": [
                    {
                        "type": "document",
                        "source": {
                            "type": "base64",
                            "media_type": "application/pdf",
                            "data": encoded_pdf
                        },
                        "title": "Technical Report"
                    },
                    {
                        "type": "text",
                        "text": question
                    }
                ]
            }
        ],
        "max_tokens": max_tokens
    }
    
    try:
        response = client.invoke_model_with_response_stream(
            modelId="us.anthropic.claude-sonnet-4-20250514-v1:0",
            contentType='application/json',
            accept='application/json',
            body=json.dumps(request_body)
        )
        
        # Process the stream and collect metrics
        full_response = ""
        metrics = {}
        
        for event in response.get('body'):
            chunk = event.get('chunk')
            if chunk:
                chunk_data = json.loads(chunk.get('bytes').decode())
                
                # Collect text
                if chunk_data.get('type') == 'content_block_delta':
                    delta = chunk_data.get('delta', {})
                    if delta.get('type') == 'text_delta':
                        text = delta.get('text', '')
                        full_response += text
                        print(text, end='', flush=True)
                
                # Collect metrics from message_stop event
                elif chunk_data.get('type') == 'message_stop':
                    if 'amazon-bedrock-invocationMetrics' in chunk_data:
                        bedrock_metrics = chunk_data['amazon-bedrock-invocationMetrics']
                        metrics['inputTokens'] = bedrock_metrics.get('inputTokenCount', 0)
                        metrics['outputTokens'] = bedrock_metrics.get('outputTokenCount', 0)
                        metrics['totalTokens'] = metrics['inputTokens'] + metrics['outputTokens']
                        
                        # Check for cache metrics
                        if 'cacheCreationInputTokenCount' in bedrock_metrics:
                            metrics['cacheCreationTokens'] = bedrock_metrics['cacheCreationInputTokenCount']
                        if 'cacheReadInputTokenCount' in bedrock_metrics:
                            metrics['cacheReadTokens'] = bedrock_metrics['cacheReadInputTokenCount']
        
        # Display metrics after streaming
        print("\n\nStreaming Metrics:")
        print(f"  Input tokens: {metrics.get('inputTokens', 'N/A')}")
        print(f"  Output tokens: {metrics.get('outputTokens', 'N/A')}")
        print(f"  Total tokens: {metrics.get('totalTokens', 'N/A')}")
        
        if 'cacheCreationTokens' in metrics:
            print(f"  Cache creation tokens: {metrics['cacheCreationTokens']}")
        if 'cacheReadTokens' in metrics:
            print(f"  Cache read tokens: {metrics['cacheReadTokens']}")
            if metrics.get('inputTokens', 0) > 0:
                print(f"  Cache hit rate: {metrics['cacheReadTokens'] / metrics['inputTokens'] * 100:.1f}%")
        
        return full_response
    
    except ClientError as e:
        print(f"Error: {e.response['Error']['Code']} - {e.response['Error']['Message']}")
        return None

print("Streaming function loaded!")

Streaming function loaded!


### Analyzing Specific Pages

In [10]:
def analyze_pdf_section(pdf_path, start_page, end_page, question):
    """Analyze a specific section of a PDF with metrics tracking"""
    
    # Extract specific pages
    pdf_section = extract_pdf_pages(pdf_path, start_page, end_page)
    encoded_section = encode_pdf_from_bytes(pdf_section)
    
    client = bedrock_client
    
    request_body = {
        "anthropic_version": "bedrock-2023-05-31",
        "messages": [
            {
                "role": "user",
                "content": [
                    {
                        "type": "document",
                        "source": {
                            "type": "base64",
                            "media_type": "application/pdf",
                            "data": encoded_section
                        },
                        "title": f"Document (pages {start_page+1}-{end_page})",
                        "citations": {
                            "enabled": True
                        }
                    },
                    {
                        "type": "text",
                        "text": question
                    }
                ]
            }
        ],
        "max_tokens": 1024
    }

    response = client.invoke_model(
        modelId="us.anthropic.claude-sonnet-4-20250514-v1:0",
        contentType='application/json',
        accept='application/json',
        body=json.dumps(request_body)
    )
    
    response_body = json.loads(response['body'].read())
    
    # Display metrics
    display_metrics(response_body, f"Section Analysis (pages {start_page+1}-{end_page})")
    
    return response_body

### Helper Functions

In [11]:
def create_document_qa_system(pdf_path):
    """Create a Q&A system for a document with metrics tracking"""
    
    def ask_question(question):
        response = analyze_pdf_with_citations(pdf_path, question)
        display_response_with_citations(response)
        return response
    
    return ask_question

def analyze_research_paper(pdf_path):
    """Analyze a research paper with predefined questions and cumulative metrics"""
    
    questions = [
        "What is the main research question or hypothesis?",
        "What methodology was used in this research?",
        "What are the key findings or results?",
        "What limitations are mentioned?",
        "What are the future research directions suggested?"
    ]
    
    results = {}
    cumulative_metrics = {
        'total_input_tokens': 0,
        'total_output_tokens': 0,
        'total_cache_creation_tokens': 0,
        'total_cache_read_tokens': 0,
        'calls_with_cache_hits': 0
    }
    
    for i, question in enumerate(questions, 1):
        print(f"\n{'='*60}")
        print(f"Question {i}/{len(questions)}: {question}")
        print(f"{'='*60}\n")
        
        response = analyze_pdf_with_citations(pdf_path, question)
        display_response_with_citations(response)
        
        # Update cumulative metrics
        if response:
            usage = response.get('usage', {})
            cumulative_metrics['total_input_tokens'] += usage.get('input_tokens', 0)
            cumulative_metrics['total_output_tokens'] += usage.get('output_tokens', 0)
            cumulative_metrics['total_cache_creation_tokens'] += usage.get('cache_creation_input_tokens', 0)
            cumulative_metrics['total_cache_read_tokens'] += usage.get('cache_read_input_tokens', 0)
            
            if usage.get('cache_read_input_tokens', 0) > 0:
                cumulative_metrics['calls_with_cache_hits'] += 1
        
        results[question] = response
    
    # Display cumulative metrics
    print("\n" + "="*60)
    print("CUMULATIVE METRICS SUMMARY")
    print("="*60)
    print(f"Total API calls: {len(questions)}")
    print(f"Total input tokens: {cumulative_metrics['total_input_tokens']:,}")
    print(f"Total output tokens: {cumulative_metrics['total_output_tokens']:,}")
    print(f"Total tokens: {cumulative_metrics['total_input_tokens'] + cumulative_metrics['total_output_tokens']:,}")
    
    if cumulative_metrics['total_cache_creation_tokens'] > 0:
        print(f"\nPrompt Caching Statistics:")
        print(f"  Cache creation tokens: {cumulative_metrics['total_cache_creation_tokens']:,}")
        print(f"  Cache read tokens: {cumulative_metrics['total_cache_read_tokens']:,}")
        print(f"  Calls with cache hits: {cumulative_metrics['calls_with_cache_hits']}/{len(questions)}")
        
        if cumulative_metrics['total_cache_read_tokens'] > 0:
            cache_efficiency = (cumulative_metrics['total_cache_read_tokens'] / 
                              cumulative_metrics['total_input_tokens'] * 100)
            print(f"  Overall cache efficiency: {cache_efficiency:.1f}%")
            
            # Calculate token savings
            tokens_saved = cumulative_metrics['total_cache_read_tokens'] * 0.9  # 90% discount on cached tokens
            print(f"  Estimated tokens saved: {int(tokens_saved):,}")
    
    return results

### Example 1: Basic PDF Analysis

Let's start with a simple question about the document:

In [13]:
# Basic analysis without citations
print("Analyzing the PDF without citations...")
response = analyze_pdf(pdf_path, "What is the main topic of this document?")

if response:
    for block in response.get('content', []):
        if block.get('type') == 'text':
            print(block.get('text', ''))

Analyzing the PDF without citations...

PDF Analysis Metrics:
  Input tokens: 5635
  Output tokens: 188
  Total tokens: 5823
The main topic of this document is **Advanced Language Models: Performance Analysis and Implementation Guidelines**. 

This technical report provides a comprehensive analysis of state-of-the-art language models, specifically focusing on:

1. **Performance evaluation** - Comparing four leading language models (GPT-3.5, Claude 2, Claude 3, and GPT-4) across 50 diverse tasks
2. **Implementation best practices** - Guidelines for deploying these models in production environments
3. **Practical applications** - How organizations can effectively utilize these technologies

The report presents key findings such as Claude 3's superior performance in complex reasoning tasks (92% accuracy), cost optimization strategies that can reduce expenses by up to 40%, and the benefits of proper prompt engineering. It's aimed at helping organizations understand how to select, implement

### Example 2: Extracting Key Findings with Citations

Now let's use the citation feature to get precise references:

In [19]:
# Analysis with citations
print("Extracting key findings with citations...")
response = analyze_pdf_with_citations(pdf_path, "What are the key findings mentioned in this report?")
display_response_with_citations(response)

Extracting key findings with citations...

Citations Analysis Metrics:
  Input tokens: 6807
  Output tokens: 253
  Total tokens: 7060
Response:
Based on the technical report, the key findings are:

• Claude 3 demonstrates superior performance in complex reasoning tasks with 92% accuracy
• Token optimization strategies can reduce costs by up to 40% without compromising quality
• Hybrid approaches combining multiple models yield best results for enterprise applications
• Proper prompt engineering increases task success rates by an average of 25%

Additionally, the report highlights that modern language models show a 15% improvement in accuracy metrics over previous generations and a 2.3x increase in processing efficiency when properly optimized.

The performance analysis also reveals specific comparative metrics, showing that Claude 3 achieved 92% accuracy with 140ms latency, outperforming other models like GPT-4 (91% accuracy, 150ms latency), Claude 2 (88% accuracy, 135ms latency), and 

### Example 3: Understanding Visual Content

PDFs often contain charts and graphs. Let's ask about the visual content:

In [21]:
# Analyzing visual content
print("Analyzing charts and visual data...")
response = analyze_pdf_with_citations(
    pdf_path, 
    "Describe any charts or graphs in the document and what they show. What insights do they provide?"
)
display_response_with_citations(response)

Analyzing charts and visual data...

Citations Analysis Metrics:
  Input tokens: 6817
  Output tokens: 425
  Total tokens: 7242
Response:
The document contains two charts that provide insights into language model performance:

## 1. Model Performance Comparison Chart

This is a bar chart comparing four language models (GPT-3.5, Claude 2, Claude 3, and GPT-4) across two metrics:
- **Accuracy %** (shown in blue bars)
- **Speed Score** (shown in purple bars)

**Key insights:**
- Claude 3 demonstrates the highest accuracy at 92%, followed closely by GPT-4 at 91%
- GPT-3.5 shows the lowest accuracy at 85% but appears to have competitive speed performance
- Claude 2 achieves 88% accuracy with medium performance across other metrics
- The chart visually demonstrates the trade-offs between accuracy and speed across different models

## 2. Token Usage Pattern Chart

This is a line graph showing token usage over a 30-day period, with:
- X-axis: Days of the month (0-30)
- Y-axis: Tokens in thousa

### Example 4: Analyzing Tables

Let's extract and understand table data from the PDF:

In [66]:
# Analyzing table data
print("Extracting and analyzing table information...")
response = analyze_pdf_with_citations(
    pdf_path, 
    "What tables are in this document? Extract the performance metrics table and explain what it shows."
)
display_response_with_citations(response)

Extracting and analyzing table information...
Response:
Based on my review of the document, there is one table present:

**Table 1: Model Performance Metrics** (found on page 2)

Here is the extracted table:

| Model    | Accuracy (%) | Latency (ms) | Tokens/sec | Cost Efficiency |
|----------|--------------|--------------|------------|-----------------|
| GPT-3.5  | 85           | 120          | 2,500      | High           |
| Claude 2 | 88           | 135          | 2,200      | Medium         |
| Claude 3 | 92           | 140          | 2,100      | Medium         |
| GPT-4    | 91           | 150          | 1,800      | Low            |

## What the table shows:

This table presents a comparative analysis of four leading language models across four key performance dimensions:

1. **Accuracy (%)**: Claude 3 demonstrates superior performance with 92% accuracy, followed by GPT-4 (91%), Claude 2 (88%), and GPT-3.5 (85%).

2. **Latency (ms)**: Response time measurements, with GPT-3.5 be

### Example 5: Creating a Document Q&A System

Let's create an interactive Q&A system for our technical report:

In [67]:
# Create a Q&A system for the technical report
qa_system = create_document_qa_system(pdf_path)

# Ask various questions
questions = [
    "What are the implementation guidelines mentioned?",
    "What models are compared in this report?",
    "What are the conclusions about language model capabilities?"
]

for question in questions:
    print(f"\n{'='*60}")
    print(f"Question: {question}")
    print(f"{'='*60}\n")
    qa_system(question)
    print("\n")


Question: What are the implementation guidelines mentioned?

Response:
Based on the technical report, here are the implementation guidelines mentioned for deploying language models in production environments:

## Model Selection
Choose models based on specific use case requirements. Claude 3 excels at complex reasoning, while GPT-3.5 offers optimal speed for simple tasks.

## Prompt Engineering
Invest in developing clear, structured prompts. Include examples and explicit instructions to improve output quality.

## Cost Optimization
Implement token counting and caching strategies. Use smaller models for initial filtering before engaging larger models.

## Quality Assurance
Establish automated testing pipelines with diverse test cases. Monitor model outputs for drift and degradation.

## Security Considerations
Implement proper input sanitization and output filtering. Never expose raw model outputs without validation.

The report emphasizes that with proper implementation strategies, th

### Example 6: Comprehensive Research Paper Analysis

Let's perform a full analysis of the document as if it were a research paper:

In [70]:
# Comprehensive analysis
print("Performing comprehensive document analysis...")
print("This will analyze the document from multiple perspectives...\n")

# Note: This might take a moment as it makes multiple API calls
results = analyze_research_paper(pdf_path)

Performing comprehensive document analysis...
This will analyze the document from multiple perspectives...


Question: What is the main research question or hypothesis?

Response:
Based on the technical report, the main research focus appears to be a comprehensive performance evaluation and practical implementation analysis of advanced language models, rather than testing a specific hypothesis. 

The research aims to address several key questions:

1. **Performance Comparison**: How do leading language models perform across diverse tasks spanning natural language understanding, generation, and reasoning capabilities

2. **Implementation Effectiveness**: How can organizations implement these technologies effectively, with the goal of providing actionable recommendations for organizations looking to implement these technologies effectively

3. **Optimization Strategies**: How various optimization approaches (token optimization, prompt engineering, hybrid approaches) impact performance an

### Example 7: Streaming Response for Better UX

For long analyses, streaming provides immediate feedback:

In [22]:
# Streaming analysis for real-time feedback
print("Streaming analysis of the document...\n")
print("Summary: ", end="")
summary = stream_pdf_analysis(
    pdf_path, 
    "Provide a comprehensive summary of this technical report, including all major sections."
)
print("\n\nStreaming complete!")

Streaming analysis of the document...

Summary: # Technical Report TR-2024-001: Advanced Language Models - Comprehensive Summary

## Overview
This comprehensive technical report from the AI Research Division (published June 2025) analyzes state-of-the-art language models, focusing on performance metrics, implementation strategies, and practical applications for organizations.

## Executive Summary
The report demonstrates that modern language models have achieved remarkable capabilities in natural language understanding and generation. Key improvements include:
- **15% improvement** in accuracy metrics over previous generations
- **2.3x increase** in processing efficiency when properly optimized
- Actionable recommendations for effective organizational implementation

## Key Research Findings

### Performance Highlights
- **Claude 3**: Superior performance in complex reasoning tasks with **92% accuracy**
- **Token optimization**: Can reduce costs by up to **40%** without quality comprom

### Example 8: Analyzing Specific Pages

For our 3-page document, let's analyze the Implementation Guidelines on page 3:

In [72]:
# Analyze only page 3 (Implementation Guidelines)
print("Analyzing page 3 (Implementation Guidelines)...")
response = analyze_pdf_section(
    pdf_path, 
    2, 3,  # Pages are 0-indexed, so page 3 is index 2
    "Summarize all the implementation guidelines and best practices mentioned on this page."
)
display_response_with_citations(response)

Analyzing page 3 (Implementation Guidelines)...
Response:
Based on the technical report, here are the implementation guidelines and best practices for deploying language models in production environments:

## Model Selection
Choose models based on specific use case requirements. Claude 3 excels at complex reasoning, while GPT-3.5 offers optimal speed for simple tasks.

## Prompt Engineering
Invest in developing clear, structured prompts. Include examples and explicit instructions to improve output quality.

## Cost Optimization
Implement token counting and caching strategies. Use smaller models for initial filtering before engaging larger models.

## Quality Assurance
Establish automated testing pipelines with diverse test cases. Monitor model outputs for drift and degradation.

## Security Considerations
Implement proper input sanitization and output filtering. Never expose raw model outputs without validation.

## Technical Infrastructure
All models were tested using AWS EC2 p3.2xlar

## Advanced Techniques <a id="advanced"></a>

### Comparing PDF Analysis vs Text Extraction

Let's demonstrate why native PDF handling is superior to text extraction:

In [73]:
# Extract text from PDF for comparison
def extract_text_from_pdf(pdf_path):
    """Extract plain text from PDF"""
    reader = PdfReader(pdf_path)
    text = ""
    for page in reader.pages:
        text += page.extract_text() + "\n"
    return text

# Compare approaches
print("1. Asking about visual content with PDF handling:")
response = analyze_pdf(pdf_path, "What does the performance comparison chart show?")
if response:
    for block in response.get('content', []):
        if block.get('type') == 'text':
            print(block.get('text', ''))

print("\n" + "="*60 + "\n")

print("2. With text extraction, we lose visual information:")
extracted_text = extract_text_from_pdf(pdf_path)
print(f"Extracted text length: {len(extracted_text)} characters")
print("Sample of extracted text:")
print(extracted_text[:500] + "...")
print("\nNote: Charts, formatting, and visual elements are lost!")

1. Asking about visual content with PDF handling:
Based on the performance comparison chart on page 2, it shows two key visualizations:

## Model Performance Comparison (Bar Chart)
This displays performance metrics for four language models:
- **GPT-3.5**: ~85% accuracy, ~80 speed score
- **Claude 2**: ~88% accuracy, ~85 speed score  
- **Claude 3**: ~92% accuracy, ~80 speed score
- **GPT-4**: ~91% accuracy, ~75 speed score

The chart uses two metrics represented by different colored bars:
- **Accuracy %** (shown in blue/teal)
- **Speed Score** (shown in purple/magenta)

## Token Usage Pattern (Line Chart)
This shows token usage over a 30-day period, displaying:
- Usage starting around 1,000 tokens (thousands) at the beginning of the month
- Peak usage of approximately 1,500 tokens around day 5-10
- A general decline through the middle of the month to around 500 tokens
- A slight uptick toward the end of the month back to around 750 tokens

## Key Insights
- **Claude 3** achieved the hi

## Best Practices Summary

### PDF Handling Best Practices

1. **File Size Management**
   - Large PDFs (>10MB) may need compression
   - Extract relevant pages when possible
   - Consider splitting very large documents

2. **Citation Usage**
   - Enable citations for compliance and verification needs
   - Use citations when exact references are required
   - Citations add minimal overhead to response time

3. **Error Handling**
   - Always implement retry logic for throttling
   - Validate PDF encoding before sending
   - Handle malformed PDFs gracefully

4. **Performance Optimization**
   - Use streaming for better user experience
   - Cache responses for frequently accessed documents
   - Consider prompt caching for repeated analyses

5. **Security Considerations**
   - Validate PDF sources before processing
   - Be aware of sensitive data in documents
   - Implement appropriate access controls

## Conclusion

AWS Bedrock's native PDF handling with Claude models provides powerful capabilities for document analysis:

- **Direct PDF processing** preserves all document information
- **Citation extraction** enables precise source attribution
- **Visual understanding** allows analysis of charts and diagrams
- **Streaming support** provides real-time feedback

This notebook has demonstrated how to leverage these features for various use cases, from simple Q&A to comprehensive document analysis.

## Next Steps

1. Try the examples with your own PDFs
2. Build specialized analyzers for your document types
3. Integrate with other AWS services for complete workflows
4. Experiment with different prompt strategies for better results

Happy PDF processing! 🚀