# üîç PaddleOCR-VL Demo on AMD GPU

This notebook demonstrates how to use **PaddleOCR-VL** (Vision-Language OCR) model running on AMD GPU with vLLM backend.

## Features
- **Layout Detection**: Detect document structure (text, images, tables, titles)
- **VL Recognition**: Vision-Language model for accurate text recognition
- **vLLM Backend**: High-performance inference on AMD GPU


## 1. Check vLLM Server Status

First, let's verify that the vLLM server is running and ready.


In [None]:
import requests
import time
from IPython.display import clear_output

def wait_for_vllm(url="http://localhost:8118/v1/models", timeout=300, interval=5):
    """
    Wait for vLLM server to be ready.
    
    Args:
        url: vLLM models endpoint
        timeout: Maximum wait time in seconds (default: 5 minutes)
        interval: Check interval in seconds
    
    Returns:
        True if server is ready, False if timeout
    """
    start_time = time.time()
    attempt = 0
    
    print("üîÑ Waiting for vLLM server to be ready...")
    print(f"   (This may take 1-2 minutes as the model loads)")
    print()
    
    while time.time() - start_time < timeout:
        attempt += 1
        elapsed = int(time.time() - start_time)
        
        try:
            response = requests.get(url, timeout=5)
            if response.status_code == 200:
                models = response.json()
                clear_output(wait=True)
                print("‚úÖ vLLM Server is ready!")
                print(f"   Time elapsed: {elapsed} seconds")
                print(f"   Available models: {[m['id'] for m in models['data']]}")
                return True
        except Exception as e:
            pass
        
        # Show progress
        clear_output(wait=True)
        spinner = ["‚†ã", "‚†ô", "‚†π", "‚†∏", "‚†º", "‚†¥", "‚†¶", "‚†ß", "‚†á", "‚†è"][attempt % 10]
        print(f"{spinner} Waiting for vLLM server... ({elapsed}s elapsed)")
        print(f"   Attempt {attempt}, checking every {interval}s")
        print(f"   Timeout: {timeout}s")
        
        time.sleep(interval)
    
    clear_output(wait=True)
    print(f"‚ùå Timeout after {timeout} seconds")
    print("   Please check vLLM server logs: /var/log/vllm_server.log")
    return False

# Wait for vLLM server
wait_for_vllm()


## 2. Load PaddleOCR-VL Pipeline

Load the PaddleOCR-VL pipeline with vLLM backend configuration.


In [None]:
import os
os.chdir("/opt/PaddleX")

# Set environment variable to skip model source check
os.environ["PADDLE_PDX_DISABLE_MODEL_SOURCE_CHECK"] = "True"

from paddlex import create_pipeline

# Create the PaddleOCR-VL pipeline with vLLM backend
pipeline = create_pipeline(pipeline="PaddleOCR-VL-vllm.yaml")
print("‚úÖ Pipeline loaded successfully!")


## 3. Display Test Image

Let's view the demo image before running OCR.


In [None]:
from IPython.display import Image, display

# Display the test image
test_image = "/opt/PaddleX/test/paddleocr_vl_demo.png"
print(f"Test image: {test_image}")
display(Image(filename=test_image, width=600))


## 4. Run OCR Inference

Execute the PaddleOCR-VL pipeline on the test image.


In [None]:
%%time

# Run inference
result = pipeline.predict(test_image)

print("‚úÖ Inference completed!")


## 5. View Results

Display the extracted text content from the document.


In [None]:
# Process and display results
for res in result:
    # Get parsing results - res is a PaddleOCRVLResult (dict-like)
    parsing_list = res.get('parsing_res_list', [])
    
    print("=" * 60)
    print("üìÑ EXTRACTED CONTENT")
    print("=" * 60)
    
    for i, block in enumerate(parsing_list):
        # block is a PaddleOCRVLBlock object, access attributes directly
        label = getattr(block, 'label', 'unknown')
        content = getattr(block, 'content', '')
        
        if content:  # Only show blocks with content
            label_emoji = {
                'doc_title': 'üìå',
                'paragraph_title': 'üìç',
                'text': 'üìù',
                'image': 'üñºÔ∏è',
                'table': 'üìä',
                'vision_footnote': 'üìé'
            }.get(label, '‚Ä¢')
            
            print(f"\n{label_emoji} [{label.upper()}]")
            print("-" * 40)
            print(content[:500] + ('...' if len(content) > 500 else ''))


---

## üéâ Congratulations!

You have successfully run PaddleOCR-VL on AMD GPU with vLLM backend.

### Resources
- [PaddleX Documentation](https://paddlepaddle.github.io/PaddleX/)
- [PaddleOCR-VL Tutorial](https://github.com/PaddlePaddle/PaddleX)
- [AMD ROCm](https://rocm.docs.amd.com/)
