# ü§ñ OCR and Text Extraction Workflow

This notebook demonstrates a multi-agent system using Upsonic:

1. **OCR Agent**: Reads text from an ID card image
2. **Extractor Agent**: Extracts specific information (ID Number) from OCR output

**Flow**: Image ‚Üí OCR ‚Üí Text Extraction

## üì¶ Installation

First, install the required packages:

In [None]:
# Install direct dependencies with exact versions
# (transitive dependencies come automatically)
!pip install -q upsonic==0.71.5 python-dotenv==1.2.1

## üîë API Key Setup

Enter your OpenAI API key (required for the agents):

In [None]:
import os
from getpass import getpass

# Set your OpenAI API key
openai_api_key = getpass('Enter your OpenAI API key: ')
os.environ['OPENAI_API_KEY'] = openai_api_key

print("‚úÖ API key configured")

## üì§ Upload Image

Upload an ID card image to analyze (or use the sample image):

In [None]:
from google.colab import files
from IPython.display import Image, display

# Option 1: Upload your own image
uploaded = files.upload()

# Get the uploaded filename
if uploaded:
    image_path = list(uploaded.keys())[0]
    print(f"‚úÖ Image uploaded: {image_path}")
    
    # Display the image
    display(Image(filename=image_path, width=400))
else:
    print("‚ö†Ô∏è No image uploaded. Please run this cell again and select an image.")

## üõ†Ô∏è Import Libraries

In [None]:
from upsonic import Agent, Task

## ü§ñ Define Agents

Create two specialized agents:
- **OCR Agent**: Expert at reading text from images
- **Extractor Agent**: Expert at finding specific information in text

In [None]:
# OCR Agent: Reads text from images
ocr_agent = Agent(
    name="OCR Agent",
    role="Text Recognition Specialist",
    goal="Extract all visible text from images accurately",
    instructions="""You are an expert OCR (Optical Character Recognition) specialist.
    Your job is to carefully read images and extract all visible text with high accuracy.
    You pay attention to every detail and return complete text content.""",
    model="openai/gpt-4o"
)

print("‚úÖ OCR Agent created")

# Extractor Agent: Finds specific information
extractor_agent = Agent(
    name="Extractor Agent",
    role="Information Extraction Specialist",
    goal="Find and extract specific information from text",
    instructions="""You are a data extraction expert. You excel at finding specific
    information in text documents. You are precise and always return exactly
    what is requested, nothing more, nothing less.""",
    model="openai/gpt-4o"
)

print("‚úÖ Extractor Agent created")

## üìã Define Tasks

Create tasks for each agent:
1. **OCR Task**: Read all text from the image
2. **Extraction Task**: Find the ID Number from OCR output

In [None]:
# Task 1: OCR - Read text from image
ocr_task = Task(
    description="""Read the ID card image and extract ALL visible text.
    
    Return all text you can see in the image, maintaining the original structure.
    Include all fields and their values.""",
    context=[image_path]  # Image file path in context
)

print("‚úÖ OCR Task created")

# Task 2: Extract ID Number
extraction_task = Task(
    description="""From the OCR text provided in the context, find and extract
    ONLY the ID Number value.
    
    Look for a field labeled 'ID Number' and return only the number itself.
    If you find it, return just the number. If not found, return 'NOT_FOUND'.""",
    context=[ocr_task]  # This task depends on ocr_task output
)

print("‚úÖ Extraction Task created")

## ‚ñ∂Ô∏è Execute Workflow

Run the multi-agent workflow:

In [None]:
print("=" * 70)
print("OCR AND TEXT EXTRACTION WORKFLOW")
print("=" * 70)
print()

# ============================================================================
# STEP 1: OCR - Reading image
# ============================================================================
print("\n" + "=" * 70)
print("STEP 1: OCR - Reading image")
print("=" * 70)
ocr_result = ocr_agent.print_do(ocr_task)
print(f"\nüìÑ OCR Output:")
print(ocr_result)
print(f"\nüìä Characters read: {len(str(ocr_result))}")

# ============================================================================
# STEP 2: EXTRACTION - Finding ID Number
# ============================================================================
print("\n" + "=" * 70)
print("STEP 2: EXTRACTION - Finding ID Number")
print("=" * 70)
extraction_result = extractor_agent.print_do(extraction_task)
print(f"\nüîç Extracted ID Number: {extraction_result}")

# ============================================================================
# FINAL SUMMARY
# ============================================================================
print("\n" + "=" * 70)
print("WORKFLOW COMPLETE")
print("=" * 70)
print(f"\n‚úÖ OCR Agent read {len(str(ocr_result))} characters")
print(f"‚úÖ Extractor Agent extracted: {extraction_result}")
print()

## üìä Results

The workflow demonstrates:
- **Agent collaboration**: OCR Agent output becomes input for Extractor Agent
- **Task chaining**: Tasks can depend on each other via `context`
- **Specialized roles**: Each agent focuses on what it does best

---

### üîÑ Try It Yourself

1. Upload a different ID card image
2. Modify the extraction task to find different fields (Name, Date of Birth, etc.)
3. Add more agents to the workflow

### üìö Learn More

- [Upsonic Documentation](https://docs.upsonic.co)
- [GitHub Repository](https://github.com/Upsonic/Upsonic)